Memanfaatkan Geo Query dan Date Query di Elasticsearch

Ridwan Fajar 16 Januari 2017

Memanfaatkan Geo Query dan Date Query di Elasticsearch

Kali ini kita akan mencoba melakukan geo query dan date query. Beberapa filter akan digunakan untuk mendapatkan dokumen yang diinginkan berdasarkan geolocation. Selain itu kita juga akan mencoba menyaringnya berdasarkan field bertipe tanggal. Di dalam tutorial ini kamu akan menggunakan data gempa bumi yang ada di Amerika Serikat yang disediakan oleh data.gov.us. Namun data tersebut sudah disesuaikan dengan Elasticsearch dan tinggal diunduh saja.

Dengan memahami geo query kamu akan melihat bagaimana Elasticsearch sangat berguna untuk hal spesifik dalam pengolahan data yang memiliki geolocation.

##1. Persiapan

Pertama pasang dulu elasticdump yang akan digunakan untuk memasang mapping dan dataset gempa yang akan digunakan di tutorial ini.

$ npm install elasticdump -g

Lalu download datanya di link berikut ini link menuju earthquake.zip

Kemudian ikuti langkah berikut untuk memulai membuat indeks data gempa di Elasticsearch.

$ curl -XPUT http://localhost:9200/demo
$ elasticdump --input earthquake-mapping.json --output http://localhost:9200/demo/earthquake --type=mapping
$ elasticdump --input earthquake-data.json --output http://localhost:9200/demo/earthquake --type=data

Jika tidak ada masalah, kamu dapat memeriksa mapping-nya dengan cara berikut:

$ curl -XGET http://localhost:9200/demo/earthquake/_mapping?pretty
{
  "demo" : {
    "mappings" : {
      "earthquake" : {
        "properties" : {
          "depth" : {
            "type" : "float"
          },
          "distance" : {
            "type" : "float"
          },
          "event_at" : {
            "type" : "date"
          },
          "location" : {
            "type" : "geo_point"
          },
          "magnitude" : {
            "type" : "float"
          },
          "source" : {
            "type" : "keyword"
          }
        }
      }
    }
  }
}

##2. Geo bounding box query

Seperti namanya dimana bounding box adalah sebuah kotak imajiner yang akan menyaring dokumen berdasarkan lokasi di dalam batas - batas bounding box tersebut. Didefinisikan dengan lokasi top left dan bottom right. Maka kita akan mendapatkan dokumen yang berada di dalam kotak imajiner tersebut. Misalkan ada sebuah file dengan nama query.json dan di dalamnya terdapat query berikut:

{	
	"size":5,
	"query":{
		"bool": {
			"must":{
				"match_all":{}
			},
			"filter":{
				"geo_bounding_box":{
					"location":{
						"top_left":{
							"lat":38.6,
							"lon":-123
						},
						"bottom_right":{
							"lat":33,
							"lon":-116
						}
					}
				}
			}
		}
	}
}

Sekarang mari kita coba eksekusi query tersebut di konsol:

$ curl -XGET http://localhost:9200/demo/earthquake/_search?pretty -d '@query.json'
{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 8745,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "3750ed8a-b8a9-456f-aee7-2b622210692c",
        "_score" : 1.0,
        "_source" : {
          "distance" : "",
          "event_at" : "2016-01-04T00:47:38.45Z",
          "source" : "CI",
          "depth" : "3.27",
          "magnitude" : "1.51",
          "location" : {
            "lat" : "33.2773",
            "lon" : "-116.2567"
          }
        }
      },
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "3ac25c4f-8825-411e-ac45-b76c8e59dfb7",
        "_score" : 1.0,
        "_source" : {
          "distance" : "",
          "event_at" : "2016-01-04T05:06:17.79Z",
          "source" : "CI",
          "depth" : "14.59",
          "magnitude" : "1.04",
          "location" : {
            "lat" : "33.6695",
            "lon" : "-116.7675"
          }
        }
      },
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "c9b0685e-e0a5-42ec-b64d-a915b5398195",
        "_score" : 1.0,
        "_source" : {
          "distance" : "3",
          "event_at" : "2016-01-04T08:13:15.03Z",
          "source" : "NC",
          "depth" : "5.73",
          "magnitude" : "1",
          "location" : {
            "lat" : "37.5767",
            "lon" : "-118.8572"
          }
        }
      },
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "7f1cd2e9-6a5c-4aa5-bc51-6cc60bbd3b56",
        "_score" : 1.0,
        "_source" : {
          "distance" : "4",
          "event_at" : "2016-01-04T13:41:40.23Z",
          "source" : "NC",
          "depth" : "2.52",
          "magnitude" : "1.69",
          "location" : {
            "lat" : "36.0968",
            "lon" : "-120.678"
          }
        }
      },
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "8ea12b68-b570-42fa-a1bd-b892669db758",
        "_score" : 1.0,
        "_source" : {
          "distance" : "8",
          "event_at" : "2016-01-04T19:23:19.45Z",
          "source" : "NC",
          "depth" : "9.34",
          "magnitude" : "1.21",
          "location" : {
            "lat" : "36.7085",
            "lon" : "-121.3883"
          }
        }
      }
    ]
  }
}

##3. Geo polygon query

Bila geo bounding box hanya membuat kotak imajiner, kamu bisa membuat pembatas dengan bentuk yang bebas dengan menggunakan geo polygon. Kamu hanya perlu menentukan geolocation yang menjadi titik - titik poligon tersebut. Dan dokumen pun akan disaring berdasarkan bentuk poligon tersebut.

{	
	"size":5,
	"query":{
		"bool": {
			"must":{
				"match_all":{}
			},
			"filter":{
				"geo_polygon":{
					"location":{
						"points":[
							{"lat":38.6, "lon":-123},
							{"lat":38.6, "lon":-118},
							{"lat":35.5, "lon":-117},
							{"lat":30.6, "lon":-110}
						]
					}
				}
			}
		}
	}
}

Bila kita eksekusi query diatas maka akan muncul output berikut:

$ curl -XGET http://localhost:9200/demo/earthquake/_search?pretty -d '@query.json'
{
  "took" : 36,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2729,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "c9b0685e-e0a5-42ec-b64d-a915b5398195",
        "_score" : 1.0,
        "_source" : {
          "distance" : "3",
          "event_at" : "2016-01-04T08:13:15.03Z",
          "source" : "NC",
          "depth" : "5.73",
          "magnitude" : "1",
          "location" : {
            "lat" : "37.5767",
            "lon" : "-118.8572"
          }
        }
      },
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "d775cddc-fd6d-41bd-a6d6-38856f7bf74c",
        "_score" : 1.0,
        "_source" : {
          "distance" : "",
          "event_at" : "2016-01-04T23:30:54.39Z",
          "source" : "CI",
          "depth" : "2.79",
          "magnitude" : "1.75",
          "location" : {
            "lat" : "35.2023",
            "lon" : "-117.2765"
          }
        }
      },
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "ab75e9fd-96b9-4828-ae27-6645d3e0d085",
        "_score" : 1.0,
        "_source" : {
          "distance" : "27",
          "event_at" : "2016-01-05T05:06:22.56Z",
          "source" : "NC",
          "depth" : "19.88",
          "magnitude" : "2.8",
          "location" : {
            "lat" : "37.33",
            "lon" : "-120.0315"
          }
        }
      },
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "e5ab07c3-25ce-409d-ab7a-78bc427d5a4b",
        "_score" : 1.0,
        "_source" : {
          "distance" : "",
          "event_at" : "2016-01-05T09:56:31.56Z",
          "source" : "CI",
          "depth" : "2.6",
          "magnitude" : "1.24",
          "location" : {
            "lat" : "34.6682",
            "lon" : "-116.308"
          }
        }
      },
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "0d705c9d-3e23-44c8-aa45-7f190df82319",
        "_score" : 1.0,
        "_source" : {
          "distance" : "23",
          "event_at" : "2016-01-05T13:31:55.96Z",
          "source" : "NN",
          "depth" : "11.94",
          "magnitude" : "1.79",
          "location" : {
            "lat" : "37.1588",
            "lon" : "-117.9696"
          }
        }
      }
    ]
  }
}

##4. Geo distance query

Sekarang kita hanya akan menggunakan satu geolocation dan mencari wilayah dalam area berbentuk lingkaran atau radial. Kamu hanya perlu menentukan titik pusat lingkaran tersebut dan menentukan jarak yang menjadi jari - jari lingkaran tersebut.

{	
	"size":5,
	"query":{
		"bool": {
			"must":{
				"match_all":{}
			},
			"filter":{
				"geo_distance":{
					"distance":"100km",
					"location":{
						"lat":38.6, 
						"lon":-123
					}
				}
			}
		}
	}
}

Bila kita eksekusi maka akan muncul output berikut:

$ curl -XGET http://localhost:9200/demo/earthquake/_search?pretty -d '@query.json'
{
  "took" : 115,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3595,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "feeed7d8-84b6-4991-aff5-e4be8ed0fcf8",
        "_score" : 1.0,
        "_source" : {
          "distance" : "2",
          "event_at" : "2016-01-04T18:42:48.20Z",
          "source" : "NC",
          "depth" : "2.68",
          "magnitude" : "1.01",
          "location" : {
            "lat" : "38.7565",
            "lon" : "-122.727"
          }
        }
      },
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "bdd288fa-5000-4734-84e7-cb1e37472102",
        "_score" : 1.0,
        "_source" : {
          "distance" : "0",
          "event_at" : "2016-01-04T23:05:31.85Z",
          "source" : "NC",
          "depth" : "3.28",
          "magnitude" : "1.32",
          "location" : {
            "lat" : "38.809",
            "lon" : "-122.7928"
          }
        }
      },
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "ba2c7de1-7c0c-4a23-b4b0-bf2df31141a9",
        "_score" : 1.0,
        "_source" : {
          "distance" : "4",
          "event_at" : "2016-01-05T08:49:54.71Z",
          "source" : "NC",
          "depth" : "11.3",
          "magnitude" : "1.71",
          "location" : {
            "lat" : "38.0877",
            "lon" : "-122.8457"
          }
        }
      },
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "9b89ecd7-1ffa-4cf5-b382-08aa88f73ab6",
        "_score" : 1.0,
        "_source" : {
          "distance" : "2",
          "event_at" : "2016-01-06T08:23:47.93Z",
          "source" : "NC",
          "depth" : "1.01",
          "magnitude" : "1.57",
          "location" : {
            "lat" : "38.8008",
            "lon" : "-122.7713"
          }
        }
      },
      {
        "_index" : "demo",
        "_type" : "earthquake",
        "_id" : "cc9047a7-faf2-4800-a35a-68457f4ff387",
        "_score" : 1.0,
        "_source" : {
          "distance" : "2",
          "event_at" : "2016-01-06T12:55:18.19Z",
          "source" : "NC",
          "depth" : "1.53",
          "magnitude" : "1.08",
          "location" : {
            "lat" : "38.8413",
            "lon" : "-122.8398"
          }
        }
      }
    ]
  }
}

##5. Agregasi geo distance query

Sekarang kita akan mencoba mengelompokkan dokumen berdasarkan rentang lokasi. Dimana kita dapat mengelompokkan lokasi berdasarkan jarak yang didefinisikan. Disini kita gunakan agregasi pada geo distance untuk menemukan dokumen di setiap rentang jarak.

{	
	"size":0,
	"query":{
		"match_all":{}
	},
	"aggregations":{
		"geo_stats":{
			"geo_distance":{
				"field":"location",
				"origin":{
					"lat":37.2,
					"lon":-118.8
				},
				"ranges":[
					{"from":20},
					{"from":20, "to":40},
					{"from":40, "to":60},
					{"from":60, "to":80},
					{"from":80, "to":100},
					{"from":100}
				]
			}
		}
	}
}

Bila kita eksekusi maka akan muncul output berikut:

$ curl -XGET http://localhost:9200/demo/earthquake/_search?pretty -d '@query.json'
{
  "took" : 102,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 38103,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "geo_stats" : {
      "buckets" : [
        {
          "key" : "20.0-*",
          "from" : 20.0,
          "doc_count" : 38103
        },
        {
          "key" : "20.0-40.0",
          "from" : 20.0,
          "to" : 40.0,
          "doc_count" : 0
        },
        {
          "key" : "40.0-60.0",
          "from" : 40.0,
          "to" : 60.0,
          "doc_count" : 0
        },
        {
          "key" : "60.0-80.0",
          "from" : 60.0,
          "to" : 80.0,
          "doc_count" : 0
        },
        {
          "key" : "80.0-100.0",
          "from" : 80.0,
          "to" : 100.0,
          "doc_count" : 0
        },
        {
          "key" : "100.0-*",
          "from" : 100.0,
          "doc_count" : 38103
        }
      ]
    }
  }
}

##6. Agregasi date histogram

Sebagai tambahan kamu juga dapat melakukan agregasi dengan tanggal berdasarkan interval. Interval tersebut dapat kamu definisikan dengan month, quarter, year, dan lainnya.

{	
	"size":0,
	"query":{
		"match_all":{}
	},
	"aggregations":{
		"date_stats":{
			"date_histogram":{
				"field":"event_at",
				"interval":"month"
			}
		}
	}
}
$ curl -XGET http://localhost:9200/demo/earthquake/_search?pretty -d '@query.json'
{
  "took" : 107,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 38103,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "date_stats" : {
      "buckets" : [
        {
          "key_as_string" : "2016-01-01T00:00:00.000Z",
          "key" : 1451606400000,
          "doc_count" : 3911
        },
        {
          "key_as_string" : "2016-02-01T00:00:00.000Z",
          "key" : 1454284800000,
          "doc_count" : 3681
        },
        {
          "key_as_string" : "2016-03-01T00:00:00.000Z",
          "key" : 1456790400000,
          "doc_count" : 3767
        },
        {
          "key_as_string" : "2016-04-01T00:00:00.000Z",
          "key" : 1459468800000,
          "doc_count" : 3903
        },
        {
          "key_as_string" : "2016-05-01T00:00:00.000Z",
          "key" : 1462060800000,
          "doc_count" : 3618
        },
        {
          "key_as_string" : "2016-06-01T00:00:00.000Z",
          "key" : 1464739200000,
          "doc_count" : 3684
        },
        {
          "key_as_string" : "2016-07-01T00:00:00.000Z",
          "key" : 1467331200000,
          "doc_count" : 3517
        },
        {
          "key_as_string" : "2016-08-01T00:00:00.000Z",
          "key" : 1470009600000,
          "doc_count" : 3898
        },
        {
          "key_as_string" : "2016-09-01T00:00:00.000Z",
          "key" : 1472688000000,
          "doc_count" : 4371
        },
        {
          "key_as_string" : "2016-10-01T00:00:00.000Z",
          "key" : 1475280000000,
          "doc_count" : 2783
        },
        {
          "key_as_string" : "2016-11-01T00:00:00.000Z",
          "key" : 1477958400000,
          "doc_count" : 970
        }
      ]
    }
  }
}

{	
	"size":0,
	"query":{
		"match_all":{}
	},
	"aggregations":{
		"date_stats":{
			"date_histogram":{
				"field":"event_at",
				"interval":"quarter"
			}
		}
	}
}
$ curl -XGET http://localhost:9200/demo/earthquake/_search?pretty -d '@query.json'
{
  "took" : 38,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 38103,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "date_stats" : {
      "buckets" : [
        {
          "key_as_string" : "2016-01-01T00:00:00.000Z",
          "key" : 1451606400000,
          "doc_count" : 11359
        },
        {
          "key_as_string" : "2016-04-01T00:00:00.000Z",
          "key" : 1459468800000,
          "doc_count" : 11205
        },
        {
          "key_as_string" : "2016-07-01T00:00:00.000Z",
          "key" : 1467331200000,
          "doc_count" : 11786
        },
        {
          "key_as_string" : "2016-10-01T00:00:00.000Z",
          "key" : 1475280000000,
          "doc_count" : 3753
        }
      ]
    }
  }
}

##7. Referensi

  • Elasticsearch official documentation
  • Tutorialspoint - Elasticsearch

(arslan/elasticsearch)