I will use Logstash to translate the data into JSON format and to model the data, so that it can be sent to Elasticsearch. On top of that, I will show how to create some simple visualizations using Kibana.
Let's start. Here is a single row from the geonames dataset - in CSV, tab delimited - format:
2986043 Pic de Font Blanca Pic de Font Blanca Pic de Font Blanca,Pic du Port 42.64991 1.53335 T PK AD 00 02860 Europe/Andorra 2014-11-05
geonameid : integer id of record in geonames database
name : name of geographical point (utf8) varchar(200)
asciiname : name of geographical point in plain ascii characters, varchar(200)
alternatenames: alternatenames, comma separated, ascii names
latitude: latitude in decimal degrees (wgs84)
longitude: longitude in decimal degrees (wgs84)
feature class: see http://www.geonames.org/export/codes.html, char(1)
feature code: see http://www.geonames.org/export/codes.html, varchar(10)
country code: ISO-3166 2-letter country code, 2 characters
cc2: alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters
admin1 code: fipscode. see file admin1Codes.txt for display names of this code; varchar(20)
admin2 code: code for the second administrative division varchar(80)
admin3 code: code for third level administrative division, varchar(20)
admin4 code: code for fourth level administrative division, varchar(20)
population: bigint (8 byte int)
elevation: in meters, integer
dem: digital elevation model, srtm3 or gtopo30. integer
timezone: the iana timezone id (see file timeZone.txt) varchar(40)
modification date: date of last modification in yyyy-MM-dd format
The conversions to the appropriate data types are necessary, so that we can properly use these in Elasticsearch.
Each row of the CSV file has a unique id (geonameid). Because it is unique, we use it as the document_id of the index in Elasticsearch. This way, the next time we load a new geonames file, when an id already exists in the index, then the value will be updated in the Elasticsearch index instead of inserting a new record.
So we have transformed the data from this input:
2986043 Pic de Font Blanca Pic de Font Blanca Pic de Font Blanca,Pic du Port 42.64991 1.53335 T PK AD 00 02860 Europe/Andorra 2014-11-05
{
"@timestamp": "2018-10-02T19:56:12.263Z",
"feature_name": "peak",
"country_code": "AD",
"feature_code": "PK",
"name": "Pic de Font Blanca",
"geonameid": 2986043,
"position": {
"lon": 1.53335,
"lat": 42.64991
},
"feature": "T.PK",
"timezone": "Europe/Andorra",
"feature_class": "T",
"modification date": "2014-11-05",
"elevation": null,
"country_code_alternate": null,
"population": 0,
"asciiname": "Pic de Font Blanca",
"modification date": [
"2014-11-05T00:00:00.000Z"
]
}
Have a look at the documentation for Logstash at: https://www.elastic.co/guide/en/logstash/current/index.html
Here are the other parts:
- https://datamelt.weebly.com/blog/elasticsearch-a-practical-example-part-2
- https://datamelt.weebly.com/blog/elasticsearch-a-practical-example-part-3
Carpe Diem