So meanwhile I have added the following data to my processing chain with Logstash, Elasticsearch and Kibana:
- Administrative Devision 1st to 4th Order
- Continent
- Continent Code
- remove double quotes from the file. It looks as if there are some unbalanced quotes in the file, so I remove them all together (usind sed)
- create two lookup CSV files for the Administrative Devision 3rd and 4th order. The first two are already availabe as files but these ones have to be derived from the data file itself (using awk)
- Lookup the continent code and name from the country code of each row of data
- finally the data is sent to Elasticsearch
The script uses environment variables that are passed to logstash, so that no paths or file names are hardcoded in the logstash pipeline file.
The file are all available on Github.
Carpe Diem