Load CSV file to Redis using Awk

4/8/2019

I have spent some time on refreshing my knowledge about Redis. After the basics, I came to a point where I wanted to load more data into Redis and typing everything in was not a solution.

So I have created a small awk script that can be used to process a CSV file and pipe it to the Redis client command. Apparently this is the fastest way of importing data to Redis. Here is a link: Redis Mass Insertion

So Redis has a specific protocol, one has to adhere to and the awk script does this: It reads the header row - which is mandatory (for the moment) - from the CSV file and determines the field names, then it reads a row from the file and converts it to the Redis protocol and outputs the result. This can then be piped into the Redis client command.

The script sends the data to Redis as hashes: each row keys a unique id and each field of the row from the CSV file gets a name (from the header) and it's value. So the HMSET command is used to create the structure in Redis.

Btw, here is the link to the awk script on Github. And there are other awk scripts that you may find useful.

Here is an example of the awk script to execute:

Before running this make sure that Redis is running and you can connect to it.

The -b flag in awk is used so that special characters such as é, à, etc are properly processed and sent to Redis.

Now the awk script has some additional variables you can pass to it:

separator: which separator is used in the CSV file to divide the individual columns
rediskey: you can or want to group certain keys together in Redis. By domain or system name maybe. This key will be used as the first part of the unique identifier of the row. If not specified "csvfile" is used.
uidcolumn: defines which column number in each row has the unique identifier of that row. If not specified or present in the file, then simply the row number is used

In this example, I have chosen "geonames" as the rediskey variable. And if the uinique id - taken from column 1 of the input file is e.g. "123456", then this will end up in redis with the key "geonames:123456".

Here is an example CSV file with a single data row:

Run through the awk script - without the pipe to Redis - this is what the output looks like. This is how the data is sent to Redis. You can read about the protocol when you follow the link at the beginning of this post.

Basically it specifies: The total number of parts the message exists of (10), and then the Redis command to execute and all key value pairs. Before each part, the length of the part is specified - e.g. $5 specifies that the next part is 5 characters long (the HMSET).

The row would be inserted into Redis with the key "geonames:2994701" and you can retrieve it from Redis like this:

And this is the response from the Redis client:

I hope you find this useful. I will work on the script to enhance it but you are also welcome to help.

Carpe Diem

0 Comments

Load CSV file to Redis using Awk

Leave a Reply.

Author

Categories

Archives