So I have created a small awk script that can be used to process a CSV file and pipe it to the Redis client command. Apparently this is the fastest way of importing data to Redis. Here is a link: Redis Mass Insertion
So Redis has a specific protocol, one has to adhere to and the awk script does this: It reads the header row - which is mandatory (for the moment) - from the CSV file and determines the field names, then it reads a row from the file and converts it to the Redis protocol and outputs the result. This can then be piped into the Redis client command.
The script sends the data to Redis as hashes: each row keys a unique id and each field of the row from the CSV file gets a name (from the header) and it's value. So the HMSET command is used to create the structure in Redis.
Btw, here is the link to the awk script on Github. And there are other awk scripts that you may find useful.
Here is an example of the awk script to execute:
The -b flag in awk is used so that special characters such as é, à, etc are properly processed and sent to Redis.
Now the awk script has some additional variables you can pass to it:
- separator: which separator is used in the CSV file to divide the individual columns
- rediskey: you can or want to group certain keys together in Redis. By domain or system name maybe. This key will be used as the first part of the unique identifier of the row. If not specified "csvfile" is used.
- uidcolumn: defines which column number in each row has the unique identifier of that row. If not specified or present in the file, then simply the row number is used
Here is an example CSV file with a single data row:
The row would be inserted into Redis with the key "geonames:2994701" and you can retrieve it from Redis like this:
Carpe Diem