Make sure you have downloaded the two jar libraries from my github account. The one is the CSV processor tool and the other one is the neo4j JDBC driver we need.
Here is a sample so that it gets clearer how the CSV processor tool works.
As indicated, the design is done in neo4j itself. We will create three nodes and two relationships.
- Node Car
- Node Country
- Node Person
- Relationship Person to Car
- Relationship Person to Country
As input data we use a sample CSV file. The first line is the header with the names of the individual columns:
After that the properties (key/values) follow. The values of the properties point to the field names in the CSV file we use. Select the key of the property at your discretion. And the value is then the name of the CSV column.
The first property is the "id_field". It is required by the CSV processor tool and indicates which column from the CSV file is used as the unique key for the node. In this case it points to "person_id" in the CSV file. The "id_field" is just metadata that helps the tool the generate the output. It will not be output itself. But note that there must be a a regular property mapped to the same field. This is the "id" property. It is a regular node property and also points to the "person_id". Again: one is metadata for the tool and the other is the regular property.
There is also a "name" property. At the end, there is the property "year_of_birth". It points to the field "born" in the CSV file. But there is also an indicator of which type the data of the field is. String, integer, float and others are possible. It's documented on the neo4j website in the operational manual.
Let's create the other two nodes:
Next we create the relationships:
Then we create the next relationship:
At this point you should have a similar layout like this in the neo4j browser:
Next we run the CSV processor tool. Put the CSV file to process in a folder. My CSV file in this case is named "persons.csv". Make sure the jar files you downloaded are in a folder accessible/known to Java (classpath).
Make sure you replace the hostname, username, password, output folder and delimiter in the statement below with the appropriate values. Then run the tool like this:
The neo4j-import tool will create the complete files and structure of the database for us. It runs offline and is bypassing the transactional layer of neo4j and this is why it is so fast.
So we can delete the database we used for the design. The database is located in the data/databases folder of the neo4j installation. Go ahead and delete the relevant folder and files . Be careful though to delete the correct one if you have multiple ones. The active database is listed in the "neo4j.conf" file in the "conf" folder. Be careful. Ok. maybe you simply rename it... just in case...
Then run the import tool using this command:
If you want to, you can wipe the data and import the original schema that we designed and change it's layout: more nodes and relationships and also properties. Or better even, create a new layout for your CSV data with lots of rows and go throught the process again.
The CSV processor tool is in beta. Is is not feature complete and probably contains bugs. At the current moment it will probably not work with very large files. I will architect a further way to work with very large data files. And I will work on further improving and devloping the tool and would be grateful for comments or suggestions.
Read more about the neo4j-import tool on the neo4j website.
Hope you enjoyed it.
Carpe Diem.