neo4j - CSV processor - datamelt blog

neo4j - CSV processor

19/2/2018

So I have spent quite a lot of time recently to learn neo4j. First the basics: how does it work - how do I write queries. Good fun and easy to learn the basics. And by the way neo4j has excellent documentation available online and also downloadable.

Next I was doing "load csv", to get more data in to my database than what I had before, were I manually added it. Load csv data is also very good. If you just return the first lines, without actually storing or manipulating data, you get the rows right there displayed in the browser. So you down have to open another tool to review the structure and content of your rows. The load csv data function gives you a lot of possibilities to convert your flat data to nodes and relationships and it is useful for developing a schema.

The APOC library offers a lot of functions that one can use to query data, export data, work with geo positions and more. I think there are close to 300 functions making the developers or users life easier.

While learning, if you look left and right, you find out about the additional tools like cypher-shell. And of the neo4j-import tool. It imports data fast, because it does not use the transactional layer as load csv or cypher shell, but it simply creates the files on disk. The files which make up the nodes, relations and what else is required for a complete neo4j database.

the CSV processor

As I wanted to try a larger dataset and load csv was too slow, I gave it a try. For importing csv data with the neo4j-import tool, the files need to have a defined structure. So basically you need to do some preparation work, before you can import the data. But once you have done that you get a really good speed - much better as with the other solutions. Read the documentation on the neo4j website to get started.

And this was the point where I started to think that it would be good to have a tool that does the preparation work for me. Now the normal way of how I design a database is to do it in the neo4j desktop or browser. I manually create some nodes and relations and think about if that works out well for my use case. Once I am convinced that the schema is right, I can load data and revisit if it is really correct for the larger data set.

As my design starts in neo4j, I thought it would be good that the tool just reads what I have designed and creates - together with the given data in a CSV file - the strucuture that is required by the meo4j-import tool. After thinking about it, I was convinced that it will work and started coding a tool in Java.

It is really simple: You design your nodes and relations. You label your nodes and relations and add properties to them as required. The values of the properties are a reference (a pointer) to a column in the CSV file. So if you have a node "Person" which has a property "name" and in the CSV file there is a column "fullname" then the value for the "Person" property "name" is "fullname". When you run the CSV processor tool, it reads the neo4j database schema, maps the keys and values from the schema to the CSV file and prepares the files for the neo4j-import tool - ready to use.

Ok. There is a little bit more to it. For the CSV processor tool to work, you need to provide some more metadata so that the files for the import tool can be created. One is to add a certain label to each node, so that the tool knows which nodes to use. You also have to define which is the key field/property of the node; the neo4j import tool needs unique keys to work with. There is more to say about this, but I will handle this in another blog entry.
You can also add metadata to the node and relation properties to define of which type they are (string, integer, float, etc.).

The complete workflow is: Design the schema in neo4j browser. Run the CSV processor tool together with the CSV file containing the data. Then run the neo4j-import tool to import all data from the files that the CSV processor created. That's it.

So much for the moment. The next days I will write another blog post with more details and an example and screenprints so visualize the process and for easy understanding.

The code is on GitHub: https://github.com/uwegeercken. The tool is in beta at the moment. But come back frequently to get the latest updates.

Carpe Diem

0 Comments

neo4j - CSV processor

the CSV processor

Leave a Reply.

Author

Categories

Archives