Information about the connector is available here: github.com/neo4j-contrib/neo4j-streams
Some more documentation is here: neo4j-contrib.github.io/neo4j-streams/
My setup is as follows:
- MySQL: I have data in a MySQL database containing information about airlines and airports and which airline flies from which origin airport to which destination airport
- Nifi: I use Apache Nifi to listen for changes to the MySQL database tables and to send the data to Kafka
- Neo4j: is configured to consume data from the three topics created using Nifi
MySQL:
Here are the tables:
Nifi:
Updates to the MySQL tables will result in an update of the last_update column of the relevant record. Nifi will pickup the change records and send them to Kafka in JSON format.
My Dataflow looks like this:
Neo4j:
I have adjusted the Neo4j configuration as documented (see link at the beginning). First, I have added the Kafka config at the end of the neo4j.conf file:
The last cypher statement creates the relationship between the airports and the airlines: which airlines flies from which airport (origin) to which other airport (destination).
So this is my data pipeline: MySQL has the data and any updates are made here. The changes are picked up by Nifi, which send it to the relevant Kafka topic. And because I configured the three cypher statements in the Neo4j config, Neo4j consumes any messages that arrive in the three Kafka topics. And if there are any changes in the MySQL data, then they will automatically arrive in Neo4j.
Once the data is available or updated in Neo4j, I can run e.g. a query to see where Swiss (airline code=LX) is flying to from Zurich (airport code=ZRH).
Besides kafa and Neo4j, Apache Nifi is used for the dataflow management. It is a very good tool for dataflows: flexible, scalable, has many connectors and is the tool when it comes to schemas (inherit, infere), data provenance and then routing the data to various target systems.
Carpe Diem