The last post showed how to retrieve tweets from Twitter and store them in separate folders based on an attribute. This time I take a similar approach but the tweets are stored in a MongoDb collection and finally displayed using Highcharts.
So the first step retrieves the tweets for some of my favorite bands. I extract the text, user id and name and the date and time from the tweats Json representation. Then the Json is pimped with some extra attributes: createddate, createdyear, createdmonth, createdtime and searchtopic. Below is a sample Json document.
{
"_id" : ObjectId("56e7115f63438a0a724b5afe"),
"createdmonth" : "03",
"createddate" : "2016-03-14",
"searchtopic" : "Black Sabbath",
"createdyear" : "2016",
"handle" : "Dra_AmeMontalvo",
"message" : "#NowPlaying Paranoid de Black Sabbath \nRock ✌🏻️ ♫ https://t.co/0fgFjmwr7p",
"user" : "La niña mala.",
"createdtime" : "20:32:39"
}
Below is a screenprint of the Nifi flow. It shows three processors that retrieve the tweets. Then the UpdateAttribute processors are used to tag the incomming flowfiles (tweats). And then the processes unite into the "Store in MongoDb" group.
Groups in Nifi allow to group multple processors (a part of the flow). This helps to create logical units of parts that belong together. What I did here is to put everything into one group that is equal for all three incomming streams of data/files.
When I double-click the "Store in MongoDb" group, the content of the group is shown, as can be seen below.
The next thing I have done is to create a MongoDb Mapreduce job. Yes, MongoDb uses map and reduce as well. Because I like Groovy and I can use it as a scripting language (based on Java), I have chosen it as the language for retrieving the results from MongoDb. The result will be a Json representation of the results from the MongoDb server.
Here is the Json I generate from the MongoDb. The Json contains the name of the band and the counts per month in form of an array. The counts are done using the MongoDb mapreduce job inside the Groovy script.
{"name": "Black Sabbath","data":[0, 0, 52.0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}
{"name": "3 Doors Down","data":[0, 0, 35.0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}