So here is the second part of the starter for Apache Nifi. Apache Nifi works with processors and connections between them. That's what you see on the flow above. Processors are sort of puzzle pieces that do a distinct task and then you connect them together to design a flow. Now the questions was how to use a common processing flow for both "GetTwitter" processors - not to duplicate things - and yet being able to devide the results into seperate files later on. I used the "UpdateAttribute" processor. It allows to assign properties, so I am tagging my different flows: they get a property named "tweettype" of "privatetweet" versus "worktweet". Twitter tweets - in the form of Json data - contain a lot of information. Information I don't want to store. I am only interested in the information about the user and the message itself. So I had to look for a way to eliminate the rest of the information in the flow. I have setup my main logic by now. Now I want to store the results in two seperate folders - one for private tweets and one for work related tweets. I will evaluate the property "tweettype" which I assigned earlier and make the decision to route the data based on the value of this property. The property "private" will result in a true or false condition and when I connect the processor I can route the results based on this true or false condition. If property "private" is true, the result is routed to the "PutFile: Private" processor. If not then it is routed to the "PutFile: Work" processor. The "PutFile" processor saves the file to a given folder. I have setup two folders, one for the work related tweets and one for private ones. The property "Directory" defined where the file shall be stored. That's it. To summarize: I retrieve private and work related tweets, extract the information I am interested in and store it in different folders.
0 Comments
Two days ago I saw a tweet about Apache Nifi, got curious and had a deeper look into it. It immediately looked interesting to me and so I spent a couple of hours understanding the basics. I read through the excellent documentation and different posts. As with any new tool, at the beginning there are many open questions.
|
AuthorUwe Geercken Categories
All
Archives
September 2020
|