Blog Archives - datamelt blog

Ruleengine - Division of Responsibilities

24/2/2016

For a while now I try to promote the idea of seperating the responsibilities in application maintenance. Many applications contain a mix of IT logic and Business logic. So the technical logic such as e.g. unzipping files, checking servers or folders, loading configuration files, accessing databases and much more is mixed with logic from the business: adjusting field values, manipulating data, calculating values.

So what is the problem? When you mix your regular IT code or processes with Business logic, then you - the IT expert - will always be the responsible to change it. Your responsilbe for changing both, because both is in IT related code or processes and the business does not understand much of that.

It is completely intransparent to the business user where the business logic is located in the middle of all the IT related stuff. Mixing of the logic is also a quality issue: The IT expert is not necessarily a business expert so changes to the IT system can go wrong when it breaks business logic that is in the middle of the development work.

Already in a moderately complex system, when you change IT related logic - e.g. application logic or an ETL process - then this might break the business logic. The other way around - if the business comes up with new or changed business logic - the business expert has no idea where that logic should end up in the IT process. So it is the IT expert to implement it. And it does not help a lot having the business user sitting next to you and watch your ETL flows, your database tools or your IDE. Changes to one or the other part of the system may break the other part. And IT will have to fix it in any case.

So this is what the screenprint below demonstrates. Changes to the system always require IT work, because the business delegates the work to IT - they can not do it!

Bild

So why do we create this mixture of IT and business logic? Well sometimes the simple answer is, that in IT we have the best analytical thinking and the required tools. Another reason is that many times we are sitting in the middle: between a source system that can not be changed and a target system that takes a long time to change or is expensive to change (meaning it also can not be changed). ETL processes are a good example for this: IT has the tools and they are flexible, scalable and configurable. If source and target can not be changed, IT in the middle can do so. Many times IT is correcting, enhancing or streamlining data, because bad data comes from source systems and as indicated those are hard to change or the business processes capturing the data can not be changed.
IT code mixed with business logic is repeatedly not able to handle time constraints of the rules. Rules - e.g. for customers or contracts - sometimes have a validity date. I time when they are valid. Usually IT has to take care of that by changing the code at the right time to get the correct result at the correct time.

But mixing IT and business logic is not a good idea. It does not draw a concrete line between the responsibilities. The responsibilities should really be divided: IT is responsible for the IT logic and the technical implementation, support, architecture and the business defines the business logic. Both experts are responsible for that part that they know best.

When there is a clear division, then both parts can be adjusted seperately: IT changes the code or process, does the testing and migrates everything and it does not affect the business logic. Business logic changes are implemented seperately, are not mixed up with IT code and don't influence the IT part of the process.

So how can this be achieved? Use a ruleengine! The idea is that there is an application that runs (or uses) a ruleengine. The ruleengine executes business rules. And the rules are defined in a seperate tool - external to the application. The application does for what it was built and only uses a set of rules and these are defined by the business. We have a clear distinction of responsibilities as shown below.

Bild

Changes are applied quicker because both experts are professionals in their domain. The IT application or process is not intermingled with business logic so IT can concentrate on a cleaner design without dependencies to business logic.
Business experts can define and update the business rules it a system that is made for a business user. They are not distracted or confused by all the IT related tools and processes around the business logic and thus can concentrate on the rules which are naturally anyway in their domain.

There will be situations of course, when both experts have to work close together, because changes affect both parts of the system. Don't get me wrong: a good and close communication between the two units is always required and an advantage. The division of responsibilities as described above sometimes allows to work independantly from each other. But communication is of course always required.

So the two experts at times have to work closer together, because the business logic and the definition of business rules can become quite difficult and requires the skills of an IT expert that thinks in "if-then-else" constructs. Business rules rarely come alone. Usually it is a group of rules that define a condition that has to be met. Those are constructs and combinations of "if", "then", "or" and "and". This is the time where IT and business together find the correct solution to the problem.

Having said all this - and I could continue for a while - I vote to divide the responsibilities between business and IT and have each one do what she or he is best at and enjoys doing most. The ruleengine JaRE I have written allows you to do this. It comes with a webbased interface for the user to manage the rules. The engine can be embedded in your Java code, in your ETL process (there is a plugin for Pentaho PDI) or runs standalone in server mode. It's all there - free for everyone. Go and download it, test it and contribute to make it better.

Have a look on http://github.com/uwegeercken

Apache Nifi - Starter - Part 2

21/2/2016

So here is the second part of the starter for Apache Nifi.

The sample I worked from retrieves tweets from Twitter, pulles out some attributes and makes a decision to store the tweet or not based on these attributes.

While discovering and learning, I wanted to make a slightly more complicated flow. Bare with me, "complicated" is of course only relative - I have used Nifi for only two days...

You can see the result below. Here are the key points I wanted to achieve:
- Retrieve tweets only for some selected topics
- Devide tweets between those that are of my personal interest and those more work related
- Pull out those attributes that I later want to store in files (I do not want the whole tweet with all its attributes)
- Store the Json representation of the tweets in seperate folders for private and work related tweets.

Bild

Apache Nifi works with processors and connections between them. That's what you see on the flow above. Processors are sort of puzzle pieces that do a distinct task and then you connect them together to design a flow.

It starts in the upper left hand corner. There is a processor named "GetTwitter" and another one next to it. When you select "Configure" you get the dialog displayed below. The properties shown in bold are mandatory - they are security details required to access the tweets (get them from Twitter). I have entered four terms seperated by comma under "Terms to Filter on". Together with "Filter Endpoint" for the property "Twitter Endpoint", so that I receive only tweets that contain those terms.

As described above I have two of these "GetTwitter" processors. One retrieves tweets for the terms shown below and the other one for different terms - I wanted to separate work related tweets from private ones.

The screenshot shows the configuration dialog for the selected processor.

Bild

Now the questions was how to use a common processing flow for both "GetTwitter" processors - not to duplicate things - and yet being able to devide the results into seperate files later on. I used the "UpdateAttribute" processor. It allows to assign properties, so I am tagging my different flows: they get a property named "tweettype" of "privatetweet" versus "worktweet".

Bild

After this, the both path flow into the "EvaluateJsonPath" processor. This processor pulls out some attributes from the tweet.

Bild

Next comes a "RouteOnAttribute" processor. It evaluates, if the tweet actually has a message assigned. So tweets without a message (an empty message) will be dropped. It uses the Nifi Expression Language to make the evaluation.

Bild

Twitter tweets - in the form of Json data - contain a lot of information. Information I don't want to store. I am only interested in the information about the user and the message itself. So I had to look for a way to eliminate the rest of the information in the flow.

The "AttributeToJson" processor allows me to do this. I specified a list of attributes which I wanted to keep for the property "Attribute List". It is important to set the property "Destination". By setting it to "flowfile-content" I am overwriting the content of the Json file (the tweet) with these attributes.

Bild

I have setup my main logic by now. Now I want to store the results in two seperate folders - one for private tweets and one for work related tweets. I will evaluate the property "tweettype" which I assigned earlier and make the decision to route the data based on the value of this property.

I call the new property "private" and use a Nifi expression to check if the "tweettype" contains the value "privatetweet". I use the "RouteOnAttribute" processor for this.

Bild

The property "private" will result in a true or false condition and when I connect the processor I can route the results based on this true or false condition. If property "private" is true, the result is routed to the "PutFile: Private" processor. If not then it is routed to the "PutFile: Work" processor.

Bild

The "PutFile" processor saves the file to a given folder. I have setup two folders, one for the work related tweets and one for private ones. The property "Directory" defined where the file shall be stored.

Bild

That's it. To summarize: I retrieve private and work related tweets, extract the information I am interested in and store it in different folders.

I can now run my process and after a short time the files appear in my output folders. Here is an example:

{
"twitter.handle": "vilo836",
"twitter.msg": "RT @couchbase: Independent benchmark of #Couchbase and #MongoDB: https://t.co/705tg8fZU4",
"twitter.user": "victor lopez"
}

Apache Nifi is a flow management tool. It allows to easily design data flows and monitor the execution and queues, evaluate what data went where and make changes in realtime to the flow. I recommend to spend some time with the tool as it has unique features, runs out of the box and is well documented. My immediate reaction when I started to understand Nifi was: "why don't we do that?".

Hope you enjoyed the post. Carpe diem.

Apache Nifi - Starter - Part 1

21/2/2016

Two days ago I saw a tweet about Apache Nifi, got curious and had a deeper look into it. It immediately looked interesting to me and so I spent a couple of hours understanding the basics. I read through the excellent documentation and different posts. As with any new tool, at the beginning there are many open questions.

As I like to play around and discover - learn by doing - I downloaded a so called template - containing a sample flow to process Twitter posts - which I used to slowly understand how one can use Apache Nifi.

So the first thing was to discover what the sample flow does: It gets tweets from Twitter and then pulls out some attributes from the Json representation and outputs the Json file based on a decision made based on the attributes pulled out.

This was a good starter for me: not too complex and something I can easily redo to learn more and discover. So I changed and extended the dataflow a bit while I was discovering the flow and while many questions came up in my mind.

I'm not gonna write here about how to use Nifi - there are a couple of really good resources: One is the documentation on http://nifi.apache.org. There are also some good video tutorials on Youtube from KISSTechDocs: https://www.youtube.com/watch?v=LGXRAVUzL4U&list=PLHre9pIBAgc4e-tiq9OIXkWJX8bVXuqlG - I can recommend those to get started. The other place I found with interesting articles is: http://www.nifi.rocks/

So hang on and read the second part to see how I developed the sample Nifi flow to a more complex one.

Ruleengine: Application mode versus Server mode

10/2/2016

There are two distinct modes available with the Ruleengine JaRE:

1) Application mode:

Run the ruleengine standalone respectively embeded in your Java application. Of course you can also use Java related languages such as e.g. Groovy.

If you use the Pentaho ETL Tool (PDI aka Kettle) you can use the ruleengine plugin to run the ruleengine inside your transformation.

In this mode all results and logging is available inside the application that utilizes the ruleengine. Of course it also uses the CPU and memory of the clients machine.

2) Server mode:

When you run the ruleengine in server mode, it runs on a defined host (also virtual or containerized of course) and a defined port. Use a socket connection to connect to the ruleengine.

If you use the Pentaho ETL Tool you can use the ruleengine client plugin to connect to the host that runs the ruleengine. In this case the ruleengine returns the minimum required result details to the client - compared to the application mode - and the details are conserved and logged on the server.

And of course the ruleengine is using the resources of the server for the business rules computations and not the resources of the client.

Both modes have their advantages and you can freely choose which one fits better your needs. The ruleengine is lightweight and the rulemaintenance web application allows to easily construct complex rule logic and groupings of rules. All free software waiting for you to give it a try.

uwe geercken, 2014-2020