- Data from geonames.org
- Apache Hadoop - doing the mapreduce task on the data
- JaRE - my java ruleengine to dynamically change the business logic
- Groovy script - to format the mapreduce result and merge it with an Apache Velocity (HTML) template
- Highcharts: to show the results in a web browser
The idea is to process data and then finally display it on a web page, as shown below. I do not want to hardcode any logic of which placenames to display on the web page. Instead, I use the Business Rules Maintenance Tool - a web application - to centrally define the business logic of what data to process.
The type of placenames display above - per country - are:
GRVC = coconut grove - a planting of coconut trees
GRVO = olive grove - a planting of olive trees
GRVP = palm grove - a planting of palm trees
GRVPN = pine grove - a planting of pine trees
On the Hadoop mapreduce side I have created a process that is generic in a sense that it processes the data, but it does not define which data is processed. The map task includes a reference to the JaRE ruleengine. So the ruleengine is used to determine which data should be considered and which data (rows) not. And so the rule logic is maintained outside the code.
Here is a snippet from the code:
The other part of the puzzle are the business rules themselves. I have defined a project named "Mapreduce Countries" in the Business Rule Maintenance Tool - a web application that is freely available.
So one can say that in this case the ruleengine and the rules are used as a simple filter mechanism to decide if to process the data or not.
Once the business logic is defined, you can export the complete project to a single file which can then be used by the ruleengine which is hooked into the mapreduce job.
Next thing is to run the mapreduce process. It processes the data from the geonames data file (CSV) and creates the result. The result is then copied from HDFS to the local filesystem. At this point a groovy script runs which formats the data, so that Highcharts can display the data. Highcharts makes it easy for developers to set up interactive charts in their web pages. To produce the chart I use an Apache Velocity template. This is a blueprint of a HTML page, but without the data (placeholders instead). Groovy processes/formats the data and then merges it with the Velocity template. The result is a web page as shown at the top of this article.
That's it! Now I can change the business logic to display different data and I do not need to touch my mapreduce code. You can create very complex business rule logic. For example find data rows with the placenames mentioned above but only for selected countries or in a specific geo fence or depending on the timezone. The possibilities are unlimited. The web tool allows you to create any complex logic. This is achieved by combining rules and subgroups using "and" and "or" conditions.
In the rule logic you can evaluate the data using checks such as: "is equal", "is greater/smaller", regular expressions, mathematical calculations, soundex algorithm, check is the data is not null, not empty, is in a list or is between certain values and much, much more.
And - very important - when running the data through the ruleengine, you can also update the data. You can apply actions which do certain calculations or modifications to the data such as: mathematical calculations (plus, minus, multiply, devide, sin, cos, tan, modulo, square root, etc), set values, sum field values, upper-/lowercase, percentage, substring, append/prepend values and more.
The ruleengine is extendable: you can create additonal checks that are used to evaluate the data and you can create additional actions that modify the data according to your needs.
If you think about this setup, you will see that it separates the IT code from the business logic. They are not mixed which is often the case. Because they are separated you can now have IT experts manage the map reduce job and the business rules logic is maintained by business experts. This is a clear separation of responsibilities and makes IT code cleaner. A major benefit for agility and quality!
The ruleengine and the web application are open source. So go ahead and integrate it into your Java projects, mapreduce tasks or your web application. Everything is available on github. Inclusing documentation, presentation and examples.
Carpe diem.
Uwe