As you can read in my previous posts: it is never a good idea to mix your IT code with the business rules. That's what the ruleengine is good for: define and handle the business rules outside of your IT code - if the rules change, you change them in a central web tool and you don't have to touch your IT code. In your code you simply reference the ruleengine and the file that contains all business rules. That makes the code clearer, is a proper division of responsibilities (rules managed by the business - IT code managed by IT), makes IT code changes more agile and at the end enhances the overall quality.
As the ruleengine is written in Java, it makes a perfect match with Apache Hadoop. You may update/manipulate the data using the ruleengine, but for now the goal is to simply filter data. As rows of data are processed by the map reduce job, the ruleengine runs against the data and filters out those rows, that are not applicable (according to the business rules).
For example you have a large file with data for different customers and you want to mapreduce the data but only for a few customers. Or - another example - you have data from various websites and you want to filter data that does not fullfill certain requirements such as type of web browser, URL or origin of the web page.
Instead of (hard-)coding these rules in your mapreduce job you would tackle the task as follows:
- Use the Business Rules Web tool to define the business rules. This can be any complex logic. The tool helps you to define single rules and combine them to group of rules
- Export the business rules (project) from the web tool
- In your mapreduce job, reference the business rules engine and the exported file from the previous step, by adding a few lines of code.
- Depending on the result of the ruleengine for the current data (passed or failed) add the row to the context of the map part of the mapreduce job.
If you now need to change which data is filtered (the business comes with new requirements...), simply go back to the web tool and change the business rules (ideally have the business construct the rule logic). Then export them again and re-run the mapreduce job.
The creation of the rules file could also be automated, so that a new file is created and distributed at regular intervalls or based on a certain trigger/condition.
All parts, the web tool and the rule engine are open source, so you freely use them in your Java based projects. Go ahead and give it a try. Using the ruleengine will make your IT life easier, your code clearer and the business user has a cantral location where to manage or review the rules. In the web tool, the business user is NOT confronted with your IT code and it will be much easier for her/him to understand what the rules do because she/he is not distracted or disencouraged by a mixture of IT code and business rules. This is much more transparent to the user.
Carpe Diem.
Uwe