The preview window shows for each row how many groups passed or failed and how many rules passed or failed. Because rules and groups can be connected using "and" and "or", it may happen that some rules fail but the group as a whole passes. Based on the results of the rules you can apply one or many actions to the row, such as updating a field or calculating something and applying it to another field.
Below you see the output (preview) of a transformation running the rule engine. The input data is a list of airports, their names and elevations. In the background you see a rule group with two rules. They are connected with an "and" condition. The first one checks that the Field_001 (first field in the csv file) does not equal "KABI" and the second rule checks that the elevation - Field_002 - of the airport is greater or equal to zero.
The preview window shows for each row how many groups passed or failed and how many rules passed or failed. Because rules and groups can be connected using "and" and "or", it may happen that some rules fail but the group as a whole passes. Based on the results of the rules you can apply one or many actions to the row, such as updating a field or calculating something and applying it to another field.
0 Comments
The ruleengine step is a regular step for the Pentaho PDI tool. You simply drag it on the canvas like any other step and connect it to the stream.
Rules are defined in external files. When you run PDI the rules are parsed and executed against each row of the dataset. Rules are combined into rule groups. This allows you to connect the rules in a flexible way using "and" or "or" conditions. Many checks you might want to run against the data are predefined: check for equality, greater then, less then, not null, is in list, is null, is empty, etc. There are also other checks such as using regular expressions or soundex. A check will check two fields against each other or a field against a desired value. The ruleengine step can output data to one or two output steps. The main output step adds a couple of fields to the stream showing the statistics of the results of the ruleengine: how many rules failed, how many rule groups failed, etc. The rule results step shows all the details of what happened when the rules were executed. For each row and rule an output row is created so that you can see in detail which rules failed and which ones passed. Additionally rule results can trigger actions. So e.g. you can concat fields, set field values, prepend data to fields and then update another field. The ruleengine is open source and free. It is also extendable: you can write your own actions if you need more complex ones or define other checks. Because it is written in Java, you have all You find yourself in the situation that your ETL processes in Pentaho PDI contain a lot of logic in various places? Sometimes you feel lost where you should search for the logic you configured inside transformations? Users always call you and tell you to change the logic of how data is processed, because things have changed?
There is a solution to these problems. Define all those rules externally - outside PDI - and use the ruleengine step available in the marketplace to process the rules inside the transformations. The advantage? The rules are not anymore a part of the transformation and can be maintained by e.g. the business user, who knows the rules best. If changes have to be made, it's the users who will have to do it and you don't have to touch your code (transformation). It allows to centralize the rules and logic which is good for ETL processes, interfaces or other processing tasks. Use the ruleengine to test/check your data, you can use it for filtering data that you want or not want to keep and you can modify the data based on the results of the rules. And keep in mind that all rules are outside PDI. Changes to the rules will not require you to change the transformation. Apart from that, the ruleengine can be used also outside of PDI. It is a standard java library which can be embedded in websites, applications and can run as a server/client technology, also inside a docker container. |
AuthorUwe Geercken Categories
All
Archives
September 2020
|