Reusing Transforms with Different Data in Pentaho Kettle Data Integration

I am working with Pentaho Kettle (PDI) and I am trying to control a flow that has multiple transforms that should work like those where functions. I'll be more specific. I have created some transform that makes some changes to several fields of some csv file. Each transformation affects only one field of the csv file. Therefore, the first transformation should change values, for example, only from the first column of the file, the second transformation should work on another column, etc. Since it's wasted time creating each individual transform, I would like to have ones that can be reused for other jobs / transforms working with the same values. If you need an example, I have created tranformation that improve the quality of phone numbers (and many others). Here the "general"main work idea: enter image description here

My problem here is to pass data through transforms. To do this, each time I put data into the results table using the Copy Rows From Result step. After making all the changes, I put the data into the results table using the Place Rows into Result step. This is just a sample (of course, real transformations are more complicated than this)

enter image description here

As you probably know, we need to specify the nearest fields in "Copy rows from result", so if I have to use this transform in another job / transform that works with a differet file, I need to change the schema Step "Copy rows from result ".

Maybe there is another way to move the data stream that might be easier than this. I've also looked at using parameters, but I don't know if it is possible to pass them using fields coming from the result tables. And here's another question: "is the result table the only way to return values ​​from a transformation?"

I also decided to do all the transformations in parallel, inside the transform, passing them only the value and key of interest, and then fuse all the individual fields with the "join step". This is also a synchronization problem. So, anyone who knows a good way to solve this problem? ... I think it exists as a standard method for all this ...

+3


source to share


1 answer


The solution to my problem is based on using the "Match" step. If you are working in an assignment, we can call all transformations inside another transform and name those that have a "Match" step. Here's a sample:

enter image description here

At each step of this kind, we have to indicate the input fields that we want to change. We can only transfer those. Here is an example of the Enter tab for this step:

enter image description here



As you can see, we have to specify the field that it caused in the main transform, and we can change it to adapt it when sub-transforming (in this case, the "phone" field will become "PHONE"). We also need to specify the output fields in the Output tab, just like we did for input.

The sub-transformation looks like this:

enter image description here

To get the input field, you must use the Configure Data Input step, and to input the changed fields in the output, you must use the Display Output Specification. In the "Mapping Input Specification", you must specify which input fields will be the same at all times that you are going to use this transformation. The adaptation to these fields must be done outside of the main transform, so you can reuse the sub-transform that doesn't change anything.

+4


source







All Articles