How do I store a single line variable for use in later lines in a Pentaho teapot?

Question

How do I store a single line variable for use in later lines in a Pentaho teapot?

I need to process a table that has multiple levels of aggregation. This is mostly ok, but in one case I need to use information from the highest aggregation level in combination with information from the next aggregation level. Here's an example:

Title,         Platform,    Quantity, Revenue
"Some title",          ,    100,      1000.00
            , "Platform A", 12,       120.00
            , "Platform B", 20,       200.00
            , "Platform C", 68,       680.00

"Some other title",       , 20,       200.00
            , "Platform B", 20,       200.00
...

The first entry has the title "Some Title" and aggregates for quantity and revenue for all platforms. This is basically what I need to read. However, in one case, say for "Platform B", I need to get specific values for that platform, and I need to know what the header is for those values.

Does anyone know how to do this?

+3

pentaho kettle data-integration pdi

Brian Dec 15. 14 at 13:56

source to share

2 answers

Codek · Answer 1 · 2014-12-16T08:29:57+0000

I raised a jira for this:

http://jira.pentaho.com/browse/PDI-13296

Because it is not immediately obvious directly.

I can only think of doing this in a "modified javascript step" little piece of code that remembers the previous header values and then replaces it when the header is null.

This means your data is always sorted that way, and it also assumes that your first row always has a value.

If you have something that you can group (i.e. some additional attribute that you don't show?), Then you can use the "pass all lines" option in this step and get the value "first is not empty "for group per aggregate for header field.

bcayer · Answer 2 · 2016-05-04T16:43:29+0000

To me, it looks like you are trying to flatten your data (sort of).

I may be wrong, but here's what worked for me with a similar dataset:

(Assuming your data is already fine as shown above)
(Marker) Formula Step: IF ([TITLE] = ""; 1; 0) - These flags are 0 at the beginning of a new group
(Seq) Sequence step: just a sequence starting at 1
(SeqCounter) Add value fields, change the sequence step: use the "Marker" field as the change field.
(Factor) Calculator field: A * B; SeqCounter * Marker
(Group) Calculator field: A + B; Seq + Factor

This SHOULD give you a GROUP field which tells you which group each row belongs to.

Then I just split the data and rejoined it using GROUP as the JOIN field.

I know this is an old post, but since I didn't find a solution when I searched, I thought I'd put it down for others to find.

How do I store a single line variable for use in later lines in a Pentaho teapot?

More articles: