I have to process a spreadsheet that has multiple levels of aggregation within it. Mostly, this is just fine, but in one case, I need to use information from the highest aggregation level in conjunction with information from the next aggregation level. Here's an example:
Title, Platform, Quantity, Revenue
"Some title", , 100, 1000.00
, "Platform A", 12, 120.00
, "Platform B", 20, 200.00
, "Platform C", 68, 680.00
"Some other title", , 20, 200.00
, "Platform B", 20, 200.00
...
The first record has a title, "Some Title", and aggregated quantity and revenue metrics for all platforms. This is mostly what I need to read. However, in one case, say for "Platform B", I need to retrieve the specific values for that platform, and I need to know what the title is for those values.
Anyone know how to get this done?
I raised a jira for this:
http://jira.pentaho.com/browse/PDI-13296
Because it's not immediately obviously straightforward.
I can only think to do it in a "modified javascript step" a tiny snippet of code that remembers previous values of title and then substitutes in when title is null.
That does assume your data is always sorted in this way, and also assumes your first row always has a value.
If you had something you could group by (i.e. some additional attribute you're not showing?) then you could use the "pass all rows" option in that step and get the "first non null" value for the group on the aggregate for the title field.