DataStage Column Generator Stage not working working properly

59 Views Asked by At

I need to add a column with a progressive number to a table with two columns: ID and Amount. The resulting table should be like this:

ID Amount Progressive n
10 10000.00 1
21 9.67 1
21 11.27 2
21 9.46 3
21 12.27 4
21 90780.51 5
21 200.00 6
21 20.18 7
21 1045.04 8
21 3477.98 9
21 454390.00 10
21 24.50 11
21 54.76 12
21 12000.00 13
5 5.99 1
45 160000.00 1
45 29.00 2
45 1.98 3
45 2.56 4
56 659034.90 1
56 12000000.00 2
56 4781.35 3
56 1065.05 4
56 29.50 5
56 92000.00 6
56 10.00 7
56 4.35 8
56 52000000.00 9
56 8763.57 10
56 3065.05 11

But I get the wrong output. Something like this:

ID Amount Progressive n
10 10000.00 1
21 9.67 1
21 11.27 2
21 9.46 3
21 12.27 4
21 90780.51 5
21 200.00 6
21 20.18 7
21 1045.04 8
21 3477.98 9
21 454390.00 10
21 24.50 11
21 54.76 12
21 12000.00 13
5 5.99 1
45 160000.00 1
45 29.00 2
45 1.98 3
45 2.56 4
56 659034.90 5
56 12000000.00 6
56 4781.35 7
56 1065.05 8
56 29.50 9
56 92000.00 10
56 10.00 11
56 4.35 12
56 52000000.00 13
56 8763.57 14
56 3065.05 15

When ID = 56, the progressive number starts with 5, instead of 1.

I tried to add a Sort Stage, ordering the table by ID, but I get almost the same error. It changes the order of the ID column, but it still assign the wrong number

ID Amount Progressive n
5 5.99 1
10 10000.00 1
21 9.67 2
21 11.27 3
21 9.46 4
21 12.27 5
21 90780.51 6
21 200.00 7
21 20.18 8
21 1045.04 9
21 3477.98 10
21 454390.00 11
21 24.50 12
21 54.76 13
21 12000.00 14
45 160000.00 1
45 29.00 2
45 1.98 3
45 2.56 4
56 659034.90 1
56 12000000.00 2
56 4781.35 3
56 1065.05 4
56 29.50 5
56 92000.00 6
56 10.00 7
56 4.35 8
56 52000000.00 9
56 8763.57 10
56 3065.05 11

When ID = 21, the progressive number starts with 2, instead of 1.

I also tried to add the Amount column to the ordering in the Sort Stage, but it just changes the row where the error appears.

It seems to me that the Column Generator Stage gets the partition wrong, but I don't know how to fix it.

Just ask if any more information is required.

1

There are 1 best solutions below

0
Justus Kenklies On

To add a column, I would suggest to use a Transform Stage instead of a Column Generator Stage.

You can use stage variables within the Transformer to generate the values. For simplification, set the transform stage (or the entire job) to run in sequential mode.

The data on the input link needs to be sorted.

If you need to run in parallel, you could use @PARTITIONNUM, @NUMPARTITIONS and @INROWNUM to deal with partitioned processing, but the following should also work in parallel without that, as long as the input link is partitioned on the ID column (only) and of course also sorted.

Example

Add a sort stage followed by a transform stage to your job. in the sort stage, partition and sort the data by ID. Edit the transform stage like this:

Input Link

DSLink2
ID
Amount

Stage Variables

Derivation Stage Variable Initial value
svID svPreviousID (not needed)
DSLink2.ID svID -1
If svID = svPreviousID Then svProgN +1 Else 1 svProgN (not needed)

Output Link

Derivation Column Name
DSLink2.ID ID
DSLink2.Amount Amount
svProgN ProgressiveN

Process Explanation

First Row (ID = 5):

  1. svID is -1 (init. value), setting the svPreviousID to -1.
  2. DSLink2.ID will set svID to 5
  3. The condition svID = svPreviousID is false, thus svProgN is set to 1

Second Row (ID = 10):

  1. svID (from previous step) sets svPrevID to 5
  2. DSLink2.ID sets svID to 10
  3. Condition is false, sets svProgN to svProgN + 1 which is 2

Third row (ID = 20):

  1. svPreviousID := svID (10)
  2. svID := DSLink2.ID (20)
  3. (condition is false) svProgN := 1

Fourth row (ID = 20):

  1. svPreviousID := svID (20)
  2. svID := DSLink2.ID (20)
  3. (condition is true) svProgN := svProgN + 1 ( which is 2)

and so on...