Subfield() for bulk dataset

86 Views Asked by At

I have a field called Subject Error which measures which subjects students earned below passing grade.

Currently, the value spits out with one raw value:

Count of 23: English
Count of 10: Math; Science 
Count of 2: English; Math; Social Studies 

How can I separate semi-colon from each subject so my analysis will show below:

Count of 24: English 
Count of 11: Math
Count of 10: Science
Count of 1: Social Studies 
1

There are 1 best solutions below

0
On

I think this is how you could do it:

/* ==== STEP 1 ==== */
[data]:
Load * Inline [
Subject Error
Count of 23: English
Count of 10: Math; Science 
Count of 2: English; Math; Social Studies
];


[data 1]:
Load
    [Subject Error]
  , [Start]
  , /* ==== STEP 4 ==== */ KeepChar([Start], '0123456789') as [Student Count]
  , /* ==== STEP 5 ==== */ Trim(SubField([End], ';')) as [Subject]
;
Load
    [Subject Error]
  , /* ==== STEP 2 ==== */ Mid([Subject Error], 1, Index([Subject Error], ':')) as [Start]
  , /* ==== STEP 3 ==== */ SubField([Subject Error], ':', 2) as [End]
Resident [data];

Drop Table [data];


/* ==== STEP 7 ==== */
Join ([data 1])
Load
    [Subject]
  , /* ==== STEP 6 ==== */ Sum([Student Count]) as [Total Student Count]
Resident [data 1]
  Group By [Subject]
;


[data 2]:
Load Distinct
    /* ==== STEP 8 ==== */ 'Count of ' & [Total Student Count] & ': ' & [Subject] as [Final]
Resident [data 1];

Drop Table [data 1];

Here's how this works:

  • Step 1: I load the data here. I'm just using an Inline table but you will load your data from a file/database/API presumably.
  • Step 2: I use the Index() function to find where the colon character is in the [Subject Error] field. That result is used in the Mid() function to grab just the Count of n:' part of the field. I name this new field [Start].
  • Step 3: I use the SubField() function to get the list of subjects that come after the colon character in the [Subject Error] field. I name this new field [End].
  • Step 4: Building off of the previous table in a preceding load, I then use the KeepChar() function to just grab the numbers from the Count of n:' string in the [Start] field so that I can Sum() them in Step 6. I name this new field [Student Count].
  • Step 5: I use the SubField() function to create a new record for each subject that gets split out on the semicolon character. Normally, you have the utilize the third parameter of the SubField() function to specify which split-out item to return but in the Data Load Editor you can omit that third parameter, which will result in a new row for each split-out item. (See Script Example 2 on the examples section of the Qlik Help page for this function.) I name this new field [Subject].
  • Step 6: In a separate table load, I use the Sum() aggregation function to sum up the values in the [Student Count] field. I group these summations on the [Subject] field using the Group By clause so that I get a total count of student for each subject. I name this new field [Total Student Count].
  • Step 7: That separate table gets Joined back in to the previous table so that I can use the new [Total Student Count] along with the other fields in the final step.
  • Step 8: Finally, in a new table, I use the ampersand operator & for string concatenation to combine the string Count of , the [Total Student Count] field, the colon character :, and then the [Subject] field. Note that I use the Distinct keyword in the load statement so that I am just getting one row per subject.

That should result in the desired output:

Screenshot of Qlik Sense table output