Merge Rows Based on index Order U-SQL

128 Views Asked by At

I would love if someone could help me with this(updated):

I have a table that looks like this:

 Id      | name |          countryId           Sequence
--------+--------+------------------------+-------------------
 23     | John   |     102                |       1
 45     | Luis   |     102                |       1
 23     | Alex   |     102                |       2
 45     | James  |     102                |       2

And I want to merge similar ids into a single row using the ARRAY_AVG, so I use this query

  SELECT DISTINCT
    b.Id
    String.Join(",", ARRAY_AGG(b.name)) AS names
  FROM table AS b
  GROUP BY b.Id;

However doing this returns the following table but the name has been sorted on ASC

 Id      | names           
--------+--------+
 23     | Alex,John   
 45     | James,Luis   

Since I want to create the names field as the name appears(based on index) so e.g John,Alex. so I tried this:

  SELECT DISTINCT
    b.Id,
    String.Join(",", ARRAY_AGG(b.name).OrderBy(p => b.Sequence)) AS names
  FROM table AS b
  GROUP BY b.Id, b.Sequence;

But I get the results but this time, it is a bit different

 Id      | names           
--------+--------+
 23     | Alex
 23     | John   
 45     | James
 45     | Luis   

The Sequence is ignored and the result isn't desired.

So I want to ask for the use the ARRAG_AVG properly in U-SQL.

1

There are 1 best solutions below

5
iamdave On

Is your dummy data at the start of your question accurate? If so, there is no difference in countryId values so there is no 'definitive' sort order for your name values when using your OrderBy.

Within SQL based languages, there is also no concept of data order when consuming or exporting data besides what is explicity specified in an order by. If you don't have a suitable, definitive sort order as defined by the data, then you cannot guarantee consistency over multiple executions.

In order to get your sort order dependant on the position in the source file, you will need to either augment your source data to include an index value, or create a custom exporter that will scan through the file line by line and add the index on read.


What you have to ask yourself is why you even need to do this in the first place? Concatenating strings like this into the one row is generally bad practice in set based languages and is very rarely appropriate.