How to allow LISTAGG overflow into new record in Snowflake

Question

How to allow LISTAGG overflow into new record in Snowflake

113 Views Asked by Debaser231 At 12 December 2023 at 22:30

I want to aggregate a list of strings together, but I know this list of strings will exceed the limit. This set of strings has an identifier associated with it. So a table might look like this

id | my_strings
______________
x1 | string{1}
x1 | string{2}
.
.
.
x1 | string{N}
x2 | string{1}

What I would love for a result is the following:

id | all_strings
------------------
x1 | string{1},...,string{M}
x1 | string{M+1},...,string{N}
x2 | string{1}

So essentially, if all the strings from 1 to N for the id x1 exceed the limit, it cuts it off at the largest possible M and then makes a new record also with x1 as the id where the "all_strings" column is the rest.

Is there anyway in standard Snowflake SQL code to achieve this? I know that the strings are never identical to one another and but the length of the strings is constant.

Any help would be greatly appreciated.

Original Q&A

There are 2 best solutions below

Simeon Pilgrim On 13 December 2023 at 03:32

using a Javascript user defined table function:

Change the 40 to the cutoff you want.

CREATE OR REPLACE FUNCTION "CHAR_SUM"(INS STRING)
    RETURNS TABLE (BUCKET FLOAT)
    LANGUAGE JAVASCRIPT
    AS '{
    processRow: function (row, rowWriter, context) {
      var nl = row.INS.length > 0 ? 
            (row.INS.length + (this.sum > 0 ? 1: 0)): 0;
      
      if(this.sum + nl > 40){ 
        this.bucket = this.bucket + 1;
        this.sum = 0;
      }
      
      this.sum = this.sum + nl;
      rowWriter.writeRow({BUCKET: this.bucket});
      
    },
    initialize: function(argumentInfo, context) {
     this.bucket = 0;
     this.sum = 0;
    }}';

then we can:

with data as ( 
    select 
        x.id,
        seq8() as seq,
        'string' || seq::text as str
    from table(generator(rowcount=>10))
    cross join (values ('x1'),('x2')) as x(id)
)
select *
from data, TABLE(char_sum(str) over (partition by id order by seq));

now we can to the aggregation, like so:

with data as ( 
    select 
        x.id,
        seq8() as seq,
        'string' || seq::text as str
    from table(generator(rowcount=>10))
    cross join (values ('x1'),('x2')) as x(id)
)
select 
    id,
    listagg(str, ',') within group (order by seq) as batch,
    length(batch) as qa
from data, TABLE(char_sum(str) over (partition by id order by seq))
group by 1, bucket
order by 1, bucket;

**Lukasz Szozda** · Accepted Answer · 2023-12-13T09:38:40.867000

It is possible to add subgroup to determine max number of elements per group:

-- max 5 elements
WITH cte AS (
  SELECT *,  CEIL(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY str) / 5) AS grp
  FROM t
)
SELECT ID, LISTAGG(str, ',') WITHIN GROUP (ORDER BY str) AS all_strings
FROM cte
GROUP BY ID, grp
ORDER BY ID, all_strings;

For input:

CREATE TABLE  t(id INT, str TEXT);

INSERT INTO t(id, str)
VALUES (1,'string01'),(1,'string02'),(1,'string03'),(1,'string04'),(1,'string05'),
(1,'string06'),(1,'string07'),(1,'string08'),(1,'string09'),(1,'string10'),(1,'string11'),
(2,'string12'),(2,'string13'),(2,'string14'),(2,'string15'),(2,'string16'),
(2,'string17'),(2,'string18'),(2,'string19');

Output:

ID  ALL_STRINGS
1   string01,string02,string03,string04,string05
1   string06,string07,string08,string09,string10
1   string11
2   string12,string13,string14,string15,string16
2   string17,string18,string19

How to allow LISTAGG overflow into new record in Snowflake

There are 2 best solutions below

Related Questions in SQL

Related Questions in SNOWFLAKE-CLOUD-DATA-PLATFORM

Related Questions in OVERFLOW

Related Questions in LISTAGG

Trending Questions

Popular # Hahtags

Popular Questions