BigQuery SPLIT() ignores empty values

1.4k Views Asked by At

It appears that SPLIT() treats empty values as though they don't exist at all, which yields unexpected results.

For example:

SELECT
  NTH(3, SPLIT(values, ","))
FROM
  (SELECT "a,b,,d,e" as values)

returns "d", when I would expect it to return NULL. You can see how this would be problematic for several rows of comma-delimited text. One would expect the following query to return NULL and "c", but it doesn't:

SELECT
  NTH(3, SPLIT(values, ","))
FROM
  (SELECT "a,b,,d,e" as values),
  (SELECT "a,,c,d,e" as values)

Rather, it returns "d" and "d".

Is this behavior by design, can it be altered, or is there a better way to do what I'm doing?

2

There are 2 best solutions below

0
On BEST ANSWER

This is By Design behavior, and it is not specific to SPLIT function, but to REPEATED fields in general. BigQuery REPEATED fields cannot store NULLs (same behavior as in protocol buffers), therefore nothing that SPLIT does can make NULLs appear inside REPEATED fields.

2
On

As Mosha explains, obtaining NULLs from a SPLIT() operation is not possible. As an alternative, is getting empty spaces an option?

SELECT
  NTH(7, SPLIT(values, ","))
FROM
(SELECT REGEXP_REPLACE(values, ',,', ', ,') values FROM
  (SELECT "a,b,,d,e,,g" as values),
  (SELECT "a,,c,d,e,f,g" as values),
  (SELECT "a,,c,d,e,f,," as values),
  (SELECT "a,,c,d,e,f," as values),
)

Row f0_  
1   g    
2   g    
3        
4   null