How to select a set of fields from input data as an array of repeated fields in beam SQL

612 Views Asked by At

Problem Statement:

I have an input PCollection with following fields:

{
   firstname_1,
   lastname_1,
   dob,
   firstname_2,
   lastname_2, 
   firstname_3,
   lastname_3,
}

then I execute a Beam SQL operation such that output of resultant PCollection should be like

 ----------------------------------------------
   name.firstname |  name.lastname | dob
 ---------------------------------------------- 
      firstname_1 |  lastname_1    | 202009
      firstname_2 |  lastname_2    | 
      firstname_3 |  lastname_3    |
-----------------------------------------------

To be precise:

array[
    (firstname_1,lastname_1,dob),
    (firstname_2,lastname_2,dob),
    (firstname_3,lastname_3,dob)
]

Here is the code snippet where I execute Beam SQL:

PCollectionTuple tuple=
    PCollectionTuple.of(new TupleTag<>("testPcollection"), testPcollection);

PCollection<Row> result = tuple
    .apply(SqlTransform.query(
        "SELECT array[(firstname_1,lastname_1,dob), (firstname_2,lastname_2,dob), (firstname_3,lastname_3,dob)]"));

I am not getting proper results.

Can someone guide me how to query an array of repeated field in Beam SQL?

2

There are 2 best solutions below

1
Jayadeep Jayaraman On
2
Kenn Knowles On

Your SQL query has a few errors.

  1. You have named the input to the SQL query testPcollection. Your SQL query does not select FROM testPcollection. Let us assume you meant it to be FROM testPcollection.
  2. You use the syntax (firstname_1, lastname_1, doc) in both your expected output and your query. This is not any valid SQL expression.