Can dynamic queries be written in U-Sql

1.2k Views Asked by At

I want to write same query for multiple files. Is this possible to write dynamic query in U-SQL or is there any way to eliminate re-writing of same piece of code like

Select count(*) as cnt from @table1;
Select count(*) as cnt from @table2;

can be replaced to

Select count(*) as cnt from @dynamic

where @dynamic = table1, table2

3

There are 3 best solutions below

3
On

(Azure Data Lake team here)

Your question mentions reading from files, but your example shows tables. If you do really do want to read from files, the EXTRACT statement supports "File Sets" that allow a single EXTRACT statement to read multiple files that are specified by a pattern

@data =
    EXTRACT name string,
            age int,
            FROM "/input/{*}.csv"
    USING Extractors.Csv();

Sometimes, the data needs to include the filename the data came frome, so you can specify it like this:

@data =
    EXTRACT name string,
            age int,
            basefilename string
            FROM "/input/{basefilename}.csv"
    USING Extractors.Csv();
0
On

I use a custom CSV extractor that matches columns to values using the first row in the CSV file.

Here is the Gist to be added as in code behind or as a custom assembly: https://gist.github.com/serri588/ff9e3047d8341398df4aea7557f0a82c

I made it because I have a list of files that have a similar structure, but slightly different columns. The standard CSV extractor is not well suited to this task. Write your EXTRACT with all the possible column names you want to pull and it will fill those values and ignore the rest.

For example:

Table_1 has columns A, B, and C. Table_2 has columns A, C, and D.

I want A, B, and C so my extract would be

EXTRACT 
  A string, 
  B string, 
  C string
FROM "Table_{*}.csv"
USING new yourNamespace.CSVExtractor();

Table 1 will populate all three columns, while Table 2 will populate A and C, ignoring D.

0
On

U-SQL does not provide a dynamic execution mode per se, but it is adding some features that can help with some of the dynamic scenarios.

Today, you have to provide the exact schema for table type parameters for TVFs/SPs, however, we are working on a feature that will give you flexible schema parameters that will make it possible to write a TVF/SP that can be applied to any table shape (as long as your queries do not have a dependency on the shape).

Until this capability becomes available, the suggestions are:

  1. If you know what the possible schemas are: Generate a TVF/SP for each possible schema and call it accordingly.

  2. Use any of the SDKs (C#, PowerShell, Java, Python, node.js) to code-gen the script based on the schema information (assuming you are applying it to an object from which you can get schema information and not just a rowset expression).