Config file for input and output folder location

511 Views Asked by At

I have multiple U-SQL scripts and I am using filename variable at the top of each U-SQL script. Is there any way we can define input and output folder to any config file and read that variable, constant or any thing to use them with Extract and OUTPUT

@tab1 = 
    EXTRACT f1 string,
            f2 string
    FROM **<path from config file>**
    USING Extractors.Csv(skipFirstNRows:1);
2

There are 2 best solutions below

0
On

It is possible to run parameterised U-SQL queries using Azure Data Factory as per here and here. Example start of pipeline where input and output folders are parameterised:

{
    "name": "ComputeEventsByRegionPipeline",
    "properties": {
        "description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
        "activities": [
            {
                "type": "DataLakeAnalyticsU-SQL",
                "typeProperties": {
                    "scriptPath": "adlascripts\\SearchLogProcessing.txt",
                    "scriptLinkedService": "StorageLinkedService",
                    "degreeOfParallelism": 3,
                    "priority": 100,
                    "parameters": {
                        "in": "/input/SearchLog.tsv",
                        "out": "/output/Result.tsv"
                    }
                },
...

You could read your config file when deploying the ADF job, eg using Powershell.

0
On

Currently U-SQL supports the notion of a base URI but that is currently hard-coded to the default ADLS account's root. You can file a request to set the base URI to a different value at http://aka.ms/adlfeedback or use a bit of script parameterization to build this yourself. Since a script can have many input or output locations, making this a system config option would be adding too much complexity.

You can parameterize your U-SQL script as follows:

// if you want to parameterize the whole path
DECLARE EXTERNAL @in = "/default/input.csv";
// if you have standard output path that you want to optionally overwrite and a user specified output file
DECLARE EXTERNAL @outdir = "/defaultoutput";
DECLARE EXTERNAL @outfile = "output.csv";
DECLARE @out = @outdir + "/" + @outfile;

@data = EXTRACT .... FROM @in USING ....;
...
OUTPUT @res TO @out USING ...;

The above can also be wrapped into a procedure and stored in the account.

Note: We are working on a capability to group declarations and references without U-SQL statements and store them in meta data for future reuse. That will help in your scenario to cut down on the "boiler-plate".

On submission you can prepend the actual paths with

DECLARE @in = "/actual/input.csv";
DECLARE @outfile = "myoutput.csv";

//... rest of script or invocation of Procedure.

The ADF example in wBob's reply is doing the prepending for you. Other tools and the SDKs are currently working on defining their parameter passing model (beyond you doing the prepending yourself).

Does this address your need?