Accessing sql server data into SparkCLR

618 Views Asked by At

How data can be fetched from SQL Server in SparkCLR?

2

There are 2 best solutions below

1
On

My recommendation is to use JDBC to connect to sql server then query against the Dataframe.

0
On

You could use the following SparkCLR code as a reference to use C# for loading Spark DataFrame from the data in SQL Server, Azure SQL Database or any other JDBC compliant datasource.

        //C# sample to load SQL Server data as Spark DataFrame using JDBC
        var sparkConf = new SparkConf();
        var sparkContext = new SparkContext(sparkConf);
        var sqlContext = new SqlContext(sparkContext);
        var dataFrame = sqlContext.Read()
            .Jdbc("jdbc:sqlserver://localhost:1433;databaseName=Temp;;integratedSecurity=true;", "xyz",
                new Dictionary<string, string>());
        dataFrame.ShowSchema();
        var rowCount = dataFrame.Count();
        Console.WriteLine("Row count is " + rowCount);

Few things to note:

  • This sample code uses Microsoft JDBC driver. If you use a different driver or JDBC datasource you need to update the url
  • You need to include the driver jar file when submitting your SparkCLR job

SparkCLR project for this sample is available @ https://github.com/Microsoft/SparkCLR/tree/master/examples/JdbcDataFrame