How data can be fetched from SQL Server in SparkCLR?
Accessing sql server data into SparkCLR
639 Views Asked by 107 At
2
There are 2 best solutions below
0
skaarthik
On
You could use the following SparkCLR code as a reference to use C# for loading Spark DataFrame from the data in SQL Server, Azure SQL Database or any other JDBC compliant datasource.
//C# sample to load SQL Server data as Spark DataFrame using JDBC
var sparkConf = new SparkConf();
var sparkContext = new SparkContext(sparkConf);
var sqlContext = new SqlContext(sparkContext);
var dataFrame = sqlContext.Read()
.Jdbc("jdbc:sqlserver://localhost:1433;databaseName=Temp;;integratedSecurity=true;", "xyz",
new Dictionary<string, string>());
dataFrame.ShowSchema();
var rowCount = dataFrame.Count();
Console.WriteLine("Row count is " + rowCount);
Few things to note:
- This sample code uses Microsoft JDBC driver. If you use a different driver or JDBC datasource you need to update the url
- You need to include the driver jar file when submitting your SparkCLR job
SparkCLR project for this sample is available @ https://github.com/Microsoft/SparkCLR/tree/master/examples/JdbcDataFrame
Related Questions in C#
- How to call a C language function from x86 assembly code?
- What does: "char *argv[]" mean?
- User input sanitization program, which takes a specific amount of arguments and passes the execution to a bash script
- How to crop a BMP image in half using C
- How can I get the difference in minutes between two dates and hours?
- Why will this code compile although it defines two variables with the same name?
- Compiling eBPF program in Docker fails due to missing '__u64' type
- Why can't I use the file pointer after the first read attempt fails?
- #include Header files in C with definition too
- OpenCV2 on CLion
- What is causing the store latency in this program?
- How to refer to the filepath of test data in test sourcecode?
- 9 Digit Addresses in Hexadecimal System in MacOS
- My server TCP doesn't receive messages from the client in C
- Printing the characters obtained from the array s using printf?
Related Questions in SQL-SERVER
- Dynamic query creation with Array like implementation
- 'pyodbc.Cursor' object has no attribute 'callproc', mssql with django
- Driver com.microsoft.sqlserver.jdbc.SQLServerDriver claims to not accept jdbcUrl, ${SPRING_DATASOURCE_URL}: GitHub Actions
- PHP Laravel SQLServer could not find driver
- Upsert huge amount of data by EFCore.BulkExtensions
- How to locate relevant tables or columns in a SQL Server database
- Cannot delete SQL datafile (.mdf) as its currently in use
- Writing query in CTE returning the wrong output
- Group By Sum and without Group by sum Amount is different
- plan_handle is always different for each query in SQL Server Cache
- Adding a different string to a table fails
- The specified data type in the EF modelBuilder doesn't correspond to the one that is created
- SQL71561: SqlComputedColumn: When column selected
- How to Solve Error Associated with Trusted Authority
- SQL Server Data Model and Insert Performance
Related Questions in APACHE-SPARK
- Getting error while running spark-shell on my system; pyspark is running fine
- ingesting high volume small size files in azure databricks
- Spark load all partions at once
- Databricks Delta table / Compute job
- Autocomplete not working for apache spark in java vscode
- How to overwrite a single partition in Snowflake when using Spark connector
- Parse multiple record type fixedlength file with beanio gives oom and timeout error for 10GB data file
- includeExistingFiles: false does not work in Databricks Autoloader
- Spark connectors from Azure Databricks to Snowflake using AzureAD login
- SparkException: Task failed while writing rows, caused by Futures timed out
- Configuring Apache Spark's MemoryStream to simulate Kafka stream
- Databricks can't find a csv file inside a wheel I installed when running from a Databricks Notebook
- Add unique id to rows in batches in Pyspark dataframe
- Does Spark Dynamic Allocation depend on external shuffle service to work well?
- Does Spark structured streaming support chained flatMapGroupsWithState by different key?
Related Questions in APACHE-SPARK-SQL
- Spark load all partions at once
- Joining 2 pyspark dataframes and continuing a running window sum and max
- Understanding least common type in databricks
- Insert selective columns into pyspark dataframe
- Dataframe won't save as anything - table, global temp view or temp view
- Spark TBLPROPERTIES to sql query?
- How to groupBy on two columns and work out avg total value for each grouped column using pyspark
- Spark SQL repartition before insert operation
- Convert 3 letter month column into a month number in Databricks SQL
- Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 - When reading table in SQL
- How to sort a PySpark dataframe rows by the order of a list?
- How to read csv files in dbfs using Spark SQL only?
- Handle different date formats in Pyspark
- Insert Overwrite partition data using Spark SQL on MINIO table
- update value in specific row by checking condition for another column values in pyspark
Related Questions in MOBIUS
- 2sxc - Oqtane: Mobius Forms don't send mails
- Optimizing Complex Mobius Transformations on a Fragment Shader
- OneM2M, IOTKETI Mobius RETRIEVE Group Member ContentInstances
- OneM2M Authentication Server
- It is possible to render a mobius strip with a raytracer?
- Create dataframe from C# List - Spark for .NET
- How to call notebook or run jobs from C# in databricks using Mobius?
- Errors Updating Ruby Gems
- Coordinates of the edge of a Mobius strip
- Wrapping a given coordinates (x,y,z) in a twisted structure
- Submitting a 32-bit application to Spark via Mobius
- Apache Spark + Mobius Debug mode
- Mobius: Spark Sql : Cannot implicitly convert type 'Microsoft.Spark.CSharp.Sql.DataFrame' to 'System.Collections.IEnumerable'
- No remote Sparkclr jar found; please specify one with --remote-sparkclr-jar
- Spark application takes same time for execution having different different cores used each time
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
My recommendation is to use JDBC to connect to sql server then query against the Dataframe.