Connect AWS Glue to SQL Server

7.1k Views Asked by At

For a project, I have to move data from SQL Server to AWS Redshift.

For the ETL process, I'd like to use AWS glue and connect to my source (SQL Server), but I always get the following error message when testing the connection

Check that your connection definition references your JDBC database with correct URL syntax, username, and password. The TCP/IP connection to the host, port 1433 has failed. Error: "Connection timed out: no further information. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall

Port are correctly opened on my server and did sever test with other tools (PowerBI and SQL Workbench can connect to it)

Here is the connection string used :

jdbc:sqlserver://host:1433;databaseName=AdventureWorksDW2012

I'm not sure what's the issue at this point

if anyone has an explanation, thanks for sharing

Thanks!

2

There are 2 best solutions below

1
On

I have a similar code in my glue job

jdbcDF = spark.read.format("jdbc") \
.option("url", "jdbc:sqlserver://sqlserverip:1433;databaseName=databasename") \
.option("dbtable", "sqltable") \
.option("user", "username") \
.option("password", "password") \
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
.load()

Did you setup VPC in order to connect to your SQL Server from AWS Glue?

0
On

I had the same problem trying to connect to an on-prem sqlserver and move data into an s3 bucket. The odd thing was my glue crawler and connection both worked and so did other applications using JDBC drivers so definitely was not a firewall issue. I changed the glue version in the job details from "Spark 2.4, Python 3 with improved job start up times (Glue Version 2)" to "Spark 2.4, Python 2 Glue Version 1". That fixed it for me. I have not tried with any other versions. I think this could be a bug and have raised with AWS support.