aws glue: snowflake connector is not downloading while job is starting

758 Views Asked by At

I am using Snowflake connector for aws glue. when I run the job it throws error as connector is not downloading.

I have setup following roles setup on the glue job

AmazonEC2ContainerRegistryFullAccess
AmazonS3FullAccess
AWSGlueServiceRole

but while job is running it throws following error:

2022-08-02 10:40:14,425 - main - INFO - Glue ETL Marketplace - Requesting ECR authorization token for registryIds=maskedid and region_name=us-east-1.

Traceback (most recent call last): File "/home/spark/.local/lib/python3.7/site-packages/urllib3/connection.py", line 160, in _new_conn (self._dns_host, self.port), self.timeout, **extra_kw File "/home/spark/.local/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection raise err
File "/home/spark/.local/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection sock.connect(sa)
socket.timeout: timed out
 
 
During handling of the above exception, another exception occurred:Traceback (most recent call last): File "/home/spark/.local/lib/python3.7/site-packages/botocore/httpsession.py", line 353, in send chunked=self._chunked(request.headers), File "/home/spark/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 727, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "/home/spark/.local/lib/python3.7/site-packages/urllib3/util/retry.py", line 386, in increment raise six.reraise(type(error), error, _stacktrace) File "/home/spark/.local/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise raise value File "/home/spark/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen chunked=chunked, File "/home/spark/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 381, in _make_request
self._validate_conn(conn) File "/home/spark/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 978, in _validate_conn conn.connect() File "/home/spark/.local/lib/python3.7/site-packages/urllib3/connection.py", line 309, in connect conn = self._new_conn() File "/home/spark/.local/lib/python3.7/site-packages/urllib3/connection.py", line 167, in _new_conn % (self.host, self.timeout),urllib3.exceptions.ConnectTimeoutError: (<botocore.awsrequest.AWSHTTPSConnection object at 0x7f4289911950>, 'Connection to api.ecr.us-east-1.amazonaws.com timed out. (connect timeout=60)')During handling of the above exception, another exception occurred:Traceback (most recent call last): File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 361, in main() File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 351, in main res += download_jars_per_connection(conn, region, endpoint, proxy) File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 293, in download_jars_per_connection token = get_ecr_authorization_token(ecr_root) File "/tmp/aws_glue_custom_connector_python/docker/util.py", line 22, in wrapper return func(*args, **kwargs) File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 122, in get_ecr_authorization_token response = ecr.get_authorization_token(registryIds=[registry_id]) File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 386, in _api_call return self._make_api_call(operation_name, kwargs) File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 692, in _make_api_call operation_model, request_dict, request_context) File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 711, in _make_request
return self._endpoint.make_request(operation_model, request_dict) File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 137, in _send_request success_response, exception):
 
File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 256, in _needs_retry
caught_exception=caught_exception, request_dict=request_dict) File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 357, in emit return self._emitter.emit(aliased_event_name, **kwargs) File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit return self._emit(event_name, kwargs) File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs) File "/home/spark/.local/lib/python3.7/site-packages/botocore/retryhandler.py", line 183, in call
if self._checker(attempts, response, caught_exception): File "/home/spark/.local/lib/python3.7/site-packages/botocore/retryhandler.py", line 251, in call
caught_exception) File "/home/spark/.local/lib/python3.7/site-packages/botocore/retryhandler.py", line 277, in _should_retry
return self._checker(attempt_number, response, caught_exception) File "/home/spark/.local/lib/python3.7/site-packages/botocore/retryhandler.py", line 317, in call
caught_exception) File "/home/spark/.local/lib/python3.7/site-packages/botocore/retryhandler.py", line 223, in call
attempt_number, caught_exception)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception raise caught_exception File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 200, in _do_get_response http_response = self._send(request) File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 269, in _send return self.http_session.send(request) File "/home/spark/.local/lib/python3.7/site-packages/botocore/httpsession.py", line 377, in send raise ConnectTimeoutError(endpoint_url=request.url, error=e)botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://api.ecr.us-east-1.amazonaws.com/"
Glue ETL Marketplace - failed to download connector, activation script exited with code 1
LAUNCH ERROR | Glue ETL Marketplace - failed to download connector.Please refer logs for details.
Exception in thread "main"
java.lang.Exception: Glue ETL Marketplace - failed to download connector. at com.amazonaws.services.glue.PrepareLaunch.downloadConnectorJar(PrepareLaunch.scala:876) at com.amazonaws.services.glue.PrepareLaunch.com$amazonaws$services$glue$PrepareLaunch$prepareCmd(PrepareLaunch.scala:667) at com.amazonaws.services.glue.PrepareLaunch$.main(PrepareLaunch.scala:44) at com.amazonaws.services.glue.PrepareLaunch.main(PrepareLaunch.scala)

I followed this blog:

https://aws.amazon.com/blogs/big-data/ingest-data-from-snowflake-to-amazon-s3-using-aws-glue-marketplace-connectors/

pls help to resolve this issue

1

There are 1 best solutions below

0
Sujan S ghosh On

The error is connection timeout for URL endpoint https://api.ecr.us-east-1.amazonaws.com, its not related to Snowflake.

botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://api.ecr.us-east-1.amazonaws.com/"

Since you are connecting from VPC, VPC does not have public IP and therefore can't access the internet from public subnets. You may need to setup NAT in public subnet.

  1. If you're using a VPC endpoint for Amazon S3, then verify that the correct Region is set. VPC endpoints for Amazon S3 are Region-specific.

You can get the "Could not connect to the endpoint URL" error if there's a typo or error in the specified Region or endpoint.

  1. Confirm that your network's firewall allows traffic to the Amazon S3 endpoints on the port that you're using for Amazon S3 traffic.

Example telnet s3.ap-southeast-2.amazonaws.com 443

  1. Please check if DNS is working fine

Example nslookup s3.amazonaws.com

Regards, Sujan