Not a real issue (typo in URL) - Netezza pySpark connection fails while DbVisualizer works

381 Views Asked by At

Update: This has been resolved. It was a typo in the URL.

--
I'm trying to read data from Netezza using pyspark on Windows 10 1909.

I can read from it using DbVisualizer no problem. Then I tried running pyspark --driver-class-path <path to nzjdbc.jar> --jars <path to nzjdbc.jar> --master local[*] (same machine, VPN connection, JDBC driver jar, and all).

I used this code from the pyspark shell:

dataframe = spark.read.format("jdbc").options(
    url="jdbc:netezza://<server>:5480/<database>",
    dbtable="ADMIN.<table>",
    user="***",
    password="***",
    driver="org.netezza.Driver",
).load()

but this fails for me, with the following stack, after about 10-20 seconds (I also tried adding queryTimeout="300", but that didn't make a difference):

"...\AppData\Local\Continuum\miniconda3\envs\spark\lib\site-packages\pyspark\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o41.load.
: org.netezza.error.NzSQLException: Connection timed out: connect
        at org.netezza.sql.NzConnection.initSocket(NzConnection.java:2859)
        at org.netezza.sql.NzConnection.open(NzConnection.java:293)
        at org.netezza.datasource.NzDatasource.getConnection(NzDatasource.java:675)
        at org.netezza.datasource.NzDatasource.getConnection(NzDatasource.java:662)
        at org.netezza.Driver.connect(Driver.java:155)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:64)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:279)
        at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:268)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:268)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:203)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)

A coworker is able to run the same code from his Mac with no issues (also on VPN).

Is there something in Windows or in Netezza itself that could affect what clients are able connect to Netezza? Or could I be missing something in the pyspark command?

2

There are 2 best solutions below

3
On BEST ANSWER

I've found a typo in my URL. That was a really sneaky one

3
On

Can you try increasing the LoginTimeout value ? FYI queryTimeout refers to timeout for a single query.