housepower jdbc connection with pyspark

74 Views Asked by At

I am using housepower clickhouse jdbc driver as I want support for array type of data. I am using jdbc in pyspark. There are certain steps we need to follow which are listed in https://housepower.github.io/ClickHouse-Native-JDBC/guide/spark_integration.html#integration-with-spark. But while registering dialect in pyspark its throwing an error as below

In [1]: spark.sparkContext._jvm.org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(spark.sparkContext._jvm.org.apache.spark.sql.jdbc.ClickHouseDialect)
---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 spark.sparkContext._jvm.org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(spark.sparkContext._jvm.org.apache.spark.sql.jdbc.ClickHouseDialect)

File /opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py:1296, in JavaMember.__call__(self, *args)
   1295 def __call__(self, *args):
-> 1296     args_command, temp_args = self._build_args(*args)
   1298     command = proto.CALL_COMMAND_NAME +\
   1299         self.command_header +\
   1300         args_command +\
   1301         proto.END_COMMAND_PART
   1303     answer = self.gateway_client.send_command(command)

File /opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py:1266, in JavaMember._build_args(self, *args)
   1262     new_args = args
   1263     temp_args = []
   1265 args_command = "".join(
-> 1266     [get_command_part(arg, self.pool) for arg in new_args])
   1268 return args_command, temp_args

File /opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py:1266, in <listcomp>(.0)
   1262     new_args = args
   1263     temp_args = []
   1265 args_command = "".join(
-> 1266     [get_command_part(arg, self.pool) for arg in new_args])
   1268 return args_command, temp_args

File /opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py:298, in get_command_part(parameter, python_proxy_pool)
    296         command_part += ";" + interface
    297 else:
--> 298     command_part = REFERENCE_TYPE + parameter._get_object_id()
    300 command_part += "\n"
    302 return command_part

File /opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py:1530, in JavaClass.__getattr__(self, name)
   1527         return get_return_value(
   1528             answer, self._gateway_client, self._fqn, name)
   1529 else:
-> 1530     raise Py4JError(
   1531         "{0}.{1} does not exist in the JVM".format(self._fqn, name))

Py4JError: org.apache.spark.sql.jdbc.ClickHouseDialect._get_object_id does not exist in the JVM

In [2]: 

Below question is similar. but yet not solved How to add custom JDBC dialects in PySpark Thanks in advance

0

There are 0 best solutions below