I am using housepower clickhouse jdbc driver as I want support for array type of data. I am using jdbc in pyspark. There are certain steps we need to follow which are listed in https://housepower.github.io/ClickHouse-Native-JDBC/guide/spark_integration.html#integration-with-spark. But while registering dialect in pyspark its throwing an error as below
In [1]: spark.sparkContext._jvm.org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(spark.sparkContext._jvm.org.apache.spark.sql.jdbc.ClickHouseDialect)
---------------------------------------------------------------------------
Py4JError Traceback (most recent call last)
Cell In[1], line 1
----> 1 spark.sparkContext._jvm.org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(spark.sparkContext._jvm.org.apache.spark.sql.jdbc.ClickHouseDialect)
File /opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py:1296, in JavaMember.__call__(self, *args)
1295 def __call__(self, *args):
-> 1296 args_command, temp_args = self._build_args(*args)
1298 command = proto.CALL_COMMAND_NAME +\
1299 self.command_header +\
1300 args_command +\
1301 proto.END_COMMAND_PART
1303 answer = self.gateway_client.send_command(command)
File /opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py:1266, in JavaMember._build_args(self, *args)
1262 new_args = args
1263 temp_args = []
1265 args_command = "".join(
-> 1266 [get_command_part(arg, self.pool) for arg in new_args])
1268 return args_command, temp_args
File /opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py:1266, in <listcomp>(.0)
1262 new_args = args
1263 temp_args = []
1265 args_command = "".join(
-> 1266 [get_command_part(arg, self.pool) for arg in new_args])
1268 return args_command, temp_args
File /opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py:298, in get_command_part(parameter, python_proxy_pool)
296 command_part += ";" + interface
297 else:
--> 298 command_part = REFERENCE_TYPE + parameter._get_object_id()
300 command_part += "\n"
302 return command_part
File /opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py:1530, in JavaClass.__getattr__(self, name)
1527 return get_return_value(
1528 answer, self._gateway_client, self._fqn, name)
1529 else:
-> 1530 raise Py4JError(
1531 "{0}.{1} does not exist in the JVM".format(self._fqn, name))
Py4JError: org.apache.spark.sql.jdbc.ClickHouseDialect._get_object_id does not exist in the JVM
In [2]:
Below question is similar. but yet not solved How to add custom JDBC dialects in PySpark Thanks in advance