PySpark Kusto Connector on Azure Databricks

1.1k Views Asked by Bha123 At 25 June 2022 at 02:41

I am using Azure databricks with LTS 7.3 and spark 3.0 (PySpark) with com.microsoft.azure.kusto:kusto-spark_3.0_2.12:2.9.1 connector for quite sometime now but recently my jobs are failing with below errors (randomly, sometimes they run and othertimes they just simply fail)

df = pyKusto.read                                                        \
           .format("com.microsoft.kusto.spark.datasource")               \
           .option("kustoCluster", kustoOptions["kustoCluster"])          \
           .option("kustoDatabase", kustoOptions["kustoDatabase"])         \
           .option("kustoQuery", Query)                                    \
           .option("kustoAadAppId", kustoOptions["kustoAadAppId"])           \
           .option("kustoAadAppSecret", kustoOptions["kustoAadAppSecret"])    \
           .option("kustoAadAuthorityID", kustoOptions["kustoAadAuthorityID"]) \
           .load()
java.lang.ClassNotFoundException: Failed to find data source: com.microsoft.kusto.spark.datasource. Please find packages at http://spark.apache.org/third-party-projects.html

I have already installed the library on the cluster and it has been running for sometime without issues but not sure what's happening to it recently. Please suggest any workaround if anyone have seen this issue?

Thanks

Original Q&A

There are 2 best solutions below

Abhishek Khandave On 25 June 2022 at 06:10

In Databricks try to upgrade kusto-spark library from kusto-spark_3.0_2.12:2.9.1 to kusto-spark_3.0_2.12:3.0.0:

Libraries -> Install New -> Maven -> copy the following coordinates:

com.microsoft.azure.kusto:kusto-spark_3.0_2.12:3.0.0

If it still not works, you can create new issue here

Refer - https://github.com/Azure/azure-kusto-spark#Linking

yesss On 28 February 2024 at 10:08

Is a bit late but here is what works for me:

df = spark.read                                                        \
           .format("com.microsoft.kusto.spark.datasource")               \
           .option("kustoCluster", kustoOptions["kustoCluster"])          \
           .option("kustoDatabase", kustoOptions["kustoDatabase"])         \
           .option("kustoQuery", Query)                                    \
           .option("kustoAadAppId", kustoOptions["kustoAadAppId"])           \
           .option("kustoAadAppSecret", kustoOptions["kustoAadAppSecret"])    \
           .option("kustoAadAuthorityID", kustoOptions["kustoAadAuthorityID"]) \
           .load()

So, spark instead of pykusto

PySpark Kusto Connector on Azure Databricks

There are 2 best solutions below

Related Questions in AZURE

Related Questions in PYSPARK

Related Questions in SPARK-STREAMING

Related Questions in AZURE-DATA-EXPLORER

Related Questions in KUSTO-JAVA-SDK

Trending Questions

Popular # Hahtags

Popular Questions