PySpark Kusto Connector on Azure Databricks

1.1k Views Asked by At

I am using Azure databricks with LTS 7.3 and spark 3.0 (PySpark) with com.microsoft.azure.kusto:kusto-spark_3.0_2.12:2.9.1 connector for quite sometime now but recently my jobs are failing with below errors (randomly, sometimes they run and othertimes they just simply fail)

df = pyKusto.read                                                        \
           .format("com.microsoft.kusto.spark.datasource")               \
           .option("kustoCluster", kustoOptions["kustoCluster"])          \
           .option("kustoDatabase", kustoOptions["kustoDatabase"])         \
           .option("kustoQuery", Query)                                    \
           .option("kustoAadAppId", kustoOptions["kustoAadAppId"])           \
           .option("kustoAadAppSecret", kustoOptions["kustoAadAppSecret"])    \
           .option("kustoAadAuthorityID", kustoOptions["kustoAadAuthorityID"]) \
           .load()
java.lang.ClassNotFoundException: Failed to find data source: com.microsoft.kusto.spark.datasource. Please find packages at http://spark.apache.org/third-party-projects.html

I have already installed the library on the cluster and it has been running for sometime without issues but not sure what's happening to it recently. Please suggest any workaround if anyone have seen this issue?

Thanks

2

There are 2 best solutions below

1
Abhishek Khandave On

In Databricks try to upgrade kusto-spark library from kusto-spark_3.0_2.12:2.9.1 to kusto-spark_3.0_2.12:3.0.0:

Libraries -> Install New -> Maven -> copy the following coordinates:

com.microsoft.azure.kusto:kusto-spark_3.0_2.12:3.0.0

If it still not works, you can create new issue here

Refer - https://github.com/Azure/azure-kusto-spark#Linking

0
yesss On

Is a bit late but here is what works for me:

df = spark.read                                                        \
           .format("com.microsoft.kusto.spark.datasource")               \
           .option("kustoCluster", kustoOptions["kustoCluster"])          \
           .option("kustoDatabase", kustoOptions["kustoDatabase"])         \
           .option("kustoQuery", Query)                                    \
           .option("kustoAadAppId", kustoOptions["kustoAadAppId"])           \
           .option("kustoAadAppSecret", kustoOptions["kustoAadAppSecret"])    \
           .option("kustoAadAuthorityID", kustoOptions["kustoAadAuthorityID"]) \
           .load()

So, spark instead of pykusto