Unable to load data from Cloudant into Python/Spark dataframe in Watson Studio Notebook

Question

Unable to load data from Cloudant into Python/Spark dataframe in Watson Studio Notebook

326 Views Asked by Arun Kumar L At 28 July 2025 at 05:29

I am trying to load the data from Cloudant DB into a Python/Spark dataframe in Python and Spark environment in Watson Studio. I have followed the steps mentioned in this link and stuck in Procedure 3: Step 5. I already have a cloudant DB with the name 'twitterdb' and I am trying to load data from here.

Error screenshot

Error Screenshot when loading the data from cloudant db

Original Q&A

There are 1 best solutions below

**charles gomes** · Answer 1

By looking at the error, i see that you must have installed incorrect Cloudant Connector as compare to the kind of Spark version available on Spark As Service from IBM Cloud. Spark As Service offers spark version 2.1.2.

Now from the tutorial, one of the step indicates to install Spark Cloudant Package.

pixiedust.installPackage("org.apache.bahir:spark-sql-cloudant_2.11:0")

which i think must be installing wrong version of spark cloudant connector as the error state it is trying to use.

/gpfs/global_fs01/sym_shared/YPProdSpark/user/s97c-0d96df4a6a0cd8-8754c7852bb5/data/libs/spark-sql-cloudant_2.11-2.2.1.jar

The right version to install/use would be https://mvnrepository.com/artifact/org.apache.bahir/spark-sql-cloudant_2.11/2.1.2

Now important part is that Spark Cloudant connector is already installed by default. /usr/local/src/dataconnector-cloudant-2.0/spark-2.0.0/libs/

You should uninstall your user-installed package using pixiedust.

pixiedust.packageManager.uninstallPackage("org.apache.bahir:spark-sql-cloudant_2.11:2.2.1")

Then restart the kernel and then use cloudant connector as describe to read from your cloudant database.

spark = SparkSession\
    .builder\
    .appName("Cloudant Spark SQL Example in Python using dataframes")\
    .config("cloudant.host","ACCOUNT.cloudant.com")\
    .config("cloudant.username", "USERNAME")\
    .config("cloudant.password","PASSWORD")\
    .config("jsonstore.rdd.partitions", 8)\
    .getOrCreate()

# ***1. Loading dataframe from Cloudant db
df = spark.read.load("n_airportcodemapping", "org.apache.bahir.cloudant")
df.cache() 
df.printSchema()

Ref:- https://github.com/apache/bahir/tree/master/sql-cloudant

Thanks, Charles.

Unable to load data from Cloudant into Python/Spark dataframe in Watson Studio Notebook

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in WATSON-STUDIO

Related Questions in PYTHON-CLOUDANT

Trending Questions

Popular # Hahtags

Popular Questions