I am encountering an issue when trying to load data from Kafka using Apache Spark version 3.3.4 with Scala version 2.12.8. The error message I receive is as follows:

py4j.protocol.Py4JJavaError: An error occurred while calling o37.load.
: java.lang.NoClassDefFoundError: scala/$less$colon$less
        at org.apache.spark.sql.kafka010.KafkaSourceProvider.org$apache$spark$sql$kafka010$KafkaSourceProvider$$validateStreamOptions(KafkaSourceProvider.scala:338)
        ...
Caused by: java.lang.ClassNotFoundException: scala.$less$colon$less
        ...

Environment:

Apache Spark: 3.3.4 Scala: 2.12.8 Java: 1.8.0_261 Python: 3.10.0

I am trying to execute a kafka consumer using pyspark

from pyspark.sql import SparkSession
from pyspark.sql.functions import from_json, col
from pyspark.sql.types import StructType, StructField, StringType

output_path = 'D:/uni/Python/spark-kafka/Spark-Streaming/output'
checkpoint_path = 'D:/uni/Python/spark-kafka/Spark-Streaming/checkpoint'

spark = SparkSession.builder \
    .appName("Kafka_consumer_demo") \
    .config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.13:3.5.0") \
    .getOrCreate()


kafka_bootstrap_servers = 'localhost:9092'
kafka_topic = 'my_topic3'

df = spark \
    .readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", kafka_bootstrap_servers) \
    .option("subscribe", kafka_topic) \
    .load()

I am encountering an issue when trying to load data from Kafka using Apache Spark version 3.3.4 with Scala version 2.12.8.

0

There are 0 best solutions below