I am encountering an issue when trying to load data from Kafka using Apache Spark version 3.3.4 with Scala version 2.12.8. The error message I receive is as follows:
py4j.protocol.Py4JJavaError: An error occurred while calling o37.load.
: java.lang.NoClassDefFoundError: scala/$less$colon$less
at org.apache.spark.sql.kafka010.KafkaSourceProvider.org$apache$spark$sql$kafka010$KafkaSourceProvider$$validateStreamOptions(KafkaSourceProvider.scala:338)
...
Caused by: java.lang.ClassNotFoundException: scala.$less$colon$less
...
Environment:
Apache Spark: 3.3.4 Scala: 2.12.8 Java: 1.8.0_261 Python: 3.10.0
I am trying to execute a kafka consumer using pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import from_json, col
from pyspark.sql.types import StructType, StructField, StringType
output_path = 'D:/uni/Python/spark-kafka/Spark-Streaming/output'
checkpoint_path = 'D:/uni/Python/spark-kafka/Spark-Streaming/checkpoint'
spark = SparkSession.builder \
.appName("Kafka_consumer_demo") \
.config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.13:3.5.0") \
.getOrCreate()
kafka_bootstrap_servers = 'localhost:9092'
kafka_topic = 'my_topic3'
df = spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", kafka_bootstrap_servers) \
.option("subscribe", kafka_topic) \
.load()
I am encountering an issue when trying to load data from Kafka using Apache Spark version 3.3.4 with Scala version 2.12.8.