1- I have a spark cluster on databricks community edition and I have a Kafka instance on GCP.
2- I just want to data ingestion Kafka streaming from databricks community edition and I want to analyze the data on spark.
3- This is my connection code.
val UsYoutubeDf =
spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "XXX.XXX.115.52:9092")
.option("subscribe", "usyoutube")
.load`
As is mentioned my datas arriving to the kafka.
I'm entering firewall settings spark.driver.host otherwise ı cannot sending any ping to my kafka machine from databricks's cluster
import org.apache.spark.sql.streaming.Trigger.ProcessingTime
val sortedModelCountQuery = sortedyouTubeSchemaSumDf
.writeStream
.outputMode("complete")
.format("console")
.option("truncate","false")
.trigger(ProcessingTime("5 seconds"))
.start()
After this post the datas dont coming to my spark on cluster
import org.apache.spark.sql.streaming.Trigger.ProcessingTime
sortedModelCountQuery: org.apache.spark.sql.streaming.StreamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@3bd8a775
It stays like this. Actually, the data is coming, but the code I wrote for analysis does not work here