Twitter API with Structured Spark Streaming

98 Views Asked by At

I am trying to access the json data from tweets in my kafka topic.In my spark structured streaming while creating schema is it necessary to explicitly specify each and every key from the twitter API.Can i not access the only ones which i want to analyse like the text field alone?

1

There are 1 best solutions below

0
OneCricketeer On

While recommended, the schema is optional. You should be able to do something like this

kafkaDf
    .select(col("value").cast("string").as("value")) 
    .select(get_json_object(col("value"), "$.text"))

https://spark.apache.org/docs/latest/api/sql/index.html#get_json_object