I want to stream/ batch load data from a Spark DataFrame to the PubSub. I came across with some libraries like:
- Apache Bahir: Useful for Streaming data from PubSub only. https://bahir.apache.org/docs/spark/2.2.1/spark-streaming-pubsub/
- PubSub Lite Connector: Capable to writing to PubSub Lite, not sure if this works for PubSub.
You cannot use the Pub/Sub Lite connector for writing messages to
Pub/Sub
. Though Pub/Sub & Pub/Sub Lite both are horizontallyscalable and managed messaging services
but due to some differences these are two individual products.You can refer to this documentation to check the differences between Pub/Sub and Pub/Sub Lite. From the doc:
For stream/ batch load data from a
Spark DataFrame
to thePub/Sub
you can use Apache Bahir’s Pub/Sub connector.You can find this example from Google Cloud Platform where Apache Bahir’s Spark Streaming connector for Google Cloud Pub/Sub has been used.