I'm currently publishing messages via a Pub/Sub topic that is using a Protobuf schema. This was working fine as my consumer was able to read and decode these messages using the aforementioned schema.
I now want to create an export subscription that instead reads these messages and writes them out to files in a GCS bucket. I want these files to contain those same protobuf messages that I can then read in via a Dataflow job to decode and process them.
I've been able to successfully set up the export subscription to write out files to the GCS bucket using two approaches:
- Explicitly creating a
Write to Cloud Storagesubscription via Cloud Console (file format as text) - Creating a Dataflow job via the template
Pub/Sub to Text Files on Cloud Storage
However, in both cases the files created produce unreadable content. When I download these files and try to decode manually using protoc (along the schema used to generate the records), I receive the message Failed to parse input..
Naturally, I think the pain point here is in writing out Protobuf messages as text files. Is there a way in which this set up can work while preserving the publisher writing out Protobuf messages?