I am pushing video to workers for a cloud dataflow pipeline. I have been advised to use beam directly to manage my objects. I can't understand the best practices for downloading objects. I can see the class
Apache Beam IO GCP So one could use it like so:
def read_file(element,local_path):
with beam.io.gcp.gcsio.GcsIO().open(element, 'r') as f:
Where element is the gcs path read from a previous beam step.
Checking out the available methods, downloader looks like.
f.downloader
Download with 57507840/57507840 bytes transferred from url https://www.googleapis.com/storage/v1/b/api-project-773889352370-testing/o/Clips%2F00011.MTS?generation=1493431837327161&alt=media
This message makes it seem like it has been downloaded, it has the right file size (57mb). But where does it go? I would like to pass a variable (local_path), so that subsequent process can handle the object. The class doesn't seem accept a path destination, its not in current working directory, /tmp/ or downloads folder. I'm testing locally on OSX before I deploy.
Am I using this tool correctly? I know that streaming video bytes may be preferable for large videos, we'll get to that once I understand basic functions. I'll open a separate question for streaming into memory (named pipe?) to be read by opencv.