Download location for apache_beam.io.gcp.gcsio.GcsBufferedReader object

1.4k Views Asked by At

I am pushing video to workers for a cloud dataflow pipeline. I have been advised to use beam directly to manage my objects. I can't understand the best practices for downloading objects. I can see the class

Apache Beam IO GCP So one could use it like so:

def read_file(element,local_path):
  with beam.io.gcp.gcsio.GcsIO().open(element, 'r') as f:

Where element is the gcs path read from a previous beam step.

Checking out the available methods, downloader looks like.

f.downloader
Download with 57507840/57507840 bytes transferred from url https://www.googleapis.com/storage/v1/b/api-project-773889352370-testing/o/Clips%2F00011.MTS?generation=1493431837327161&alt=media

This message makes it seem like it has been downloaded, it has the right file size (57mb). But where does it go? I would like to pass a variable (local_path), so that subsequent process can handle the object. The class doesn't seem accept a path destination, its not in current working directory, /tmp/ or downloads folder. I'm testing locally on OSX before I deploy.

Am I using this tool correctly? I know that streaming video bytes may be preferable for large videos, we'll get to that once I understand basic functions. I'll open a separate question for streaming into memory (named pipe?) to be read by opencv.

0

There are 0 best solutions below