So when using DataflowRunner, we are staging files to GCS using the filesToStage method, however this does not happen in DirectRunner. Is there a way to have DirectRunner stage files to GCS and use those files similar to the DataflowRunner perhaps by perhaps using ClassLoader or another method?
Staging Files to GCS using Dataflow DirectRunner
317 Views Asked by Chase At
1
There are 1 best solutions below
Related Questions in GOOGLE-CLOUD-PLATFORM
- Google Logging API - What service name to use when writing entries from non-Google application?
- Custom exception message from google endpoints exception
- Unable to connect database of lamp instance from servlet running on tomcat instance of google cloud
- How to launch a Jar file using Spark on hadoop
- Google Cloud Bigtable Durability/Availability Guarantees
- How do I add a startup script to an existing VM from the developer console?
- What is the difference between an Instance and an Instance group
- How do i change files using ftp in google cloud?
- How to update all machines in an instance group on Google Cloud Platform?
- Setting up freeswitch server on Google cloud compute
- Google Cloud Endpoints: verifyToken: Signature length not correct
- Google Cloud BigTable connection setup time
- How GCE HTTP Cross-Region Load Balancing implemented
- Google Cloud Bigtable compression
- Google cloud SDK code to execute via cron
Related Questions in GOOGLE-CLOUD-STORAGE
- Google Cloud Storage sort directory by name
- Creating a scalable database for android app | cloud hosted
- Reading/writing to Google Storage from Google Compute Windows 2008 VM
- Reading PlayStore csv review files from Google storage bucket using Java App Engine
- How to update ACL of a file in Google Cloud Storage using Java API
- How do i change files using ftp in google cloud?
- Downloading files from Google Cloud Bucket onto Google Compuete Engine Instance Startup (.NET)
- Google Cloud Storage: FATAL Alert:BAD_CERTIFICATE - A corrupt or unuseable certificate was received
- Google Cloud Storage authentication: restrict permissions without creating additional Google Accounts
- Where is the retrying logic in the Google Cloud Storage go client?
- Google Storage Incorrect Authorization Header with Amazon S3 PHP SDK v3
- Upload a file via google cloud Endpoints to google cloud Storage via Android Client
- Map tasks with input from Cloud Storage use only one worker
- How to Create Google Cloud Storage Signed Urls on App Engine Python
- Password protect site hosted on Google Cloud Storage
Related Questions in GOOGLE-CLOUD-DATAFLOW
- Support for Cloud Bigtable as Sink in Cloud Dataflow
- Is it possible to read a message from a PubSub and separate its data in different elements of a PCollection<String>? If so, how?
- Is there any form to write to BigQuery specifying the name of destination tables dynamically?
- Is there anyway to poll the system watermark of a running data flow pipeline?
- Error when I try to create different BigQuery tables at the same pipeline execution
- Dataflow job errors: "'The resource 'projects/<removed>/zones/us-central1-a/disks/<removed>-harness-0' is not ready'
- INTERNAL: Write rejected
- Error during the pipeline execution: exceeds allowed maximum skew
- Error during pipeline execution: Cannot get host IP: cannot get node: node billingtransactionsprod-o-06150305-c2d7-harness-0 not found
- Cloud Dataflow - Increase JVM Xmx Value
- failed to compile dataflow sample
- Is there a limit on the number of side outputs in Google Cloud Dataflow?
- Inserting into BigQuery via load jobs (not streaming)
- How can I emit summary data for each window even if a given window was empty?
- How to read the resource file? (google cloud dafaflow)
Related Questions in DIRECT-RUNNER
- Writing to a File in Apache Beam
- Missing options in DirectOptions class
- ApacheBeam ElasticsearchIO is not working with latest elasticsearch
- Way to visualize Beam pipeline run with DirectRunner
- Apache Beam with DirectRunner (SUBPROCESS_SDK) uses only one worker, how do I force it to use all available workers?
- MQTTIO Connection with Apache Beam behaves differently for different topics
- Apache Beam DirectRunner with Cloud Pub/Sub
- Memory profiling in DirectRunner spark mode
- Staging Files to GCS using Dataflow DirectRunner
- ERROR: Could not find a version that satisfies the requirement grpcio<2,>=1.29.0 (from apache-beam[gcp])
- GCP Dataflow pipeline runs faster in DirectRunner than DataflowRunner
- JAVA - Apache BEAM- GCP: GroupByKey works fine with Direct Runner but fails with Dataflow runner
- beam python with pub/sub subscription : error with DirectRunner but not DataflowRunner
- Apache Beam Python > 2.38.0 DirectRunner ~ AssertionError: A total of N watermark-pending bundles did not execute
- How to debug Dataflow/Apache Beam pipeline DoFn functions in eclipse using direct runner
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
No, direct runner simply runs locally, so it doesn't stage files to GCS, it just uses the local files to run the software. My best suggestion is to write a tool that looks for the files in two possible places, detecting if its running on dataflowrunner or directrunner by locating the files