I have a dataflow made with apache-beam in python 3.7 where I process a file and then I have to delete it. The file comes from a google Storage bucket, and the problem is that when I use the DataflowRunner runner my job doesn't work because google-cloud-storage API is not installed in the Google Dataflow python 3.7 environment. Do you know guys how could I delete this file inside my dataflow without using this API? I've seen apache_beam modules like https://beam.apache.org/releases/pydoc/2.22.0/apache_beam.io.filesystem.html but I don't have any idea of how to use it, and haven't found a tutorial or example on how to use this module.
delete file from Google Storage from a Dataflow job
1.3k Views Asked by Felipe Sierra At
1
There are 1 best solutions below
Related Questions in PYTHON-3.X
- Update a text file with ( new words+ \n ) after the words is appended into a list
- Kivy - Create new widget and set its position and size
- TypeError: encoding or errors without a string argument
- How to print varible name in python
- PyQt, Python 3: Lambda slot assigning signal argument to a variable?
- How to write data to stdin of the first process in a Python shell pipeline?
- pygame.draw.circle, still draws a square
- Duplicate Frames Created When Calling a Function in a Tkinter Application
- Python TypeError: can only concatenate tuple (not "int") to tuple
- recursively editing member variable: All instances have same value
- missing 1 required positional argument: 'key'
- How do I fix this sorting error?
- Dictionary values missing
- Why does opening a file in two different encodings work as expected?
- Binary bit flip generator in python
Related Questions in GOOGLE-CLOUD-PLATFORM
- Google Logging API - What service name to use when writing entries from non-Google application?
- Custom exception message from google endpoints exception
- Unable to connect database of lamp instance from servlet running on tomcat instance of google cloud
- How to launch a Jar file using Spark on hadoop
- Google Cloud Bigtable Durability/Availability Guarantees
- How do I add a startup script to an existing VM from the developer console?
- What is the difference between an Instance and an Instance group
- How do i change files using ftp in google cloud?
- How to update all machines in an instance group on Google Cloud Platform?
- Setting up freeswitch server on Google cloud compute
- Google Cloud Endpoints: verifyToken: Signature length not correct
- Google Cloud BigTable connection setup time
- How GCE HTTP Cross-Region Load Balancing implemented
- Google Cloud Bigtable compression
- Google cloud SDK code to execute via cron
Related Questions in GOOGLE-CLOUD-STORAGE
- Google Cloud Storage sort directory by name
- Creating a scalable database for android app | cloud hosted
- Reading/writing to Google Storage from Google Compute Windows 2008 VM
- Reading PlayStore csv review files from Google storage bucket using Java App Engine
- How to update ACL of a file in Google Cloud Storage using Java API
- How do i change files using ftp in google cloud?
- Downloading files from Google Cloud Bucket onto Google Compuete Engine Instance Startup (.NET)
- Google Cloud Storage: FATAL Alert:BAD_CERTIFICATE - A corrupt or unuseable certificate was received
- Google Cloud Storage authentication: restrict permissions without creating additional Google Accounts
- Where is the retrying logic in the Google Cloud Storage go client?
- Google Storage Incorrect Authorization Header with Amazon S3 PHP SDK v3
- Upload a file via google cloud Endpoints to google cloud Storage via Android Client
- Map tasks with input from Cloud Storage use only one worker
- How to Create Google Cloud Storage Signed Urls on App Engine Python
- Password protect site hosted on Google Cloud Storage
Related Questions in APACHE-BEAM
- Api for video processing with Apache beam
- Reading CSV header with Dataflow
- BigqueryIO Unable to Write to Date-Partitioned Table
- Azure Blob support in Apache Beam?
- Consuming unbounded data in windows with default trigger
- How to get a list of elements out of a PCollection in Google Dataflow and use it in the pipeline to loop Write Transforms?
- Read a file from GCS in Apache Beam
- Reading and Writing XML files through Apache Beam/Google Cloud DataFlow
- Multiple file generation while writing to XML through Apache Beam
- Unable to serialize com.google.api.services.bigquery.Bigquery$Tables
- Apache Beam Dataflow Jobs started failing with: Workflow failed
- What is a single bar in python?
- Download location for apache_beam.io.gcp.gcsio.GcsBufferedReader object
- Processing Total Ordering of Events By Key using Apache Beam
- Pick elements in processElement() - Apache Beam
Related Questions in GOOGLE-DATAFLOW
- Using schema update option in beam.io.writetobigquery
- Error in SQL Launcher (java.lang.NullPointerException) in Google Dataflow SQL
- CoGroupByKey always failed on big data (PythonSDK)
- Google Dataflow pipeline for varying schema
- Error 401 with cloud scheduler while passing Dataflow template as URL via POST request
- Unacknowledge some pub/sub messages in apache beam pipeline
- How to read a json file from GCP bucket using java
- Backfill Beam pipeline with historical data
- Nested rows using STRUCT are not supported in Dataflow SQL (GCP)
- Google dataflow job which reads from Pubsub and writes to GCS is very slow (WriteFiles/WriteShardedBundlesToTempFiles/GroupIntoShards) takes too long
- How to create tar.gz file using apache beam
- How to count the number of rows in the input file of the Google Dataflow file processing?
- Handling rejects in Dataflow/Apache Beam through dependent pipelines
- How to deploy Google Cloud Dataflow with connection to PostgreSQL (beam-nuggets) from Google Cloud Functions
- delete file from Google Storage from a Dataflow job
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
I don't think you can delete while running the dataflow job. You have to delete the file after the dataflow job is completed. I normally recommend some kind of orchestration like apache airflow or Google Cloud Composer.
You can make a DAG in airflow as follows -
Here,
"Custom DAG Workflow" will have the dataflow job.
"Custom Python Code" will have the python code to delete the file
Refer - https://github.com/GoogleCloudPlatform/professional-services/tree/master/examples/cloud-composer-examples/composer_dataflow_examples