I have an requirement to load hadoop snappy compressed avro file to Big query. Saw from Google docs that big query detects snappy compression. But when I tried bq load --source-format=AVRO project:dataset.table gs://mybucket/inputsnappy.snappy I got "Apache Avro library failed to parse the header with the following error: Invalid data file. Magic does not match" error. Any input on this will really help. Also Google doc says only compression on data blocks can be detected by bigquery. Can some one help me understand that point on data blocks. I also tried converting snappy to avro using python snappy. But iam getting error when doing "decompressed_data= snappy.decompress(input_data)" Error :Uncompress:invalid input file. Not sure how to proceed now.
Big Query parsing error while reading snappy compressed avro file
81 Views Asked by Sruthi Chandran At
0
There are 0 best solutions below
Related Questions in PYTHON
- new thread blocks main thread
- Extracting viewCount & SubscriberCount from YouTube API V3 for a given channel, where channelID does not equal userID
- Display images on Django Template Site
- Difference between list() and dict() with generators
- How can I serialize a numpy array while preserving matrix dimensions?
- Protractor did not run properly when using browser.wait, msg: "Wait timed out after XXXms"
- Why is my program adding int as string (4+7 = 47)?
- store numpy array in mysql
- how to omit the less frequent words from a dictionary in python?
- Update a text file with ( new words+ \n ) after the words is appended into a list
- python how to write list of lists to file
- Removing URL features from tokens in NLTK
- Optimizing for Social Leaderboards
- Python : Get size of string in bytes
- What is the code of the sorted function?
Related Questions in HADOOP
- pcap to Avro on Hadoop
- schedule and automate sqoop import/export tasks
- How to diagnose Kafka topics failing globally to be found
- Only 32 bit available in Oracle VM - Hadoop Installation
- Using HDFS with Apache Spark on Amazon EC2
- How to get raw hadoop metrics
- How to output multiple values with the same key in reducer?
- Loading chararray from embedded JSON using Pig
- Oozie Pig action stuck in PREP state and job is in RUNNING state
- InstanceProfile is required for creating cluster - create python function to install module
- mapreduce job not setting compression codec correctly
- What does namespace and block pool mean in MapReduce 2.0 YARN?
- Hadoop distributed mode
- Building apache hadoop 2.6.0 throwing maven error
- I am using Hbase 1.0.0 and Apache phoenix 4.3.0 on CDH5.4. When I restart Hbase regionserver is down
Related Questions in GOOGLE-BIGQUERY
- Get the last data of my google analytics dataset
- Is there any form to write to BigQuery specifying the name of destination tables dynamically?
- How to obtain java repositories having maximum number of stars in GitHub-Archive
- Possible to create BigQuery Table/Schema without populating with Data?
- Google spreadsheet script authorisation to BigQuery
- Google BigQuery Optimization Strategies
- Error when I try to create different BigQuery tables at the same pipeline execution
- Run BigQuery without login authentication
- Is there a CityHash Python (2.7) Implementation for Google App Engine?
- pandas read_gbq returns httplib.ResponseNotReady
- Designing an API on top of BigQuery
- BigQuery row level security permissions
- What is the best way to fuzzy compare two tables
- Query Google Bigquery Through Python In Google App Engine
- How to integrate Google Bigquery with c# console application
Related Questions in SNAPPY
- mapreduce job not setting compression codec correctly
- Set parquet snappy output file size is hive?
- how do you configure snappy with hbase
- How spark-streaming dealing with snappy compressed data which in kafka
- About a java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
- Push binary contents of file to std::string for use with Google's Snappy
- Read Snappy Compressed data on HDFS from Hadoop Streaming
- Compression of JSON documents in Couchbase
- How to configure Executor in Spark Local Mode
- My mongoimport runs to infinity
- snappy version mismatch issue
- Snappy compression error in Hadoop 2.x
- Optimize write to a hive table
- How to load json snappy compressed in HIVE
- Space before %pdf in snappy pdf generation
Related Questions in FASTAVRO
- How do I decode an Avro message in Python?
- Big Query parsing error while reading snappy compressed avro file
- Is there a way to write a headless Avro message to a file without deserializing its binary contents in Python?
- ValidationError while validating data against schema FastAvro
- Processing Multiple AVRO (avsc files) which are in different directory and refer each other using python (fastavro)
- Fastavro fails to parse Avro schema with enum
- What is the best way to upgrade avro files (stored on GCS) having older schemas (containing "default":"null") to newer formats (with "default":null)
- How to normalize decimal values while iterating over dataframe rows using toLocalIterator
- Changing schema of avro file when writing to it in append mode
- How do I get fastavro to support logical types?
- fastavro - Convert json file into avro file
- Trouble installing packages googleclient and fastavro
- Deserialization, fixed data type in Avro
- how to serialize large file more than 5 GB to avro?
- Confluent Kafka python schema parser causes conflict with fastavro
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?