I need to use Parquet-mr library to read from Parquet files programmatically in Java. I need to selectively read a few columns and skip other columns (For example, read 3 columns out of 500 columns). I can't seem to find any documentation on how to do that. Can someone please point me to one if there is any?
Documentation for Parquet-mr java library
1.8k Views Asked by User29519 At
1
There are 1 best solutions below
Related Questions in PARQUET
- Spark with Avro, Kryo and Parquet
- Set parquet snappy output file size is hive?
- Getting error,Error: org.kitesdk.data.DatasetIOException: Cannot decode Avro value
- Got exception running Sqoop: java.lang.NullPointerException using -query and --as-parquetfile
- bit vector intersect in handling parquet file format
- Spark: error reading DateType columns in partitioned parquet data
- export parquet format data to mysql using sqoop
- Hive - How to print the classpath of a Hive service
- Flink Avro Parquet Writer in RollingSink
- How to convert parquet file to Avro file?
- from java objects to parquet file
- Spark empty _metadata file in parquet output
- java.lang.NoSuchMethodError: com.microsoft.azure.storage.core.StorageCredentialsHelper.signBlobAndQueueRequest
- Reading/writing with Avro schemas AND Parquet format in SparkSQL
- Partial Vertical Caching of DataFrame
Related Questions in PARQUET-MR
- ParquetFileReader leading to too many TCP connections in CLOSE_WAIT state
- parquet-tools cannot read zstd files but can read gzip?
- Is it possible to reopen ParquetWriter after close() is called?
- PySpark Write Parquet Binary Column with Stats (signed-min-max.enabled)
- Using parquet tools on files in hdfs
- Installing parquet-tools
- How do you set the row group size of files in hdfs?
- Unable to filter parquet file using where clause.... error "unsafe symbol Unstable"
- flink sink to parquet file with AvroParquetWriter is not writing data to file
- INT32 type error when scanning parquet federated table. Bug or Expected behavior?
- Is it possible to write multiple oracle database tables into one parquet file?
- read a parquet file using Java, but it works in local machine, and doesn't work in docker container
- Add parquet-tools to path (Visual Studio Code)
- Why is dictionary page offset 0 for `plain_dictionary` encoding?
- Process parquet file row-wise
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Unfortunately this is not documented too well. There are some examples you can check out here. These ones use the ExampleParquetWriter class from Parquet however, which was meant to be used as an example only. Nevertheless, it works.
The proper way to use Parquet would be either through one of the supported object models (like Avro, Thrift or Protobuf) or by implementing your own object model (which leads to the best performance). You can read more about object models here.