How to read parquet files from HDFS in R

839 Views Asked by At

I need to read parquet files stored on HDFS (I have a Kerberos-protected Hadoop cluster) in my R program. I came across a couple of packages, but none of them completely satisfy what I need

  • rhadoop: It looks like an old project with no further development. rhdfs package under these libraries does not support parquet files or Kerberos.
  • arrow: It seems like it can read parquet files, but there is no connectivity to HDFS

Is there any other library which let me read parquet files from HDFS in R?

I'm aware of sparklyr, but I believe I need to install spark on the machine which runs the spark driver? Is that correct? My R client is a different machine.

0

There are 0 best solutions below