ibis ImpalaTable to pyspark dataframe

101 Views Asked by JEONGHYEON OH At 26 October 2021 at 05:48

In my case, I need to load impala data to spark(pyspark). Because I want to use FPGrowth of spark mllib.

Data is in kudu and it was made by impala. Connecting to directly kudu on spark was rejected by a relevant department. And I also failed connecting with impala jdbc made by cloudera.
So my last choice is

Load data with ibis (https://github.com/ibis-project/ibis)
Convert ImpalaTable to spark's Dataframe

But I couldn't find a way.
Do I think wrong?

Original Q&A

There are 1 best solutions below

JEONGHYEON OH On 28 October 2021 at 01:02

Previously, this way was not worked for me.
I could get schema of tables, but I couldn't query because of timeout.

And I finally found a problem. My problem caused by firewall.
I opened ports of only master nodes, but I needed to open ports of data nodes.
And now everything is fine.

ibis ImpalaTable to pyspark dataframe

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in IMPALA

Related Questions in KUDU

Related Questions in IBIS

Trending Questions

Popular # Hahtags

Popular Questions