how to read from Kudu to python

2.5k Views Asked by At

I am trying to retrieve data from Kudu. But I am not able to install kudu-python package in anaconda or my server. Can I get some help with it? The documentation on the internet is not really clear.

2

There are 2 best solutions below

0
On BEST ANSWER

@Karthik, did you encounter any errors? I just installed kudu-python client on Anaconda on Centos 6.9. There was one gotcha with versioning, but otherwise it was straightforward. The only error I ran into was

kudu/client.cpp:589:30: fatal error: kudu/util/int128.h: No such file or directory

there is a solution for it here: https://community.cloudera.com/t5/Data-Ingestion-Integration/can-not-install-kudu-python/td-p/67496

Otherwise, the steps are: 1. Install kudu client libraries as described on Kudu website (https://kudu.apache.org/docs/installation.html#_install_on_rhel_or_centos_hosts):

wget http://archive.cloudera.com/kudu/redhat/6/x86_64/kudu/cloudera-kudu.repo
sudo mv cloudera-kudu.repo /etc/yum.repos.d/
sudo yum update
sudo yum install kudu kudu-client0 kudu-client-devel
  1. install a bunch of dev dependencies if you don't have them already:

    sudo yum install autoconf automake libtool make gcc gcc-c++

  2. install Cython and kudu-python

    pip install Cython kudu-python==1.2.0

Once you have this installed, you can find examples in https://github.com/apache/kudu/tree/master/examples/python

0
On

i had no ability to install kudu-client (windows os is not supported) so i used the cluster's Impala to get Kudu's tables:

from impala.dbapi import connect
conn = connect('<Impala Daemon>', port=21050) 
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print(cursor.description)  # prints the result set's schema
results = cursor.fetchall()

https://github.com/cloudera/impyla