Issue with SQLAlchemy accessing Impala database via cloudera ODBC DSN

37 Views Asked by At

I'm trying to access an Impala DB via SQLAlchemy - I have configured a DSN that allows me to connect to the DB when using directly pyodbc.

However when using SQLAlchemy I get an error:

When using a db called datamart_x in the DSN:

pyodbc.Error: ('HY000', '[HY000] [Cloudera][ImpalaODBC] (370) Query analysis error occurred during query execution: [HY000] : AnalysisException: datamart_x.schema_name() unknown for database datamart_x. Currently this db has 0 functions.\n (370) (SQLExecDirectW)')

sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('HY000', '[HY000] [Cloudera][ImpalaODBC] (370) Query analysis error occurred during query execution: [HY000] : AnalysisException: datamart_x.schema_name() unknown for database datamart_x. Currently this db has 0 functions.\n (370) (SQLExecDirectW)') [SQL: SELECT schema_name()]

It is not a permission issue - as I can connect directly to the DB with pyodbc using the same DSN.

I suspect the issue is in the SQL statement: SELECT schema_name() that is executed when the SQLAlchemy engine is access ed (e.g. in my case with pandas read_sql)

Any ideas if there are connection parameters to get this to work?

Below a the code that I use to create the SQLAlchemy engine

connection_string = 'mssql+pyodbc://DataLake'
SQL = 'SHOW DATABASES'
args = {'autocommit': True}
engine = create_engine(connection_string, connect_args=args)
df = pd.read_sql(SQL, engine)

Kind Regards, Ernst

Tried different kind of connection strings .

Update Enabled logging for the sqlalchemy module(below): This seems to confirm my suspicion - as the SELECT schema_name() Bu I still don't know how to prevent sqlalchemy from running this sql statement.

INFO:sqlalchemy.engine.Engine:SELECT CAST(SERVERPROPERTY('ProductVersion') AS VARCHAR)
INFO:sqlalchemy.engine.Engine:[raw sql] ()
INFO:sqlalchemy.engine.Engine:ROLLBACK
INFO:sqlalchemy.engine.Engine:SELECT schema_name()
INFO:sqlalchemy.engine.Engine:[generated in 0.00042s] ()
INFO:sqlalchemy.engine.Engine:ROLLBACK
2

There are 2 best solutions below

1
Gord Thompson On BEST ANSWER

You are trying to use mssql+pyodbc://, which is the dialect for Microsoft SQL Server. You'll need to use a dialect specifically for Impala. Usage examples can be found here:

https://github.com/cloudera/impyla/blob/master/impala/tests/test_sqlalchemy.py

0
ErnstW On

Code below is now working using the input from Gord Thompson

from sqlalchemy import create_engine
from impala.dbapi import connect
from impala.sqlalchemy import ImpalaDialect
from getpass import getpass
import pandas as pd

my_host = 'impalahost.com'
my_user = '[email protected]'
my_pass = getpass('The password please : ')
SQL = 'SHOW TABLES'

imp_conn = connect(host=my_host, port=21050, user=my_user, password=my_pass,
               use_ssl=True, auth_mechanism='PLAIN', database='default')


engine = create_engine('impala://', creator=lambda: imp_conn, echo=True)
conn = engine.connect()
df = pd.read_sql(SQL, conn)
print(df.head(15))