This sequence:
from airflow.hooks.mysql_hook import MySqlHook
conn = MySqlHook(mysql_conn_id='conn_id')
engine = conn.get_sqlalchemy_engine()
df.to_sql('test_table', engine, if_exists='append', index=False)
produces the following:
UnicodeEncodeError: 'latin-1' codec can't encode character '\ufffd' in position 57: ordinal not in range(256)
This sequence works great:
from sqlalchemy import create_engine
engine = create_engine("mysql://{0}:{1}@{2}/capone?charset=utf8".format(user, pwd, host))
df.to_sql('test_table', engine, if_exists='append', index=False)
The key is in explicitly declaring the charset
. I have attempted to do this in airflow as follows with {"charset": "utf8"}
:
But this has not fixed the error. I've restarted my dev environment since making the changes and the admin panel lets me know that the edit was successful. How can I work with Airflow connections to my charsets as utf8?
I realised that this is a bug in Airflow and I have reported it here: https://issues.apache.org/jira/browse/AIRFLOW-4824
For now I have a workaround with the following code:
And then use it as follows:
The real solution will be to send a pull request to the project overriding get_uri in mysql_hook.py.