Loading CSV to MySQL with odo and SQLAlchemy

2.2k Views Asked by At

I'm using odo to write a CSV file to a MySQL table.

uri = 'mysql+pymysql://username:password@database/schema::table'
odo('data.csv', uri)

When I do not specify a datashape, the MySQL table gets created with all string columns as TEXT and all integer columns as BIGINT(20). So, I tried the following datashape, and got the errors below:

dshape = '{} * {{ \
    ID: 10 * string, \
    FOO: float16, \
    URL: 175 * var * string, \
    PRICE: uint32, \
    BAR: int8, \
}}'.format(num_rows)
uri = 'mysql+pymysql://username:password@database/schema::table'
odo('data.csv', uri, dshape=dshape)

NotImplementedError: No SQLAlchemy dtype match for datashape: var

NotImplementedError: No SQLAlchemy dtype match for datashape: float16

NotImplementedError: No SQLAlchemy dtype match for datashape: uint32

NotImplementedError: No SQLAlchemy dtype match for datashape: int8

Ideally, I would be able to have int8 map to TINYINT, 175 * var * string map to VARCHAR(175), and 10 * string map to CHAR(10) for example, but it appears that SQLAlchemy does not support this. I settled and set all my data types to int32 or float32 and my removed the var designation from my string types.

dshape = '{} * {{ \
    ID: 10 * string, \
    FOO: float32, \
    URL: 175 * string, \
    PRICE: int32, \
    BAR: int32, \
}}'.format(num_rows)
uri = 'mysql+pymysql://username:password@database/schema::table'
odo('data.csv', uri, dshape=dshape)

This maps the int32 to INT(11) and float32 to DOUBLE. I can live with this. But I cannot accept that all of my string columns are still written as TEXT on MySQL.

How must I change my datashape to get CHAR and VARCHAR datatypes in MySQL?

1

There are 1 best solutions below

1
On

you can try using pandas and sqlalchemy to achieve the same

import pandas as pd
import sqlalchemy
from sqlalchemy import create_engine
engine = create_engine('mysql://username:password@host:port/database', echo=False)

#read csv into Dataframe
df = pd.read_csv('data.csv')

#write csv into Sql
df.to_sql('Table_Name', con=engine, if_exists='replace',index=False)

It will read data.csv file and put it into Table_Name in your database