python odo sql AssertionError: datashape must be Record type, got 0 * {...}

812 Views Asked by At

I'm trying to import a CSV into MySQL using odo but am getting a datashape error.

My understanding is that datashape takes the format:

var * {
    column: type
    ... 
}

where var means a variable number of rows. I'm getting the following error:

AssertionError: datashape must be Record type, got 0 * {
  tod: ?string,
  interval: ?string,
  iops: float64,
  mb_per_sec: float64
}

I'm not sure where that 0 number of rows is coming from. I've tried explicitly setting the datashape using dshape(), but continue to get the same error.

Here's a stripped down version of the code that recreates the error:

from odo import odo

odo('test.csv', mysql_database_uri)

I'm running Ubuntu 16.04 and Python 3.6.1 using Conda.

Thanks for any input.

3

There are 3 best solutions below

1
On

Try replacing

odo('test.csv', mysql_database_uri) 

with

odo(pandas.read_csv('test.csv') , mysql_database_uri)
0
On

I had this error, needed to specify table

# error
odo('data.csv', 'postgresql://usr:pwd@ip/db')

# works
odo('data.csv', 'postgresql://usr:pwd@ip/db::table')
0
On

Odo seems to be buggy and discontinued. As an alternative you can use d6tstack which has fast pandas to SQL functionality because it uses native DB import commands. It supports Postgres, MYSQL and MS SQL,

cfg_uri_mysql = 'mysql+mysqlconnector://testusr:testpwd@localhost/testdb'
d6tstack.combine_csv.CombinerCSV(glob.glob('*.csv'), 
    apply_after_read=apply_fun).to_mysql_combine(uri_psql, 'table')

Also particularly useful for importing multiple CSV with data schema changes and/or preprocess with pandas before writing to db, see further down in examples notebook