Pandas to_gbq error due to type inconsistencies between MacOS and Windows

430 Views Asked by At

My Python/Pandas code is working fine on my MacOS, but now that I've moved it to Windows, it's not working due to type differences and I'm getting an error when trying to write to gbq (Google Big Query):

The code is as follows:

def formatNumber(x):
    if math.isnan(x):
        f_number = 0.0
    else:
        f_number = str(round(x, 8))

    return f_number


... <reading df from file> ...

print("A")
print(df.info())
df['Date'] = [x.date().strftime("%Y-%m-%d") for x in df['Date']]
df['A'] = [formatNumber(x) for x in df['A']]

# drop duplicates
print(df.shape)
df = df.drop_duplicates()
print(df.shape)

# upload to bigquery
print("B")
print(df.info())

table_schema = [{
    'name': 'Date',
    'type': 'date'
}, {
    'name': 'A',
    'type': 'numeric'
}, {
    'name': 'B',
    'type': 'string'
}]


df.to_gbq('tablename',
                 'dbname',
                 chunksize=None,
                 if_exists='replace',
                 table_schema=table_schema,
                 credentials=credentials
                 )

The output is:

A
<class 'pandas.core.frame.DataFrame'>
Int64Index: 82624 entries, 0 to 9
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype
---  ------                   --------------  -----
 0   Date                     82624 non-null  datetime64[ns]
 1   A                        82624 non-null  float64
 2   B                        80769 non-null  object
 ...

dtypes: datetime64[ns](1), float64(6), object(6)
memory usage: 8.8+ MB
None
(82624, 13)
(82624, 13)

[5 rows x 13 columns]
B
<class 'pandas.core.frame.DataFrame'>
Int64Index: 82624 entries, 0 to 9
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype
---  ------                   --------------  -----
 0   Date                     82624 non-null  datetime64[ns]
 1   A                        82624 non-null  object
 2   B                        80769 non-null  object
 ...

dtypes: datetime64[ns](1), float64(6), object(6)
memory usage: 8.8+ MB

Error message:

  File "pyarrow\array.pxi", line 1044, in pyarrow.lib.Array.from_pandas
  File "pyarrow\array.pxi", line 316, in pyarrow.lib.array
  File "pyarrow\array.pxi", line 83, in pyarrow.lib._ndarray_to_array
  File "pyarrow\error.pxi", line 123, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'datetime.time' object

Another difference I've noticed between running it on MacOS and Windows are the changes to indexes on MacOS whereas nothing changes on Windows.

MacOS:

  • A --> Int64Index: 82624 entries, 0 to 1015
  • B --> RangeIndex: 1016 entries, 0 to 1015

Windows:

  • A and B --> Int64Index: 82624 entries, 0 to 9
1

There are 1 best solutions below

0
On

try to change

df['Date'] = [x.date().strftime("%Y-%m-%d") for x in df['Date']]

to

df['Date'] = [x.date().strftime("%Y-%m-%d %Z") for x in df['Date']]

it appears The error you are receiving suggests that there is a type incompatibility between the datetime.time object and the expected bytes type. This may be caused by a difference in the behavior of the strftime() method of the datetime object on MacOS and Windows.