Strange warning using dask.dataframe to read csv

575 Views Asked by At

I am using dask dataframe module to read a csv.

In [3]: from dask import dataframe as dd                                                                               

In [4]: dd.read_csv("/file.csv", sep=",", dtype=str, encoding="utf-8", error_bad_lines=False, collection=True, blocksize=64e6) 

I used to this with no problem, but today a strange warning showed up:

   FutureWarning: The default value of auto_mkdir=True has been deprecated and will be changed to auto_mkdir=False by default in a future release.
      FutureWarning,

This didn't worried me until I realised it breaks my unit tests, because, when using this from console, it's simple a warning, but the tests set for my app have broken because of this.

Does anyone know the cause of this warning or how to get rid of it?

2

There are 2 best solutions below

0
On BEST ANSWER

Auto-answering for documentation:

  • This issue appears in fsspec==0.6.3 and dask==2.12.0 and will be removed in the future.
  • To prevent pytest failing because of the warning, add or edit a pytest.ini file in your project and set
filterwarnings =
    error
    ignore::UserWarning
  • If you want dask to silent the warning at all, explicit set this in the function call storage_options=dict("auto_mkdir"=True)
0
On

I got the same thing. Finding no answers as to what might have replaced the feature, I decided to see if the feature is even needed any more. Sure enough, as of Pandas 1.3.0 the warnings that previously motivated the feature no longer appear. So

pd.read_csv(import_path, error_bad_lines=False, warn_bad_lines=False, names=cols)

simply became

pd.read_csv(import_path, names=cols)

and works fine with no errors or warnings.