Using polars.read_csv on a large data set results in a failure because of a field delimiter issue. Ignore_errors skips the erroneous records, but I have no idea if one or thousands of records were ignored. Is there a way to pipe the bad records to a bad file or report the number of ignored rows?
I wish the world was simple enough for data to support single character column delimiters, but that hasn't happened yet - why doesn't pandas/pyarrow/polars support multi character field delimiters?
Polars library doesn't provide a mechanism to pipe the bad records to a separate file or report the number of ignored rows when using the ignore_errors parameter. You could do it manually in the following way but I don't know if it's what you want:
Regarding your second question., in Pandas you can change the field delimiters when reading a CSV file by specifying the "sep" parameter in the pandas.read_csv() function. The "sep" parameter allows you to specify the delimiter character or string used in the CSV file. For example: