Only one of 4 DOS Borland Dbase 300mb dbf files errors using python dbfread lib after many minutes with a non ascii character

57 Views Asked by At

I read 4 files fine but one one I got this error using the code below: dbfread is giving this error:

return decode_text(text, self.encoding, errors=self.char_decode_errors) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeDecodeError: 'ascii' codec can't decode byte 0xac in position 11: ordin

loading a 300mb file. According to google it needs the character format. I have 4 dbf files all were loaded with dbfread except this one.

from dbfread import DBF

    def load_data_from_dbf(self, num=10000):
        records = list(DBF(self.dbf_file_path))[-num:]
        with self.data_lock:
            self.df = pd.DataFrame(records)
            self.index1DF = self.df.set_index([self.indexf[0]])
            if len(self.indexf) > 1:
                self.index2DF = self.df.set_index([self.indexf[1]])
1

There are 1 best solutions below

7
Sam Marvasti On

After searching and trying geopandas it resolved the issue. THis is because "The format drivers will attempt to detect the encoding of your data, but may fail. In this case, the proper encoding can be specified explicitly by using the encoding keyword parameter, e.g. encoding='utf-8'." Geopandas derives the encoding while dbfread does not. So this answer works. It does add a 'geometry' which can be ignored.

here is a code snippet:

import geopandas as gpd
def load_data_from_dbf(self, num=100000):
    tempdf = gpd.read_file(self.dbf_file_path)
    indexDF = tempdf.tail(num).set_index([self.indexf[0]])
    with self.data_lock:
        self.df = tempdf
        self.index1DF = indexDF
        if len(self.indexf) > 1:
            self.index2DF = self.df.set_index([self.indexf[1]])