mismatched columnspecs, therefore wrong read values with pd.read_fwf and using colspecs

Question

mismatched columnspecs, therefore wrong read values with pd.read_fwf and using colspecs

675 Views Asked by BobbyF At 17 May 2021 at 00:05

I am reading a text file using pd.read_fwf as below:

import pandas as pd

specs_test =[(19, 20),(20, 21),(21, 23),(23,26)]
names_test = ["Record_Type","Resident_Status","State_Occurrence_FIPS",
"County_Occurrence_FIPS"]

test_l = pd.read_fwf('test.txt', header=None, names = names_test, colspecs= specs_test)

and test.txt is as follow:

after reading the file test_l is as follow:

    Record_Type Resident_Status State_Occurrence_FIPS   County_Occurrence_FIPS
0   1   S   C0  59
1   1   S   C0  51
2   1   S   C0  19
3   1   S   C0  33
4   1   S   C0  7
5   1   S   C0  41
6   2   S   C0  79
7   1   S   C0  43
8   1   S   C0  45
9   2   S   C0  79

but, based on my colspec it should have the following (I have just added first row as I expected):

1   1  SC  059

What am I missing here? Thank you very much for your help!

Original Q&A

There are 2 best solutions below

**Jonathan Leon** · Answer 1 · 2021-05-17T00:44:32.110000

I got this when pasting your data into a test file and fixing the tuples.

specs_test =[(18, 19),(19, 20),(20, 22),(22,25)]
names_test = ["Record_Type","Resident_Status","State_Occurrence_FIPS",
"County_Occurrence_FIPS"]
pd.read_fwf('test.txt', header=None, names = names_test, colspecs= specs_test )

It's dropping the leading zeroes on te 4th column, so you may have to play around with kwargs to send in data types or fix the column after import

   Record_Type  Resident_Status State_Occurrence_FIPS  County_Occurrence_FIPS
0            1                1                    SC                      59
1            1                1                    SC                      51
2            1                1                    SC                      19
3            1                1                    SC                      33
4            1                1                    SC                       7
5            1                1                    SC                      41
6            2                2                    SC                      79
7            1                1                    SC                      43
8            1                1                    SC                      45
9            2                2                    SC                      79

**Mohammad Moghadamfalahi** · Answer 2 · 2021-05-17T06:18:59.030000

First you are off by an index. Try:

specs_test =[(18, 19),(19, 20),(20, 22),(22,25)]

Also, for numerical values the leading zeros will be ignored. To keep them, you can convert to string by adding:

converters = {h:str for h in names_test}

The final code can be:

import pandas as pd

specs_test =[(18, 19),(19, 20),(20, 22),(22,25)] ## Here you where off by an index.

names_test = ["Record_Type","Resident_Status","State_Occurrence_FIPS", "County_Occurrence_FIPS"]

test_l = pd.read_fwf('test.txt', 
                 header=None, 
                 names = names_test, 
                 colspecs= specs_test, 
                 converters = {h:str for h in names_test}) ## If you want to keep the leading 
                                                           ## zeros you can convert to string.

The result:

Record_Type Resident_Status State_Occurrence_FIPS   County_Occurrence_FIPS
0   1   1   SC  059
1   1   1   SC  051
2   1   1   SC  019
3   1   1   SC  033
4   1   1   SC  007
5   1   1   SC  041
6   2   2   SC  079
7   1   1   SC  043
8   1   1   SC  045
9   2   2   SC  079

mismatched columnspecs, therefore wrong read values with pd.read_fwf and using colspecs

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in FIXED-WIDTH

Related Questions in READ-FWF

Trending Questions

Popular # Hahtags

Popular Questions