Reading misaligned .txt file to a pd dataframe

39 Views Asked by oshanshan248 At 25 January 2024 at 06:00

I'm trying to read numerical data from a .txt to a pandas dataframe, but it needs some wrangling. Some rows are misaligned (I think by tabs)

Snippet of data (pasting the table actually makes it appear aligned): .txt dataset with mixed alignment

What worked for now was simply dropping the misaligned rows, but it's a small dataset that I'd like to retain every row for. Code:

df = pd.read_table('path/file.txt', on_bad_lines='skip', header=None)
df

Output:

    0
0   15.26\t14.84\t0.871\t5.763\t3.312\t2.221\t5.22\t1
1   14.88\t14.57\t0.8811\t5.554\t3.333\t1.018\t4.9...
2   14.29\t14.09\t0.905\t5.291\t3.337\t2.699\t4.82...

Using read_table without skipping bad lines returns: 'ParserError: Error tokenizing data. C error: Expected 8 fields in line 8, saw 10'

I've tried rewriting the .txt to replace tabs with a single space (or a comma) and trying to read the new file in with the specific delimiter, but that brings me back to the ParserError (strategy inspired by Replace Tab with space in entire text file python).

inputFile = open('path/file.txt', 'r') # read mode
exportFile = open('path/file_v1.txt', 'w') # write mode
for line in inputFile:
   new_line = line.replace('\t', ',')
   exportFile.write(new_line)

inputFile.close()
exportFile.close()

(PS. Python beginner, and first StackOverflow problem. Thanks and sorry in advance if I missed some posting convention)

Original Q&A

There are 1 best solutions below

Corralien On 25 January 2024 at 08:24

You can use the sep='\s+' parameter to specify how to split your data. This means that each column is separated by one or more spaces.

Try:

df = pd.read_table('path/file.txt', header=None, sep='\s+')  # or sep='\t+'

Reading misaligned .txt file to a pd dataframe

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in DELIMITER

Related Questions in TXT

Trending Questions

Popular # Hahtags

Popular Questions