I want to read a csv file with pandas, I want to skip rows but I want to keep the original line numbers

69 Views Asked by At

I want to keep track of the original line numbers.

I tried using the skiprows parameter of pd.read_csv(). The original line numbers are not preserved so.

If I start reading at row 100, then the first row number in the in the obtained dataframe will be 0. Moreover I want to preserve the original header.

5

There are 5 best solutions below

1
mozway On BEST ANSWER

If you have a range index and want to keep the header and skip the next n rows, use a range in skiprows:

file_path = io.StringIO('''A,B
1,2
3,4
5,6
7,8
''')

n = 2
df = pd.read_csv(file_path, header=0, skiprows=range(1,n+1))
df.index += n

Output:

   A  B
2  5  6
3  7  8
1
Mr. Irrelevant On

Use Indexnumber as true:

df.to_csv('MyLists.csv', sep=",", index=True)
1
Panda Kim On

Example

we need minimal and reproducible example, extract dataframe from csv file

import io
import pandas as pd

txt = '''1,2
3,4
5,6
7,8
'''

file_path = io.StringIO(txt)

Code

Add the skipped number to the RangeIndex.

n = 2 # skip number
df = (pd.read_csv(file_path, header=None, skiprows=n)
      .pipe(lambda x: x.set_axis(x.index + n))
)

df(skip 2 rows from txt):

    0   1
2   5   6
3   7   8
4
mozway On

If you want to keep the original index, why no just slice after reading the data?:

pd.read_csv(filename).iloc[100:]
0
M.Nemes On

Combinining the solutions of @mozway and @Panda Kim and adding the preservation of the header (that I later put into my original question):

    import pandas as pd
    import io
    
    txt = '''A,B
    1,2
    3,4
    5,6
    7,8
    '''
    
    file_path = io.StringIO(txt)
    h = pd.read_csv(file_path, header=None, nrows=1)
    h = h.iloc[0].tolist() # preserve the header
    
    file_path = io.StringIO(txt)
    df = pd.read_csv(file_path, header=0, skiprows=n)
    df.columns = h # restore the header
    df.index += n  # restore the line numbers
    print(df)
    
    #    A  B
    # 2  5  6
    # 3  7  8