Add header to .data file in Pandas

2.4k Views Asked by At

Given a file with the extention of .data, I have read it with pd.read_fwf("./input.data", sep=",", header = None):

Out:

    0
0   63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3...
1   67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5...
2   67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6...
3   37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5...
4   41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4...
... ...
292 57.0,0.0,4.0,140.0,241.0,0.0,0.0,123.0,1.0,0.2...
293 45.0,1.0,1.0,110.0,264.0,0.0,0.0,132.0,0.0,1.2...
294 68.0,1.0,4.0,144.0,193.0,1.0,0.0,141.0,0.0,3.4...
295 57.0,1.0,4.0,130.0,131.0,0.0,0.0,115.0,1.0,1.2...
296 57.0,0.0,2.0,130.0,236.0,0.0,2.0,174.0,0.0,0.0...

How can I add the following column names to it? Thanks.

col_names = ["age", "sex", "cp", "restbp", "chol", "fbs", "restecg", 
           "thalach", "exang", "oldpeak", "slope", "ca", "thal", "num"]

Update:

pd.read_fwf("./input.data", names = col_names)

Out:

    age sex cp  restbp  chol    fbs restecg thalach exang   oldpeak slope   ca  thal    num
0   63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1   67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2   67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3   37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4   41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
292 57.0,0.0,4.0,140.0,241.0,0.0,0.0,123.0,1.0,0.2...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
293 45.0,1.0,1.0,110.0,264.0,0.0,0.0,132.0,0.0,1.2...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
294 68.0,1.0,4.0,144.0,193.0,1.0,0.0,141.0,0.0,3.4...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
295 57.0,1.0,4.0,130.0,131.0,0.0,0.0,115.0,1.0,1.2...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
296 57.0,0.0,2.0,130.0,236.0,0.0,2.0,174.0,0.0,0.0...   NaN NaN NaN NaN NaN NaN
2

There are 2 best solutions below

5
On BEST ANSWER

If check read_fwf:

Read a table of fixed-width formatted lines into DataFrame.

So if there is separator , use read_csv:

col_names = ["age", "sex", "cp", "restbp", "chol", "fbs", "restecg", 
           "thalach", "exang", "oldpeak", "slope", "ca", "thal", "num"]

df = pd.read_csv("input.data", names=col_names)
print (df)

      age  sex   cp  restbp   chol  fbs  restecg  thalach  exang  oldpeak  \
0    63.0  1.0  1.0   145.0  233.0  1.0      2.0    150.0    0.0      2.3   
1    67.0  1.0  4.0   160.0  286.0  0.0      2.0    108.0    1.0      1.5   
2    67.0  1.0  4.0   120.0  229.0  0.0      2.0    129.0    1.0      2.6   
3    37.0  1.0  3.0   130.0  250.0  0.0      0.0    187.0    0.0      3.5   
4    41.0  0.0  2.0   130.0  204.0  0.0      2.0    172.0    0.0      1.4   
..    ...  ...  ...     ...    ...  ...      ...      ...    ...      ...   
292  57.0  0.0  4.0   140.0  241.0  0.0      0.0    123.0    1.0      0.2   
293  45.0  1.0  1.0   110.0  264.0  0.0      0.0    132.0    0.0      1.2   
294  68.0  1.0  4.0   144.0  193.0  1.0      0.0    141.0    0.0      3.4   
295  57.0  1.0  4.0   130.0  131.0  0.0      0.0    115.0    1.0      1.2   
296  57.0  0.0  2.0   130.0  236.0  0.0      2.0    174.0    0.0      0.0   

     slope   ca  thal  num  
0      3.0  0.0   6.0    0  
1      2.0  3.0   3.0    1  
2      2.0  2.0   7.0    1  
3      3.0  0.0   3.0    0  
4      1.0  0.0   3.0    0  
..     ...  ...   ...  ...  
292    2.0  0.0   7.0    1  
293    2.0  0.0   7.0    1  
294    2.0  2.0   7.0    1  
295    2.0  1.0   7.0    1  
296    2.0  1.0   3.0    1  

[297 rows x 14 columns]
2
On

Just do a read_csv without header and pass col_names:

df = pd.read_csv('input.data', header=None, names=col_names);

Output (head):

      age    sex    cp    restbp    chol    fbs    restecg    thalach    exang    oldpeak    slope    ca    thal    num
--  -----  -----  ----  --------  ------  -----  ---------  ---------  -------  ---------  -------  ----  ------  -----
 0     63      1     1       145     233      1          2        150        0        2.3        3     0       6      0
 1     67      1     4       160     286      0          2        108        1        1.5        2     3       3      1
 2     67      1     4       120     229      0          2        129        1        2.6        2     2       7      1
 3     37      1     3       130     250      0          0        187        0        3.5        3     0       3      0
 4     41      0     2       130     204      0          2        172        0        1.4        1     0       3      0