Transposing a large csv file

607 Views Asked by At

I have a huge csv file in this format

https://i.stack.imgur.com/ksQzS.png

I want to tanspose it in this format I tried a lot but unable to achieve it. Is there some way i can do using awk. The file contains thousands of records

https://i.stack.imgur.com/PHQ52.png

2

There are 2 best solutions below

1
On

Unfortunately, I do not know how to do this in awk. However, if you don't mind using Python, this kind of data reshaping is simple. For example:

import scipy, pandas

df = pandas.DataFrame({
    "s1_x": scipy.randn(10),
    "s1_y": scipy.randn(10),
    "s2_x": scipy.randn(10),
    "s2_y": scipy.randn(10)
    })

df is initially shaped as follows:

       s1_x      s1_y      s2_x      s2_y
0 -0.075796  2.191362 -0.960267  0.619519
1 -1.201713  0.015710  0.121307 -0.273759
2 -0.549812  1.089105 -0.525985  1.383265

But if you use df.stack() it becomes:

0  s1_x   -0.075796
   s1_y    2.191362
   s2_x   -0.960267
   s2_y    0.619519
1  s1_x   -1.201713
   s1_y    0.015710
   s2_x    0.121307
   s2_y   -0.273759
2  s1_x   -0.549812
   s1_y    1.089105
   s2_x   -0.525985
   s2_y    1.383265
dtype: float64
0
On

My awk skills are more "functional" than "elegant" but this might get you started

awk -F'|' '{for(f=1;f<=NF;f++){x[NR subsep f]=$f}} END{for(f=1;f<=NF;f++){p="";for(r=1;r<=NR;r++){if(length(p))p=p "|";p=p x[r subsep f]}print p}}' file.csv
r1f1|r2f1|r3f1
r1f2|r2f2|r3f2
r1f3|r2f3|r3f3
r1f4|r2f4|r3f4
r1f5|r2f5|r3f5

file.csv

r1f1|r1f2|r1f3|r1f4|r1f5
r2f1|r2f2|r2f3|r2f4|r2f5
r3f1|r3f2|r3f3|r3f4|r3f5

So, for every line of your inout file, the fields are saved into a 2-D array, called x[], indexed by line number (NR) and field number (1..NF). At the end of your input file, inside the END{}, I iterate through the number of fields and number of records in the file and print out the transpose, adding in pipe symbols, if necessary, as I go.