Getting correct row and column names when reading with read.csv2

2.9k Views Asked by At

I have the following CSV file:

;A;C;D;E;F;G;H;I;K;L;M;N;P;Q;R;S;T;V;W;X;Y
Position1;0,054213776;0,003005945;0,027905128;0,00375423;0,290228233;0,064954976;0,002462278;0,047134442;0,005404894;0,081739388;0,002012803;0,046380669;0,020762236;0,03654459;0,057469835;0,011760176;0,002482397;0,026511666;0,108202585;0,011974854;0,058416108
Position2;0,004057157;0,041518985;0,019806132;0,051610208;0,003572703;0,036402843;0,074879075;0,010325334;0,044981263;0,09328763;0,03897166;0,064762246;0,029074767;0,004175355;0,013691361;0,109767515;0,046100376;0,002930728;0,248865169;0;0,028268182
Position3;0,051305224;0,064958634;0,025061506;0,001931642;0,022646096;0,053596034;0,060665537;0,002355053;0,002426384;0,264133805;0,030836312;0,032183821;0,018242803;0,048333116;0,11381004;0,066739613;0,052130556;0,005772064;0,047369009;2,92638E-05;0,033100145

Basically, my row names are Position1, Position2, Position3 and my column names are A, B, C....,Y. I have loaded them in R using the following command:

 data<- read.csv2(f, header=TRUE) 

where f has been selected before.

However, if I ask for the row names using data[,1] I get

[1] Position1 Position2 Position3
Levels: Position1 Position2 Position3

which seems ok. However, if I now ask for the column names via data[1,] I get the following:

          X          A           C          D          E         F          G           H          I           K          L           M          N          P          Q          R          S           T          V         W        X.1
1 Position1 0.05421378 0.003005945 0.02790513 0.00375423 0.2902282 0.06495498 0.002462278 0.04713444 0.005404894 0.08173939 0.002012803 0.04638067 0.02076224 0.03654459 0.05746983 0.01176018 0.002482397 0.02651167 0.1082026 0.01197485
           Y
1 0.05841611

which I do not understand. For some reason R thinks that the first element [1,1] should have a name and uses X for that while in the CSV file the first element is empty, i.e.

[1,1]=empty   A   C   D   E..........Y
Position1
Position2
Position3

How should I read the CSV file in R?

Edit: I removed the semicolon and used the following command: data<- read.table(f, header=TRUE, sep=";") However, if I now want to ask for the rownames via data[,1] I get the following:

[1] 0,054213776 0,004057157 0,051305224
Levels: 0,004057157 0,051305224 0,054213776

while the column names via data[1,] are:

                    A           C           D          E           F           G           H           I           K           L           M           N           P          Q           R           S           T           V           W
Position1 0,054213776 0,003005945 0,027905128 0,00375423 0,290228233 0,064954976 0,002462278 0,047134442 0,005404894 0,081739388 0,002012803 0,046380669 0,020762236 0,03654459 0,057469835 0,011760176 0,002482397 0,026511666 0,108202585
                    X           Y
Position1 0,011974854 0,058416108

This is still not correct. Any suggestions?

2

There are 2 best solutions below

2
On

I think

dat <- read.csv2("csvex.txt",row.names=1)

will do what you want.

rownames(dat)
## [1] "Position1" "Position2" "Position3"
dat[,1]
## [1] 0.054213776 0.004057157 0.051305224
dat["Position1",]
##                    A           C          D  ...
## Position1 0.05421378 0.003005945 0.02790513  ...
dat[1,] ## same as dat["Position1",]

The row names and column names in an R matrix or data.frame are not considered part of the table data (that is, they're not the first row and column of the data) -- rather, they are kept as separate attributes, which are retrievable using colnames(dat) and rownames(dat) (and settable using rownames(dat) <- ... and colnames(dat) <- ...). dimnames() is useful for retrieving or setting both column and row names at the same time ...

header=TRUE (which is the default for read.csv[2]) tells R that it should treat the first row of the CSV file as column names (rather than assuming they're data, and that it should make up generic column names). row.names=1 tells R that it should treat the first column of the CSV file as row names (ditto).

3
On

You have a semi-colon preceding A in the header row. I wonder if that's wreaking havoc on the read? Remove it and see.