I have the following data
GOBPID Term ADX_KD_06.ip ADX_KD_24.ip ADX_LG_06.ip (more columns)
GO:0000003 reproduction 0 0 0
GO:0000165 MAPK cascade 0 0 0
(more rows)
When I read it like the following
d1 <- read.table("http://dpaste.com/1487049/plain/",sep="\t",header=TRUE)
I expect d1$GOBPID
to contain values like GO:0000003
, but it access Term
column
instead.
> d1$GOBPID
[1] reproduction MAPK cascade ....
Basically, it doesn't assign the header column as it should. Why is that? What's the right way to do it?
How big are your actual data?
As Richie Cotton pointed out,
count.fields
is useful for identifying how many delimiters there are in each row of your data. In this case, however, it was a little more useful to open the file up in a decent text editor that shows tab characters, and you would see that every line except for the first has a trailing tab. Because all the other rows have one more tab than the first, R assumes the first "column" should be therow.names
which leads to the problem you're having.Here are two possible options for this data:
Option 1
This is convenient if your data are small: Use
gsub
to get rid of the trailing tabs, and useread.delim
on the output of that:Option 2
Read the table in skipping the first line, drop the last column (which should be all
NA
values), and add names by reading just the first line usingscan
: