How to get column name and column index

26k Views Asked by At

Hi I have the below dataframe. Since the column contains NA's the datatype of the column is character. Now, I need to get the column name and index which only contains String value.

In the example below, I want to get the column name and column index of Zo-A and Zo-B:

 ZONE-1        Zo-A         Zone-3        Zo-B
 58            On             75          NA
 60            NA             NA          High
 NA            Off            68          Low
 70            On             NA          NA

So far I tried to first convert all of them to numeric, which created NA's for Zo-A and Zo-B column. And if I use the below code for column index, I'm getting NA's as a result

a <- which(colnames(df)=="Zo-A" )
integer(0)

match_col <- match(c("Zo-A","Zo-B")names(df))
NA NA

I need to perform below operations:

  1. I need to first get the column names which consists of String values
  2. I need the column index for the same
3

There are 3 best solutions below

1
On

To obtain this we can use the code below:

K=sapply(df,function(x)any(grepl("\\D+",x)))
 names (df)[K]
    Zo.A Zo.B 

 Which (k)
   Zo.A Zo.B 
     2    4 
6
On

For what I understand of your question, what you want or need is really, really simple.

First, read the data in.

df <- read.table(text = "
ZONE-1        Zo-A         Zone-3        Zo-B
 58            On             75          NA
 60            NA             NA          High
 NA            Off            68          Low
 70            On             NA          NA
", header = TRUE, check.names = FALSE)

str(df)
'data.frame':   4 obs. of  4 variables:
 $ ZONE-1: int  58 60 NA 70
 $ Zo-A  : Factor w/ 2 levels "Off","On": 2 NA 1 2
 $ Zone-3: int  75 NA 68 NA
 $ Zo-B  : Factor w/ 2 levels "High","Low": NA 1 2 NA

df
  ZONE-1 Zo-A Zone-3 Zo-B
1     58   On     75 <NA>
2     60 <NA>     NA High
3     NA  Off     68  Low
4     70   On     NA <NA>

Now, question (1), "first get the column names which consists of String values". All column names consist of string values so this can be done either with names or with colnames.

names(df)
[1] "ZONE-1" "Zo-A"   "Zone-3" "Zo-B" 

colnames(df)
[1] "ZONE-1" "Zo-A"   "Zone-3" "Zo-B" 

Now question (2), to get the column index of "the same". (I assume it's of column Zo-A you are asking for.)

a <- which(colnames(df) == "Zo-A")
a
[1] 2

a2 <- grep("Zo-A", colnames(df))
a2
[1] 2

Data in dput format.

df <-
structure(list(`ZONE-1` = c(58L, 60L, NA, 70L), `Zo-A` = structure(c(2L, 
NA, 1L, 2L), .Label = c("Off", "On"), class = "factor"), `Zone-3` = c(75L, 
NA, 68L, NA), `Zo-B` = structure(c(NA, 1L, 2L, NA), .Label = c("High", 
"Low"), class = "factor")), .Names = c("ZONE-1", "Zo-A", "Zone-3", 
"Zo-B"), class = "data.frame", row.names = c(NA, -4L))

Edit
If you need to get only the column names composed of alphabetic characters and punctuation marks, you can use the following regular expression.

a3 <- grep("^[[:alpha:]|[:punct:]]*$", colnames(df))
a3
[1] 2 4
0
On

While reading the data.frame you can specify 'stringsAsFactors=FALSE' and if your data itself contains NA as a string "NA" then you can specify that in the read.csv setting this parameter na.strings = c("NA")

df = read.csv('file.csv',header=T,stringsAsFactors=FALSE,na.strings=c("NA"))

Then try:

type = sapply(df,class) 
indexes = which(type=='character')
nameofindexes = names(indexes)