How to name/title rownames in R

186 Views Asked by At

I have a dataframe named 'res', where the row names are numbers corresponding to genes.

>res

        baseMean log2FoldChange     lfcSE      stat      pvalue        padj
       <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
27395    1268.40       0.100013  0.164840  0.606731 5.44029e-01 0.737925231
18777    1413.56      -0.266365  0.175847 -1.514758 1.29834e-01 0.312449929
21399    3376.09      -0.243707  0.132616 -1.837687 6.61086e-02 0.196027163

I am wondering how to give the row names of my dataframe the heading 'gene_id' so that my data frame ends up looking like this.


>res
gene_id baseMean log2FoldChange     lfcSE      stat      pvalue        padj
       <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
27395    1268.40       0.100013  0.164840  0.606731 5.44029e-01 0.737925231
18777    1413.56      -0.266365  0.175847 -1.514758 1.29834e-01 0.312449929
21399    3376.09      -0.243707  0.132616 -1.837687 6.61086e-02 0.196027163

I am planning to bind this dataframe with another dataframe (anno) containing information of the actual genes, by the 'gene_id' column using the left_join function.

>anno
   gene_id  SYMBOL                                                                     GENENAME
1    27395  Mrpl15                                          mitochondrial ribosomal protein L15
2    18777  Lypla1                                                          lysophospholipase 1
3    21399   Tcea1                                    transcription elongation factor A (SII) 1

res_anno <- left_join(res, anno,by="gene_id")

1

There are 1 best solutions below

3
James On BEST ANSWER

Is this what you're looking for?

Creating two dataframes that represent the example:

library(tidyverse)

# creating the res dataframe
res = tibble(
  baseMean = c(1268.40,1413.56,3376.09),
  log2FoldChange = c(0.100013,-0.266365,-0.243707)
)

# A tibble: 3 × 2
  baseMean log2FoldChange
     <dbl>          <dbl>
1    1268.          0.100
2    1414.         -0.266
3    3376.         -0.244


# creating the anno dataframe
anno = tibble(
  gene_id = c(1,2,3),
  SYMBOL = c('Mrpl15', 'Lypla1', 'Tcea1')
)

# A tibble: 3 × 2
  gene_id SYMBOL
    <dbl> <chr> 
1       1 Mrpl15
2       2 Lypla1
3       3 Tcea1


Then you can apply this to your dataset:

# extracting the rownames and putting them in a column
res = res %>% 
  rownames_to_column('gene_id') %>% 
  mutate(gene_id = gene_id %>% as.numeric())

# A tibble: 3 × 3
  gene_id baseMean log2FoldChange
    <dbl>    <dbl>          <dbl>
1       1    1268.          0.100
2       2    1414.         -0.266
3       3    3376.         -0.244

And finally left_join them:

# left joining both datasets
res_anno = res %>% 
  left_join(.,
            anno,
            by = 'gene_id')

# A tibble: 3 × 4
  gene_id baseMean log2FoldChange SYMBOL
    <dbl>    <dbl>          <dbl> <chr> 
1       1    1268.          0.100 Mrpl15
2       2    1414.         -0.266 Lypla1
3       3    3376.         -0.244 Tcea1 

As per your comment, if you don't want to add a column to your original dataframe, you can just pipe the additional column and left_join so that it only exists in your new dataframe:

res_anno = res %>% 
  rownames_to_column('gene_id') %>% 
  mutate(gene_id = gene_id %>% as.numeric()) %>% 
  left_join(.,
            anno,
            by = 'gene_id')


# A tibble: 3 × 4
  gene_id baseMean log2FoldChange SYMBOL
    <dbl>    <dbl>          <dbl> <chr> 
1       1    1268.          0.100 Mrpl15
2       2    1414.         -0.266 Lypla1
3       3    3376.         -0.244 Tcea1