I have a table with a long list of aliased values like this:

> head(transmission9, 50)
# A tibble: 50 x 2
   In_Node  End_Node
   <chr>    <chr>   
 1 c4ca4238 2838023a
 2 c4ca4238 d82c8d16
 3 c4ca4238 a684ecee
 4 c4ca4238 fc490ca4
 5 28dd2c79 c4ca4238
 6 f899139d 3def184a

I would like to have R go through both columns and assign a number sequentially to each value, in the order that an aliased value appears in the dataset. I would like R to read across rows first, then down columns. For example, for the dataset above:

   In_Node  End_Node
   <chr>    <chr>   
 1  1       2
 2  1       3
 3  1       4
 4  1       5
 5  6       1
 6  7       8

Is this possible? Ideally, I'd also love to be able to generate a "key" which would match each sequential code to each aliased value, like so:

Code Value
1    c4ca4238
2    2838023a
3    d82c8d16
4    a684ecee
5    fc490ca4

Thank you in advance for the help!

3

There are 3 best solutions below

2
On BEST ANSWER

A dplyr version

  • Let's first re-create a sample data
library(tidyverse)

transmission9 <- read.table(header = T, text = "   In_Node  End_Node
 1 c4ca4238 283802d3a
 2 c4ca4238 d82c8d16
 3 c4ca4238 a684ecee
 4 c4ca4238 fc490ca4
 5 28dd2c79 c4ca4238
 6 f899139d 3def184a")

Do this simply

transmission9 %>% 
  mutate(across(everything(), ~ match(., unique(c(t(cur_data()))))))
#>   In_Node End_Node
#> 1       1        2
#> 2       1        3
#> 3       1        4
#> 4       1        5
#> 5       6        1
#> 6       7        8

use .names argument if you want to create new columns

transmission9 %>% 
  mutate(across(everything(), ~ match(., unique(c(t(cur_data())))),
                .names = '{.col}_code'))

   In_Node End_Node In_Node_code End_Node_code
1 c4ca4238 2838023a            1             2
2 c4ca4238 d82c8d16            1             3
3 c4ca4238 a684ecee            1             4
4 c4ca4238 fc490ca4            1             5
5 28dd2c79 c4ca4238            6             1
6 f899139d 3def184a            7             8
0
On

You could do:

df1 <- df
df1[]<-as.numeric(factor(unlist(df), unique(c(t(df)))))
df1
  In_Node End_Node
1       1        2
2       1        3
3       1        4
4       1        5
5       6        1
6       7        8
2
On

You can match against the unique values. For a single vector, the code is straightforward:

match(vec, unique(vec))

The requirement to go across columns before rows makes this slightly tricky: you need to transpose the values first. After that, match them.

Finally, use [<- to assign the result back to a data.frame of the same shape as your original data (here x):

y = x
y[] = match(unlist(x), unique(c(t(x))))
y
  V2 V3
1  1  2
2  1  3
3  1  4
4  1  5
5  6  1
6  7  8

c(t(x)) is a bit of a hack:

  • t first converts the tibble to a matrix and then transposes it. If your tibble contains multiple data types, these will be coerced to a common type.
  • c(…) discards attributes. In particular, it drops the dimensions of the transposed matrix, i.e. it converts the matrix into a vector, with the values now in the correct order.