R Reticulate--Use Python Module with Tidyverse Mutate

188 Views Asked by At

I'd like to use the tiktoken python module with the R tidyverse mutate function. The module works fine on its own but throws an error when used in a tidyverse mutate statement. The error is:

Error in cnd_type(): ! cnd is not a condition object

Here's a reproducible example:

library(reticulate)
tiktoken <- import("tiktoken")
encoding <- tiktoken$encoding_for_model("gpt-4")

#This works fine
prompt <- "John Lennon"
length(encoding$encode(prompt))

#Create df
beatles <- data.frame(name=c("John Lennon", "Paul McCartney",
                             "George Harrison", "Ringo Starr"),
instrument=c("Guitar", "Bass", "Guitar", "Drums"))

#run tiktoken on each cell of the name column
#this produces the error
beatles %>% 
  mutate(n_tokens = length(encoding$encode(name)))

Thanks in advance.

2

There are 2 best solutions below

3
Nick ODell On

I got it to work by using sapply to apply the function to each element of the vector.

beatles %>% 
  mutate(n_tokens = sapply(name, function (x) length(encoding$encode(x))))

You can also make this faster using encode_batch():

beatles %>% 
  mutate(n_tokens = sapply(encoding$encode_batch(name), function (x) length(x)))
1
Zhiqiang Wang On

You could also try a tidyverse approach with rowwise()

beatles %>% 
  rowwise() %>% 
  mutate(n_tokens = length(encoding$encode(name)))