I'd like to use the tiktoken python module with the R tidyverse mutate function. The module works fine on its own but throws an error when used in a tidyverse mutate statement. The error is:
Error in cnd_type():
! cnd is not a condition object
Here's a reproducible example:
library(reticulate)
tiktoken <- import("tiktoken")
encoding <- tiktoken$encoding_for_model("gpt-4")
#This works fine
prompt <- "John Lennon"
length(encoding$encode(prompt))
#Create df
beatles <- data.frame(name=c("John Lennon", "Paul McCartney",
"George Harrison", "Ringo Starr"),
instrument=c("Guitar", "Bass", "Guitar", "Drums"))
#run tiktoken on each cell of the name column
#this produces the error
beatles %>%
mutate(n_tokens = length(encoding$encode(name)))
Thanks in advance.
I got it to work by using sapply to apply the function to each element of the vector.
You can also make this faster using encode_batch():