execution speed of string in if statement

38 Views Asked by At

I'm trying to do an automatic code, I have build a loop to match people in my 2 databases. here an example of thoses two database:

some library might be needed so here all I used:

library(readr)
library(stringi)
library(stringr)
library(dplyr)
library(tidyr)
library(readr)
library(rlang)
library(Hmisc)
library(sqldf)
library(tcltk)
library(tcltk2)
library(gWidgets2)
library(gWidgets2tcltk)
df1 <- data.frame(
  id = c(1, 2, 3),
  name = c("John", "Jane", "Jim"),
  age = c(25, 30, 35)
)

df2 <- data.frame(
  id_2 = c(4, 5, 3,9),
  name_2 = c("Johny", "Janey", "Jim","Gar"),
  age_2 = c(26, 31, 35,NA)
)

and I have some condition in string that I got by readline():

#condtion I got by readline()
condition =c("df1$id[i]==df2$id_2[j]","df1$name[i]==df2$name_2[j]","df1$age[i]==df2$age_2[j] && !is.na(df2$age_2[j])")

I wanna have a loop that apply a score for every condition respected and put people with a minimal score of similarity in a new dataframe.

here what I tried but it take so much time to execute:

#new dataframe that assign similarity score between people
df3 <- data.frame(id=integer(), id_2=integer(), score=integer())


#only way I know to use string condition in if statement
  start_time <- Sys.time()
  
# loop to search on df1 and df2
  for (i in 1:nrow(df1)){
    for (j in 1:nrow(df2)){
      score = 0
      
      # Vérifie condition
      for (k in 1:length(condition)){
      if(eval(parse(text = condition[k]))){
        score = score + 1
      }
      }
      
      #add to new dataframe if condition verified 
      if(score == (length(condition))/2){
        df3 <- rbind(df3, data.frame(id=df1[i, 1], id_2=df2[j, 1], score=score))
    }
    }
  }
  
  #system to verify time spend
  end_time <- Sys.time()
  elapsed_time <- difftime(end_time, start_time, units = "secs")
  elapsed_time

View(df3)
#the result of df3:
id  id_2  score
3   3      3
#then just need to merge by ID and id_2 to get all other information 
#what you should get after merging
id  id_2  score  name  age  name_2 age_2
3   3      3     "Jim"  35  "Jim"  35

I need to repoduce this for 2 dataframe of 683870 * 3681

I already did a sql code that is almost instant but there no score so some client with some small difference are left out. Im also gonna add a readline for the minimal score intended later.

edit: I got the condition by readline() since I try to make the code accesible for not regular R user:

condition = readline("what data you wanna compare exemple: [data1==data2] ")

for the if(score == (length(condition))/2) I know its not useful here but its just for the exemple I will add a readline() for the score too

0

There are 0 best solutions below