I have 2 large dataframes respectively with the following columns:
> colnames(uci_ce_inv) [1] "Ndg" "CTP" "CNT_ID" "NOMINAL_EUR" "FXD_CCYPAIR" "ISIN" "SICOVAM" "TRN_ID" "SOURCE" "INCLUDED" [11] "EXCLUDED"
> colnames(assess_data) [1] "GROUP_SOURCE" "GROUP" "MD_ISS_NAME" "MD_ISS_RATING" "MD_GROUP" "MD_SECTOR" "MD_CCY" "MD_COUNTRY" [9] "MD_REGION" "MD_SEC_TYPE"
I'd like to perform an inner join based on the partial match between uci_ce_inv$TRN_ID and assess_data$GROUP. First of all, in order to do that I define relevant vectors for matching as characters:
uci_ce_inv$TRN_ID <- as.character(uci_ce_inv$TRN_ID) assess_data$GROUP <- as.character(assess_data$GROUP)
I tried different ways to perform the matching but the resulting daframe is always empty:
method 1:
uci_ce_inv <- uci_ce_inv %>% fuzzy_inner_join(assess_data, by = c("TRN_ID"="GROUP"), match_fun=stri_detect_fixed)
method 2:
uci_ce_inv <- uci_ce_inv %>% fuzzy_inner_join(assess_data, by = c("TRN_ID"="GROUP"), match_fun=str_detect)
method 3:
uci_ce_inv <- uci_ce_inv %>% regex_inner_join(assess_data, by = c("TRN_ID"="GROUP"))
method 4: `partial_join <- function(x, y, by_x, pattern_y) { idx_x <- sapply(y[[pattern_y]], grep, x[[by_x]]) idx_y <- sapply(seq_along(idx_x), function(i) rep(i, length(idx_x[[i]])))
df <- dplyr::bind_cols(x[unlist(idx_x), , drop = F],
y[unlist(idx_y), , drop = F])
return(df)
} a <- partial_join(uci_ce_inv,assess_data, by_x="TRN_ID",pattern_y="GROUP") `
What's going wrong with this codes? Thanks in advance!
e.g.
` uci_ce_inv$TRN_ID=123456789
assess_data$GROUP=ewitwr/kdk/123456789/gfhdhfd`
all columns of assess_data for matchig rows should be added in the related uci_ce_inv row
If your tables are `data.table~ objects, you can use this :
The
nomatch = 0
ensures the inner join, otherwise you get a left join onassess_data
.If your tables are data frames, this should work: