I'm trying to merge two datasets by the only column with the same name, but the result is a Dataframe with one dataset after the another, without any actual merging on the same row.
This is an example
File1
ID Age
GBI0061M 20
GBI0067M 21
GBI0069M 24
File2
ID Var1
GHU008F 0,55
GBI0067M 2,01
GFB0045F 1,50
I would like a file with only the common row
Filemerged:
ID Age Var1
GBI0067 21 2,01
This is my R script
library(dplyr)
library(plyr)
File1 <- read.csv2("C:/Users/..............csv", sep = ";")
File2 <- read.csv("C:/Users.............csv", sep=";")
m3 <-merge(File1, File2, by.x = "ï..Codice", all.x= TRUE, all.y = TRUE)
or
m3 <- full_join(File1, File2, by.x = "ï..Codice", all.x= TRUE, all.y = TRUE)
I even tried with python .merge with the how="Outer" option with the same result. In excel the conditional formatting options does NOT recognize the same ID (execpt for the "i...Codice" even though they are the exact same string.. What should I do?
You can control which column to use when joining. In the example below, the column to join on is specified explicitly.
Generic observations
c("ID" = "ID")
. You could change that to reflect different columns available in each of the data setsx
andy
as suffixes.by.x = "ï..Codice"
and likely was created throughmake.names("ï Codice")
. Whereas the string on it's own may be a synthetically correct column name it would be better to rename it to something easier to handle likeid_codice
Created on 2022-04-12 by the reprex package (v2.0.1)