Why is left_join creating NAs when values seem to match in the by="x" argument?

271 Views Asked by At

I am trying to carry out a left_join between two dataframes, called multi_scenario and production_targets. I am trying to carry out the join based on the following code, using a left_join based on the matched column "mean_needed" :

library(dplyr)
comb <- left_join(multi_scenario, production_targets, by = "mean_needed")

Here are my two dataframes

1.multi_scenario:

structure(list(scenario_ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 
5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
4L, 4L, 5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L), yield.spect = c(3, 
13, 19, 22, 23, 14, 16, 20, 23, 1, 7, 16, 26, 35, 4, 6, 10, 33, 
47, 5, 6, 10, 28, 64, 7, 8, 10, 29, 59, 7, 12, 28, 35, 36, 12, 
13, 15, 27, 58, 13, 16, 25, 57, 4, 12, 22, 33, 54, 2, 17, 28, 
29, 50, 19, 21, 23, 28, 35, 14, 17, 25, 52, 21, 23, 24, 26, 45, 
20, 21, 23, 30, 45, 9, 21, 31, 57, 12, 14, 50, 0, 10, 37, 44, 
55, 2, 9, 34, 48, 54, 3, 28, 31, 40, 44, 13, 17, 28, 38, 50, 
14, 26, 29, 34, 49, 13, 30, 31, 37, 41, 14, 15, 17, 44, 64, 11, 
13, 15, 56, 57, 7, 25, 30, 44, 47, 2, 30, 32, 38, 58, 18, 25, 
26, 36, 55, 13, 29, 32, 34, 57, 13, 26, 38, 40, 49, 1, 4, 44, 
55, 63, 10, 21, 36, 40, 58, 23, 25, 38, 57, 6, 18, 44, 50, 54, 
17, 27, 32, 39, 63, 14, 20, 34, 51, 60, 5, 6, 55, 56, 57, 11, 
27, 50, 64, 22, 25, 40, 44, 54, 18, 24, 41, 43, 59, 24, 26, 35, 
45, 63, 7, 21, 37, 62, 64, 13, 37, 46, 48, 49, 3, 19, 46, 62, 
7, 50, 63, 65, 28, 30, 37, 49, 56, 2, 38, 42, 59, 64, 21, 30, 
47, 53, 56, 13, 31, 41, 59, 62, 8, 27, 49, 62, 65, 20, 21, 51, 
60, 31, 36, 39, 45, 62, 36, 37, 42, 45, 60, 18, 33, 52, 56, 59, 
43, 44, 51, 27, 43, 49, 53, 61, 30, 41, 47, 53, 62, 18, 49, 53, 
60, 64), number_parcel = c(1000, 1000, 1000, 1000, 1000, 2000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 2000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 2000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 2000, 1000, 1000, 1000, 2000, 2000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 2000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 2000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 2000, 2000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 2000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 2000, 2000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 
1000, 1000, 1000, 1000, 1000, 1000, 1000), mean_needed = c(15.9204, 
15.9204, 15.9204, 15.9204, 15.9204, 17.2471, 17.2471, 17.2471, 
17.2471, 17.2471, 17.2471, 17.2471, 17.2471, 17.2471, 19.9005, 
19.9005, 19.9005, 19.9005, 19.9005, 22.5539, 22.5539, 22.5539, 
22.5539, 22.5539, 22.5539, 22.5539, 22.5539, 22.5539, 22.5539, 
23.8806, 23.8806, 23.8806, 23.8806, 23.8806, 25.2073, 25.2073, 
25.2073, 25.2073, 25.2073, 25.2073, 25.2073, 25.2073, 25.2073, 
25.2073, 25.2073, 25.2073, 25.2073, 25.2073, 25.2073, 25.2073, 
25.2073, 25.2073, 25.2073, 25.2073, 25.2073, 25.2073, 25.2073, 
25.2073, 26.534, 26.534, 26.534, 26.534, 27.8607, 27.8607, 27.8607, 
27.8607, 27.8607, 27.8607, 27.8607, 27.8607, 27.8607, 27.8607, 
27.8607, 27.8607, 27.8607, 27.8607, 27.8607, 27.8607, 27.8607, 
29.1874, 29.1874, 29.1874, 29.1874, 29.1874, 29.1874, 29.1874, 
29.1874, 29.1874, 29.1874, 29.1874, 29.1874, 29.1874, 29.1874, 
29.1874, 29.1874, 29.1874, 29.1874, 29.1874, 29.1874, 30.5141, 
30.5141, 30.5141, 30.5141, 30.5141, 30.5141, 30.5141, 30.5141, 
30.5141, 30.5141, 30.5141, 30.5141, 30.5141, 30.5141, 30.5141, 
30.5141, 30.5141, 30.5141, 30.5141, 30.5141, 30.5141, 30.5141, 
30.5141, 30.5141, 30.5141, 31.8408, 31.8408, 31.8408, 31.8408, 
31.8408, 31.8408, 31.8408, 31.8408, 31.8408, 31.8408, 33.1675, 
33.1675, 33.1675, 33.1675, 33.1675, 33.1675, 33.1675, 33.1675, 
33.1675, 33.1675, 33.1675, 33.1675, 33.1675, 33.1675, 33.1675, 
33.1675, 33.1675, 33.1675, 33.1675, 33.1675, 33.1675, 33.1675, 
33.1675, 33.1675, 34.4942, 34.4942, 34.4942, 34.4942, 34.4942, 
35.8209, 35.8209, 35.8209, 35.8209, 35.8209, 35.8209, 35.8209, 
35.8209, 35.8209, 35.8209, 35.8209, 35.8209, 35.8209, 35.8209, 
35.8209, 35.8209, 35.8209, 35.8209, 35.8209, 37.1476, 37.1476, 
37.1476, 37.1476, 37.1476, 37.1476, 37.1476, 37.1476, 37.1476, 
37.1476, 38.4743, 38.4743, 38.4743, 38.4743, 38.4743, 38.4743, 
38.4743, 38.4743, 38.4743, 38.4743, 38.4743, 38.4743, 38.4743, 
38.4743, 38.4743, 38.4743, 38.4743, 38.4743, 38.4743, 38.4743, 
38.4743, 38.4743, 38.4743, 39.801, 39.801, 39.801, 39.801, 39.801, 
41.1277, 41.1277, 41.1277, 41.1277, 41.1277, 41.1277, 41.1277, 
41.1277, 41.1277, 41.1277, 41.1277, 41.1277, 41.1277, 41.1277, 
41.1277, 42.4544, 42.4544, 42.4544, 42.4544, 42.4544, 42.4544, 
42.4544, 42.4544, 42.4544, 42.4544, 42.4544, 42.4544, 42.4544, 
42.4544, 43.7811, 43.7811, 43.7811, 43.7811, 43.7811, 43.7811, 
43.7811, 43.7811, 43.7811, 43.7811, 46.4345, 46.4345, 46.4345, 
46.4345, 46.4345, 46.4345, 46.4345, 46.4345, 46.4345, 46.4345, 
46.4345, 46.4345, 46.4345, 49.0879, 49.0879, 49.0879, 49.0879, 
49.0879)), row.names = c(NA, -277L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x0000000000321ef0>)
  1. production_targets:
structure(list(rel_prod = c(0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 
0.8, 0.85, 0.9, 0.95, 1, 1.05, 1.1, 1.15, 1.2, 1.25, 1.3, 1.35, 
1.4, 1.45, 1.5, 1.55, 1.6, 1.65, 1.7, 1.75, 1.8, 1.85, 1.9, 1.95, 
2), production_target = c(66335, 72968.5, 79602, 86235.5, 92869, 
99502.5, 106136, 112769.5, 119403, 126036.5, 132670, 139303.5, 
145937, 152570.5, 159204, 165837.5, 172471, 179104.5, 185738, 
192371.5, 199005, 205638.5, 212272, 218905.5, 225539, 232172.5, 
238806, 245439.5, 252073, 258706.5, 265340), mean_needed = c(13.267, 
14.5937, 15.9204, 17.2471, 18.5738, 19.9005, 21.2272, 22.5539, 
23.8806, 25.2073, 26.534, 27.8607, 29.1874, 30.5141, 31.8408, 
33.1675, 34.4942, 35.8209, 37.1476, 38.4743, 39.801, 41.1277, 
42.4544, 43.7811, 45.1078, 46.4345, 47.7612, 49.0879, 50.4146, 
51.7413, 53.068)), row.names = c(NA, -31L), class = "data.frame")

I was expecting the above code to add two new columns (rel_production and production_target) to my new dataframe, comb, with associated values. However, for some reason, my new dataframe has some areas where NAs are being shown instead of the expected values. This only seems to be happening for certain observations of mean_needed. But when I look at the values for mean_needed in the two dataframes I am trying to join, they look identical. There aren't any extra hidden spaces, and both columns are numeric. I include a print to show that mean_needed values seem to be the same in both columns. Is there a reason I am getting NAs instead of expected values? Thanks

comb dataframe, showing NAs for certain mean_values. This allocation of NA is repeated at other mean_values after the join

Values I am trying to left join by:

production_targets$mean_needed

13.2670 14.5937 15.9204 17.2471 18.5738 19.9005 21.2272 22.5539 23.8806 25.2073 26.5340 27.8607 29.1874 30.5141 31.8408 33.1675 34.4942 35.8209 37.1476 38.4743 39.8010 41.1277 42.4544 43.7811 45.1078 46.4345 47.7612 49.0879 50.4146 51.7413 53.0680
comb$mean.needed %>% unique()

14.5937 15.9204 17.2471 18.5738 19.9005 21.2272 22.5539 25.2073 26.5340 27.8607 29.1874 30.5141 31.8408 33.1675 34.4942 35.8209 37.1476 38.4743 39.8010 41.1277 42.4544 43.7811 45.1078 46.4345 47.7612 49.0879 27] 50.4146
0

There are 0 best solutions below