Using leftjoin to merge csv files based on multiple columns and receiving NAs

61 Views Asked by At

I have GPS collared data of lions and giraffes for a masters project. I've had to seperate the animals into different csv and shp files due to the high volume of data. I also have multiple files for each animal where I have added columns such as the season the gps location was in or the elevation, NDVI, etc. I now want to combine all these files together and have been using RStudio and the leftjoin and tried other methods such as merge, st_join, fulljoin etc. Example:

combined_datacas <- left_join(
  waterholescas, casdatacov,
  by=c("Lat", "Lon", "Timestamp", "animal_id"))

Here is an example of my data from waterholecas:

      animal_id  Timestamp distance_to_nearest_waterhole   Lon   Lat
1       Cas 2012-04-08 02:39:00  3333.13041830703   721363.1 7166414
2       Cas 2012-04-08 06:39:00  3331.69779980209 721354.5 7166301
3       Cas 2012-04-08 10:39:00  3464.1329308895 722307.2 7166891
4       Cas 2012-04-08 14:39:00  2417.74343771813 724203.8 7167332
5       Cas 2012-04-08 18:39:00  2983.85441244622 725335.3 7168174
6       Cas 2012-04-08 22:39:00  2807.4428112668 725241.8 7167996

Here is an example of casdatacov:

animal_id Timestamp realrand type Scaled_NDVI landform    
   landcover elevation   Lon     Lat
1       Cas 2012-04-08 02:39:00        1 real      0.3186 Lower slope (flat) Natural Grassland      1032 721363.1 7166414
2       Cas 2012-04-08 06:39:00        1 real      0.3186 Lower slope (flat) Natural Grassland      1030 721354.5 7166301
3       Cas 2012-04-08 10:39:00        1 real      0.3408 Lower slope (flat) Natural Grassland      1031 722307.2 7166891
4       Cas 2012-04-08 14:39:00        1 real      0.3582 Lower slope (flat) Natural Grassland      1032 724203.8 7167332
5       Cas 2012-04-08 18:39:00        1 real      0.3378 Upper slope (flat) Natural Grassland      1033 725335.3 7168174
6       Cas 2012-04-08 22:39:00        1 real      0.3494             Valley Natural Grassland      1032 725241.8 7167996

Here is an example of the combined file with NAs and without: Image of r output table

I also get this warning message:

Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
  display list redraw incomplete
2: In doTryCatch(return(expr), name, parentenv, handler) :
  invalid graphics state
3: In doTryCatch(return(expr), name, parentenv, handler) :
  invalid graphics state
4: In left_join(waterholescas, casdatacov, by = c("Lat", "Lon", "animal_id",  :
  Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 3434 of `x` matches multiple rows in `y`.
ℹ Row 139 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship = "many-to-many"` to silence this warning.

When I do this, the result table has NAs in multiple places but some rows have combined successfully. In the original files I have no NAs. When I manually look to ensure that the lat, lon, timestamp, and animal id are the same on both files that I am joining, I am able to find them but there are still NAs in the combined file. Has anyone had a similar problem?

0

There are 0 best solutions below