Merging two dataframes in R by date when some dates are NA

558 Views Asked by At

I have two dataframes;

df.tweets <- data.frame(Date = c(as.Date("2020-12-26"), as.Date="2020-12-15", as.Date="2021-01-12"), Tweet = c("abs", "ksk", "sks"), Ticker = c("NFLX", "GOOGL", "FB"))

           Date Tweet Ticker
1 2020-12-26   abs   NFLX
2 2020-12-15   ksk  GOOGL
3 2021-01-12   sks     FB

df.finance <- data.frame(Date = c(as.Date("2020-12-25"), as.Date="2020-12-27", as.Date="2020-12-15", as.Date="2021-01-12"), AR = c("0.23", "0.34", "0.44", "0.91"), Ticker = c("NFLX", "NFLX", "GOOGL", "FB"))

        Date   AR Ticker
1 2020-12-25 0.23   NFLX
2 2020-12-27 0.34   NFLX
3 2020-12-15 0.44  GOOGL
4 2021-01-12 0.91     FB

I would like to merge these two dataframes into something like this;

df.output <- data.frame(Date = c(as.Date("2020-12-27"), as.Date="2020-12-15", as.Date="2021-01-12"), Tweet = c("abs", "ksk", "sks"), Ticker = c("NFLX", "GOOGL", "FB"), AR = c("0.34", "0.44", "0.91"))

        Date Tweet Ticker   AR
1 2020-12-27   abs   NFLX 0.34
2 2020-12-15   ksk  GOOGL 0.44
3 2021-01-12   sks     FB 0.91

Originally used this code;

df.new_analysis <- merge(df.new, df.finance, by = c("Date", "Ticker"), all.x = TRUE)

However, since some of the tweets are tweeted in the weekend, there is no stock-data that corresponds to the date. I found the tweets from Saturday and Sunday and changed the date to the following Monday (before merging the data frames) to find the next trading day.

df.tweets$weekday <- weekdays(df.tweets$Date)
df.tweets$Date <- as.Date(df.tweets$Date)

for (row in 1:nrow(df.tweets)){
  if (df.tweets[row, 4] == "Saturday") {                # 4 = Weekday column
    df.tweets[row, 1] <- df.tweets[row, 1] + 2          # 1 = Date column
  }
  else if (df.tweets[row, 4] == "Sunday"){
    df.tweets[row, 1] <- df.tweets[row, 1] + 1
  }
}

However, now I run into the problem that there are some trading dates that are weekdays (not weekend), but the stock-exchange was closed due to a public holiday (e.g. Easter Monday).

Would there be an easier way to solve this than to add days to the day-column until there are no NA's? E.g. when merging the data frames to look for the next day in df.finance if there is no corresponding date.

I dont really care whether the data column is the date of the tweet or AR (abnormal return), as long as the AR is from the following and not the previous trading day.

1

There are 1 best solutions below

0
On

One way using dplyr :

library(dplyr)

df.tweets %>%
  #Join the two dataframes
  full_join(df.finance, by = 'Ticker') %>%
  #Arrange the data by date
  arrange(Ticker, Date.x) %>%
  #Get the difference between the dates
  mutate(diff = Date.y - Date.x) %>%
  #For each ticker and date
  group_by(Ticker, Date.x) %>%
  #Select the first row where the 
  #difference is greater than equal to 0
  slice(which.max(diff >= 0)) %>%
  ungroup %>%
  #Select and rename columns.
  select(Date = Date.y, everything(), -diff, -Date.x)

#   Date       Tweet Ticker AR   
#  <date>     <chr> <chr>  <chr>
#1 2021-01-12 sks   FB     0.91 
#2 2020-12-15 ksk   GOOGL  0.44 
#3 2020-12-27 abs   NFLX   0.34