I have two dataframes;
df.tweets <- data.frame(Date = c(as.Date("2020-12-26"), as.Date="2020-12-15", as.Date="2021-01-12"), Tweet = c("abs", "ksk", "sks"), Ticker = c("NFLX", "GOOGL", "FB"))
Date Tweet Ticker
1 2020-12-26 abs NFLX
2 2020-12-15 ksk GOOGL
3 2021-01-12 sks FB
df.finance <- data.frame(Date = c(as.Date("2020-12-25"), as.Date="2020-12-27", as.Date="2020-12-15", as.Date="2021-01-12"), AR = c("0.23", "0.34", "0.44", "0.91"), Ticker = c("NFLX", "NFLX", "GOOGL", "FB"))
Date AR Ticker
1 2020-12-25 0.23 NFLX
2 2020-12-27 0.34 NFLX
3 2020-12-15 0.44 GOOGL
4 2021-01-12 0.91 FB
I would like to merge these two dataframes into something like this;
df.output <- data.frame(Date = c(as.Date("2020-12-27"), as.Date="2020-12-15", as.Date="2021-01-12"), Tweet = c("abs", "ksk", "sks"), Ticker = c("NFLX", "GOOGL", "FB"), AR = c("0.34", "0.44", "0.91"))
Date Tweet Ticker AR
1 2020-12-27 abs NFLX 0.34
2 2020-12-15 ksk GOOGL 0.44
3 2021-01-12 sks FB 0.91
Originally used this code;
df.new_analysis <- merge(df.new, df.finance, by = c("Date", "Ticker"), all.x = TRUE)
However, since some of the tweets are tweeted in the weekend, there is no stock-data that corresponds to the date. I found the tweets from Saturday and Sunday and changed the date to the following Monday (before merging the data frames) to find the next trading day.
df.tweets$weekday <- weekdays(df.tweets$Date)
df.tweets$Date <- as.Date(df.tweets$Date)
for (row in 1:nrow(df.tweets)){
if (df.tweets[row, 4] == "Saturday") { # 4 = Weekday column
df.tweets[row, 1] <- df.tweets[row, 1] + 2 # 1 = Date column
}
else if (df.tweets[row, 4] == "Sunday"){
df.tweets[row, 1] <- df.tweets[row, 1] + 1
}
}
However, now I run into the problem that there are some trading dates that are weekdays (not weekend), but the stock-exchange was closed due to a public holiday (e.g. Easter Monday).
Would there be an easier way to solve this than to add days to the day-column until there are no NA's? E.g. when merging the data frames to look for the next day in df.finance if there is no corresponding date.
I dont really care whether the data column is the date of the tweet or AR (abnormal return), as long as the AR is from the following and not the previous trading day.
One way using
dplyr
: