Looing one observation using loess

51 Views Asked by At

I am using the data from Rafa's book about data science. One of the exercises ask to plot a smooth function in order to link deaths in PR to the date. I am converting date as numeric to compute the smooth function with a span of 60 days, and both date and deaths have 1205 observations, but when I compute the fitted predictions for deaths based on date, I only get 1204 observations. There isn't any NA value in the dataset

When I use other data (smaller) I get the same number of observations for the fitted results and for the real data

I append the code:

library(tidyverse)
library(purrr)
library(pdftools)
library(dslabs)
fn <- system.file("extdata", "RD-Mortality-Report_2015-18-180531.pdf",
package="dslabs")
dat <- map_df(str_split(pdf_text(fn), "\n"), function(s){
s <- str_trim(s)
header_index <- str_which(s, "2015")[1]
tmp <- str_split(s[header_index], "\\s+", simplify = TRUE)
month <- tmp[1]
header <- tmp[-1]
tail_index <- str_which(s, "Total")
n <- str_count(s, "\\d+")
out <- c(1:header_index, which(n == 1),
which(n >= 28), tail_index:length(s))
s[-out] %>% str_remove_all("[^\\d\\s]") %>% str_trim() %>%
str_split_fixed("\\s+", n = 6) %>% .[,1:5] %>% as_tibble() %>%
setNames(c("day", header)) %>%
mutate(month = month, day = as.numeric(day)) %>%
gather(year, deaths, -c(day, month)) %>%
mutate(deaths = as.numeric(deaths))
}) %>%
mutate(month = recode(month,"JAN" = 1, "FEB" = 2, "MAR" = 3,
"APR" = 4, "MAY" = 5, "JUN" = 6,
"JUL" = 7, "AGO" = 8, "SEP" = 9,
"OCT" = 10, "NOV" = 11, "DEC" = 12)) %>%
mutate(date = make_date(year, month, day)) %>%
filter(date <= "2018-05-01")


span=60/1205

smooth1<-loess(deaths~as.numeric(date),span=span, degree=1 data = dat)
1

There are 1 best solutions below

2
Nir Graham On

The 181st record in the data has NA for 'deaths' so the predict is skipping this ...

dat2[181,]
# A tibble: 1 × 5
    day month year  deaths  date
  <dbl> <dbl> <chr>  <dbl> <dbl>
1    29     2 2016      NA 16860