reading filtered data from socrata in R

313 Views Asked by At

does anyone know how to filter data automatically based on date_of_incident from socrata dataset in R in the first step of import to speed up read time?

this is what I have so far

token <- "n15hFiXqJU6DBItiSjA4jWD2U"
PoliceIncidents <- read.socrata("https://www.dallasopendata.com/resource/qv6i-rri7.csv", app_token = token)

#filter police incident data to 2019 to present

PoliceIncidents2019to2020 <- PoliceIncidents %>% filter(servyr > 2018)

here is the source data https://www.dallasopendata.com/Public-Safety/Police-Incidents/qv6i-rri7/data

2

There are 2 best solutions below

1
On

For big csvs, I like the package vroom from tidyverse. It's a lot faster than read_csv. With vroom, it's often easier to swallow the whole thing, then filter.

library(vroom)
library(tidyverse)

df_raw<-vroom('Police_Incidents.csv')
occurence_2019<-df_raw %>%
  filter(`Year1 of Occurrence`>=2019)

This only took like 10 seconds.

0
On

You can use filters in your original query to only pull incidents since 2019. This will speed up the read process, mostly from the server response that won't need to pass as much data. You'll need to use the "API field name" to construct the query.

In this case:

PoliceIncidents <- read.socrata("https://www.dallasopendata.com/resource/qv6i-rri7.csv?servyr > 2018")