Trying to access MTA turnstile data rapidly on the following page:
http://web.mta.info/developers/turnstile.html
I've been planning on looping through the page numbers and running fread or download.file to store the data and bind, but on some of the files I get and error. Here are two examples, one that works and one that doesn't. I notice the second file looks a little different:
test_mta_works = fread("http://web.mta.info/developers/data/nyct/turnstile/turnstile_161224.txt", sep = ',')
test_mta_wont_work = fread("http://web.mta.info/developers/data/nyct/turnstile/turnstile_140419.txt", sep = ',')
error I'm receiving on the second one:
Error in fread("http://web.mta.info/developers/data/nyct/turnstile/turnstile_140419.txt", :
Expected sep (',') but new line, EOF (or other non printing character) ends field 12 when detecting types from point 0: A002,R051,02-00-00,04-18-14,16:00:00,REGULAR,004575433,001558298,04-18-14,20:00:00,REGULAR,004575838,001558374
Any ideas what the issue might be and/or how to solve this? I tried using fill = T
but it created issues with the data.
Thanks!
EDIT
when using fill = T i get ouput as the following:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
1: A002 R051 02-00-00 04-12-14 00:00:00 REGULAR 4566812 1555499 04-12-14 04:00:00 REGULAR 4566850 1555508 04-12-14 08:00:00 REGULAR 4566875 1555536 04-12-14 12:00:00
2: A002 R051 02-00-00 04-13-14 08:00:00 REGULAR 4567968 1555789 04-13-14 12:00:00 REGULAR 4568069 1555842 04-13-14 16:00:00 REGULAR 4568278 1555903 04-13-14 20:00:00
3: A002 R051 02-00-00 04-14-14 16:00:00 REGULAR 4569148 1556362 04-14-14 20:00:00 REGULAR 4569786 1556420 04-15-14 00:00:00 REGULAR 4569949 1556447 04-15-14 04:00:00
4: A002 R051 02-00-00 04-16-14 00:00:00 REGULAR 4571423 1556965 04-16-14 04:00:00 REGULAR 4571442 1556966 04-16-14 08:00:00 REGULAR 4571486 1557049 04-16-14 12:00:00
5: A002 R051 02-00-00 04-17-14 08:00:00 REGULAR 4573294 1557587 04-17-14 12:00:00 REGULAR 4573469 1557848 04-17-14 16:00:00 REGULAR 4573800 1557901 04-17-14 20:00:00
6: A002 R051 02-00-00 04-18-14 16:00:00 REGULAR 4575433 1558298 04-18-14 20:00:00 REGULAR 4575838 1558374 NA NA
meanwhile the first file that doesnt require fill = T gives me the following:
C/A UNIT SCP STATION LINENAME DIVISION DATE TIME DESC ENTRIES EXITS
1: A002 R051 02-00-00 59 ST NQR456W BMT 12/17/2016 03:00:00 REGULAR 5967477 2022101
2: A002 R051 02-00-00 59 ST NQR456W BMT 12/17/2016 07:00:00 REGULAR 5967485 2022116
3: A002 R051 02-00-00 59 ST NQR456W BMT 12/17/2016 11:00:00 REGULAR 5967553 2022233
4: A002 R051 02-00-00 59 ST NQR456W BMT 12/17/2016 15:00:00 REGULAR 5967790 2022331
5: A002 R051 02-00-00 59 ST NQR456W BMT 12/17/2016 19:00:00 REGULAR 5968186 2022421
Using
na.strings
as a parameter forfread