GTFS - how to combine the protocol buffers and GTFS file?

2.8k Views Asked by At

I am trying to look at the New York City Subway Realtime GTFS Feeds. After a lot of reading around, I learned about Protocol Buffers and installed the protoc compiler.

New York City Transit has the file nyct-subway.proto.txt the first line says NYCT Subway extensions for the GTFS-realtime protocol. Is this supposed by be combined with gtfs-realtime-proto? I compiled the two protocol buffers separately and got the warning:

[libprotobuf WARNING google/protobuf/compiler/parser.cc:471] 
No syntax specified for the proto file. 
Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)

In Python wrote a line to call on whatever library protoc had created:

import gtfs_realtime_pb2, nyct_subway_pb2

Despite my previous installation effort, Python didn't know anything about import google.protobuf So it do sudo pip install protobuf.

At this point I am still not reading any data -- I can get a gtfs file with http://datamine.mta.info/mta_esi.php?key=<key>&feed_id=1 which is unreadable.

How do I combine this to read the data from the GTFS file?

2

There are 2 best solutions below

0
On

You can combine the two together by using the protobuf Python package. Download both .proto file and place it into docs/gtfs_proto, create a gtfs_proto folder for the output and then run:

export SRC_DIR=docs/gtfs_proto
export DST_DIR=gtfs_proto
protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/nyct-subway.proto
0
On

To further clarify Jamie's comment, you should be able to do something like this:

import urllib2
import gtfs_realtime_pb2, nyct_subway_pb2

... 

    // initialize the feed parser
    feed = gtfs_realtime_pb2.FeedMessage()

    // fetch the raw gtfs-realtime feed
    gtfs_raw = urllib2.urlopen("http://datamine.mta.info/mta_esi.php?key=<key>&feed_id=1").read()

    // parse the raw feed
    feed.ParseFromString(gtfs_raw)

    // access the data structure as needed
    print feed.header.timestamp
    print feed.header.gtfs_realtime_version

    for entity in feed.entity:
        // etc.

Alternate Method (Command Line + JSON)

Personally, I think protocol buffers and gtfs-realtime can be a pain. To skip the work, I wrote a standalone tool to convert GTFS-realtime into JSON:
https://github.com/harrytruong/gtfs_realtime_json

Just download (no install), and run: gtfs_realtime_json <feed_url>

Here's a sample JSON output.