Skip to content

Specification for GTFS Feed Archive Tool

ed-g edited this page Aug 20, 2013 · 11 revisions

GTFS Feed Archive Tool

Output Format

The archive of all files will be named

"Oregon-GTFS-feeds-YYYY-MM-DD.zip" with YYYY-MM-DD as the current date

and the feeds changed since a given date yyyy-mm-dd will be named

"Oregon-GTFS-updated-from-yyyy-mm-dd-to-YYYY-MM-DD.zip"

Each zip file will itself contain a number of GTFS zip files, and also a meta information file "last-updates.csv".

last-updates.csv will have a header, then a row listing each GTFS zip file, with one column giving the zip file names, another column giving the most recent update time for that file (in UTC time zone), a column with a unique feed name, and a column with the URL where the feed was downloaded (which may or may not be an active URL for future downloads, hence the name historical_download_url). Additional columns may be added later however existing columns will be preserved. For instance,

===> last-updates.csv <===  
zip_file_name,most_recent_update,feed_name,historical_download_url
a.zip,"2013-03-17T12:13:14Z",a,http://example.com/A.zip
b.zip,"2013-02-14T11:12:13Z",b,http://example.com/B.zip

Requesting archives

Archives can be requested either with a web interface, or by hitting a URL using an automated tool. The requests will be authenticated using a Username/Password or API key, since if an unauthorized user accidentally or intentionally requested many archives, they could overload the server resulting in denial of service.

When an archive has been requested, the tool will give the URL where the archive will eventually be available, for instance http://gtfs-archive-server/full/Oregon-GTFS-feeds-2013-04-16.zip or http://gtfs-archive-server/changed/Oregon-GTFS-updated-from-2013-01-01-to-2013-04-16.zip

Since it may take time to generate the archive, the client will need to wait before downloading it. In the web interface the user will be notified when the archive is ready to download.

In the API the client can poll the download URL, if the archive is not ready the server will return a HTTP status of "204 No Content", which means the client should retry. Once it is ready the server will return a status of "200 OK" along with the zip archive. The status can be polled easily in a script for example using "curl -u user:password -I http://gtfs-archive-server/full/Oregon-GTFS-feeds-2013-04-16.zip" and then starting a download when the status becomes 200.

Input format

The input CSV file contains a list of GTFS feeds. Each has an alpha-numeric (no spaces, but underscores and minus sign are OK) feed_name that is easy for computers to handle, a human-readable feed_description, and a link to the transit agency gtfz.zip file. The feed_name must be globally unique among all users and all CSV input files and not change between updates to the input CSV file.

For this reason it is probably a bad idea to use transit agency name acronyms as the feed_name, since many agencies will share the same acronym and their feeds could easily get confused. Instead use a full name, for instance "benton_county_or" and "basin_transit_service_klamath_falls" instead of "bc" or "bts".

feed_name,feed_description,gtfs_zip_url  
trimet_portland,"Tri-Met: Portland Oregon Metro","http://developer.trimet.org/schedule/gtfs.zip"  
cherriots_salem_kaiser,"Cherriots, Salem-Kaiser Oregon","http://www.cherriots.org/developer/gtfs.zip"  

Command-line tool

For advanced users, the application supports a simple command line mode where you feed it multiple CSV files containing GTFS feed locations. This version can be installed on any sufficiently powerful machine with a late-model Java runtime environment.

There are three script-able commands:

(a) Check the feeds and see if they're all reachable. Would also warn on credential problems for password feeds.

(b) Refresh the feeds. In other words check all feeds and update the built-in database with when any have changed since the last refresh.

Almost all computers provide a facility to run a command repeatedly and automatically on a schedule. On Linux and Macintosh this is known as "cron." I believe the windows version is "Task Scheduler".

(c) Build a ZIP archive of feeds which have changed since a certain date, and copy the archive into a given folder.