Skip to content

Commit

Permalink
Merge branch 'development'
Browse files Browse the repository at this point in the history
  • Loading branch information
Pieter Colpaert committed Nov 8, 2015
2 parents d002ae8 + 03ae2c6 commit 8385abc
Show file tree
Hide file tree
Showing 23 changed files with 313 additions and 137 deletions.
28 changes: 2 additions & 26 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,29 +1,5 @@
logs
*.log

# Runtime data
pids
*.pid
*.seed

# Directory for instrumented libs generated by jscoverage/JSCover
lib-cov

# Coverage directory used by tools like istanbul
coverage

# Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files)
.grunt

# Compiled binary addons (http://nodejs.org/api/addons.html)
build/Release

# Dependency directory
# Deployed apps should consider commenting this line out:
# see https://npmjs.org/doc/faq.html#Should-I-check-my-node_modules-folder-into-git
node_modules

arrivals
dates
departures
stop_times
.services
.trips
6 changes: 6 additions & 0 deletions .npmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
sample_feed
node_modules
.services
.trips
*.jsonstream
test
21 changes: 15 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,35 @@
# Guide: GTFS Linked Connections
# GTFS to Linked Connections

Transforms a GTFS file towards a directed acyclic graph of "connections".

A connection is the combination of a departure and its successive arrival of the same trip.
Our goal is to retrieve a list of connections that is sorted by departure time, better known as a Directed Acyclic Graph. This way, routeplanning algorithms can be performed.

## Install it

```
npm install -g gtfs2lc
```

## Use it

First, __unzip__ your GTFS file to a place on your disk using e.g., `unzip gtfs.zip /tmp`

Now, we need to make sure that a couple of files are ordered in a specific fashion, not required by the GTFS spec. You can do these orderings through bash as follows:
* __stop_times.txt__ must be ordered by `trip_id` and `stop_sequence`. You can do this using this command: `{ head -n 1 stop_times.txt ; tail -n +2 stop_times.txt | sort; } > stop_times2.txt ; mv stop_times2.txt stop_times.txt`. Mind that the number of the columns are also not standardized by GTFS and you might need to tweak the sort command.
* __trips.txt__ must be ordered by `trip_id`. You can do this using this command: `{ head -n 1 trips.txt ; tail -n +2 trips.txt | sort; } > trips2.txt ; mv trips2.txt trips.txt`.
* __calendar.txt__ must be ordered by `service_id`. You can do this using this command: `{ head -n 1 calendar.txt ; tail -n +2 calendar.txt | sort; } > calendar2.txt ; mv calendar2.txt calendar.txt`.
* __calendar_dates.txt__ must be ordered by `service_id`. You can do this using this command: `{ head -n 1 calendar_dates.txt ; tail -n +2 calendar_dates.txt | sort; } > calendar_dates2.txt ; mv calendar_dates2.txt calendar_dates.txt`.
* __stop_times.txt__ must be ordered by `trip_id` and `stop_sequence`. Mind that the number of the columns are also not standardized by GTFS and you might need to tweak the sort command in this repo.
* __calendar.txt__ must be ordered by `service_id`.
* __calendar_dates.txt__ must be ordered by `service_id`.

We've enclosed a bash script which ensures this for you. It isn't perfect however and may not return the desired result.

If you've ensured this, you can install this library using: `npm install -g gtfs2lc` and use it as follows:

```bash
gtfs2lc /path/to/extracted/gtfs
gtfs2lc -p /path/to/extracted/gtfs -f csv
```

For more options, check `gtfs2lc --help`

## How it works (for contributors)

We convert `stop_times.txt` to a stream of connection rules. These rules need a certain explanation about on which days they are running, which can be retrieved using the `trip_id` in the connection rules stream.
Expand Down
12 changes: 12 additions & 0 deletions bin/gtfs2lc-sort.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/bash
[[ $# == 1 ]] && {
cd $1 && {
echo Sorting files in directory $1;
{ head -n 1 stop_times.txt ; tail -n +2 stop_times.txt | sort -t , -k 1,5n ; } > stop_times2.txt ; mv stop_times2.txt stop_times.txt &
{ head -n 1 trips.txt ; tail -n +2 trips.txt | sort -t , -k 3 ; } > trips2.txt ; mv trips2.txt trips.txt &
{ head -n 1 calendar.txt ; tail -n +2 calendar.txt | sort -t , -k 1; } > calendar2.txt ; mv calendar2.txt calendar.txt &
{ head -n 1 calendar_dates.txt ; tail -n +2 calendar_dates.txt | sort -t , -k 1n; } > calendar_dates2.txt ; mv calendar_dates2.txt calendar_dates.txt &
} ;
} || {
echo Give a path to the gtfs dir as the only argument;
}
54 changes: 45 additions & 9 deletions bin/gtfs2lc.js
Original file line number Diff line number Diff line change
@@ -1,22 +1,58 @@
#!/usr/bin/env node

var program = require('commander'),
Mapper = require('../lib/gtfs2lc.js');
gtfs2lc = require('../lib/gtfs2lc.js'),
N3 = require('n3'),
jsonldstream = require('jsonld-stream');

console.error("GTFS to linked connections converter use --help to discover more functions");

program
.version('0.1.0')
.option('-p, --path <path>', 'Path to sorted GTFS files (default: ./)')
.parse(process.argv);
.version('0.3.0')
.option('-p, --path <path>', 'Path to sorted GTFS files (default: ./)')
.option('-f, --format <format>', 'Format of the output. Possibilities: csv, ntriples, turtle, json or jsonld (default: json)')
.parse(process.argv);

if (!program.path) {
program.path = './'
program.path = './';
}

var mapper = new Mapper();
var mapper = new gtfs2lc.Connections();
mapper.resultStream(program.path, function (stream) {
stream.on('data', function (connection) {
console.log(JSON.stringify(connection));
});
if (!program.format || program.format === "json") {
stream.on('data', function (connection) {
console.log(JSON.stringify(connection));
});
} else if (program.format === 'csv') {
//print header
console.log('"id","departureStop","departureTime","arrivalStop","arrivalTime","trip"');
var count = 0;
stream.on('data', function (connection) {
console.log(count + ',' + connection["departureStop"] + ',' + connection["departureTime"].toISOString() + ',' + connection["arrivalStop"] + ',' + connection["arrivalTime"].toISOString() + ',' + connection["trip"]);
count ++;
});
} else if (['ntriples','turtle','jsonld'].indexOf(program.format) > -1) {
stream = stream.pipe(new gtfs2lc.Connections2Triples()); //TODO: add configurable base uris here.
if (program.format === 'ntriples') {
stream = stream.pipe(new N3.StreamWriter({ format : 'N-Triples'}));
} else if (program.format === 'turtle') {
stream = stream.pipe(new N3.StreamWriter({ prefixes: { lc: 'http://semweb.mmlab.be/ns/linkedconnections#', gtfs : 'http://vocab.gtfs.org/terms#', xsd: 'http://www.w3.org/2001/XMLSchema#' } }));
} else if (program.format === 'jsonld') {
var context = {
'@context' : {
lc: 'http://semweb.mmlab.be/ns/linkedconnections#',
gtfs : 'http://vocab.gtfs.org/terms#',
xsd: 'http://www.w3.org/2001/XMLSchema#',
trip : { '@type' : '@id', '@id' : 'gtfs:trip' },
Connection : 'lc:Connection',
departureTime : { '@type' : 'xsd:dateTime', '@id' : 'lc:departureTime' },
departureStop : { '@type' : '@id', '@id' : 'lc:departureStop' },
arrivalStop : { '@type' : '@id', '@id' : 'lc:arrivalStop' },
arrivalTime : { '@type' : 'xsd:dateTime', '@id' : 'lc:arrivalTime' },
}
};
stream = stream.pipe(new jsonldstream.TriplesToJSONLDStream(context)).pipe(new jsonldstream.Serializer());
}
stream.pipe(process.stdout);
}
});
66 changes: 66 additions & 0 deletions lib/Connections2Triples.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
/**
* Pieter Colpaert © Ghent University - iMinds
* Combines connection rules, trips and services to an unsorted stream of connections
*/
var Transform = require('stream').Transform,
util = require('util'),
moment = require('moment');

var Connections2Triples = function (baseUris) {
Transform.call(this, {objectMode : true});
var defaultBaseUris = {
stops : 'http://example.org/stops/',
connections : 'http://example.org/connections/',
trips : 'http://example.org/trips/'
};
if (!baseUris) {
baseUris = defaultBaseUris;
} else if (!baseUris.stops) {
baseUris.stops = defaultBaseUris.stops;
} else if (!baseUris.trips) {
baseUris.trips = defaultBaseUris.trips;
} else if (!baseUris.connections) {
baseUris.connections = defaultBaseUris.connections;
}
this._baseUris = baseUris;
this._count = 0;
};

util.inherits(Connections2Triples, Transform);

Connections2Triples.prototype._transform = function (connection, encoding, done) {
var id = this._baseUris.connections + connection.departureTime + connection.departureStop + connection.trip;
this.push({
subject : id,
predicate :'http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
object : 'http://semweb.mmlab.be/ns/linkedconnections#Connection'
});
this.push({
subject : id,
predicate :'http://semweb.mmlab.be/ns/linkedconnections#departureStop',
object : this._baseUris.stops + connection.departureStop
});
this.push({
subject : id,
predicate :'http://semweb.mmlab.be/ns/linkedconnections#arrivalStop',
object : this._baseUris.stops + connection.arrivalStop
});
this.push({
subject : id,
predicate :'http://semweb.mmlab.be/ns/linkedconnections#departureTime',
object : '"' + connection.departureTime.toISOString() + '"^^http://www.w3.org/2001/XMLSchema#dateTime'
});
this.push({
subject : id,
predicate :'http://semweb.mmlab.be/ns/linkedconnections#arrivalTime',
object : '"' + connection.arrivalTime.toISOString() + '"^^http://www.w3.org/2001/XMLSchema#dateTime'
});
this.push({
subject : id,
predicate :'http://vocab.gtfs.org/terms#trip',
object : this._baseUris.trips + connection.trip
});
done();
};

module.exports = Connections2Triples;
29 changes: 17 additions & 12 deletions lib/ConnectionsBuilder.js
Original file line number Diff line number Diff line change
Expand Up @@ -24,18 +24,23 @@ ConnectionsBuilder.prototype._transform = function (connectionRule, encoding, do
this._tripsdb.get(connectionRule['trip_id'], function (error, trip) {
trip = JSON.parse(trip);
self._servicesdb.get(trip['service_id'], function (error, service) {
service = JSON.parse(service);
for (var i in service) {
var serviceDay = service[i];
var departureTime = moment(serviceDay, 'YYYYMMDD').add(departureDFM);
var arrivalTime = moment(serviceDay, 'YYYYMMDD').add(arrivalDFM);
self.push({
departureTime: departureTime,
arrivalTime: arrivalTime,
arrivalStop: connectionRule['arrival_stop'],
departureStop: connectionRule['departure_stop'],
trip: connectionRule['trip_id']
});
if (error) {
console.error('Error: Or you didn\'t sort the file, or there\'s an undocumented service_id in the data, or there\'s a bug in our code:', error);
process.exit();
} else {
service = JSON.parse(service);
for (var i in service) {
var serviceDay = service[i];
var departureTime = moment(serviceDay, 'YYYYMMDD').add(departureDFM);
var arrivalTime = moment(serviceDay, 'YYYYMMDD').add(arrivalDFM);
self.push({
departureTime: departureTime,
arrivalTime: arrivalTime,
arrivalStop: connectionRule['arrival_stop'],
departureStop: connectionRule['departure_stop'],
trip: connectionRule['trip_id']
});
}
}
done();
});
Expand Down
9 changes: 8 additions & 1 deletion lib/StreamIterator.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ var util = require('util'),

var StreamIterator = function (stream) {
this._stream = stream;
this._currentObject = null;
this._currentCB = null;
var self = this;
this._streamEnded = false;
Expand All @@ -16,6 +17,10 @@ var StreamIterator = function (stream) {

util.inherits(StreamIterator, EventEmitter);

StreamIterator.prototype.getCurrentObject = function () {
return this._currentObject;
};

StreamIterator.prototype.next = function (callback) {
var self = this;
this._currentCB = callback;
Expand All @@ -24,10 +29,12 @@ StreamIterator.prototype.next = function (callback) {
this._stream.once("readable", function () {
self.next(callback);
});
} else if (object && !this._streamEnded) {
} else if (object) {
this._currentObject = object;
callback(object);
} else {
//stream ended
this._currentObject = null;
callback(null);
}
};
Expand Down
53 changes: 53 additions & 0 deletions lib/gtfs2connections.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
var csv = require('fast-csv'),
ConnectionRules = require('./stoptimes/st2c.js'),
ConnectionsBuilder = require('./ConnectionsBuilder.js'),
Services = require('./services/calendar.js'),
through2 = require('through2'),
level = require('level');
fs = require('fs');

var Mapper = function (options) {
this._options = options;
};

/**
* Returns a resultStream for connections
* Step 1: Convert calendar_dates.txt and calendar.txt to service ids mapped to a long list of dates
* Step 2: Pipe these services towards a leveldb: we want to use them later.
* Step 3: also index trips.txt in a leveldb on key trip_id
* Step 4: create a stream of connection rules from stop_times.txt
* Step 5: pipe this stream to something that expands everything into connections and returns this stream.
* Caveat: coding this with numerous callbacks and streams, makes this code not chronologically ordered.
*/
Mapper.prototype.resultStream = function (path, done) {
var trips = fs.createReadStream(path + '/trips.txt', {encoding:'utf8', objectMode: true}).pipe(csv({objectMode:true,headers: true}));
var calendarDates = fs.createReadStream(path + '/calendar_dates.txt', {encoding:'utf8', objectMode: true}).pipe(csv({objectMode:true,headers: true}));
var services = fs.createReadStream(path + '/calendar.txt', {encoding:'utf8', objectMode: true}).pipe(csv({objectMode:true,headers: true})).pipe(new Services(calendarDates));
//Preparations for step 4
var connectionRules = fs.createReadStream(path + '/stop_times.txt', {encoding:'utf8', objectMode: true}).pipe(csv({objectMode:true,headers: true})).pipe(new ConnectionRules());

//Step 2 & 3: store in leveldb in 2 hidden directories
var tripsdb = level(path + '/.trips');
var servicesdb = level(path + '/.services');
var count = 0;
var finished = function () {
count ++;
//wait for the 2 streams to finish (services and trips) to write to the stores
if (count === 2) {
console.error("Indexing services and trips succesful!");
//Step 4 and 5: let's create our connections!
done(connectionRules.pipe(new ConnectionsBuilder(tripsdb, servicesdb)));
}
};

services.pipe(through2.obj(function (service, encoding, doneService) {
servicesdb.put(service['service_id'], service['dates'], {valueEncoding: 'json'}, doneService);
})).on('finish', finished);

trips.pipe(through2.obj(function (trip, encoding, doneTrip) {
tripsdb.put(trip['trip_id'], trip, {valueEncoding: 'json'}, doneTrip);
})).on('finish', finished);

};

module.exports = Mapper;
Loading

0 comments on commit 8385abc

Please sign in to comment.