Original project proposal

The Chicago Transit Authority and the University of Chicago’s Data Science for Social Good (DSSG) program have been exploring potential data projects for DSSG’s 2013 summer fellowship. This proposal lays out how bus service simulation rooted in Bayesian statistics could improve CTA service planning, and scopes out an initial summer project to explore this idea. Bus service simulation project ideas We can use CTA’s historical bus location and ridership data to simulate bus service across Chicago. Bus service simulation could enhance CTA’s bus service planning in two key ways:

Evaluating service changes Currently, service planners propose changes to bus service, and, after some friendly negotiation, schedule planning translates their service recommendations into a schedule. Bus Operations then executes the schedule.

Over the course of the next pick, CTA’s buses hit the streets and transmit their location and ridership counts back to the agency. As this raw bus data comes in, Planning Analytics aggregates it into performance metrics used by Service Planning and Scheduling to identify conditions that may warrant a service change - route crowding, on-time performance, bunching, big gaps, running time, and so on.

All told, it takes months for the agency to learn how a service tweak affected ridership patterns. But what if we could predict the impact of bus service changes on bus crowding before deploying a single vehicle under a new schedule?

By simulating future bus service, CTA could:

predict the impact of proposed service changes on route crowding before scheduling the next pick - the number of people on buses and the number of people left on the sidewalk at every stop.
visualize these ridership metrics across the city by using Open Trip Planner Analyst software.

In short, simulation could give service planning more tools to evaluate whether a proposed service change effectively aligns bus supply with demand, and help shorten the feedback loop of actionable crowding data from months to days.

It would let CTA to know more, faster. But it would also allow the agency to better compare the crowding impacts of alternative bus schedules - if we figure out how to quickly convert service level recommendations into proposed schedules.

Optimize service planning The first use case doesn’t replace service planning’s current method for figuring out service changes. It simply predicts the ridership impact of whatever proposals they come up with.

We could go further, though, use simulation to predict demand (passenger boardings) at every bus stop and optimize service levels needed to meet tomorrow’s demand.

Currently, service level recommendations are made using a simple and effective algorithm that proposes headways to meet projected demand on a route.

Here’s how service planners currently arrive at service level recommendations. First, they try to project a route’s future demand by drawing on historical ridership trends and their deep knowledge of the specific routes and the neighborhood they serve. Then, they use a simple and effective algorithm to propose headways that should meet that projected demand.

Simulation would enhance the first step in service planning - how CTA projects demand. Instead of looking at historical trends and making a best guess of a route’s future ridership, we could predict it.

Here’s how it would work: first, we would fit a statistical model to CTA’s historical ridership data. Then, we would use this model to simulate likely future passenger boardings at every bus stop. Finally, we would optimize headways to meet every route’s predicted demand. These headways would be optimized to deal with the natural variability and time-sensitivity of transit demand.

Bottom line: we could use simulation to predict future demand, allowing us to optimize future service levels proactively rather than reactively.

Summer project Fully implementing these ideas will take months of prototyping, software development, piloting, and full deployment. But we think we can take the first step over the summer using existing data.

Service optimization prototype As a first step, we propose building a working prototype of a service level optimization tool. It would likely be a web app modeled on the planning department’s existing Service Standards excel tool.

You’d be able to view charts of a route’s simulated demand and resulting optimized headways across the day. Certain routes might have multiple optimal headways, and the tool would allow you compare each one and understand the trade-offs.

Service evaluation prototype We also propose building a working prototype of a service change evaluation tool. The tool would display the predicted crowding impact of proposed service changes or new schedules on CTA bus routes.

The tool would consume proposed bus schedules, ideally in GTFS format. It would then run simulations in the background using bus location and ridership (AVL and APC) data and statistical bus demand and service models, described below. The user would view simulation results in a user-friendly interface.

Proposed bus schedules could be output by Haustus scheduling software, or potentially by the service level optimizer. So you could use the optimizer to develop headways based on simulated demand, turn those headways into dummy schedules, and plug those schedules into the service evaluation tool to simulate their impact on system operation and riders.

Together, these tools will allow CTA to better predict future demand and simulate how well proposed changes service - at the route or system level - will meet that demand.

Deliverables

Software and statistical models: the CTA would receive backend data processing scripts and statistical models, as well as the user-facing software tool. All code would be open source and reusable by anyone.
Documentation: all code would be extensively documented. We could also write a report documenting our method and findings. We also propose all knowledge to be open, so that an transit analytics community can grow around our work online, and other transit agencies can benefit around the foundation work we develop with CTA.
Blogging: we want our fellows to “work out loud” by blogging extensively during the summer about the policy problems they’re working on, data techniques they’re using, and the findings they’re turning up.

Process DSSG will provide one to two teams of fellows to work on the project. They’ll be working with a mentor, a professional statistician with extensive transit experience.

The team(s) will need substantial access to CTA’s planning department (domain experts and project end users) and to Mike Haynes (data expert) throughout the fellowship.

Fellows will spend the first two weeks, from June 3rd to June 14th, learning about transit policy, the CTA, and the day to day work of the planning department through interviews and field visits.

After this initial training phase, fellows will need weekly feedback from the planning department, the key “client.”

How simulation works Our approach to simulation relies on Bayesian statistics, a branch of statistics which enables the simulation of complex outcomes.

Simulation is increasingly used in academic transit research, but hasn’t been applied to industry yet. CTA would be a pioneer in the area.

At a high level, here’s how it would work:

Build demand model. Before we can simulate bus trips, we need to predict how many people are likely show up at every bus stop in the system.

To do this, we would first look at the distribution of passenger boardings at every stop during a certain time period. (Think of a histogram of the number of passengers getting on a bus at that stop last month, for example.) We could derive this from CTA’s stop-level passenger counter (APC) data.

We would then fit a statistical model to this data. The model would take into account the time-dependence of bus ridership - the fact that different times of day, days of week, and months of the year will have different passenger boarding distributions.

Simulate demand Once we derive a statistical model from CTA’s historical ridership data, we can use this “demand model” to generate stop-level ridership predictions for the future. This is how we would simulate passengers arriving at bus stops.

Once we have simulate demand data, we can develop an algorithm to optimize headways to meet predicted demand. At this point, we would have the statistical machinery in place to build a user-facing service optimization tool.

How would we know if the demand model is accurate? We could use it to predict past data, and improve it until we are confident in the results. We could also visualize the data to make it comparable to the route-level load and flow charts service planning is already using, and ask them to visually validate the data.

Build service model Bus service simulation is what powers the service evaluation tool. Once you can predict demand at each stop, you can simulate bus trips to pick up the passengers.

The first step would be to supply a proposed bus schedule, preferably in GTFS format. We would use the schedule to build a statistical model of bus service - the “service model”.

Simulate service We would first use the service model to simulate bus trips that fit the schedule perfectly.

Buses would move along the route according to schedule, picking up and dropping off passengers as predicted by the demand model. For every schedule trip in the next pick, we can track how many people are ride the bus, and how many are left on the sidewalk due to crowding.

And we also make the simulation realistic, emulating the messyness of bus service in the real world. To do this, we would adjust the service model using historical bus location data, working in each route’s observed on-time performance, reliability and running time, and seeing how these real-world schedule deviations affect crowding. We would get this data from CTA’s timepoint-level vehicle arrival (AVAS) data.

This “realistic” simulation would power the user-facing service evaluation tool.

ABOUT DSSG CTA PROJECT

Introduction

External resources

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Original project proposal

Clone this wiki locally