-
Notifications
You must be signed in to change notification settings - Fork 8
Planning
The Chicago Transit Authority (CTA) has made a concerted effort to decrease crowding on public transportation that has been a cause for concern over the last few years. The CTA has been collecting information such as: the number of people riding a specific bus, the times at which more people board buses at a specific stop, the delay times on different routes, among other things. This data, after some processing, is used to understand and modify schedules and routes for the following quarter.
The buses pick up information regarding ridership and bus performance while executing a proposed schedule. The Planning Analytics department then calculates performance metrics such as Load, Flow, Bunching, and Crowding to determine the effectiveness of current CTA strategies.
While current best practices are data-focused, they are retrospective in nature. It is only after we implement the proposed schedule and wait for data to be collected (which takes several months not including data clean-up and aggregation) that we can assess the effectiveness of a certain schedule on de-crowding. We propose to turn scheduling into a more prospective exercise through statistical modeling and simulation (described below). This will allow the planning analytics department to be proactive and better understand the impact of certain scheduling decisions on bus crowding before implementation. Given the richness of the data available, we believe that even a simple statistical model and simulation approach will provide useful insight into bus de-crowding.
-
Demand Model: At the stop level, we assume that individuals arrive at stops in a stochastic manner. That is, the next individual shows up to the stop after a certain amount of time T, which is random. This is a common approach to waiting time problems.
-
Supply Model: At the route level, we have information regarding the deviation from the scheduled arrival time at timepoints (designated points along the route which need not be stops). The supply model will estimate jointly the deviation from the scheduled time along a route for a specific bus. We expect to see that deviation at subsequent timepoints will be positively correlated with the deviation at a specific timepoint. That is, there will be some type of accumulation in the deviation away from the schedule which occurs over the course of a bus running its route. At the most basic level, we don't wish to attribute this variation to specific sources but simply to account for it in the simulations to follow.
-
Fitting a Model vs. Running a Simulation
-
The Model component requires the historical data and a specified model for both demand and supply components. Each model has a set of parameters we need to estimate. The historical data is then used to update parameter estimates.
-
The Simulation component only requires the parameter estimates from the model. With these we can simulate the entire CTA bus system for a given time period. This includes simulating arrival of people at bus stops and the buses as they follow their schedule (including the estimated deviations from bus schedule we described above).
We would like the CTA to outline about how such a tool would be used in practice by different departments (strategic planning, service planning, scheduling, operations) and what features are most key to uptake and success.
- Web interface
- The key deliverable is a web interface that allows interactive querying of the simulated bus system. The interface will link to the APC and AVAS datasets so that model parameters can be re-estimated after new data is collected. Due to the computational complexity, the model will be pre-computed, perhaps once per quarter.
- The interface will display plots of key metrics similar to what is already in use by service planning but computed with simulated rather than historical data.
- Users can toggle select parameters, such as headway, and visualize the impact on crowding by time period or route / stop.
- We hope to update current plots to incorporate the variability in performance metrics and provide a more holistic view.
We are in the process of implementing the modeling, simulation, and web interface components of the project. During discussions, a number of issues have been raised:
- The Planning Analytics team has a set of plots they generate at the route level which inform their decisions regarding service recommendations. However, one of the key unobservables is the number of people being left behind at stops. This is correlated and caused by bunching and crowding on buses, but it is distinct. Our simulation approach will allow us to estimate these latent variables and generate plots accordingly. Along this line, we are interested in discovering additional plots which would be useful to the CTA team. Moreover, we are interested in proposing additional metrics (some that may only be computable in the simulated setting) that will allow for a more holistic view of the crowding situation.
- A simple example is that the current best practices use the 75th percentile of the loads on buses when estimating optimal headway. Is there some more descriptive measure that will lead to better headway estimation?
- Currently the work needed to generate schedules in the appropriate GTFS format is quite large. However, given a schedule, we can ask how changes to headway (the time-distance between buses) affect crowding and bunching. Our web interface will allow the CTA team to modulate the headways and see how these de-crowding metrics are affected. This begs the question, what other parameters could we include as modulation factors that allow for a more dynamic picture of crowding?
- The simulation-based approach opens us up to answering more questions. For example, our simulations will allow us to identify demand shocks which correlate most highly with schedule deviations. Understanding factors that lead to poor on-time performance pro-actively will allow Operations to alter bus patterns more efficiently. We wish to find other questions that we can answer effectively by drawing upon the simulations.
- Ex. We can study certain changes in schedule and assess whether the parameter estimates are affected. A negative change in rate parameters implies ways to identify changes in schedule which lead to drastic changes in demand in a certain location?
- Ex. Do there exist bus substitutes? That is, are there bus stops such that individuals are indifferent between taking one of many buses. If this is the case, we will see negative correlations between ridership on these buses at this stop. We can try to capture this by looking at the historical data. This would allow us to better understand the impact of removing a bus route on the overall system.
Note: These are possible directions that could be taken with the project after this summer. We would like the CTA to identify which directions are most promising so we can keep these in mind when producing the initial implementation this summer.
The simulation provides a natural platform on which we can build additional analytical tools to further the CTA's understand of crowding. For example, the simulation can be adjusted to understand the impact of route removal.
Some questions that could be answered would require a more complex model capable of handling the intricacies of a more complex model. Factors include:
- Adding interdependencies / network effects
- Identify when customers can / do substitute one route for another (bring in payment data)
- One trip being late causes the next trip on that block / run to be late as well
- Add in other transportation options.
- Trains first.
- More demand factors (events, weather)
- More service factors (operator tardiness, bus breaks down, traffic, construction)
- Feedback: level of service -> demand
Another future goal would be to leverage this platform to affect real time operations. For example, we can ask what factors are strong signals of imminent crowding and bunching. If we incorporate this in the already existing CTA system, bus supervisors will be able to alter schedules and headway accordingly. At the very least, the simulations can inform where to place supervisors at appropriate geographic locations. An ambitious goal would be to have the model updated continuously and be used by operations to inform their day to day decisions. This would be more useful once two-way communication with the bus drivers is available.
Other future directions include:
- Optimization
- Link with GPS data from phone apps to find customers' actual origin and destination
- Deeper analysis, e.g. how much does actual reachability differ from what is calculated with GTFS?