This package allows you to create various layers directly from the GTFS and visualize the results in the most straightforward way possible. It is still in its testing face.
- Installation
- GTFS parsing
- Stop frequencies
- Line frequencies
- Cut in Bus segments
- Speeds
- Segment frequencies
- Export/Save your work
- Mapping the results
- Other plots
!pip install gtfs_functions
import gtfs_functions as gtfs
The function import_gtfs
takes the path or the zip file as argument and returns 5 dataframes/geodataframes.
routes, stops, stop_times, trips, shapes = gtfs.import_gtfs(r"C:\Users\santi\Desktop\Articles\SFMTA_GTFS.zip")
routes.head(2)
route_id | agency_id | route_short_name | route_long_name | route_desc | route_type | route_url | route_color | route_text_color | |
---|---|---|---|---|---|---|---|---|---|
0 | 15761 | SFMTA | 1 | CALIFORNIA | 3 | https://SFMTA.com/1 | |||
1 | 15766 | SFMTA | 5 | FULTON | 3 | https://SFMTA.com/5 |
stops.head(2)
stop_id | stop_code | stop_name | stop_desc | zone_id | stop_url | geometry | |
---|---|---|---|---|---|---|---|
0 | 390 | 10390 | 19th Avenue & Holloway St | POINT (-122.47510 37.72119) | |||
1 | 3016 | 13016 | 3rd St & 4th St | POINT (-122.38979 37.77262) |
stop_times.head(2)
trip_id | arrival_time | departure_time | stop_id | stop_sequence | stop_headsign | pickup_type | drop_off_type | shape_dist_traveled | route_id | service_id | direction_id | shape_id | stop_code | stop_name | stop_desc | zone_id | stop_url | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 9413147 | 81840.0 | 81840.0 | 4015 | 1 | NaN | NaN | 15761 | 1 | 0 | 179928 | 14015 | Clay St & Drumm St | POINT (-122.39682 37.79544) | |||||
1 | 9413147 | 81902.0 | 81902.0 | 6294 | 2 | NaN | NaN | 15761 | 1 | 0 | 179928 | 16294 | Sacramento St & Davis St | POINT (-122.39761 37.79450) |
trips.head(2)
trip_id | route_id | service_id | direction_id | shape_id | |
---|---|---|---|---|---|
0 | 9547346 | 15804 | 1 | 0 | 180140 |
1 | 9547345 | 15804 | 1 | 0 | 180140 |
shapes.head(2)
shape_id | geometry | |
---|---|---|
0 | 179928 | LINESTRING (-122.39697 37.79544, -122.39678 37... |
1 | 179929 | LINESTRING (-122.39697 37.79544, -122.39678 37... |
This function will create a geodataframe with the frequency for each combination of stop
, time of day
and direction
. Each row with a Point geometry. The stops_freq
function takes stop_times
and stops
created in the previous steps as arguments. The user can optionally specify cutoffs
as a list in case the default is not good. These cutoffs
are the times of days to use as aggregation.
cutoffs = [0,6,9,15.5,19,22,24]
stop_freq = gtfs.stops_freq(stop_times, stops, cutoffs = cutoffs)
stop_freq.head(2)
stop_id | dir_id | window | ntrips | frequency | max_trips | max_freq | stop_name | geometry | |
---|---|---|---|---|---|---|---|---|---|
8157 | 5763 | Inbound | 0:00-6:00 | 1 | 360 | 5 | 12 | Noriega St & 48th Ave | POINT (-122.50785 37.75293) |
13102 | 7982 | Outbound | 0:00-6:00 | 1 | 360 | 3 | 20 | Moscow St & RussiaAvet | POINT (-122.42996 37.71804) |
9539 | 6113 | Inbound | 0:00-6:00 | 1 | 360 | 5 | 12 | Portola Dr & Laguna Honda Blvd | POINT (-122.45526 37.74310) |
12654 | 7719 | Inbound | 0:00-6:00 | 1 | 360 | 5 | 12 | Middle Point & Acacia | POINT (-122.37952 37.73707) |
9553 | 6116 | Inbound | 0:00-6:00 | 1 | 360 | 5 | 12 | Portola Dr & San Pablo Ave | POINT (-122.46107 37.74040) |
This function will create a geodataframe with the frequency for each combination of line
, time of day
and direction
. Each row with a LineString geometry. The line_freq
function takes stop_times
, trips
, shapes
, routes
created in the previous steps as arguments. The user can optionally specify cutoffs
as a list in case the default is not good. These cutoffs
are the times of days to use as aggregation.
cutoffs = [0,6,9,15.5,19,22,24]
line_freq = gtfs.lines_freq(stop_times, trips, shapes, routes, cutoffs = cutoffs)
line_freq.head()
route_id | route_name | dir_id | window | frequency | ntrips | max_freq | max_trips | geometry | |
---|---|---|---|---|---|---|---|---|---|
376 | 15808 | 44 O'SHAUGHNESSY | Inbound | 0:00-6:00 | 360 | 1 | 12 | 5 | LINESTRING (-122.46459 37.78500, -122.46352 37... |
378 | 15808 | 44 O'SHAUGHNESSY | Inbound | 0:00-6:00 | 360 | 1 | 12 | 5 | LINESTRING (-122.43416 37.73355, -122.43299 37... |
242 | 15787 | 25 TREASURE ISLAND | Inbound | 0:00-6:00 | 360 | 1 | 15 | 4 | LINESTRING (-122.39611 37.79013, -122.39603 37... |
451 | 15814 | 54 FELTON | Inbound | 0:00-6:00 | 360 | 1 | 20 | 3 | LINESTRING (-122.38845 37.73994, -122.38844 37... |
241 | 15787 | 25 TREASURE ISLAND | Inbound | 0:00-6:00 | 360 | 1 | 15 | 4 | LINESTRING (-122.39542 37.78978, -122.39563 37... |
The function cut_gtfs
takes stop_times
, stops
, and shapes
created by import_gtfs
as arguments and returns a geodataframe where each segment is a row and has a LineString geometry.
segments_gdf = gtfs.cut_gtfs(stop_times, stops, shapes)
segments_gdf.head(2)
route_id | direction_id | stop_sequence | start_stop_name | end_stop_name | start_stop_id | end_stop_id | segment_id | shape_id | geometry | distance_m | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 15761 | 0 | 1 | Clay St & Drumm St | Sacramento St & Davis St | 4015 | 6294 | 4015-6294 | 179928 | LINESTRING (-122.39697 37.79544, -122.39678 37... | 205.281653 |
1 | 15761 | 0 | 2 | Sacramento St & Davis St | Sacramento St & Battery St | 6294 | 6290 | 6294-6290 | 179928 | LINESTRING (-122.39761 37.79446, -122.39781 37... | 238.047505 |
This function will create a geodataframe with the speed_kmh
and speed_mph
for each combination of line
, segment
, time of day
and direction
. Each row with a LineString geometry. The function speeds_from_gtfs
takes routes
, stop_times
and segments_gdf
created in the previous steps as arguments. The user can optionally specify cutoffs
as a list in case the default is not good. These cutoffs
are the times of days to use as aggregation.
# Cutoffs to make get hourly values
cutoffs = list(range(24))
speeds = speeds_from_gtfs(routes, stop_times, segments_gdf, cutoffs = cutoffs)
speeds.head(1)
route_id | route_name | dir_id | segment_id | window | speed_kmh | s_st_id | s_st_name | e_st_id | e_st_name | distance_m | stop_seq | runtime_h | max_kmh | geometry | speed_mph | max_mph | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 15761 | 1 CALIFORNIA | Inbound | 4015-6294 | 10:00-11:00 | 12.0 | 4015 | Clay St & Drumm St | 6294 | Sacramento St & Davis St | 205.281653 | 1 | 0.017222 | 12.0 | LINESTRING (-122.39697 37.79544, -122.39678 37... | 7.456452 | 7.456452 |
speeds.loc[(speeds.segment_id=='3114-3144')&(speeds.window=='0:00-6:00')]
route_id | route_name | dir_id | segment_id | window | speed_kmh | s_st_id | s_st_name | e_st_id | e_st_name | distance_m | stop_seq | runtime_h | max_kmh | geometry | speed_mph | max_mph | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
11183 | 15792 | 30 STOCKTON | Inbound | 3114-3144 | 0:00-6:00 | 12.8 | 3114 | 3rd St & Brannan St | 3144 | 3rd St & Bryant St | 373.952483 | 2 | 0.028565 | 13.0 | LINESTRING (-122.39323 37.77923, -122.39431 37... | 7.953549 | 8.077823 |
16862 | 15809 | 45 UNION-STOCKTON | Inbound | 3114-3144 | 0:00-6:00 | 13.0 | 3114 | 3rd St & Brannan St | 3144 | 3rd St & Bryant St | 373.952483 | 2 | 0.027778 | 13.0 | LINESTRING (-122.39323 37.77923, -122.39431 37... | 8.077823 | 8.077823 |
19889 | 15831 | 91 3RD-19TH AVE OWL | Outbound | 3114-3144 | 0:00-6:00 | 17.0 | 3114 | 3rd St & Brannan St | 3144 | 3rd St & Bryant St | 373.952483 | 56 | 0.021667 | 17.0 | LINESTRING (-122.39323 37.77923, -122.39431 37... | 10.563307 | 10.563307 |
22823 | ALL_LINES | All lines | NA | 3114-3144 | 0:00-6:00 | 15.2 | 3114 | 3rd St & Brannan St | 3144 | 3rd St & Bryant St | 373.952483 | 2 | 0.024511 | NaN | LINESTRING (-122.39323 37.77923, -122.39431 37... | 9.444839 | NaN |
cutoffs = [0,6,9,15.5,19,22,24]
seg_freq = gtfs.segments_freq(segments_gdf, stop_times, routes, cutoffs = cutoffs)
seg_freq.head(2)
route_id | route_name | dir_id | segment_id | window | frequency | ntrips | s_st_id | s_st_name | e_st_name | max_freq | max_trips | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
23191 | ALL_LINES | All lines | NA | 3628-3622 | 0:00-6:00 | 360 | 1 | 3628 | Alemany Blvd & St Charles Ave | Alemany Blvd & Arch St | 20 | 18 | LINESTRING (-122.46949 37.71045, -122.46941 37... |
6160 | 15787 | 25 TREASURE ISLAND | Inbound | 7948-8017 | 0:00-6:00 | 360 | 1 | 7948 | Transit Center Bay 29 | Shoreline Access Road | 15 | 4 | LINESTRING (-122.39611 37.79013, -122.39603 37... |
seg_freq.loc[(seg_freq.segment_id=='3114-3144')&(seg_freq.window=='0:00-6:00')]
route_id | route_name | dir_id | segment_id | window | frequency | ntrips | s_st_id | s_st_name | e_st_name | max_freq | max_trips | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10566 | 15809 | 45 UNION-STOCKTON | Inbound | 3114-3144 | 0:00-6:00 | 120 | 3 | 3114 | 3rd St & Brannan St | 3rd St & Bryant St | 12 | 5 | LINESTRING (-122.39323 37.77923, -122.39431 37... |
7604 | 15792 | 30 STOCKTON | Inbound | 3114-3144 | 0:00-6:00 | 60 | 6 | 3114 | 3rd St & Brannan St | 3rd St & Bryant St | 12 | 5 | LINESTRING (-122.39323 37.77923, -122.39431 37... |
13209 | 15831 | 91 3RD-19TH AVE OWL | Outbound | 3114-3144 | 0:00-6:00 | 30 | 12 | 3114 | 3rd St & Brannan St | 3rd St & Bryant St | 30 | 2 | LINESTRING (-122.39323 37.77923, -122.39431 37... |
16580 | ALL_LINES | All lines | NA | 3114-3144 | 0:00-6:00 | 17 | 21 | 3114 | 3rd St & Brannan St | 3rd St & Bryant St | 5 | 59 | LINESTRING (-122.39323 37.77923, -122.39431 37... |
file_name = 'stop_frequencies'
gtfs.save_gdf(stop_frequencies_gdf, file_name, shapefile=True, geojson=True)
# Stops
condition_dir = stop_freq.dir_id == 'Inbound'
condition_window = stop_freq.window == '6:00-9:00'
gdf = stop_freq.loc[(condition_dir & condition_window),:].reset_index()
gtfs.map_gdf(gdf = gdf,
variable = 'ntrips',
colors = ["#d13870", "#e895b3" ,'#55d992', '#3ab071', '#0e8955','#066a40'],
tooltip_var = ['frequency'] ,
tooltip_labels = ['Frequency: '],
breaks = [10, 20, 30, 40, 120, 200])
# Line frequencies
condition_dir = line_freq.dir_id == 'Inbound'
condition_window = line_freq.window == '6:00-9:00'
gdf = line_freq.loc[(condition_dir & condition_window),:].reset_index()
gtfs.map_gdf(gdf = gdf,
variable = 'ntrips',
colors = ["#d13870", "#e895b3" ,'#55d992', '#3ab071', '#0e8955','#066a40'],
tooltip_var = ['route_name'] ,
tooltip_labels = ['Route: '],
breaks = [5, 10, 20, 50])
If you are looking to visualize data at the segment level for all lines I recommend you go with something more powerful like kepler.gl (AKA my favorite data viz library). For example, to check the scheduled speeds per segment:
# Speeds
import keplergl as kp
m = kp.KeplerGl(data=dict(data=speeds, name='Speed Lines'), height=400)
m
# Segment frequencies
import keplergl as kp
m = kp.KeplerGl(data=dict(data=seg_freq, name='Segment frequency'), height=400)
m
# Histogram
import plotly.express as px
px.histogram(
stop_freq.loc[stop_freq.frequency<50],
x='frequency',
title='Stop frequencies',
template='simple_white',
nbins =20)
# Heatmap
import plotly.graph_objects as go
dir_0 = speeds.loc[(speeds.dir_id=='Inbound')&(speeds.route_name=='1 CALIFORNIA')].sort_values(by='stop_seq')
dir_0['hour'] = dir_0.window.apply(lambda x: int(x.split(':')[0]))
dir_0.sort_values(by='hour', ascending=True, inplace=True)
fig = go.Figure(data=go.Heatmap(
z=dir_0.speed_kmh,
y=dir_0.s_st_name,
x=dir_0.window,
hoverongaps = False,
colorscale=px.colors.colorbrewer.RdYlBu,
reversescale=False
))
fig.update_yaxes(title_text='Stop', autorange='reversed')
fig.update_xaxes(title_text='Hour of day', side='top')
fig.update_layout(showlegend=False, height=600, width=1000,
title='Speed heatmap per direction and hour of the day')
fig.show()
by_hour = speeds.pivot_table('speed_kmh', index = ['window'], aggfunc = ['mean','std'] ).reset_index()
by_hour.columns = ['_'.join(col).strip() for col in by_hour.columns.values]
by_hour['hour'] = by_hour.window_.apply(lambda x: int(x.split(':')[0]))
by_hour.sort_values(by='hour', ascending=True, inplace=True)
# Scatter
fig = px.line(by_hour,
x='window_',
y='mean_speed_kmh',
template='simple_white',
#error_y = 'std_speed_kmh'
)
fig.update_yaxes(rangemode='tozero')
fig.show()
# Line graphs
import plotly.graph_objects as go
example2 = speeds.loc[(speeds.s_st_name=='Fillmore St & Bay St')&(speeds.route_name=='All lines')].sort_values(by='stop_seq')
example2['hour'] = example2.window.apply(lambda x: int(x.split(':')[0]))
example2.sort_values(by='hour', ascending=True, inplace=True)
fig = go.Figure()
trace = go.Scatter(
name='Speed',
x=example2.hour,
y=example2.speed_kmh,
mode='lines',
line=dict(color='rgb(31, 119, 180)'),
fillcolor='#F0F0F0',
fill='tonexty',
opacity = 0.5)
data = [trace]
layout = go.Layout(
yaxis=dict(title='Average Speed (km/h)'),
xaxis=dict(title='Hour of day'),
title='Average Speed by hour of day in stop Fillmore St & Bay St',
showlegend = False, template = 'simple_white')
fig = go.Figure(data=data, layout=layout)
# Get the labels in the X axis right
axes_labels = []
tickvals=example2.hour.unique()[::3][1:]
for i in range(0, len(tickvals)):
label = str(tickvals[i]) + ':00'
axes_labels.append(label)
fig.update_xaxes(
ticktext=axes_labels,
tickvals=tickvals
)
# Add vertical lines
y_max_value = example2.speed_kmh.max()
for i in range(0, len(tickvals)):
fig.add_shape(
# Line Vertical
dict(
type="line",
x0=tickvals[i],
y0=0,
x1=tickvals[i],
y1=y_max_value,
line=dict(
color="Grey",
width=1
)
)
)
# Labels in the edge values
for i in range(0, len(tickvals)):
y_value = example2.loc[example2.hour==tickvals[i], 'speed_kmh'].values[0].round(2)
fig.add_annotation(
x=tickvals[i],
y=y_value,
text=str(y_value),
)
fig.update_annotations(dict(
xref="x",
yref="y",
showarrow=True,
arrowhead=0,
ax=0,
ay=-18
))
fig.update_yaxes(rangemode='tozero')
fig.show()