-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More than two flex zones in the same trip #76
Comments
It could be either, and we don't know based on the data itself. Because flex allows these windows to overlap and doesn't have any other data element indicating the actual expected flow of operating vehicles, the consumer of static data can't know which zones will be visited during a trip in this use case. I don't think this matters from a product perspective. To the user, we don't need to say anything that we don't already need to say ("we don't know the path of the vehicle" is already true when we're just talking about one zone). If the software practically speaking needs to answer this question, I think any answer is acceptable as long as documented and as long as we hide that information from the user. |
Adding to what @tsherlockcraig said, since Flex is not in the business of prescriptively defining paths of travel (because, as he said, we cannot know what the actual trip will be), the data then can only define the boundaries of what origin/destination cases are possible with the service. In any case, on an operational level I can't think of why the bus would always (i.e., prescriptively) go out of its way to Zone 2 before Zone 3 unless there was a fixed stop it always served, in which case that could and should be reflected in the data. If the bus did behave in such a way (going to Zone 2 just to fulfill that unknown requirement), the closest thing to accommodate such a behavior in the data is adding offsets for those trips so the estimated travel time is more reflective of reality. As a side note, if I'm interpreting correctly that the service allows any direction of travel between these zones, and intrazone travel for each, I would probably just use a trip for each case instead of grouping multiple cases in one trip just to be as explicit as possible: Lastly:
We actually do encounter a fair amount of multi-zone services with all sorts of directionality restrictions in the U.S., including ones with more than 2 zones. I'd be happy to point you to some examples if you're curious! |
@leonardehrenfried, @westontrillium @tsherlockcraig et al. The summary of the problem with the current schema is that although transitions from zones to zones can be described via trips, the schema does not enforce having the data producer provide a "completely specified graph" of the possible transitions. However, a completely specified graph is what a routing algorithm needs in order to perform any kind of routing, and those are the gaps experienced by our OTP developing friends. When @westontrillium says, "just to be as explicit as possible"- it is that level of having to address all the transitions among zones that are allowable that is current missing from GTFS-flex v2 best practices, and also missing the enforcement of some crucial attributes to describe the time bounds on those allowable transitions. (please note that I did not say you have to specify transitions that are not allowed. So long as we require that all allowable zone-to-zone transitions are specified, anything not explicitly named can then be assumed to be not allowed. The current issue is that when something is not specified, @leonardehrenfried does not know whether to interpret it as not allowed, or allowed by default. And since scenarios are different with different providers, we should not make either one a default.) Now I'd like to provide the more graph-specific explanation. It is squarely coming from the routing use-case perspective, which I believe to be aligned with the purpose of GTFS-flex v2 as a routing-first schema (or so I was told when I was trying to suggest some proposals that came from an accessibility-first lens)... So-- on routing-- Routing algorithms (even for fuzzy routing like flex), fundamentally require a fully specified graph. With GTFS-static, what happens inside the "nodes" is precisely a time-bounded stop. With GTFS-flex, we can interpret zones to be "nodes" in the graph, and then what happens inside the zone is governed by the type of flex service that is provided there. The routing algorithm can break down routing to What I'm suggesting is that GTFS-flex v2 will never, as suggested by @tsherlockcraig, provide a fully specified graph within the zone. You can read more as to why this would be necessary from routing algorithms, coming from graph theory. But you can skip everything below if you wish. Thanks! Anat & the Taskar team. In graph theory, a graph is considered "fully specified" when all of its essential characteristics and properties have been explicitly defined or provided. These essential characteristics typically include: Edges (Links): The graph should define all the edges (or links) that connect the nodes. For each edge, it should specify the two nodes it connects and any associated attributes or weights. Other things required for full graph specifications: Any Additional Attributes: Depending on the specific application, a fully specified graph may also include other attributes associated with nodes or edges, such as labels, colors, or metadata. I would say that this would be the role of documentation and best practices, as I understood was suggested by @tsherlockcraig in the MobilityData conversation, Bottom line is this: Thanks for reading thus far. |
The current schema definitely needs further specification, but I think the needed explicit identification of edges between nodes can be attained more easily by adding a further description to stops.stop_id. I've suggested the edit on the GTFS-flex PR here: https://github.com/google/transit/pull/388/files/e359750cb2d4a496cf96bdbb1c6e30a73b3fb59f..2efafbfe2b91e0b99313df2391adb3fbc9121861#diff-3ecf0760eb54b4953728042a1e30586705dc2335807be94faae0de5829cd12a1 I'm very interested in @leonardehrenfried 's take here. In previous discussion, it's seemed like we have bugs/unfinished logic in the current OTP implementation, but a path to resolve the issues we've run into so far that's in the works. (Current PR for OTP linked in Leonard's first comment.) |
Yes, it would be desirable for consumers to have as much detailed information about the service as possible. However, a spec like flex is always a compromise between the needs of the consumer and the producer. Overall, GTFS is optimized for being easy to produce at the cost of some ambiguity for the consumers. This strikes me as the correct balance as the consumers tend to have higher technical skills than the producers and are happy to "suck up" the complexity if that means that they get any data at all. If we put more burden on the producers that would probably mean that we get fewer data feeds. If you want to have a richer set of tools for describing fixed or flexible transit, I encourage you to check out NeTEX. This definitely enables you to describe services to an excruciating detail, however barriers to creating these feeds are very high as the complexity is orders of magnitude greater than GTFS. Personally, I prefer a well done GTFS feed over a poorly implemented NeTEX one. Since NeTEX is a lot more complex it's way easier to make a mess. OTP implementationAbout @tsherlockcraig's point about the implementation: yes, there are definitely gaps in what is possible in the spec and what OTP deals with well, however the majority of the current popular use cases are supported. Nevertheless, I expect there to be a long tail of issues and edge cases that I will have to deal with. Given the complexity of the expected results, I think this is a normal process. Regarding this particular issue, since we didn't have a service with more than one zone, OTP didn't have an implementation for it. With opentripplanner/OpenTripPlanner#5376 this is about to change. |
Thanks for your insightful discussion! Totally agree that we need to further clarify how to model different use cases appropriately. Regarding where these clarifications should go: MobilityData is gradually incorporating best practices into the spec. In other words, we are gradually introducing more "best practice-like" descriptions into the spec, just like this suggestion! So if you have any other suggestions for clarifications in the spec, please let us know. Additionally, we plan to create a flex data examples page in gtfs.org once Flex is adopted (similar to this Fares-v2 page). Please share any useful use cases/edge cases in MobilityData/gtfs.org issue#195, and we can add them on data examples page in the future. |
Hi again. TL;DR: There will always be tension between data producers & data consumers, but only the end-device-users pay the price for badly formed schemas. We need to produce excellent, well formed schemas, and pair them with human-usable tools to support producers with the full data life-cycle of their streams (including data collecting, vetting, validation, distribution, update and maintenance). I'm hearing the need to "explicitly specify via a trip every allowable transition" (in the same way @westontrillium showed above) equated with "excruciating detail." That is an exaggeration. At some point, the producer needs to communicate what service they are provisioning, AND we need to give them easy to use, simple language, human-usable tools to express that. @leonardehrenfried says "Overall, GTFS is optimized for being easy to produce at the cost of some ambiguity for the consumers." We owe it to travelers to create well specified schemas to effectively communicate precise information about flexible services. Keep in mind that on-demand transportation users (at least in the U.S.) are enriched for traveler populations that have been travel-marginalized for decades already. Instead, we should support those schemas with human-usable data tooling ecosystems (tools for data collection, validation, vetting, maintenance and public stream distribution) to support good Flex data production, even by unskilled or lower-skilled service providers. |
End-device-users also pay if data is unproduced because standards are overspecified and introduce complexity not needed to achieve the relevant degree of specificity. There is no service that has been identified that cannot be explicitly defined within the current proposed specification (including this recent change). Is there some hypothetical service that might exist which cannot be specified, or is there an existing service which we have not considered? |
Maybe the "excruciating detail" line was a little harsh and I'm sorry if it came across as belittling your point, which I didn't want to. However, I still stand by my comment that adding more features to the spec doesn't result in more details being provided. |
I guess I don't see how you could avoid being explicit about what's allowable without either having invalid data or without producing an incomplete picture of the service, so I don't understand what issue is actually being raised (it's very possible it exists somewhere over my producer, non-developer head 🙂).
If it's not specified, it is not allowed. How could it be any other way? If the dataset says "Zone 1 to Zone 3" and nothing about a "Zone 2," as far as the data is concerned, Zone 2 doesn't exist. If the trip query falls within the parameters of what is specified, the trip is possible. This has always been the case with Flex; it's just that as our understanding of the required logic improved, so did our understanding of how best to structure particular cases with the spec. The earliest version of Flex v2 said you can just have a single stop_time record referring to itself for both pickup and drop-off. Then Flex v2 underwent what is probably its most significant evolution, which was two-fold: The requirement for travel to be expressed in consecutive stop_times (i.e., no intrazone travel in a single stop_time record) and specifying that Together, these new rules made ambiguity about what is allowed/not allowed impossible–at least impossible as far as I can determine. Taking the following example:
There is only one way to interpret this set of data because of the rules we have set for Flex.
Not allowed:
*We could alternatively express this scenario with its own |
@westontrillium Thank you for this explanation. Here is the explicit concern: Taking the following example:
and the reason this is associated with this particular issue that was opened by @leonardehrenfried are the questions regarding tripB tripC and tripD in my example-
I believe tripA is currently allowed, but should not be. The source of the problem is that there's no current restriction in the spec to have either pickup_type or drop_off_type to be 1 (no pickup/drop off). Bottom line: we should put in place appropriate restrictions in stop_times to ensure we are discreetly describing zone transitions (edges in the graph) with no ambiguity. |
Yes, GTFS allows an origin/destination pair to skip I hadn't actually thought we needed to clarify skipping Regarding your tripA, I would actually disagree that there are three other interpretations; in fact I think there can only be one (rules around Allowed
Not allowed
As far as I can tell, there is no other way to interpret the data if following the spec 100%. The community have actually talked through this exact scenario before, which turned out to be an important turning point on whether or not to allow intrazone travel in a single stop_time. Suffice it to say, this case actually exists in the real world, and the current rules surrounding sequencing and intra/interzone travel actually make it easier and more intuitive to model: You can have that Zone2 stop_time work both for a drop-off from Zone1 and a pickup going to Zone3 ( |
Trying to capture possible clarifications here.
Based on previous discussions, it seems when a trip planner provide an option from zoneA to zoneC, the trip planner would provide estimated travel times based on the direct way from zoneA to zoneC, i.e., "skipping" zoneB (stop_sequence = 5). Would adding something to the spec like - "If a trip's This previous comment also makes sense to me -
Regarding the tripA in this comment, the current flex spec states "Travel within the same stop area or GeoJSON location requires two records in stop_times.txt with the same stop_id." I agree there doesn't seem to be ambiguity. |
Adding more clarity is probably the right call. Perhaps something more like this that includes an example: "When providing routing or travel time between the origin and destination, data consumers should ignore intermediate stop_times.txt records that have
Consumers should not take Zone2 into consideration when providing routing or travel time for a trip from Zone1 to Zone3. That way we're capturing cases with a |
I would also welcome the clarification suggested by @westontrillium and @tzujenchanmbd. |
I agree on that clarification. In addition, should the clarification state what happens if the trip requested from Zone 1 to Zone 3 falls between 8:00-10:00? I did not see an explicit mention of that. More explicitly, this allowable "skip" is also bound to a service time. So the transition in the graph from Zone 1 to Zone 3 is actually bound by service times that are different from the original ones mentioned for Zone 1
Thanks. |
I don't think the "skip" is bound in that way. The windows should only apply to their own rows. You could be picked up at 09:45 in Zone1 and dropped off in Zone3 at 10:05 because each record's window corresponds to its own pickup/drop-off rules, and the travel time of this hypothetical trip is such that the origin and destination each fall within a valid stop_time record. Example:
Assuming the travel time is 20 minutes, if the rider were to submit the same query but with the desired departure time of 09:30, no result would return because 9:50 (20 minutes from the departure time) does not fall within Zone3's drop-off window.* *Or, depending on the app, it may have the capability to show the next closest possible trip departing at 09:40 instead. |
I find it unsettling that having a consistent "read" on the flex schema is dependent on the (possibly wildly) different time estimates downstream consuming routers would make regarding traversals from Zone 1 to Zone 3 at different times of day. Completely by coincidence, one of my teams were recently using Google and Here API's to get (and compare) car traversal In absence of concensus on this, in the least there ought to be a clarification that "reasonable attempt at estimates should be made to buffer the time traversals between pick ups and drop offs in order to fall within the indicated stop_times." |
I understand the discomfort but isn't it the nature of this spec, which is called "flex" after all, that you cannot exactly predict how the trip is going to happen? In fact we can't even say if it's going to happen at all, just that, according to the data provided to us, it could happen. If we knew the exact route and times we wouldn't need the spec and could use static GTFS, couldn't we? |
If a service has these rules (buffer time between trips), shouldn't they put them in the data? There is nothing stopping a producer to only use short windows, several small areas or several trips to get the routing software to return the result they want. Or am I overlooking something? |
There's a difference between uncertainty and ambiguity. |
If I had to model such a service I would create several trips with the appropriately short windows at each zone rather than a single trip with very long windows. But most services that I've worked with are not planned to this level of detail. Most just wotk it out as they go along. |
@uwtcat Do There is also consensus on the clarification of default values in issue#73. We didn't include these fields in google/transit PR#388 simply because there are currently no consumers implementing these fields (as per the adoption tracker). We may think about including these fields if needed. |
I don't see a way to avoid this without specifying requirements for every factor in how consumers estimate travel time, which seems a bit of an overreach to me. Can we really enforce what base map an app is using? What routing algorithm? If it should factor in traffic? Realtime traffic? Closed roads? Weather? Anyway, I already get different results with fixed route transit between the various apps trip planning apps I use.
|
Sorry @tzujenchanmbd i read on to @westontrillium 's comments and just skipped the last sentence of yours! You can ignore the comment below. I am working on a OTP project that will be seeking to implement these soon.
I agree these are the values that clarify the boundary between what we're expecting consumer to know and what we're expecting producer to know, and that we're drawing the line in the right place. However @tzujenchanmbd I don't actually see those values as part of the PR on the GTFS repo--do we need to add those there? |
Tzu-Chen mentioned in his comment that the reason it's not in the PR is that there are no consumers for it. |
@leonardehrenfried Considering the adoption of GTFS-Flex, this repo is now out of date. Would you like to move this issue to google/transit before we close the repo? |
The issue itself is resolved but I would love it if there was an archive of the discussion somewhere that I can refer to when questions come up. |
We won't be deleting the repo per say, it will be in a read-only state. |
Then I would say that I can stay exactly where it is. |
While making the feed for the Catholic Community Services of Western Washington work in OTP, we hit upon an interesting edge case: what is the expected travel time for services that have more than 2 zones in their stop times which have a time window?
Let me illustrate what I mean with a sketch:
Here we have three zones which all have pretty long time windows. In stop_times the are listed in order and you can get on or off in all zones.
Now, if you're planning a trip from Zone 1 to Zone 3 is the expectation that the trip will always go via zone 2? Or can the shortest path from 1 to 3 be used?
Since services which have more than one zone are pretty rare (I only know CCSWW) and right now the case with more than 2 zones is theoretical I'm wondering what the correct router behavior should be.
Has anyone thought about this yet?
cc @t2gran @vpaturet @tsherlockcraig @jon-campbell-ibigroup @westontrillium
The text was updated successfully, but these errors were encountered: