Implement threshold for triggering trip_distance_exceeds_shape_distance #1613

emmambd · 2023-11-27T18:53:38Z

Describe the problem

We've seen trip_distance_exceeds_shape_distance trigger as an error from anywhere between 6-7% of feeds in the Mobility Database. Several producers have expressed concern with this new error.

Analysis report

Python script

Proposed solution

Implement a threshold where this ERROR only generates if the difference between the max trips.txt shape_dist_traveled and the max shapes.txt shape_dist_traveled is greater than 1.11m, similar to our plans for equal_shape_dist_traveled_diff_coordinates. Rerun the acceptance tests to see how many feeds trigger this rule.

A warning should be added for when trip_distance_exceeds_shape_distance is less than 1.11m.

Alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

stevenmwhite · 2023-12-01T23:33:35Z

I'm curious what the specific concern from producers is. In my opinion, if there's an error, it should be stated in the validator report. The error itself does not depend on whether the distance traveled passes a certain threshold or not, so I'd suggest not putting a threshold in whether the error is reported.

As a producer, I would want to know that the feeds I produce have the error (and some of the feeds my company has produced do in fact have this error) so that I can fix it, and not have the validator refrain from telling me it if it's "not bad enough."

stevenmwhite · 2023-12-07T05:52:37Z

A few of our project managers working directly with customer agencies have noted that transit agencies are getting spooked because the NTD Program Manager is forwarding them validation reports that show this error. While they specifically state that the reports are offered "solely [as a] courtesy and the resolution of the above errors are not required for FTA to accept your 2023 NTD report" the agencies have a fear that it may impact NTD report acceptance in the future.

Perhaps this is also why some other producers are concerned?

If that's the case, maybe we adjust the suggestion of excluding errors when the difference is less than 1% to instead report a warning when the difference is less than 1%. That way, we can still get the information out of the validator regardless of the size of the difference, but don't need to spook anyone with a minor, non-impactful error?

I wouldn't say this is my preference (an error is an error and if this is truly "canonical" it should call it an error), but because the error doesn't have much impact in those minor cases, I'm certainly open to it being a warning if that helps ease concerns.

EDIT: On the other hand, if this doesn't have much impact for consumers perhaps it should just be a warning all the time? I'm not sure if that's canonically accurate, but would also be fine for me. I just don't want to fully exclude any instances.

emmambd · 2024-01-12T16:11:40Z

Thanks for sharing your thoughts publicly @stevenmwhite! We dug into some old conversations about distance thresholds on the validator, and plan to work on changing the rule this quarter so

any difference above 1.11m = ERRROR
any difference below 1.11m = WARNING

We'll see if we should make the threshold more permissive than that based on the number of feeds we see still triggered by the ERROR and what their differences are.

stevenmwhite · 2024-01-12T16:31:01Z

@emmambd The one issue I see with this is that the distance along route does not have a unit specified in GTFS. Some feeds may be in meters, some in kilometers, some in miles... it could even be in "average lengths of Steve's stride" (I hope nobody's feed using that as their distance measurements, haha!)

Our feeds happen to be in meters, but that's not part of the spec and we can't assume that all feeds are.

sivakarthik07 · 2024-02-01T17:49:46Z

Please share if we already have a pre-release msi available for this fix. We are seeing a ton of errors due to the trip distance exceeding the shape distance by 0.2 meters or less and we would like to get this fixed before testing the same infront of the transit authority. Is it already available, if not any approximate date on when it will be?

emmambd · 2024-02-01T18:34:25Z

@stevenmwhite Thanks for that note Steve! We've been exploring ways we can check the distance between the lat longs of the last stop and the end of the shape in cases where the threshold is met.

@sivakarthik07 There is no PR or fix for this issue yet, but we are planning to deliver it for our next release in the end of March. How are you using the validator?

sivakarthik07 · 2024-02-01T20:45:52Z

@emmambd Thanks for responding! I am using the windows based 4.2 version of the validator.
I guess end of March should be OK, but a pre-release version before would be much appreciated.

emmambd · 2024-02-21T20:31:54Z

Thanks @sivakarthik07 ! We're currently working on a fix and plan to provide package installers for every merge to main within the next month as well.

Could you share some sample feeds to see how our new threshold impacts them?

emmambd · 2024-03-08T18:32:28Z

cc @stevenmwhite @sivakarthik07

An update on the threshold that will be used in this rule — we initially set the threshold to 1.11m between the last shape point and the last stop point, but were still concerned about it being too restrictive given the high number of concerns we've received with this notice. After doing several distribution analyses of the data, we've concluded that 11.1m is permissive enough while still being useful to consumers to flag major problems in the shape_dist_traveled values.

Why 11.1m?

This is an extension of our logic for setting a threshold for equal_shape_dist_traveled_diff_coordinates, capturing differences between same values that resulting from rounding to 4 decimal places in lat/long values.
All 50th percentiles in the min distance distributions fell under 11.1m, indicating it's not restrictive for real world use cases.

We'll proceed with this threshold and seek feedback after the upcoming release this month for any other feedback on this notice change.

stevenmwhite · 2024-03-08T18:42:00Z

Thanks @emmambd, I know this will relieve the concern from our project teams and customers in most cases.

Just two questions:

Are you measuring the meter distance using lat/long values, knowing that the shape_distance_traveled value is unit-less?
Has there been consideration to still throw a warning (as opposed to an error) for values that are beyond the shape but within the threshold? As a producer, it's really helpful to know that we can trust the validator to be as clear as possible and know that we'll be able to discover any/every case where a feed we produce doesn't match the spec. With this change, we're now adding subjective judgement to the validator and there'll be cases where feeds are "incorrect" but we won't learn that from the validator.

emmambd · 2024-03-08T18:44:45Z

@stevenmwhite Hi Steve,

Correct - the meter distance calculation is based on the lat/longs precisely because there is no unit defined for shape_distance_traveled.
yes, included in the PR is a warning, so anything below this threshold will trigger a WARNING rather than nothing. We agree that all cases should be flagged when we're applying a threshold, just with varying severity.

stevenmwhite · 2024-03-08T18:45:39Z

Perfect! That seems to cover all of my concerns!

emmambd · 2024-03-11T18:35:26Z

@sivakarthik07 You can add this fix into your pipeline now by using the package installers associated with the most recent commit to master.

Please note that there's been a breaking change to the JSON report to improve the validatedAt timestamp: #1691. More info will be included about this in the release later this month.

emmambd added enhancement New feature request or improvement on an existing feature status: Needs triage Applied to all new issues labels Nov 27, 2023

emmambd added this to Schedule Validator Q2 backlog Nov 27, 2023

emmambd moved this to MobilityData Backlog (bugs & maintenance) in Schedule Validator Q2 backlog Nov 27, 2023

emmambd removed the status in Schedule Validator Q2 backlog Nov 27, 2023

emmambd moved this to MobilityData backlog (enhancements) in Schedule Validator Q2 backlog Nov 27, 2023

emmambd mentioned this issue Nov 30, 2023

Publish a validator release #1523

Closed

3 tasks

emmambd added status: Blocked Can't work on it currently because of an external factor. and removed status: Needs triage Applied to all new issues status: Blocked Can't work on it currently because of an external factor. labels Dec 4, 2023

emmambd added this to the Now milestone Jan 9, 2024

emmambd added the status: Ready An issue that is ready to be worked on. label Jan 12, 2024

cka-y self-assigned this Feb 16, 2024

cka-y mentioned this issue Feb 19, 2024

feat: reformat trip and shape dist validator #1676

Merged

5 tasks

emmambd mentioned this issue Mar 8, 2024

Improving stop-to-shape validation #1701

Open

cka-y closed this as completed in #1676 Mar 11, 2024

github-project-automation bot moved this from MobilityData backlog (enhancements) to Done in Schedule Validator Q2 backlog Mar 11, 2024

emmambd removed this from the Now milestone Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement threshold for triggering trip_distance_exceeds_shape_distance #1613

Implement threshold for triggering trip_distance_exceeds_shape_distance #1613

emmambd commented Nov 27, 2023 •

edited

Loading

stevenmwhite commented Dec 1, 2023

stevenmwhite commented Dec 7, 2023 •

edited

Loading

emmambd commented Jan 12, 2024

stevenmwhite commented Jan 12, 2024

sivakarthik07 commented Feb 1, 2024 •

edited

Loading

emmambd commented Feb 1, 2024

sivakarthik07 commented Feb 1, 2024 •

edited

Loading

emmambd commented Feb 21, 2024

emmambd commented Mar 8, 2024

stevenmwhite commented Mar 8, 2024 •

edited

Loading

emmambd commented Mar 8, 2024

stevenmwhite commented Mar 8, 2024

emmambd commented Mar 11, 2024

Implement threshold for triggering trip_distance_exceeds_shape_distance #1613

Implement threshold for triggering trip_distance_exceeds_shape_distance #1613

Comments

emmambd commented Nov 27, 2023 • edited Loading

Describe the problem

Proposed solution

Alternatives you've considered

Additional context

stevenmwhite commented Dec 1, 2023

stevenmwhite commented Dec 7, 2023 • edited Loading

emmambd commented Jan 12, 2024

stevenmwhite commented Jan 12, 2024

sivakarthik07 commented Feb 1, 2024 • edited Loading

emmambd commented Feb 1, 2024

sivakarthik07 commented Feb 1, 2024 • edited Loading

emmambd commented Feb 21, 2024

emmambd commented Mar 8, 2024

stevenmwhite commented Mar 8, 2024 • edited Loading

emmambd commented Mar 8, 2024

stevenmwhite commented Mar 8, 2024

emmambd commented Mar 11, 2024

emmambd commented Nov 27, 2023 •

edited

Loading

stevenmwhite commented Dec 7, 2023 •

edited

Loading

sivakarthik07 commented Feb 1, 2024 •

edited

Loading

sivakarthik07 commented Feb 1, 2024 •

edited

Loading

stevenmwhite commented Mar 8, 2024 •

edited

Loading