Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement threshold for triggering trip_distance_exceeds_shape_distance #1613

Closed
emmambd opened this issue Nov 27, 2023 · 13 comments · Fixed by #1676
Closed

Implement threshold for triggering trip_distance_exceeds_shape_distance #1613

emmambd opened this issue Nov 27, 2023 · 13 comments · Fixed by #1676
Assignees
Labels
enhancement New feature request or improvement on an existing feature status: Ready An issue that is ready to be worked on.

Comments

@emmambd
Copy link
Contributor

emmambd commented Nov 27, 2023

Describe the problem

We've seen trip_distance_exceeds_shape_distance trigger as an error from anywhere between 6-7% of feeds in the Mobility Database. Several producers have expressed concern with this new error.

Analysis report

Python script

Proposed solution

Implement a threshold where this ERROR only generates if the difference between the max trips.txt shape_dist_traveled and the max shapes.txt shape_dist_traveled is greater than 1.11m, similar to our plans for equal_shape_dist_traveled_diff_coordinates. Rerun the acceptance tests to see how many feeds trigger this rule.

A warning should be added for when trip_distance_exceeds_shape_distance is less than 1.11m.

Alternatives you've considered

No response

Additional context

No response

@emmambd emmambd added enhancement New feature request or improvement on an existing feature status: Needs triage Applied to all new issues labels Nov 27, 2023
@emmambd emmambd moved this to MobilityData Backlog (bugs & maintenance) in Schedule Validator Q2 backlog Nov 27, 2023
@emmambd emmambd moved this to MobilityData backlog (enhancements) in Schedule Validator Q2 backlog Nov 27, 2023
@stevenmwhite
Copy link

I'm curious what the specific concern from producers is. In my opinion, if there's an error, it should be stated in the validator report. The error itself does not depend on whether the distance traveled passes a certain threshold or not, so I'd suggest not putting a threshold in whether the error is reported.

As a producer, I would want to know that the feeds I produce have the error (and some of the feeds my company has produced do in fact have this error) so that I can fix it, and not have the validator refrain from telling me it if it's "not bad enough."

@emmambd emmambd added status: Blocked Can't work on it currently because of an external factor. and removed status: Needs triage Applied to all new issues status: Blocked Can't work on it currently because of an external factor. labels Dec 4, 2023
@stevenmwhite
Copy link

stevenmwhite commented Dec 7, 2023

A few of our project managers working directly with customer agencies have noted that transit agencies are getting spooked because the NTD Program Manager is forwarding them validation reports that show this error. While they specifically state that the reports are offered "solely [as a] courtesy and the resolution of the above errors are not required for FTA to accept your 2023 NTD report" the agencies have a fear that it may impact NTD report acceptance in the future.

Perhaps this is also why some other producers are concerned?

If that's the case, maybe we adjust the suggestion of excluding errors when the difference is less than 1% to instead report a warning when the difference is less than 1%. That way, we can still get the information out of the validator regardless of the size of the difference, but don't need to spook anyone with a minor, non-impactful error?

I wouldn't say this is my preference (an error is an error and if this is truly "canonical" it should call it an error), but because the error doesn't have much impact in those minor cases, I'm certainly open to it being a warning if that helps ease concerns.

EDIT: On the other hand, if this doesn't have much impact for consumers perhaps it should just be a warning all the time? I'm not sure if that's canonically accurate, but would also be fine for me. I just don't want to fully exclude any instances.

@emmambd emmambd added this to the Now milestone Jan 9, 2024
@emmambd
Copy link
Contributor Author

emmambd commented Jan 12, 2024

Thanks for sharing your thoughts publicly @stevenmwhite! We dug into some old conversations about distance thresholds on the validator, and plan to work on changing the rule this quarter so

  • any difference above 1.11m = ERRROR
  • any difference below 1.11m = WARNING

We'll see if we should make the threshold more permissive than that based on the number of feeds we see still triggered by the ERROR and what their differences are.

@emmambd emmambd added the status: Ready An issue that is ready to be worked on. label Jan 12, 2024
@stevenmwhite
Copy link

@emmambd The one issue I see with this is that the distance along route does not have a unit specified in GTFS. Some feeds may be in meters, some in kilometers, some in miles... it could even be in "average lengths of Steve's stride" (I hope nobody's feed using that as their distance measurements, haha!)

Our feeds happen to be in meters, but that's not part of the spec and we can't assume that all feeds are.

@sivakarthik07
Copy link

sivakarthik07 commented Feb 1, 2024

Please share if we already have a pre-release msi available for this fix. We are seeing a ton of errors due to the trip distance exceeding the shape distance by 0.2 meters or less and we would like to get this fixed before testing the same infront of the transit authority. Is it already available, if not any approximate date on when it will be?

@emmambd
Copy link
Contributor Author

emmambd commented Feb 1, 2024

@stevenmwhite Thanks for that note Steve! We've been exploring ways we can check the distance between the lat longs of the last stop and the end of the shape in cases where the threshold is met.

@sivakarthik07 There is no PR or fix for this issue yet, but we are planning to deliver it for our next release in the end of March. How are you using the validator?

@sivakarthik07
Copy link

sivakarthik07 commented Feb 1, 2024

@emmambd Thanks for responding! I am using the windows based 4.2 version of the validator.
I guess end of March should be OK, but a pre-release version before would be much appreciated.

@emmambd
Copy link
Contributor Author

emmambd commented Feb 21, 2024

Thanks @sivakarthik07 ! We're currently working on a fix and plan to provide package installers for every merge to main within the next month as well.

Could you share some sample feeds to see how our new threshold impacts them?

@emmambd
Copy link
Contributor Author

emmambd commented Mar 8, 2024

cc @stevenmwhite @sivakarthik07

An update on the threshold that will be used in this rule — we initially set the threshold to 1.11m between the last shape point and the last stop point, but were still concerned about it being too restrictive given the high number of concerns we've received with this notice. After doing several distribution analyses of the data, we've concluded that 11.1m is permissive enough while still being useful to consumers to flag major problems in the shape_dist_traveled values.

Why 11.1m?

  • This is an extension of our logic for setting a threshold for equal_shape_dist_traveled_diff_coordinates, capturing differences between same values that resulting from rounding to 4 decimal places in lat/long values.

  • All 50th percentiles in the min distance distributions fell under 11.1m, indicating it's not restrictive for real world use cases.

We'll proceed with this threshold and seek feedback after the upcoming release this month for any other feedback on this notice change.

@stevenmwhite
Copy link

stevenmwhite commented Mar 8, 2024

Thanks @emmambd, I know this will relieve the concern from our project teams and customers in most cases.

Just two questions:

  1. Are you measuring the meter distance using lat/long values, knowing that the shape_distance_traveled value is unit-less?
  2. Has there been consideration to still throw a warning (as opposed to an error) for values that are beyond the shape but within the threshold? As a producer, it's really helpful to know that we can trust the validator to be as clear as possible and know that we'll be able to discover any/every case where a feed we produce doesn't match the spec. With this change, we're now adding subjective judgement to the validator and there'll be cases where feeds are "incorrect" but we won't learn that from the validator.

@emmambd
Copy link
Contributor Author

emmambd commented Mar 8, 2024

@stevenmwhite Hi Steve,

  1. Correct - the meter distance calculation is based on the lat/longs precisely because there is no unit defined for shape_distance_traveled.

  2. yes, included in the PR is a warning, so anything below this threshold will trigger a WARNING rather than nothing. We agree that all cases should be flagged when we're applying a threshold, just with varying severity.

@stevenmwhite
Copy link

Perfect! That seems to cover all of my concerns!

@github-project-automation github-project-automation bot moved this from MobilityData backlog (enhancements) to Done in Schedule Validator Q2 backlog Mar 11, 2024
@emmambd
Copy link
Contributor Author

emmambd commented Mar 11, 2024

@sivakarthik07 You can add this fix into your pipeline now by using the package installers associated with the most recent commit to master.

Please note that there's been a breaking change to the JSON report to improve the validatedAt timestamp: #1691. More info will be included about this in the release later this month.

@emmambd emmambd removed this from the Now milestone Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature request or improvement on an existing feature status: Ready An issue that is ready to be worked on.
Projects
Development

Successfully merging a pull request may close this issue.

4 participants