Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: reformat trip and shape dist validator #1676

Merged
merged 10 commits into from
Mar 11, 2024
Merged

feat: reformat trip and shape dist validator #1676

merged 10 commits into from
Mar 11, 2024

Conversation

cka-y
Copy link
Contributor

@cka-y cka-y commented Feb 18, 2024

Summary:
Resolves #1613 and #1611 by implementing a distance threshold for the trip_distance_exceeds_shape_distance notice.

Expected Behavior:

  • Introduces a 11.1m threshold for trip_distance_exceeds_shape_distance, triggering an ERROR for distances $\geq 11.1m$.
  • Creates a new notice, trip_distance_exceeds_shape_distance_below_threshold, with WARNING severity for distances $\lt11.1m$.
  • This update streamlines the cross-validation of trips against shapes by adhering to the GTFS specification. This approach has resulted in minimal changes to validation outcomes as evidenced here. Any minor discrepancies arise from instances where the feed does not comply with the specification, often leading to ERROR level notices such as decreasing_or_equal_stop_time_distance and decreasing_shape_distance.
    • By capitalizing on the expectation that both stop-time and shape distances should incrementally increase, the validation process is optimized. Instead of evaluating all points, we now only assess the last one, which changes our processing time complexity from linear to constant for these elements.

Empirical Performance Comparison:
Considering $n$ as the number of trips, $m$ as the number of stop-times, and $k$ as the number of shapes, the complexity in the worst-case scenario:

  • For the master branch is $\Omega(k^2nm)$
  • For the feature branch (feat/1613) is $\Omega(nk)$

Statistical Performance Comparison:
The performance improvements are depicted in the graph below. The datasets analyzed are from our catalog, with zipped file sizes of at least 1MB. Sizes have been normalized for a more meaningful comparison of slopes. The performance slope of the feature branch is significantly lower, decreasing from approximately 37 to approximately 25, indicating enhanced efficiency.
time_improvement_comparison

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with gradle test to make sure you didn't break anything
  • Add or update any needed documentation to the repo
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

@MobilityData MobilityData deleted a comment from github-actions bot Feb 19, 2024
@MobilityData MobilityData deleted a comment from github-actions bot Feb 19, 2024
Copy link
Contributor

✅ Rule acceptance tests passed.
New Errors: 0 out of 1485 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 1 out of 1485 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
New Warnings: 0 out of 1485 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Warnings: 0 out of 1485 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
1 out of 1486 sources (~0 %) are corrupted.
Corrupted sources:
us-district-of-columbia-dc-circulator-gtfs-486
Commit: 89d1cc1
Download the full acceptance test report here (report will disappear after 90 days).
✅ Rule acceptance tests passed.

Copy link
Contributor

❌ Invalid acceptance test.
New Errors: 0 out of 1485 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 17 out of 1485 datasets (~1%) are invalid due to code change, which is above the provided threshold of 1%.
New Warnings: 25 out of 1485 datasets (~2%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1485 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
1 out of 1486 sources (~0 %) are corrupted.
Corrupted sources:
fi-etela-pohjanmaa-komia-liikenne-gtfs-1255
Commit: bb18676
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

Copy link
Contributor

github-actions bot commented Mar 6, 2024

❌ Invalid acceptance test.
New Errors: 0 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 36 out of 1520 datasets (~2%) are invalid due to code change, which is above the provided threshold of 1%.
New Warnings: 82 out of 1520 datasets (~5%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1520 sources (~0 %) are corrupted.
Commit: 97dc096
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

@cka-y cka-y marked this pull request as ready for review March 6, 2024 13:27
@emmambd emmambd self-requested a review March 6, 2024 13:58
Copy link
Contributor

@emmambd emmambd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice description + name minor revisions.

Copy link
Contributor

github-actions bot commented Mar 8, 2024

❌ Invalid acceptance test.
New Errors: 0 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 37 out of 1520 datasets (~2%) are invalid due to code change, which is above the provided threshold of 1%.
New Warnings: 82 out of 1520 datasets (~5%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1520 sources (~0 %) are corrupted.
Commit: b9deab7
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

Copy link
Contributor

❌ Invalid acceptance test.
New Errors: 0 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 37 out of 1520 datasets (~2%) are invalid due to code change, which is above the provided threshold of 1%.
New Warnings: 91 out of 1520 datasets (~6%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1520 sources (~0 %) are corrupted.
Commit: cb6887b
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

Copy link
Contributor

@jcpitre jcpitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive, again!

Copy link
Contributor

❌ Invalid acceptance test.
New Errors: 0 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 37 out of 1520 datasets (~2%) are invalid due to code change, which is above the provided threshold of 1%.
New Warnings: 91 out of 1520 datasets (~6%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1520 sources (~0 %) are corrupted.
Commit: ff5278a
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

@cka-y cka-y merged commit 2f9fccd into master Mar 11, 2024
332 of 333 checks passed
@cka-y cka-y deleted the feat/1613 branch March 11, 2024 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants