-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] Why are the feeds from a direct_download
and latest
not the same?
#296
Comments
@wklumpen Thanks for flagging this issue! There shouldn't be any difference between the datasets from the two URLs - if there is, this likely indicates a bug with our Github Actions. Right now there's a cronjob that runs each day to check the |
Of note: I've com across a few broken I'd like to trust the Should I raise an issue (that would then be linked presumably to a PR) for each broken URL? I don't want to come in and stomp on whatever workflow you have going for this. |
@wklumpen I think the broken For the Thanks for checking in and asking about the best approach for this — it's very helpful that you've flagged this! |
Sounds good. There will probably be more to come as I go through basically every agency in a number of US urban areas :) |
@wklumpen We always welcome the help in our data updating/cleaning efforts! Really appreciate it 🚀 |
Thanks! Maybe I'll write a little validation script for the feeds I'm interested in. Stale feeds will cause a problem but that's separate issue. |
You mean to check if the datasets at
Stale as in there's another URL somewhere with more up-to-date data? |
Yes - I was going to do this for the feeds I'm using but internal validation on the MDB end would be even better
Yes, correct. I wonder if there's a possibility to detect if "active" feeds aren't actually being updated (e.g. the latest feed no longer covers the current date) |
@wklumpen re: detecting date range from actual GTFS calendars, we plan on actually opening the feeds and sharing the dynamic data from datasets for V2 of the API we're developing right now (the logic will be from the GTFS Validator). But it won't be another 3-6 months, so if you want to do validation based on the text files themselves, I'd suggest going ahead and doing it yourself. However, re: internal validation, if you are open to just relying on our cronjob pass/fail to verify if |
Apologies if this is asked and answered but a quick search didn't turn anything up.
I've noticed that the feeds that are archived in
latest
often do not match the datasets that come from thedirect_download
(e.g. fewercalendar_date
rows, etc.).An example: Arlington Transit (
mdb_id = 485
) direct download has calendar dates that extend to20240131
while thelatest
URL has dates only to20230902
Is this simply because the set hasn't been updated on a recent pass?
Some further info/documentation on the differences between the two would be ideal, as I'm struggling to understand them from the current field descriptions.
The text was updated successfully, but these errors were encountered: