-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending the API to expose crawl events as an RSS/Atom feed #28
Comments
Looking quickly at using the existing URL scanning experimental tools and seeing what the timings are like. First scanning up to 100 million entries from the Guardian, for all time... And it takes a while... Just under 7 minutes! Now, if we restrict it to the last few months, does that help, or is it all scanning time.... on no Ho hum, this means querying CDX for changes as an API is likely a not going to work, as client will timeout. Caching is possible, but would basically mean deriving millions of results from full table scans. |
So, host-level change feeds would need to run off one of
|
At the level of individual URLs, expose CDX information as an RSS feed of crawl events, allowing users to be notified if a particularly interesting page is changed, e.g.
e.g.
/api/mementos/rss?url=http://example.com/
where (by default) only changes, crawls with different hashes, are reported.
The text was updated successfully, but these errors were encountered: