You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For external parties to know which URLs we can crawl, and hence what is worth posting to the save endpoint or what requires a new W3ACT record, we should allow the current permissible crawl scope to be queried.
Essentially, GET /in-scope?url=http://test.url returns true/false.
This Python Trie implementation would make a good 'backbone' for this kind of scope Oracle. Given a URL, and using urlcanon to generate the SSURTs, it can find matching prefixes and use that to map URLs to scope rules.
For external parties to know which URLs we can crawl, and hence what is worth posting to the
save
endpoint or what requires a new W3ACT record, we should allow the current permissible crawl scope to be queried.Essentially,
GET /in-scope?url=http://test.url
returnstrue/false
.n.b. this is similar to: ukwa/ukwa-heritrix#37
The text was updated successfully, but these errors were encountered: