Skip to content
Andy Jackson edited this page Nov 14, 2013 · 6 revisions

A critical aspect of w3act is to decide whether a given URL is in scope. In particular, although any URL is allowed to be entered into w3act, the user interface MUST make it clear to the user whether the Target is in scope, and the [Crawl Feeds] MUST NOT contain Targets that are not in scope.

Current Scoping Rules

A Target is in scope if any of the following statements is true:

  • All URLs for this Target meet at least one of the following criteria:
    • The authority of the URI (i.e. the hostname) end with '.uk'. (AUTO)
    • The IP address associated with the URI is geo-located in the UK (AUTO, using the GeoIP database).
  • The Target is known to be hosted in the UK (manual boolean field).
  • The Target features an page that specified a UK postal address (a manual boolean field plus a text field to hold a specific URL that contains the address).
  • The Target is known to be a UK publication, according to correspondence with a curator (a manual boolean field plus a text field to hold details of the correspondence).
  • The Target is known to be a UK publication, in the professional judgement of a curator (a manual boolean field plus a text field to hold the justification).

This is a policy matter, and so may change in the future. Therefore, the code that implements this logic must be declared one in a well-specified location and re-used throughout.

Upon updating any of the relevant fields, this code should be re-run and used to populate an 'is in scope' field. This can be used to give immediate feedback to users as to whether further information is needed.