26-Sep-2018
In the Data Warehouse we have decided to track content through time and use a unique identifier that guarantees consistency across time.
Our initial approach was to use a random uuid:
SecureRandom.uuid #=> "1ca71cd6-08c4-4855-9381-2f41aeffe59c"
Following some feedback from @tijmenb, we have decided to use a compound id for each on of our items. The compound id will have the following format depending of the type of Content Item:
We concat the content_id
and the locale
for Content Item that don't have multiple paths.
warehouse_item_id = "1ca71cd6-08c4-4855-9381-2f41aeffe59c:en"
We concat the content_id
, the locale
and the base_path
for Content Items that have multiple paths (Guides, Travel Advice...)
warehouse_item_id = "1ca71cd6-08c4-4855-9381-2f41aeffe59c:en:/marriage-abroad"
We won't be able to track changes to base_paths, but as of today we have no simple way to work around this. This is a known issue that could be mitigated by getting aggregations at the parent level.
- Use a compound unique id following the naming conventions described above.
- Name it
warehouse_item_id
- The unique identifier is not a random number. It has a meaning which mimics the existing rules that define uniqueness across all GOV.UK content
- It is consistent with Content Publisher: the new publishing application for content.
- The unique identifier enables easy integration for external applications as they can easily build URLs that depend on the unique id.