-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add truncation task #27
Comments
Another aspect of this is how it aligns with the long-term goals of Activity Feed. The designs for activity feed use snapshots in a very targeted way, which is to show the latest unpublished changes. Previously unpublished snapshots are a secondary feature. With a truncation task active, the power-user feature of old activity would only be available on a small set of versions, and that's probably fine. If anyone complains, they can just crank up the config setting. |
Can we express this in SQL? Even a prefiltered list of "old snapshot" (by date) might accumulate a few entries where drafts are left unpublished for a long time FYI, in terms of database storage, it's $0.276 per GB-month for a MySQL RDS in Sydney region. Plus $0.095 per GiB-month for additional backup storage. So ... nothing. The impact of additional required write capacity is probably higher, since it might push you in the next database tier, which can often add ~50% to the cost of the database component (dozens to hundreds of dollars). But unless you're doing something funky around batch publishing, the writes should be sporadic peaks only based on author activity. |
storage isn't expensive but the outage when the db locks up due to lack of disk space is. |
Summary
As we scale this module out, one of the hurdles we're bound to encounter is the unbound expansion of the database it implies. Because the two snapshot tables every edit in the ownership chain, it at least doubles the size of the
_versions
tables. Unlike_versions
, however, the tables are monolithic, so there are no writes to ancestral tables. All in all, "finger in the air" analysis, snapshots have at least an equal impact on database size as versions, and we're effectively doubling the problem.Example scenario
In the typical set up we base our tests on:
Versions only paradigm
BlockPage_versions
,Page_versions
,SiteTree_versions
(3)Block_versions
,Element_versions
(2)BlockGallery_versions
(1)BlockImage_versions
(1)Roughly 7 new rows over this hypothetical editing timeline.
Snapshot paradigm
BlockPage_versions
,Page_versions
,SiteTree_versions
(3)Snapshot
(1)SnapshotItem
(1)Block_versions
,Element_versions
(2)Snapshot
(1)SnapshotItem
(Block, BlockPage) (2)BlockGallery_versions
(1)Snapshot
(1)SnapshotItem
(BlockGallery, Block, BlockPage) (3)BlockImage_versions
(1)Snapshot
(1)SnapshotItem
(BlockImage, BlockGallery, Block, BlockPage) (4)Roughly 21 new rows over the same hypothetical editing timeline.
Impact
Database size, when left unchecked, can amount to a critical issue culminating in a loss of service for a client. While mitigations are marginally cheap to implement (e.g. more disk space), it requires a level of management and proactivity for the service provider that in theory never ends. Further, large databases have significant impacts on developer experience, as they are less portable, and in some cases, impossible to export.
It is unlikely that there are any significant runtime performance impacts to having a large number of snapshots, as all the queries are against indexed columns, but it's not impossible to imagine that it could become a problem given enough volume.
Possible solution
Ideally, we find a safe way to truncate the Snapshots tables in a way that:
Earlier investigations into truncating _versions tables proved fruitless as the implicit dependency graph across multiple unrelated tables was impossible to predict at the database level. Fortunately, with the monolithic table approach we use in Snapshots, it seems possible that we could truncate history, albeit with a less-performant executable task rather than a simple query.
Suggested implementation
SnapshotTruncationTask
truncate_irrelevant_snapshots_after
(name TBD)Pseudo-code:
The text was updated successfully, but these errors were encountered: