Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unit-state.db grows too large on some deployments #736

Open
xmkg opened this issue Sep 20, 2022 · 2 comments
Open

unit-state.db grows too large on some deployments #736

xmkg opened this issue Sep 20, 2022 · 2 comments

Comments

@xmkg
Copy link
Contributor

xmkg commented Sep 20, 2022

The size of unit-state.db is growing too large (>=10GiB) on some deployments, causing out-of-space issues.

I've prepared a script to generate a summary of the database, which can be found in the attachments. The script summarizes the kv_revisions and hooks tables and runs sqlite3_analyzer over the database.
py-sqlite-analyzer.zip
I ran it over a large DB (13GiB), the output is as follows:

- summary of kv_revisions table - 
key              count(data)  AVG(length(data))  MAX(revision)
---------------  -----------  -----------------  -------------
charm_revisions  1            7.0                1            
env              189256       69926.7616138986   604476       
relid            164          13.0853658536585   579128       
unit             1            21.0               1           
- summary of hooks table -  
hook                   count(version)  min(date)                   max(date)                 
---------------------  --------------  --------------------------  --------------------------
amqp-relation-changed  117             2020-12-03T04:46:07.148879  2022-08-29T06:43:19.686370
amqp-relation-joined   9               2020-12-03T04:43:42.104154  2020-12-03T04:53:22.280197
ceph-access-relation-  30              2020-12-03T04:45:49.334308  2021-02-12T05:38:05.756322
ceph-access-relation-  24              2020-12-03T04:45:04.749447  2021-02-12T05:37:37.780087
ceph-relation-changed  137             2020-12-03T04:51:37.991545  2021-06-25T04:26:48.593918
ceph-relation-joined   12              2020-12-03T04:45:36.308443  2020-12-03T04:56:51.483697
cloud-compute-relatio  780             2020-12-03T04:54:55.135751  2022-08-29T06:43:36.708740
cloud-compute-relatio  9               2020-12-03T04:44:46.350884  2020-12-03T05:02:08.888557
compute-peer-relation  120             2020-12-03T04:38:06.043974  2021-06-25T03:57:23.200963
config-changed         106             2020-12-03T04:36:59.939021  2022-08-29T05:31:02.002973
image-service-relatio  12              2020-12-03T04:59:59.193568  2020-12-03T05:02:45.174421
install                3               2020-12-03T04:27:44.370234  2020-12-03T04:27:45.155717
neutron-plugin-relati  4               2020-12-03T04:41:51.152141  2020-12-03T04:42:11.740906
neutron-plugin-relati  3               2020-12-03T04:40:41.837813  2020-12-03T04:40:43.026942
nova-ceilometer-relat  4               2020-12-03T04:46:22.493887  2020-12-03T04:46:25.806789
nova-ceilometer-relat  3               2020-12-03T04:45:17.648627  2020-12-03T04:45:18.930743
nrpe-external-master-  3               2020-12-03T04:48:18.677064  2020-12-03T04:48:20.081757
nrpe-external-master-  3               2020-12-03T04:47:07.565641  2020-12-03T04:47:08.983417
secrets-storage-relat  30              2020-12-03T04:40:07.464698  2021-10-04T23:44:07.813179
secrets-storage-relat  3               2021-10-05T02:05:45.316307  2021-10-05T02:05:47.516711
secrets-storage-relat  12              2020-12-03T04:38:43.269038  2021-10-04T23:22:20.167861
start                  3               2020-12-03T04:37:27.040447  2020-12-03T04:37:28.201241
update-status          603045          2020-12-03T05:14:04.185822  2022-09-20T06:58:23.713675
upgrade-charm          7               2021-02-12T01:37:42.459655  2022-05-17T09:12:44.763132
/** Disk-Space Utilization Report For /var/lib/juju/agents/unit-nova-compute-kvm-20/charm/.unit-state.db

Page size in bytes ............................................. 4096
Pages in the whole file (measured) .......................... 3249243
Pages in the whole file (calculated) ........................ 3249242
Pages that store data ....................................... 3249242   100.00%
Pages on the freelist (per header) ................................ 0
Pages on the freelist (calculated) ................................ 1
Pages of auto-vacuum overhead ..................................... 0     0.00%
Number of tables in the database .................................. 5
Number of indices ................................................. 2
Number of defined indices ......................................... 0
Number of implied indices ......................................... 2
Size of the file in bytes: .............................. 13308899328
Bytes of user payload stored ............................ 13262405906    99.65%

*** Page counts for all tables with their indices *****************************

KV_REVISIONS ................................................ 3241918    99.77%
HOOKS .......................................................... 7303     0.22%
KV ............................................................... 19     0.00%
SQLITE_MASTER ..................................................... 1     0.00%
SQLITE_SEQUENCE ................................................... 1     0.00%

*** Page counts for all tables and indices separately *************************

KV_REVISIONS ................................................ 3241082    99.75%
HOOKS .......................................................... 7303     0.22%
SQLITE_AUTOINDEX_KV_REVISIONS_1 ................................. 836     0.03%
KV ............................................................... 18     0.00%
SQLITE_SEQUENCE ................................................... 1     0.00%
SQLITE_MASTER ..................................................... 1     0.00%
SQLITE_AUTOINDEX_KV_1 ............................................. 1     0.00%

*** All tables and indices ****************************************************

Percentage of total database ................................... 100%
Number of entries ............................................ 983346
Bytes of storage consumed ............................... 13308895232
Bytes of payload ........................................ 13264825620    99.67%
Bytes of metadata .......................................... 31951265     0.24%
Average payload per entry .................................. 13489.48
Average unused bytes per entry ................................ 24.64
Average metadata per entry .................................... 32.49
Non-sequential pages ........................................... 8112     0.25%
Maximum payload per entry ..................................... 70169
Entries that use overflow .................................... 189257    19.25%
Index pages used ................................................. 85
Primary pages used ............................................ 31788
Overflow pages used ......................................... 3217369
Total pages used ............................................ 3249242
Unused bytes on index pages ................................... 54415    15.63%
Unused bytes on primary pages ............................... 2518664     1.93%
Unused bytes on overflow pages ............................. 21657716     0.16%
Unused bytes on all pages .................................. 24230795     0.18%


*** All tables ****************************************************************

Percentage of total database ............................... 99.9742%
Number of entries ............................................ 793916
Bytes of storage consumed ............................... 13305466880
Bytes of payload ........................................ 13262406588    99.68%
Bytes of metadata .......................................... 31372939     0.24%
Average payload per entry .................................. 16705.05
Average unused bytes per entry ................................ 29.98
Average metadata per entry .................................... 39.52
Non-sequential pages ........................................... 7283     0.22%
Maximum payload per entry ..................................... 70169
Entries that use overflow .................................... 189257    23.84%
Index pages used ................................................. 79
Primary pages used ............................................ 30957
Overflow pages used ......................................... 3217369
Total pages used ............................................ 3248405
Unused bytes on index pages ................................... 46316    14.31%
Unused bytes on primary pages ............................... 2095769     1.65%
Unused bytes on overflow pages ............................. 21657716     0.16%
Unused bytes on all pages .................................. 23799801     0.18%


*** All indices ***************************************************************

Percentage of total database ............................. 0.0257598%
Number of entries ............................................ 189430
Bytes of storage consumed ................................... 3428352
Bytes of payload ............................................ 2419032    70.56%
Bytes of metadata ............................................ 578326    16.87%
Average payload per entry ..................................... 12.77
Average unused bytes per entry ................................. 2.28
Average metadata per entry ..................................... 3.05
Non-sequential pages ............................................ 829    99.16%
Maximum payload per entry ........................................ 51
Entries that use overflow ......................................... 0     0.00%
Index pages used .................................................. 6
Primary pages used .............................................. 831
Overflow pages used ............................................... 0
Total pages used ................................................ 837
Unused bytes on index pages .................................... 8099    32.95%
Unused bytes on primary pages ................................ 422895    12.42%
Unused bytes on overflow pages .................................... 0     0.00%
Unused bytes on all pages .................................... 430994    12.57%

*** Table KV_REVISIONS and all its indices ************************************

Percentage of total database ............................... 99.7746%
Number of entries ............................................ 378844
Bytes of storage consumed ............................... 13278896128
Bytes of payload ........................................ 13238742293    99.70%
Bytes of metadata .......................................... 28216692     0.21%
Average payload per entry ................................... 34945.1
Average unused bytes per entry ................................ 63.48
Average metadata per entry .................................... 74.48
Non-sequential pages ............................................ 829     0.03%
Maximum payload per entry ..................................... 70169
Entries that use overflow .................................... 189256    49.96%
Index pages used ................................................. 66
Primary pages used ............................................ 24500
Overflow pages used ......................................... 3217352
Total pages used ............................................ 3241918
Unused bytes on index pages ................................... 42171    15.60%
Unused bytes on primary pages ............................... 2349745     2.34%
Unused bytes on overflow pages ............................. 21657611     0.16%
Unused bytes on all pages .................................. 24049527     0.18%


*** Table KV_REVISIONS w/o any indices ****************************************

Percentage of total database ............................... 99.7488%
Number of entries ............................................ 189422
Bytes of storage consumed ............................... 13275471872
Bytes of payload ........................................ 13236323402    99.71%
Bytes of metadata .......................................... 27638398     0.21%
B-tree depth ...................................................... 3
Average payload per entry .................................. 69877.43
Average unused bytes per entry ............................... 124.71
Average metadata per entry ................................... 145.91
Non-sequential pages .............................................. 0     0.00%
Maximum payload per entry ..................................... 70169
Entries that use overflow .................................... 189256    99.91%
Index pages used ................................................. 60
Primary pages used ............................................ 23670
Overflow pages used ......................................... 3217352
Total pages used ............................................ 3241082
Unused bytes on index pages ................................... 34072    13.86%
Unused bytes on primary pages ............................... 1930773     1.99%
Unused bytes on overflow pages ............................. 21657611     0.16%
Unused bytes on all pages .................................. 23622456     0.18%

*** Index KV_REVISIONS of table SQLITE_AUTOINDEX_KV_REVISIONS_1 ***************

Percentage of total database ............................. 0.0257291%
Number of entries ............................................ 189422
Bytes of storage consumed ................................... 3424256
Bytes of payload ............................................ 2418891    70.64%
Bytes of metadata ............................................ 578294    16.89%
B-tree depth ...................................................... 3
Average payload per entry ..................................... 12.77
Average unused bytes per entry ................................. 2.25
Average metadata per entry ..................................... 3.05
Non-sequential pages ............................................ 829    99.28%
Maximum payload per entry ........................................ 19
Entries that use overflow ......................................... 0     0.00%
Index pages used .................................................. 6
Primary pages used .............................................. 830
Overflow pages used ............................................... 0
Total pages used ................................................ 836
Unused bytes on index pages .................................... 8099    32.95%
Unused bytes on primary pages ................................ 418972    12.32%
Unused bytes on overflow pages .................................... 0     0.00%
Unused bytes on all pages .................................... 427071    12.47%


*** Table HOOKS ***************************************************************

Percentage of total database ............................... 0.22476%
Number of entries ............................................ 604479
Bytes of storage consumed .................................. 29913088
Bytes of payload ........................................... 26012267    86.96%
Bytes of metadata ........................................... 3734216    12.48%
B-tree depth ...................................................... 3
Average payload per entry ..................................... 43.03
Average unused bytes per entry ................................. 0.28
Average metadata per entry ..................................... 6.18
Non-sequential pages ........................................... 7283    99.74%
Maximum payload per entry ........................................ 67
Entries that use overflow ......................................... 0     0.00%
Index pages used ................................................. 19
Primary pages used ............................................. 7284
Overflow pages used ............................................... 0
Total pages used ............................................... 7303
Unused bytes on index pages ................................... 12244    15.73%
Unused bytes on primary pages ................................ 154361     0.52%
Unused bytes on overflow pages .................................... 0     0.00%
Unused bytes on all pages .................................... 166605     0.56%


*** Table KV and all its indices **********************************************

Percentage of total database ........................... 0.000584752%
Number of entries ................................................ 16
Bytes of storage consumed ..................................... 77824
Bytes of payload .............................................. 70367    90.42%
Bytes of metadata ............................................... 210     0.27%
Average payload per entry ................................... 4397.94
Average unused bytes per entry ............................... 456.94
Average metadata per entry .................................... 13.12
Non-sequential pages .............................................. 0     0.00%
Maximum payload per entry ..................................... 69948
Entries that use overflow ......................................... 1     6.25%
Primary pages used ................................................ 2
Overflow pages used .............................................. 17
Total pages used ................................................. 19
Unused bytes on primary pages .................................. 7206    87.96%
Unused bytes on overflow pages .................................. 105     0.15%
Unused bytes on all pages ...................................... 7311     9.39%


*** Table KV w/o any indices **************************************************

Percentage of total database ........................... 0.000553975%
Number of entries ................................................. 8
Bytes of storage consumed ..................................... 73728
Bytes of payload .............................................. 70226    95.25%
Bytes of metadata ............................................... 178     0.24%
B-tree depth ...................................................... 1
Average payload per entry ................................... 8778.25
Average unused bytes per entry ................................ 423.5
Average metadata per entry .................................... 22.25
Non-sequential pages .............................................. 0     0.00%
Maximum payload per entry ..................................... 69948
Entries that use overflow ......................................... 1    12.50%
Primary pages used ................................................ 1
Overflow pages used .............................................. 17
Total pages used ................................................. 18
Unused bytes on primary pages .................................. 3283    80.15%
Unused bytes on overflow pages .................................. 105     0.15%
Unused bytes on all pages ...................................... 3388     4.60%

*** Index KV of table SQLITE_AUTOINDEX_KV_1 ***********************************

Percentage of total database ........................... 3.07764e-05%
Number of entries ................................................. 8
Bytes of storage consumed ...................................... 4096
Bytes of payload ................................................ 141     3.44%
Bytes of metadata ................................................ 32     0.78%
B-tree depth ...................................................... 1
Average payload per entry ..................................... 17.62
Average unused bytes per entry ............................... 490.38
Average metadata per entry ...................................... 4.0
Maximum payload per entry ........................................ 51
Entries that use overflow ......................................... 0     0.00%
Primary pages used ................................................ 1
Overflow pages used ............................................... 0
Total pages used .................................................. 1
Unused bytes on primary pages .................................. 3923    95.78%
Unused bytes on overflow pages .................................... 0     0.00%
Unused bytes on all pages ...................................... 3923    95.78%


*** Table SQLITE_SEQUENCE *****************************************************

Percentage of total database ........................... 3.07764e-05%
Number of entries ................................................. 1
Bytes of storage consumed ...................................... 4096
Bytes of payload ................................................. 11     0.27%
Bytes of metadata ................................................ 12     0.29%
B-tree depth ...................................................... 1
Average payload per entry ...................................... 11.0
Average unused bytes per entry ............................... 4073.0
Average metadata per entry ..................................... 12.0
Maximum payload per entry ........................................ 11
Entries that use overflow ......................................... 0     0.00%
Primary pages used ................................................ 1
Overflow pages used ............................................... 0
Total pages used .................................................. 1
Unused bytes on primary pages .................................. 4073    99.44%
Unused bytes on overflow pages .................................... 0     0.00%
Unused bytes on all pages ...................................... 4073    99.44%


*** Table SQLITE_MASTER *******************************************************

Percentage of total database ........................... 3.07764e-05%
Number of entries ................................................. 6
Bytes of storage consumed ...................................... 4096
Bytes of payload ................................................ 682    16.65%
Bytes of metadata ............................................... 135     3.30%
B-tree depth ...................................................... 1
Average payload per entry .................................... 113.67
Average unused bytes per entry ................................ 546.5
Average metadata per entry ..................................... 22.5
Maximum payload per entry ....................................... 208
Entries that use overflow ......................................... 0     0.00%
Primary pages used ................................................ 1
Overflow pages used ............................................... 0
Total pages used .................................................. 1
Unused bytes on primary pages .................................. 3279    80.05%
Unused bytes on overflow pages .................................... 0     0.00%
Unused bytes on all pages ...................................... 3279    80.05%

... trimmed the description part ...

To summarize the output above; the kv_revisions table is taking %99.8 of the space, and the env variable revisions account for nearly all the rows in the kv_revisions table. In this specific environment, the JSON-serialized list of environment variables is nearly ~70KiB in size, and on each hook invocation, this 70 KiB of data is being pushed into the kv_revisions table. So even if the charm is standing idle, the update-status hook will be invoked every 5 minutes, and this solely will produce 7 GiB of data in a year, and this is for a single charm. This is obviously bad and causes out-of-disk-space issues for some deployments.

I've reviewed the code that produces the revisions, and it is not blindly pushing all environment variables to the database. It checks whether the value has changed before pushing it as a revision. But in the environment variables scenario, some variables like JUJU_CONTEXT_ID constantly change on each hook invocation. So, env gets pushed as a revision no matter what.

To eliminate this issue, I have several ideas at hand:

1-) Implement a policy to keep last N revisions only
2-) Limit hooks & kv_revisions table max row count to N rows at most
3-) Exclude update-status from hooks & kv_revisions
4-) Rotate the database periodically (e.g. daily/weekly/monthly), compress & store the old ones
5-) Store only the delta for env
6-) Do not keep revisions for env at all?
7-) ... any other ideas?

Thanks in advance.

xmkg added a commit to xmkg/charm-helpers that referenced this issue Sep 21, 2022
The unitdata code keeps environment variable revisions in kv_revisions
table, and all of the environment variables serialized into JSON and
pushed into the table on each hook invocation. There is a check to see
if current env variable set is same with the previous one, but some of
the environment variables (e.g. JUJU_CONTEXT_ID) is always changing, which
means whole env variable list is serialized and pushed to the DB on every
single hook invocation. That has become a problem in deployments that
has very large (e.g. ~70 KiB) environment variable lists. Given that
update-status is running periodically (every 5 mins by default) and
having many charms in the same environment, the disk space of the host
running the charms runs out over time and needs manual intervention to
clear up kv_revisions table.

This fix aims to avoid that by keeping only the delta of environment
variable list as a revision. Note that this is a breaking change and
will break any charms that are relying on `kv_revisions` env values

Closes-Bug: LP#1930173
xmkg added a commit to xmkg/charm-helpers that referenced this issue Sep 21, 2022
The unitdata code keeps environment variable revisions in kv_revisions
table, and all of the environment variables serialized into JSON and
pushed into the table on each hook invocation. There is a check to see
if current env variable set is same with the previous one, but some of
the environment variables (e.g. JUJU_CONTEXT_ID) is always changing, which
means whole env variable list is serialized and pushed to the DB on every
single hook invocation. That has become a problem in deployments that
has very large (e.g. ~70 KiB) environment variable lists. Given that
update-status is running periodically (every 5 mins by default) and
having many charms in the same environment, the disk space of the host
running the charms runs out over time and needs manual intervention to
clear up kv_revisions table.

This fix aims to avoid that by keeping only the delta of environment
variable list as a revision. Note that this is a breaking change and
will break any charms that are relying on `env` value's layout on
returned from `gethistory()` function.

Closes-Bug: LP#1930173
xmkg added a commit to xmkg/charm-helpers that referenced this issue Sep 21, 2022
The unitdata code keeps environment variable revisions in kv_revisions
table, and all of the environment variables serialized into JSON and
pushed into the table on each hook invocation. There is a check to see
if current env variable set is same with the previous one, but some of
the environment variables (e.g. JUJU_CONTEXT_ID) is always changing, which
means whole env variable list is serialized and pushed to the DB on every
single hook invocation. That has become a problem in deployments that
has very large (e.g. ~70 KiB) environment variable lists. Given that
update-status is running periodically (every 5 mins by default) and
having many charms in the same environment, the disk space of the host
running the charms runs out over time and needs manual intervention to
clear up kv_revisions table.

This fix aims to avoid that by keeping only the delta of environment
variable list as a revision. Note that this is a breaking change and
will break any charms that are relying on `env` value's layout on
returned from `gethistory()` function.

Closes-Bug: LP#1930173
xmkg added a commit to xmkg/charm-helpers that referenced this issue Sep 21, 2022
The unitdata code keeps variable revisions in kv_revisions table, and
as a part of that all of the environment variables serialized into JSON and
pushed into the table if current set of environment variables differ from the
previous ones. But the code considers environment variable list as a whole and
even if a single environment variable changes, whole list gets pushed as a
revision. Some of the environment variables (e.g. JUJU_CONTEXT_ID) is always
changing, which means whole env variable list is serialized and pushed to
the DB on every single hook invocation. This has become a problem in deployments
with very large (e.g. ~70 KiB) environment variable lists. Given that
update-status is running periodically (every 5 mins by default) and
many charms running in the same environment, the disk space of the host
running the charms runs out over time, raising a need of manual intervention
to clear up the kv_revisions table.

This fix aims to avoid that by keeping only the delta of environment
variable list as a revision. As a result, only the changed environment
variables will be logged into the kv_revisions table. Note that this is a
breaking change and will break any charms that are relying on `env`
value's layout on returned from `gethistory()` function.

Closes-Bug: LP#1930173
@ajkavanagh
Copy link
Contributor

This is an interesting bug report, and I didn't even realise that the unit db kept this data. I wonder why? I can understand keeping data for the main hooks, but update-status seems like the odd one out. I wonder if the solution is to not (by default) log env for update-status?

@xmkg
Copy link
Contributor Author

xmkg commented Sep 22, 2022

I wonder if the solution is to not (by default) log env for update-status?

That would also work, but the bug will resurface if other hooks are called often enough. I have no idea why this data is being kept or whether changing the current behavior would break any downstream charms or not. I tried to locate any charms that rely on gethistory() by brute-force searching on GitHub and OpenDev, but I failed to find any:

GitHub
OpenDev

It would be great if anybody familiar with this particular part of the code base chime in and enlightens us about the rationale and use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants