Add long running log store tests. #419

sanebay · 2024-05-13T23:32:16Z

Changes.

Fix the truncation in corner cases.
Add long running test for logstore. Stripped down version of test_log_store. Add burst of requests to all logstores, truncate, restart, add/remove logstores logdev's, ad rollbacks etc.
Enable logstore test except the parallel truncation UT.

More details.
Journalvdev maintains a list of chunks to store all the log entries. All append log entries are appended to the last chunk in the list(right side/tail offset) and truncate is applied to the head of the chunk list(left part/ data start offset). Whenever we append log entries, if we dont have enough space, we create a new chunk and append to the list. So log groups(batch of log entries) dont go across chunks. So there will be holes in these chunks at the end which are marked by end_of_chunk in chunk private data. The hole lies between end_of_chunk and (chunk_start + chunk_size) . When we read and we reach this hole, there is no data so we skip and move to the next chunk. Similarly if truncate happens to be in that hole also, we release the whole chunk and move to the next chunk. Also we set the data_start_offset to the start of the next chunk.

src/lib/device/journal_vdev.cpp

yamingk · 2024-06-07T18:33:09Z

We also need to add flip point when we add new chunks to the journal_vdev, in which we do persistence of the private data and think it through what will happen on reboot and verify with a test case.

yamingk · 2024-06-07T18:36:46Z

Ideally we should have a long running either at log store level or journal vdev level, that we do truncate periodically for a few hours and make sure it is running fine.
Another test case is run journal or logstore test for 30 mins (with trunation happening more frequently), do a clean shutdown and do another 15 mins run and repeat clean shutdown for multiple times.

We can review these and see if there is anything comment from others and create issues (not necessaries to be included in this PR).

sanebay · 2024-06-10T23:28:10Z

Ideally we should have a long running either at log store level or journal vdev level, that we do truncate periodically for a few hours and make sure it is running fine. Another test case is run journal or logstore test for 30 mins (with trunation happening more frequently), do a clean shutdown and do another 15 mins run and repeat clean shutdown for multiple times.

We can review these and see if there is anything comment from others and create issues (not necessaries to be included in this PR).

test_log_store_long_run.cpp is doing that what you mentioned. Will have to run it on 85 namespace.

sanebay · 2024-06-10T23:28:34Z

We also need to add flip point when we add new chunks to the journal_vdev, in which we do persistence of the private data and think it through what will happen on reboot and verify with a test case.

Added this in the doc.

yamingk · 2024-06-11T00:28:57Z

Ideally we should have a long running either at log store level or journal vdev level, that we do truncate periodically for a few hours and make sure it is running fine. Another test case is run journal or logstore test for 30 mins (with trunation happening more frequently), do a clean shutdown and do another 15 mins run and repeat clean shutdown for multiple times.
We can review these and see if there is anything comment from others and create issues (not necessaries to be included in this PR).

test_log_store_long_run.cpp is doing that what you mentioned. Will have to run it on 85 namespace.

Right, is the truncation point ramdomized?

yamingk · 2024-06-11T00:31:35Z

We also need to add flip point when we add new chunks to the journal_vdev, in which we do persistence of the private data and think it through what will happen on reboot and verify with a test case.

Added this in the doc.

Better to create issue otherwise it will lose track which are already there which are todos.

sanebay · 2024-06-11T00:40:49Z

Ideally we should have a long running either at log store level or journal vdev level, that we do truncate periodically for a few hours and make sure it is running fine. Another test case is run journal or logstore test for 30 mins (with trunation happening more frequently), do a clean shutdown and do another 15 mins run and repeat clean shutdown for multiple times.
We can review these and see if there is anything comment from others and create issues (not necessaries to be included in this PR).

test_log_store_long_run.cpp is doing that what you mentioned. Will have to run it on 85 namespace.

Right, is the truncation point ramdomized?

Yes test_log_store_long_run.cpp:459.

sanebay · 2024-06-11T22:14:57Z

Created a ticket to use version instead of created time (#441)

Fix truncation issues on boundary cases. Release chunks if truncate cross end of chunk boundaries. Enable logstore test except the parallel write and truncate test case. Truncate can cause data start to go to next chunk start offset. Change truncate api to return that offset.

sanebay force-pushed the logstore_long_running branch from 6f1a673 to 0db6b23 Compare May 15, 2024 19:39

sanebay requested review from yamingk and hkadayam May 15, 2024 19:41

sanebay linked an issue May 16, 2024 that may be closed by this pull request

Logstore long duration testing with multiple logdevs #301

Closed

sanebay force-pushed the logstore_long_running branch from 0db6b23 to 54dd65e Compare May 21, 2024 20:30

yamingk added this to the MileStone4.2 milestone May 21, 2024

shosseinimotlagh assigned shosseinimotlagh and unassigned shosseinimotlagh May 29, 2024

yamingk reviewed May 29, 2024

View reviewed changes

src/lib/device/journal_vdev.cpp Outdated Show resolved Hide resolved

yamingk reviewed May 29, 2024

View reviewed changes