Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batching in UT to support million ios for baseline test. #510

Merged
merged 1 commit into from
Aug 26, 2024

Conversation

sanebay
Copy link
Contributor

@sanebay sanebay commented Aug 16, 2024

Add batching of read and write values in UT.
Keep mapping a index to key value pairs for quick lookup during snapshot read.
Add shutdown and start to simulate follower down, writes complete, follower start. Earlier we used to restart with sleep, but for 1 million io's or different values of num_io's we dont know how much to sleep.

To test
./test_raft_repl_dev --gtest_filter=RaftReplDevTest.BaselineTest --num_io=1000000 --log_mods=replication:debug --config_path ./config --dev_size_mb=2048600 --snapsho
t_distance=0

@xiaoxichen
Copy link
Collaborator

I think we can have a general test case for both baseline and incremental, the pattern is similar, only difference is if
X> #snapshot_to_keep * snapshot_distance ? mode = baseline : mode= incremental

while(#IO < TARGET_IOS_TO_RUN) {
  1. write some data
  2. shutdown one replica
  3. write  X reqs
  4. start up the down replica.
  [5. optionally ]keep writing during start up
  6. wait the replica sync.
}
//Lets have a huge one
erase data on one replica, maybe go through format again?  and let is resync from begining.

It needs a night for long running, hope we can exercise recovery more times vs exercise a recovery with big gap.

@sanebay
Copy link
Contributor Author

sanebay commented Aug 19, 2024

It needs a night for long running, hope we can exercise recovery more times vs exercise a recovery with big gap.

Yeah this is a nice test case to add. We can add this in our SM long running. raft_repl UT is more like functionality test with smaller and larger number of inputs.

@xiaoxichen
Copy link
Collaborator

in SM it is hard to ensure we go through incremental or baseline. I think it is a good waste to generate 1M IOs and only test one recovery:)

@sanebay sanebay force-pushed the baseline_test_batch branch from 334f941 to 272b1ca Compare August 22, 2024 20:04
@yamingk
Copy link
Contributor

yamingk commented Aug 26, 2024

I think we can have a general test case for both baseline and incremental, the pattern is similar, only difference is if X> #snapshot_to_keep * snapshot_distance ? mode = baseline : mode= incremental

while(#IO < TARGET_IOS_TO_RUN) {
  1. write some data
  2. shutdown one replica
  3. write  X reqs
  4. start up the down replica.
  [5. optionally ]keep writing during start up
  6. wait the replica sync.
}
//Lets have a huge one
erase data on one replica, maybe go through format again?  and let is resync from begining.

It needs a night for long running, hope we can exercise recovery more times vs exercise a recovery with big gap.

In hour HomeStore 4.x replication long running, we definitely should have test case running recovery a few hundred times (we also do it with nublox 1.3 homestore long run) during the night time, (with I/Os running before and after recovery) and keep rebooting one replica. Our goal is to "abuse" HomeStore so badly so it won't "abuse" us in production.

With it being said, it can be done by someone else and in another PR (create an issue and track it).

yamingk
yamingk previously approved these changes Aug 26, 2024
@sanebay sanebay force-pushed the baseline_test_batch branch from fd82772 to a7c2b9c Compare August 26, 2024 18:28
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 67.40%. Comparing base (1a0cef8) to head (a7c2b9c).
Report is 49 commits behind head on master.

Files Patch % Lines
.../lib/replication/log_store/home_raft_log_store.cpp 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           master     #510       +/-   ##
===========================================
+ Coverage   56.51%   67.40%   +10.89%     
===========================================
  Files         108      109        +1     
  Lines       10300    10419      +119     
  Branches     1402     1398        -4     
===========================================
+ Hits         5821     7023     +1202     
+ Misses       3894     2717     -1177     
- Partials      585      679       +94     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sanebay sanebay merged commit 7ab82d2 into eBay:master Aug 26, 2024
21 checks passed
@sanebay sanebay deleted the baseline_test_batch branch August 26, 2024 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants