Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default stress tests #19

Open
maethor opened this issue Nov 30, 2022 · 4 comments
Open

Default stress tests #19

maethor opened this issue Nov 30, 2022 · 4 comments
Labels
good first issue Good for newcomers

Comments

@maethor
Copy link
Collaborator

maethor commented Nov 30, 2022

For the first energizta.sh, my default stress-test is too basic :

sleep 120
stress-ng -q --cpu 1
stress-ng -q --cpu 4
stress-ng -q --cpu 8

We need to check other cases, memory, disk IO, maybe other type of CPU usage. stress-ng should be able to manage all of this.
Any help and advice would be appreciated.

But we also need to limit the number of tests. By default we run each test for one minute. 20 secondes waiting for warmup, and 40 seconds of measures. This can be discussed. But we should be careful that the full test doesn't run for 2 hours. I think we should limit the number of tests to 10 or 15. What do you think?

@github-benjamin-davy
Copy link

Hello @maethor what we did with turbostress was focusing on stressing CPU and RAM but we used different options:
--cpu-load that let's define a CPU target load for the test which is nice because usual stress tests are more all or nothing
--ipsec-mb, a stress test that performs cryptographic processing using advanced instructions like AVX-512 (test called ipsec later). We wanted to observe the impact of such instructions on power consumption.
--vm, a test that specifically performs memory stress methods so that we can observe the impact of memory-intensive workloads
--maximize, in this test stress-ng will launch different types of stressors (CPU, cache, memory, file) and set these to the maximum settings allowed in order to get an estimate of a worst-case scenario => there is an even worse setting that caused the VM to crash not sure it's needed because we want to have an estimation of the consumption in normal working conditions anyway

There are specific disk network stressing modes we didn't investigate.

What would be interesting is to see with a wattmeter how these different settings actually change the measurement and what's the main driver (supposedly CPU - Memory for hardware that doesn't include GPUs).

@da-ekchajzer
Copy link
Contributor

da-ekchajzer commented Mar 8, 2023

With stress-ng

Looking at stress-ng options

I/O

 --iomix N
              start N workers that  perform  a  mix  of  sequential,  random  and  memory  mapped
              read/write  operations  as  well  as  forced  sync'ing  and  (if run as root) cache
              dropping.  Multiple child processes are spawned to all  share  a  single  file  and
              perform different I/O operations on the same file.
-i N, --io N
              start  N workers continuously calling [sync](https://manpages.ubuntu.com/manpages/bionic/man2/sync.2.html)(2) to commit buffer cache to disk.  This
              can be used in conjunction with the --hdd options.
-d N, --hdd N
              start N workers continually writing, reading  and  removing  temporary  files.  The
              default  mode is to stress test sequential writes and reads.  With the --aggressive
              option enabled without any --hdd-opts options the hdd stressor  will  work  through
              all the --hdd-opt options one by one to cover a range of I/O options.

CPU

   --getrandom N

RAM

stress-ng --vm 1 --vm-bytes 75% --vm-method all

Another interesting project

https://github.com/stressapptest/stressapptest

@maethor
Copy link
Collaborator Author

maethor commented Mar 11, 2023

OK, I finally took the time to look at this. Thank you for you suggestions. @github-benjamin-davy and @da-ekchajzer

I think for now we should be looking for generalist test. I don't want to pinpoint specific use cases like crypto or random generation. And I absolutely want to test IOs.

So I played a little, and this is what I see :

  • --io alone is nice because it generates 100% business on disk, but no writes or reads.
  • with --io and --hdd we do not need more that 1 worker. For good measure I will use 2 but states does not seem to change with 16 workers.
  • --hdd 2 alone seems to generate more writes dans --io 2 --hdd 2
  • --iomix seems to be able to cause a lot of load so I will be stresstesting it with 1, 2, 4, 8…
  • --getrandom seems to load cpu_sys instead of cpu_user so I will be using it with 1, 2, 4, 8…
  • --memrate seems to generate a lot of power consumption in the DRAM section of RAPL so I will be stresstesting it with 1, 2, 4, 8…

What I am missing is a disk read test.

@da-ekchajzer
Copy link
Contributor

Here are some tests that read recursively, which seems to be good for reading test but will only affect some partitions…

--sysfs N
              start N workers that recursively read files from [/sys](file:///sys) (Linux only).  This may cause
              specific kernel drivers to emit messages into the kernel log.
--getdent N
              start N workers that recursively read directories [/proc](file:///proc), [/dev/](file:///dev/), [/tmp](file:///tmp), [/sys](file:///sys) and [/run](file:///run)
              using getdents and getdents64 (Linux only).
--procfs N
              start  N  workers  that  read  files  from  [/proc](file:///proc)  and  recursively read files from
              [/proc/self](file:///proc/self) (Linux only).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants