forked from google/multichase
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
72 lines (46 loc) · 2.37 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
Multichase - a pointer chaser benchmark
Multiload - a superset of multichase which runs latency, memory bandwidth, and loaded-latency
1/ BUILD
- just type:
$ make
2/ INSTALL
- just run from current directory or copy multichase wherever you need to
3.1/ RUN Multichase
- To get help
$ multichase -h
- By default, multichase will perform a pointer chase through an array
size of 256MB and a stride size of 256 bytes for 2.5 seconds on a single
thread:
$ multichase
- Pointer chase through an array of 4MB with a stride size of 64 bytes:
$ multichase -m 4m -s 64
- Pointer chase through an array of 1GB for 10 seconds (-n is the number of 0.5 second samples):
$ multichase -m 1g -n 20
- Pointer chase through an array of 256KB with a stride size of 128 bytes on 2 threads.
Thread 0 accesses every 128th byte, thread 1 accesses every 128th byte offset by sizeof(void*)=8
on 64bit architectures:
$ multichase -m 256k -s 128 -t 2
3.2/ RUN Multiload
- Latency Only (simple pointer chase)
In this mode, Multiload can run any of the multichase commands above.
A "-c" chase arg (other than chaseload) can be used or it will default to "simple".
Using either "-c chaseload" and/or the "-l" load arguments will choose a different test mode.
$ multiload
- Bandwidth Only
Multiload can run a memory bandwidth test using the "-l" load argument. The "-c" chase argument MUST NOT be used.
Below command runs 5 samples (~2.5 seconds each), using 16 threads, using the glibc memcpy() function,
using a 512M buffer per thread.
$ multiload -n 5 -t 16 -m 512M -l memcpy-libc
- Loaded Latency.
Multiload can run 1 pointer chaser thread on logical cpu0 with multiple memory bandwidth load threads.
The "-c chaseload" arg MUST be used. The "-l" arg MUST be used with one of the memory load arguments.
Below command runs 5 samples (~2.5 seconds each), on 16 threads (1 chase, 15 stream-sum bandwidth loads),
using a 512M buffer per thread. The chase thread uses a stride=16.
$ multiload -s 16 -n 5 -t 16 -m 512M -c chaseload -l stream-sum
3.3/ RUN Pingpong & fairness
- Pingpong: measure latency of exchanging a line between cores.
To run, simply do:
$ pingpong -u
- Fairness: measure fairness with N threads competing to increment an atomic variable.
To run, simply do:
$ fairness