-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.PerfMeasurements
378 lines (253 loc) · 13.2 KB
/
README.PerfMeasurements
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
How to Run the TorFlow Performance Tools
I. Introduction
There are two main client-side performance measurement tools in TorFlow:
speedracer and buildtimes. Speedracer is meant for gathering average
stream capacity of nodes and buildtimes is meant for gathering
statistics on circuit construction speeds and success rates.
II. SpeedRacer
Speedracer functions by dividing the Tor network into groups of nodes of
similar advertised capacity and then fetching the same URL over and over
again via 2-hop circuits consisting of nodes in that group.
A. Configuring SpeedRacer
At the time of this writing, it has the following configuration
parameters at the top of its sourcefile in NetworkScanners/speedracer.py:
url = "http://svn.torproject.org/svn/tor/trunk/doc/design-paper/tor-design.pdf"
start_pct = 0
stop_pct = 78
pct_step = 3
count = 25
save_every = 5
The URL you specify should be big enough to amortize out TCP slow-start.
Shoot for somewhere between 200k-1M. The tor-design.pdf may actually be
a little on the small side to properly measure capacities of faster
nodes.
start_pct and stop_pct are the start and stop points for the run, in
terms of the rankings of nodes by their bandwidth. Lower percentiles
are faster.
pct_step is the size of the slices in percentile units.
count is the number of URL fetches to do for each slice.
save_every is used for saving incremental results to check for
convergence. Results will be saved after each multiple of 'save_every'
fetches. The incremental results are cumulative.
B. Running SpeedRacer
Like soat, speedracer should be given its own Tor that is not performing
any other stream activity. It will also require Tor 0.2.1.13 (r18556) or
later.
First, start up tor:
# ~/src/tor-trunk/src/or/tor -f ~/src/torflow-trunk/torrc >& tor.log &
Then, start up the Metatroller:
# ~/src/torflow-trunk/metatroller.py >& mt.log &
Finally, start up speedracer:
# cd ~/src/torflow-trunk/NetworkScanners
# ./speedracer.py >& speed.log &
C. Reading the Tea Leaves
SpeedRacer outputs a lot of statistics in aggregate form in
./NetworkScanners/data/speedraces/stats-<pct_start>-<pct_end>-<n>-<time>
and
./NetworkScanners/data/speedraces/ratios-<pct_start>-<pct_end>-<n>-<time>
pct_start and pct_end denote the range of the slice. N denotes the
number of fetches so far, and time is the timestamp of that run. The
results are cumulative, so the n=10 file will contain the results from
n=5 in addition to 5 more fetches.
The statistics stored with each node are indicated in the key at the top
of each stat file.
For the purposes of speedracer, the interesting statistics are actually
in the ratio files. The stats files are more auxiliary in nature, describing
failure and attempt counts.
1. Ratio files
The ratio files are the initial set of options created for consideration
for reweighting nodes' advertised bandwidths. They contain a set of ratios
that can be multiplied by an advertised bandwidth to produce a new value
to be voted on by participating authorities for use in NS documents and
client node selection. This means that faster and more reliable nodes have
higher ratio values.
They are described succinctly in the key for the file:
Metatroller Ratio Statistics:
SR=Stream avg ratio AR=Advertised bw ratio BRR=Adv. bw avg ratio
CSR=Circ suspect ratio CFR=Circ Fail Ratio SSR=Stream suspect ratio
SFR=Stream fail ratio CC=Circuit Count SC=Stream Count
P=Percentile Rank U=Uptime (h)
In detail:
a. SR=Stream avg ratio
This is the ratio of the node's observed average stream capacity to the
average observed stream capacity for the entire slice. It is candidate
#1 for reweighting, and may be the only one we eventually use. The ratio
file itself is sorted by this number.
b. AR=Advertised bw ratio
This value is provided only for reference. It is the ratio of the
advertised bandwidth of the router to the average advertised bandwidth
of the slice.
c. BRR=Adv. bw avg ratio
This ratio is actually a ratio of ratios. First, the ratio of the node's
observed stream capacity to its advertised bandwidth is taken. Then this
function is averaged across all nodes, and each node is given a value
that is the ratio of its observed bandwidth to stream capacity to the
average for the slice.
This was originally my first choice for ratio usage. I initially thought
it would be ideal to use for penalizing nodes lying about their
bandwidth. But upon reflection it seems to double-penalize these nodes:
Nodes that lie will naturally attract more traffic than they can handle,
which decreases their observed stream capacity proportionally. Taking
the ratio of of this to their already inflated advertised bandwidth
amount would double-count the discrepancy.
d. CSR=Circ suspect ratio
This value is the ratio of the node's circuit suspected failure rate
to the average circuit success rate for the slice. A "suspected failure"
is attributed to every member node currently present in a circuit at the
time of failure, plus the next hop if an extend was in progress. Nodes
beyond this position in the path are not blamed for the failure.
This is my second choice for a reweighting ratio. However, we currently
don't (and probably can't) differentiate between failures because of
hibernation or shutdown versus actual connectivity issues.
e. CFR=Circ Fail Ratio
This value is similar to the Circ suspect ratio except that in TorFlow
parlance, a "failure" is only counted against the extender and extendee
nodes. Earlier nodes are not counted.
This would be a nice ratio to use, except for the fact that it is
probably more useful if we could get separate stats on extender vs
extendee, but that is currently not supported. Also, given the relative
frequency of timeout failures, and the fact that timeout failures can be
caused by or contributed to by earlier hops in the circuit, we would
probably want to treat those specially for this stat as well.
f. SSR=Stream Suspect Ratio
The stream suspect ratio is counted similarly to the circuit suspect
ratio, except that stream suspects are only attributed to pre-exit nodes
if the failure reason is one of "TIMEOUT", "INTERNAL", "TORPROTOCOL",
or "DESTROY".
g. SFR=Stream Fail ratio
The stream fail ratio records only the success vs failure of exit nodes,
as such non-exit nodes will never have a value for this stat.
Both the stream stats are better dealt with by the SoaT exit scanner, as
the values for these tend to be pretty binary: either the exit is able
to make external connections or it isn't. In the few cases where they are
not binary, usually the same reliability information is represented in the
Circuit Suspect Ratio, and more accurately at that.
Rationale and suggestions for usage:
The ratios are computed relative to the average values of that stat for
the slice as opposed to the network as a whole primarily because this
will enable us to concurrently use Steven Murdoch's queuing theory
load-optimum selection weighting in tandem with these reweighting
ratios. His ratios are based on queuing theory effects of node
selection on faster vs slower nodes for various network loads. If we
both are correcting the load of individual nodes with respect to the
network as a whole, we will end up over-compensating.
My current thinking is that we should combine the circuit fail ratio
and the stream weighting ratio linearly, with 50% of the ratio changes
coming from each. There's no formal justification for this of course.
Typically the circuit failure ratios are usually pretty close to 1 for
most nodes except those with serious issues, so this primarily has the
effect of dampening the change by the observed stream ratios, except
in cases of really unreliable nodes.
So the formula would be something like:
NewBandwidth = Advertised*(0.5*SR+0.5*CSR)
2. Stats files
For ease of review, the nodes are sorted and printed in lists according
to a few different metrics. For speedracer, the most useful list is the
first one, but the others are useful for buildtimes, where these same
stat files are also available. The data being displayed is the same, it
is just reordered in each list. These lists are:
a. Bandwidth Ratios
This list is sorted by the ratio of advertised bandwidth to average
stream capacity (the BR stat). Nodes at the top of this list advertise a
disproportionately large amount of bandwidth in comparison to what they
actually were seen to carry over streams used to fetch the URL (the EB
stat).
b. Failed Counts
This list is less interesting for speedracer. In it, the nodes are
sorted by the sum of stream and circuit failures (SF and CF,
respectively). Stream failures are primarily attributed to exit nodes,
where as circuit failures are attributed to the extender and the
extendee at the time of failure.
c. Suspected Counts
This list is sorted by 'suspected' failure counts (SS and CS). Suspected
failure counts are attributed to each node that was a member of the
path at the time of failure.
Some failures (such as timeouts) are only attributed as 'suspected' to
all nodes in the path, and as such do not show up in the 'failed'
counts for nodes.
d. Fail Rates
This list is sorted by the rate of failures per hour of node uptime.
e. Suspect Rates
This list is sorted by the rate of suspected failures per hour of
node uptime.
f. Failed Reasons
This list groups nodes by their failure reason, and sorts the reasons by
most prevalent, and sorts the nodes within these lists.
g. Suspect Reasons
This is the same as the failed reasons, except it is sorted by
'suspected' counts.
III. Buildtimes
Buildtimes lives in
torflow-trunk/CircuitAnalysis/BuildTimes/buildtimes.py. It functions by
creating circuits over and over again through percentile slices of the
network, similar to speedracer.
A. Running Buildtimes
Buildtimes can actually be run concurrently with one of either
speedracer or soat using the same Tor process. It can also be run on a
Tor process that is being used for normal client activity.
Running it is a lot simpler too. It does not require the metatroller
(but again, it is fine to run the metatroller concurrently). The
full_run.sh script will run 3 different buildtimes invocations and
output the results to the 'slices' subdirectory.
Currently, these runs are:
./buildtimes.py -n 10000 -s 3 -e 93 -c 15 -d ./slices
./buildtimes.py -n 10000 -s 3 -g -e 50 -c 30 -d ./slices
./buildtimes.py -n 100000 -s 93 -c 100 -d ./slices
This will first run 10k circuits on each 3% slice from 0-93%, with at
most 15 concurrent circuits at a time. The results from this run are
split into their percentile ranges.
The second run will only apply the percentile restrictions to the first
hop, and ensure that this hop has the guard flag. The rest of the
network will be selected for the 2nd and 3rd hop using Tor's
bandwidth-weighted selection distribution. The results from this run
will have a g appended to their percentile ranges.
The final run will create 100k circuits over the entire Tor network,
using Guard flagged nodes for the first hop, and a bandwidth-weighted
selection mechanism for all three hops. The results from this run will
be the only ones with 100000 in their filenames.
In all three runs, the third node is chosen if it allows one of either
80 or 443. This is done to approximate the effect of Tor's circuit
prediction mechanism on the typical Tor user. Since Web traffic makes
up the bulk of Tor traffic by connection, it is likely that the typical
user's Tor client will prefer to pre-build circuits serving 80 or 443.
B. Reading the Tea Leaves
Buildtimes outputs a lot of data. Each of the three runs output a debug
log via the output redirection in full_run.sh.
Additionally, each percentile slice from each run has its own set of 9
data files:
1. .agg
This is the aggregate stats file that has the same format as described
above for speedracer. This time, circuit failure counts and reasons are
the most interesting items here.
2. .nodes
This file provides a well-formed record of which nodes were used in which
positions for each circuit ID.
3. .buildtimes
These are the total circuit creation times, indexed by circuit ID.
4. .extendtimes
These are the individual node extend times, indexed by circuit ID.
5. .failed
This file provides a list of failed circuit IDs and their nodes, but currently
with no reason codes.
6. .ranks
This file records the history of advertised bandwidth and the ranks of
nodes over the course of the run.
7. .uptime
This file outputs the uptime of each node over the course of the run.
8. .check
This file contains verification information for the selection mechanism. It
provides min/avg/max percentile ranks, selection counts, uptime, and counts
for flag presence to verify restrictions.
9. .log
This is the full control port log file.
C. Graphing Results
The shufflebt.py script provides histogram graphing for the results and
doing basic checks on convergence of this histogram for limited sample
sizes. It takes a .buildtimes file as input, and an optional number of
circuits to truncate/shuffle at:
usage: shufflebt.py [-n <number of circuits>] [-s] [-g] [-k <k value>] [-d
outdirname] [-r <res in ms>] <list of filenames>
So for example, to randomly select (shuffle) 1000 circuits and graph the
result:
# ./shufflebt.py -d ./slices -n 1000 -s -g ./slices/0-93.100000.buildtimes
eog ./slices/0-93.100000.buildtimes.shuffled.res100.png