-
Notifications
You must be signed in to change notification settings - Fork 34
[Performance Analysis] DPM/ACA gRPC Performance Report #384
base: master
Are you sure you want to change the base?
Conversation
update table for test result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haboy52581 Some initial comments. Thanks for setting up tests and collecting the data points.
@@ -0,0 +1,215 @@ | |||
= ALCOR CONTROL AGENT-ALCOR DATAPLANE MANAGER Test Report |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested to change to "Alcor gRPC Performance Test Report"
|*cpu MHz* |2231.772 |2599.079 | ||
|*Memory* |192GB |386GB | ||
|*Network* |NetXtreme BCM5719 Gigabit Ethernet PCIe (GB network) |82599ES 10-Gigabit SFI/SFP+ Network Connection | ||
|*Storage* |LSI raid (no ssd) |AVAGO (no ssd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the DPM machine (.188) has 6X1600GB SSD. Could you confirm?
|*Model Name* |Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz |Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz | ||
|*cpu MHz* |2231.772 |2599.079 | ||
|*Memory* |192GB |386GB | ||
|*Network* |NetXtreme BCM5719 Gigabit Ethernet PCIe (GB network) |82599ES 10-Gigabit SFI/SFP+ Network Connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the network bandwidth. As the results shows DPM client is network bounded, so we would need to revisit this configuration.
[arabic, start=2] | ||
. *Test step:* | ||
|
||
F send goal state message to A-E at the same time concurrently after first warming up then wait for the response, goal state message is different in each payload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you upload the test scripts or codes that generate the payload to https://github.com/futurewei-cloud/alcor-int/tree/master/tools? This can be done in a sperate PR.
|
||
F send goal state message to A-E at the same time concurrently after first warming up then wait for the response, goal state message is different in each payload | ||
|
||
On A-E there are 2600 ACA running on each box, ACA code has been revised to cut off the ovsdb and mq operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2,600 or 2,000? I thought 2,000 is the stable setup. Need to update the image accordingly.
image::128-2.png["128 thread 2nd time",width=262,height=156] | ||
____ | ||
|
||
for 256 threads and below, the success rate is 100% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add one more data point of 256 threads? People will be interested in seeing the limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also can we put some resource utilization diagram including CPU, RAM, Disk IO and Network IO in this extreme case? This would help.
____ | ||
|
||
____ | ||
* 10k neighbor, every connection time cost for different concurrent thread number* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain the x-axis, what do those numbers represent? for example, first number is number of threads and the second number is number of successful run out of a total of 10K runs.
____ | ||
|
||
____ | ||
* 10k neighbor, every connection time cost for different concurrent thread number* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, as discussed, we need to verify the extreme large value (5,594,098) and rerun the test.
|
||
|
||
____ | ||
* when neighbor number changed, every connection time cost and overall time cost for different concurrent thread number* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This image is important. Let us work to collect more data based on two dimensions (concurrent thread # and neighbor numbers), fix one and adjust the other.
____ | ||
|
||
____ | ||
* when neighbor number changed, overall time cost for different concurrent thread number* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as to image other-ov-jc.png.
"Let us work to collect more data based on two dimensions (concurrent thread # and neighbor numbers), fix one and adjust the other."
we can take out the data point for "1t-1w" and explain in the texts.
@@ -65,6 +65,28 @@ image::p1.png["Test Deployment",width=488,height=302] | |||
|*90% TILE* |12 |11 |32 |28 |78 |84 |292 |262 | |||
|=== | |||
|
|||
different payload sizes vary from 1 neighbor to 10000 neighbor(2MB) each | |||
|
|||
*1WR+other OV-MAX+average* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate what this means?
@@ -65,6 +65,28 @@ image::p1.png["Test Deployment",width=488,height=302] | |||
|*90% TILE* |12 |11 |32 |28 |78 |84 |292 |262 | |||
|=== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The column and row of this table is opposite of the next one. Could we make them consistent?
No description provided.