-
Notifications
You must be signed in to change notification settings - Fork 34
Performance of saga python
Comparing shell adaptor to raw local shell performance (script here) results in the following numbers:
id | type | Real Time (s) | System Time (s) | User Time (s) | Utilization (%) |
1 |
/bin/sleep 1 &
|
0.2 | 0.0 | 0.1 | 58.7 |
2 |
RUN sleep 1
|
11.4 | 0.2 | 0.8 | 9.2 |
3 |
RUN sleep 1 @ cyder.cct.lsu.edu/
|
35.5 | 0.1 | 0.2 | 0.7 |
4 |
js = saga.job.Service ('fork://localhost/')
j = js.create_job ({executable:'/bin/data', arguments :['1']})
|
17.3 | 8.1 | 0.8 | 51.2 |
5 |
js = saga.job.Service ('fork://localhost/')
j = js.run_job ("/bin/sleep 1")
|
17.6 | 8.2 | 0.8 | 51.3 |
6 |
js = saga.job.Service ('ssh://localhost/')
j = js.create_job ({executable:'/bin/data', arguments:['1']})
|
17.6 | 8.2 | 0.8 | 50.7 |
7 |
js = saga.job.Service ('ssh://[email protected]/')
j = js.create_job ({executable:'/bin/data', arguments:['1']})
|
228.1 | 65.9 | 9.0 | 32.8 |
8 |
js = saga.job.Service ('gsissh://trestles-login.sdsc.edu/')
j = js.create_job ({executable:'/bin/data', arguments:['1']})
|
247.3 | 68.6 | 9.6 | 31.6 |
- Plain shell is (as expected) very quick, and needs virtually no system resources. Time is spent mostly in the shell internals.
- The shell wrapper we use adds an order of magnitude, mostly on user time. That wrapper makes sure that job information (state, pid) are written to disk, spawns a monitoring daemon, and reports job ID etc. But most of the time is actually to wait for those state information to be consistent and to be writtent (waitpid, sync) -- thus the low system utilization.
- Running jobs on the shell wrapper over ssh is relatively quick, too -- as the input/output are streamed w/o blocking, there is almost no times wasted in waits.
- The python layer on top of the local wrapper adds a factor of ~1.5, which is quite acceptable, as that includes the complete SAGA and Python stack, and I/O capturing/parsing etc.
- The shortcut methods
run_job()
does not add significantly, despite the fact that it creates a new job description per job. - Exchanging
fork://localhost/
withssh://localhost/
again adds almost nothing -- the overhead is solely due to the increased startup time. - As expected, real remote operations add significant overhead -- this is owed to the fact that operations are synchronous, i.e. the adaptor waits for confirmation that the operation succeeded. This adds 1 roundtrip per operation (200ms * 1000 = 200 seconds). Locally that does not contribute, thus the adaptor and wrapper operations (4, 5, 6) are close to the nonblocking I/O versions (1, 2, 3).
- No significant difference between ssh and gsissh, as expected...
At this point, we reached saturation for a remote backend -- any adaptor
building on top of the PTYShell
and PTYProcess
infrastructure will see
the above limits, basically. There are three options for further scaling though:
(A) concurrent job service instances, (B) asynchronous operations, and (C) bulk operations.
We can run test (7) again, but use an increasing number of job services (each in its own application thread), and observe the following behavior:
id | threads | Real Time (s) | jobs/second |
3 | 1 | 35.5 | 28.2 |
7.01 | 1 | 225.2 | 4.4 |
7.02 | 2 | 122.7 | 8.1 |
7.03 | 3 | 80.9 | 12.4 |
7.04 | 4 | 62.9 | 15.9 |
7.05 | 5 | 52.2 | 19.1 |
7.06 | 6 | 50.5 | 19.8 |
7.07 | 7 | 40.1 | 24.9 |
7.08 | 8 | 36.2 | 27.6 |
7.09 | 9 | 38.5 | 25.9 |
7.10 | 10 | 32.9 | 30.4 |
So, this scales up to what we have seen from a plain piped ssh submit (3)!
Alas, I did not know that most ssh deamons limit the number of concurrent ssh connections to 10 -- so, this is the end of scaling -- for one host. But of course we can use multiple hosts concurrently:
id | threads | hosts | Real Time (s) | jobs/second |
7.10 | 10 | cyder | ||
7.20 | 20 | cyder, repex1 | ||
7.30 | 30 | cyder, repex1, trestles | ||
7.40 | 40 | cyder, repex1, trestles, india | ||
7.50 | 50 | cyder, repex1, trestles, india, sierra |