Performance of saga python

Job Execution via shell adaptor:

Comparing shell adaptor to raw local shell performance (script here) results in the following numbers:

id	type	Real Time (s)	System Time (s)	User Time (s)	Utilization (%)
1	/bin/sleep 1 &	0.2	0.0	0.1	58.7
2	RUN sleep 1	11.4	0.2	0.8	9.2
3	RUN sleep 1 @ cyder.cct.lsu.edu/	35.5	0.1	0.2	0.7
4	js = saga.job.Service ('fork://localhost/') j = js.create_job ({executable:'/bin/data', arguments :['1']})	17.3	8.1	0.8	51.2
5	js = saga.job.Service ('fork://localhost/') j = js.run_job ("/bin/sleep 1")	17.6	8.2	0.8	51.3
6	js = saga.job.Service ('ssh://localhost/') j = js.create_job ({executable:'/bin/data', arguments:['1']})	17.6	8.2	0.8	50.7
7	js = saga.job.Service ('ssh://[email protected]/') j = js.create_job ({executable:'/bin/data', arguments:['1']})	228.1	65.9	9.0	32.8
8	js = saga.job.Service ('gsissh://trestles-login.sdsc.edu/') j = js.create_job ({executable:'/bin/data', arguments:['1']})	247.3	68.6	9.6	31.6

Plain shell is (as expected) very quick, and needs virtually no system resources. Time is spent mostly in the shell internals.
The shell wrapper we use adds an order of magnitude, mostly on user time. That wrapper makes sure that job information (state, pid) are written to disk, spawns a monitoring daemon, and reports job ID etc. But most of the time is actually to wait for those state information to be consistent and to be writtent (waitpid, sync) -- thus the low system utilization.
Running jobs on the shell wrapper over ssh is relatively quick, too -- as the input/output are streamed w/o blocking, there is almost no times wasted in waits.
The python layer on top of the local wrapper adds a factor of ~1.5, which is quite acceptable, as that includes the complete SAGA and Python stack, and I/O capturing/parsing etc.
The shortcut methods run_job() does not add significantly, despite the fact that it creates a new job description per job.
Exchanging fork://localhost/ with ssh://localhost/ again adds almost nothing -- the overhead is solely due to the increased startup time.
As expected, real remote operations add significant overhead -- this is owed to the fact that operations are synchronous, i.e. the adaptor waits for confirmation that the operation succeeded. This adds 1 roundtrip per operation (200ms * 1000 = 200 seconds). Locally that does not contribute, thus the adaptor and wrapper operations (4, 5, 6) are close to the nonblocking I/O versions (1, 2, 3).
No significant difference between ssh and gsissh, as expected...

At this point, we reached saturation for a remote backend -- any adaptor building on top of the PTYShell and PTYProcess infrastructure will see the above limits, basically. There are three options for further scaling though: (A) concurrent job service instances, (B) asynchronous operations, and (C) bulk operations.

(A) Concurrent Job Services:

We can run test (7) again, but use an increasing number of job services (each in its own application thread), and observe the following behavior:

id	threads	Real Time (s)	jobs/second
3	1	35.5	28.2
7.01	1	225.2	4.4
7.02	2	122.7	8.1
7.03	3	80.9	12.4
7.04	4	62.9	15.9
7.05	5	52.2	19.1
7.06	6	50.5	19.8
7.07	7	40.1	24.9
7.08	8	36.2	27.6
7.09	9	38.5	25.9
7.10	10	32.9	30.4

So, this scales up to what we have seen from a plain piped ssh submit (3)!

Alas, I did not know that most ssh deamons limit the number of concurrent ssh connections to 10 -- so, this is the end of scaling -- for one host. But of course we can use multiple hosts concurrently:

id	threads	hosts	Real Time (s)	jobs/second
7.10	10	cyder
7.20	20	cyder, repex1
7.30	30	cyder, repex1, trestles
7.40	40	cyder, repex1, trestles, india
7.50	50	cyder, repex1, trestles, india, sierra

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of saga python

Job Execution via shell adaptor:

(A) Concurrent Job Services:

Clone this wiki locally