.. index:: SQL, monitoring, SQLite, SQLite library; sqlite3
Parsl can store information about workflow execution into an SQLite database. Then you can look at the information, in a few different ways.
.. index:: monitoring; configuration MonitoringHub
Here's the workflow used in taskpath, but with monitoring turned on:
import parsl
def fresh_config():
return parsl.Config(
executors=[parsl.HighThroughputExecutor()],
monitoring=parsl.MonitoringHub(hub_address = "localhost")
)
@parsl.python_app
def add(x: int, y: int) -> int:
return x+y
@parsl.python_app
def twice(x: int) -> int:
return 2*x
with parsl.load(fresh_config()):
print(twice(add(5,3)).result())
Compared to the earlier version, the changes are adding monitoring=
parameter to the Parsl configuration, and adding an additional app twice
to make the workflow a bit more interesting.
After running this, you should see a new file, runinfo/monitoring.db
:
$ ls runinfo/ 000 monitoring.db
This new file is an SQLite database shared between all workflow runs that use the same runinfo/
directory.
There are two main approaches to looking at the monitoring database: the prototype parsl-visualize
tool, and Python data analysis.
.. index:: parsl-visualize monitoring; parsl-visualize
Parsl comes with a prototype browser-based visualizer for the monitoring database.
Start it like this, and then point your browser at the given URL.
$ parsl-visualize * Serving Flask app 'parsl.monitoring.visualization.app' * Debug mode: off WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on http://127.0.0.1:8080 Press CTRL+C to quit
Here's a screenshot, showing the above two-task workflow spending most of its 5 second run with the add
task in launched
state (waiting for a worker to be ready to run it), and the twice
task in pending
state (waiting for the add
task to complete).
I'm not going to go further into parsl-visualize
but you can run your own workflows and click around to explore.
.. index:: pandas monitoring; pandas library; pandas data frame
A different approach preferred by many data-literate users is to treat monitoring data like any other Python data, using Pandas.
This example loads the entire task table (for all known workflows) into a data frame and then extracts the task completion times using Pandas notation:
import pandas as pd
import sqlite3
c = sqlite3.connect("runinfo/monitoring.db")
df = pd.read_sql_query("SELECT * FROM task", c)
c.close()
print(df['task_time_returned'])
$ python3 panda_mon.py 0 2024-09-22 17:44:52.947501 1 2024-09-22 17:44:53.005619 Name: task_time_returned, dtype: object
.. todo:: one example of plotting
.. index:: monitoring; schema
.. todo:: deeper dive into workflow/tasks/try table schema - not trying to be comprehensive of all schemas here but those three are a good set to deal with
The monitoring database SQL schema is defined using SQLAlchemy's object-relational model at parsl/monitoring/db_manager.py line 132 onwards.
Warning
The schema is defined a second time in parsl/monitoring/visualization/models.py line 12 onwards. See issue #2266 for more discussion.
These tables are defined:
.. todo:: the core task-related tables can get a hierarchical diagram workflow/task/try+state/resource
workflow
- each workflow run gets a row in this table. A workflow run is one call toparsl.load()
with monitoring enabled, and everything that happens inside that initialized Parsl instance.task
- each task (so each invocation of a decorated app) gets a row in this tabletry
- if/when Parsl tries to execute a task, the try will get a row in this table. As mentioned in elaborating, there might not be any tries, or there might be many tries.status
- this records the changes of task status, which include changes known on the submit side (inTaskRecord
) and changes which are not otherwise known to the submit side: when a task starts and ends running on a worker. You'll seerunning
andrunning_ended
states in this table which will never appear in theTaskRecord
. Onetask
row may have manystatus
rows.resource
- if Parsl resource monitoring is turned on (TODO: how?), a sub-mode of Parsl monitoring in general, then a resource monitor process will be placed alongside the task (see elaborating) which will report things like CPU time and memory usage periodically. Those reports will be stored in the resource table. So a try of a task may have many resource table rows.block
- when the scaling code starts or ends a block, or asks for status of a block, it stores any changes into this table. If enough monitoring is turned on, the block where a try runs will be stored in the relevanttry
table row.node
- this one is populated with information about connected worker pools with htex (and not at all with other executors), populated by the interchange when a pool registers or when it changes status (disconnects, is set to holding, etc)