Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report usage using a richer configuration traversing protocol #3229

Merged
merged 9 commits into from
Apr 25, 2024

Conversation

benclifford
Copy link
Collaborator

@benclifford benclifford commented Mar 11, 2024

This PR introduces code to traverse the configuration object (in a similar manner to the RepresentationMixin style of logging the supplied configuration object) with the intention of giving each object a chance to report its own usage information.

The protocol now reports configured objects either as a JSON string class name, or as a JSON object containing the class name and any additional information that class wishes to report for usage (via the UsageInformation abstract class)

This PR modifies the HighThroughputExecutor to use this API to report richer usage information: a specific usage tracking query is to ask about use of the enable_mpi_mode parameter, and so the HighThroughputExecutor will now report the boolean value of that parameter.

Beware that this reports on configuration, not use, of components: for example, configurations by default will include three staging providers, even though I believe it is extremely rare that either the FTP or HTTP staging providers are actually used to stage data. The UsageInformation API is intended to support reporting whether these staging providers actually stage anything, but this PR does not implement that in those staging provider components.

To support UsageInformation instances which report on usage during a run, the component tree is traversed both for the start message and the end message, and may result in different information in each message - for example, the DFK report occurs only end in the end message.

An example start message looks like this: (pretty-formatted)

{'correlator': 'f7595d08-7b94-49bc-b3d7-1ea7532b2f51',
 'parsl_v': '1.3.0-dev',
 'python_v': '3.12.2',
 'platform.system': 'Linux',
 'start': 1710150467,
 'components': ['parsl.config.Config',
                {'c': 'parsl.executors.high_throughput.executor.HighThroughputExecutor', 'mpi': False},
                'parsl.providers.local.local.LocalProvider',
                'parsl.channels.local.local.LocalChannel',
                'parsl.launchers.launchers.SingleNodeLauncher',
                'parsl.data_provider.ftp.FTPInTaskStaging',
                'parsl.data_provider.http.HTTPInTaskStaging',
                'parsl.data_provider.file_noop.NoOpFileStaging',
                'parsl.monitoring.monitoring.MonitoringHub']}

Changed Behaviour

Different information will be reported via usage tracking - anyone processing that usage data will need to adapt their code.

Type of change

  • New feature

This PR introduces code to traverse the configuration object (in a similar
manner to the RepresentationMixin style of logging the supplied configuration
object) with the intention of giving each object a chance to report its
own usage information.

This PR modifies the HighThroughputExecutor to use this API to report
richer usage information: a specific usage query is to ask about use of the
enable_mpi_mode parameter and this modification supports that.

Beware that this reports on configuration of components, and does not
report any further usage unless those components are so augmented using
the new API: for example, configurations by default will include three
staging providers, even though I believe it is extremely rare that either
the FTP or HTTP staging providers are actually used to stage data.
(It is hopefully a straightforward change to add a UsageInformation
implementation to report if those classes are actually used to stage
data in any run).

To support UsageInformation instances which report on usage during a run,
the component tree is traversed both for the start message and the end
message, and may result in different information in each message.

An example start message looks like this: (pretty-formatted)

{'correlator': 'f7595d08-7b94-49bc-b3d7-1ea7532b2f51',
 'parsl_v': '1.3.0-dev',
 'python_v': '3.12.2',
 'platform.system': 'Linux',
 'start': 1710150467,
 'components': ['parsl.config.Config',
                {'c': 'parsl.executors.high_throughput.executor.HighThroughputExecutor', 'mpi': False},
                'parsl.providers.local.local.LocalProvider',
                'parsl.channels.local.local.LocalChannel',
                'parsl.launchers.launchers.SingleNodeLauncher',
                'parsl.data_provider.ftp.FTPInTaskStaging',
                'parsl.data_provider.http.HTTPInTaskStaging',
                'parsl.data_provider.file_noop.NoOpFileStaging',
                'parsl.monitoring.monitoring.MonitoringHub']}
@benclifford benclifford changed the title Traverse configuration heirarchy to report more usage information Report usage using a richer configuration traversing protocol Mar 11, 2024
@benclifford benclifford marked this pull request as ready for review March 11, 2024 11:47
Copy link
Collaborator

@khk-globus khk-globus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; a couple of minor comments/suggestions but no blockers.

parsl/usage_tracking/api.py Outdated Show resolved Hide resolved
parsl/usage_tracking/api.py Outdated Show resolved Hide resolved
Comment on lines +46 to +49
for arg in argspec.args[1:]: # skip first arg, self
arg_value = getattr(obj, arg)
d = get_parsl_usage(arg_value)
me += d
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic alternative (but by no means a blocker):

    me.extend(get_parsl_usage(getattr(obj, arg)) for arg in argspec.args[1:])

parsl/usage_tracking/api.py Show resolved Hide resolved
parsl/usage_tracking/api.py Outdated Show resolved Hide resolved
parsl/usage_tracking/usage.py Outdated Show resolved Hide resolved
@benclifford benclifford marked this pull request as draft March 18, 2024 15:18
@benclifford
Copy link
Collaborator Author

I converted this to draft status, because others are working on usage tracking now (which @kylechard and @yadudoc are driving I think?) - so this can be merged if they want it, otherwise we can close it.

NishchayKarle added a commit to NishchayKarle/parsl that referenced this pull request Apr 24, 2024
Merge changes from Ben related to usage tracking update Parsl#3229
@benclifford benclifford marked this pull request as ready for review April 25, 2024 10:53
@benclifford benclifford merged commit 06f25fc into master Apr 25, 2024
6 checks passed
@benclifford benclifford deleted the benc-usage-protocol branch April 25, 2024 11:18
benclifford pushed a commit that referenced this pull request Jun 6, 2024
This PR introduces a choice of 3 levels for users to select based on their preferred level of usage reporting. It introduces updates on top of #3229.

Tracking Levels
Level 1: python version, parsl version, operating system details.
Level 2: configuration details + Level 1
Level 3: total apps run, total failed apps, execution time + Level 2
Usage tracking if currently enabled will be defaulted to level 1.

Usage Data sent at launch (Levels 1 and 2)
• Capture Parsl version, Python version, and environment details at startup.
• Configuration Reporting: Log details about providers, launchers, executors, channels, and storage access methods used.

Usage Data sent on closure (Level 3 only)
• Number of apps ran
• Number of failed apps
• Total time elapsed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants