-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report usage using a richer configuration traversing protocol #3229
Conversation
This PR introduces code to traverse the configuration object (in a similar manner to the RepresentationMixin style of logging the supplied configuration object) with the intention of giving each object a chance to report its own usage information. This PR modifies the HighThroughputExecutor to use this API to report richer usage information: a specific usage query is to ask about use of the enable_mpi_mode parameter and this modification supports that. Beware that this reports on configuration of components, and does not report any further usage unless those components are so augmented using the new API: for example, configurations by default will include three staging providers, even though I believe it is extremely rare that either the FTP or HTTP staging providers are actually used to stage data. (It is hopefully a straightforward change to add a UsageInformation implementation to report if those classes are actually used to stage data in any run). To support UsageInformation instances which report on usage during a run, the component tree is traversed both for the start message and the end message, and may result in different information in each message. An example start message looks like this: (pretty-formatted) {'correlator': 'f7595d08-7b94-49bc-b3d7-1ea7532b2f51', 'parsl_v': '1.3.0-dev', 'python_v': '3.12.2', 'platform.system': 'Linux', 'start': 1710150467, 'components': ['parsl.config.Config', {'c': 'parsl.executors.high_throughput.executor.HighThroughputExecutor', 'mpi': False}, 'parsl.providers.local.local.LocalProvider', 'parsl.channels.local.local.LocalChannel', 'parsl.launchers.launchers.SingleNodeLauncher', 'parsl.data_provider.ftp.FTPInTaskStaging', 'parsl.data_provider.http.HTTPInTaskStaging', 'parsl.data_provider.file_noop.NoOpFileStaging', 'parsl.monitoring.monitoring.MonitoringHub']}
…info of Config object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; a couple of minor comments/suggestions but no blockers.
for arg in argspec.args[1:]: # skip first arg, self | ||
arg_value = getattr(obj, arg) | ||
d = get_parsl_usage(arg_value) | ||
me += d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistic alternative (but by no means a blocker):
me.extend(get_parsl_usage(getattr(obj, arg)) for arg in argspec.args[1:])
I converted this to draft status, because others are working on usage tracking now (which @kylechard and @yadudoc are driving I think?) - so this can be merged if they want it, otherwise we can close it. |
Merge changes from Ben related to usage tracking update Parsl#3229
This PR introduces a choice of 3 levels for users to select based on their preferred level of usage reporting. It introduces updates on top of #3229. Tracking Levels Level 1: python version, parsl version, operating system details. Level 2: configuration details + Level 1 Level 3: total apps run, total failed apps, execution time + Level 2 Usage tracking if currently enabled will be defaulted to level 1. Usage Data sent at launch (Levels 1 and 2) • Capture Parsl version, Python version, and environment details at startup. • Configuration Reporting: Log details about providers, launchers, executors, channels, and storage access methods used. Usage Data sent on closure (Level 3 only) • Number of apps ran • Number of failed apps • Total time elapsed
This PR introduces code to traverse the configuration object (in a similar manner to the RepresentationMixin style of logging the supplied configuration object) with the intention of giving each object a chance to report its own usage information.
The protocol now reports configured objects either as a JSON string class name, or as a JSON object containing the class name and any additional information that class wishes to report for usage (via the
UsageInformation
abstract class)This PR modifies the HighThroughputExecutor to use this API to report richer usage information: a specific usage tracking query is to ask about use of the enable_mpi_mode parameter, and so the HighThroughputExecutor will now report the boolean value of that parameter.
Beware that this reports on configuration, not use, of components: for example, configurations by default will include three staging providers, even though I believe it is extremely rare that either the FTP or HTTP staging providers are actually used to stage data. The
UsageInformation
API is intended to support reporting whether these staging providers actually stage anything, but this PR does not implement that in those staging provider components.To support UsageInformation instances which report on usage during a run, the component tree is traversed both for the start message and the end message, and may result in different information in each message - for example, the DFK report occurs only end in the end message.
An example start message looks like this: (pretty-formatted)
Changed Behaviour
Different information will be reported via usage tracking - anyone processing that usage data will need to adapt their code.
Type of change