Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add analytics logging to MosaicMLLogger #3106

Open
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

angel-ruiz7
Copy link

@angel-ruiz7 angel-ruiz7 commented Mar 12, 2024

Add log_analytics function to MosaicMLLogger

This adds a log_analytics function to the MosaicMLLogger, which records these metrics about composer runs on the Mosaic platform:

  • fsdp_config
    • sharding_strategy
    • state_dict_type
    • activation_checkpointing
    • forward_prefetch
    • backward_prefetch
    • mixed_precision
    • device_mesh
  • autoresume
  • algorithms
  • cloud_providers
    • cloud_provider_data
    • cloud_provider_checkpoints
  • save_interval
  • loggers
  • precision
  • optimizer

By logging these fields from the MosaicMLLogger into a run's metadata, we can then parse the metadata in SQL and add these metrics into UC. This will enable anyone on the genai team to create dashboards / queries for analytical purposes and view the patterns in these metrics across runs on the platform. (see the genai_runs table for more info)

Output of the Quick Start Example

Screenshot 2024-03-21 at 1 40 20 PM

@angel-ruiz7 angel-ruiz7 marked this pull request as ready for review March 14, 2024 21:49
@angel-ruiz7 angel-ruiz7 requested a review from a team as a code owner March 14, 2024 21:49
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a brief description of what we are trying to enable?

composer/trainer/trainer.py Outdated Show resolved Hide resolved
composer/core/engine.py Outdated Show resolved Hide resolved
composer/utils/analytics_helpers.py Outdated Show resolved Hide resolved
Copy link
Contributor

@snarayan21 snarayan21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments!

composer/core/engine.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
@angel-ruiz7
Copy link
Author

angel-ruiz7 commented Mar 21, 2024

Added some tests and adjusted the way we display optimizers. If algorithms is also super verbose I can do something similar for that as well 👍

Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly lgtm! a few minor follow ups.

composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
composer/trainer/trainer.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
composer/utils/analytics_helpers.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
backend, _, _ = parse_uri(self.analytics_data.save_folder)
metrics['composer/cloud_provided_save_folder'] = backend if backend else 'local'

# Save interval can be passed in w/ multiple types. If the type is a function, then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some idea for utility of analytics on save interval? Nothing is coming to mind for me

Copy link
Author

@angel-ruiz7 angel-ruiz7 Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was requested in a Slack thread a while back. this included some less-helpful metrics (i.e. num_workers) so if it isn't useful i'll take it out. @dakinggg

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think just drop this one

composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
composer/loggers/mosaicml_logger.py Outdated Show resolved Hide resolved
tests/loggers/test_mosaicml_logger.py Outdated Show resolved Hide resolved
@@ -96,10 +115,57 @@ def log_hyperparameters(self, hyperparameters: Dict[str, Any]):
def log_metrics(self, metrics: Dict[str, Any], step: Optional[int] = None) -> None:
self._log_metadata(metrics)

def log_analytics(self, state: State, loggers: Tuple[LoggerDestination, ...]) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a callback instead? It seems like all loggers and experiment tracking tools would benefit from having these extra information for navigation or reproducibility purposes.

def init(self, state: State, logger: Logger) -> None:
try:
self.log_analytics(state, logger.destinations)
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we log the exception here? We should also probably only log once. if its failing consistently, we don't want to spam the run with warnings

Copy link
Contributor

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Sorry, lost track of this PR

Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just log this all callbacks? Make it a separate callback?

@dakinggg
Copy link
Contributor

dakinggg commented Jun 6, 2024

@mvpatel2000 are you suggesting a new logger function log_analytics that broadcasts to all LoggerDestinations? not sure how useful that is. In foundry for example, we already log the full config to the ones where it makes sense (wandb, mlflow).

@snarayan21
Copy link
Contributor

@mvpatel2000 @angel-ruiz7 @dakinggg is this one basically done? or are there still concerns?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants