Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add analytics logging to MosaicMLLogger #3106

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
f2e5537
add `log_analytics` function to `MosaicMLLogger`
angel-ruiz7 Mar 9, 2024
87f3d09
Merge branch 'dev' of github.com:mosaicml/composer into angel/add-dat…
angel-ruiz7 Mar 9, 2024
55e738f
add `optimizers`, `loggers`, `algorithms`, `device_mesh`, and `save_i…
angel-ruiz7 Mar 12, 2024
da1a179
fix pyright tests + formatting
angel-ruiz7 Mar 12, 2024
e0b559d
Merge branch 'dev' of github.com:mosaicml/composer into angel/add-dat…
angel-ruiz7 Mar 12, 2024
44cb283
log cloud providers from `load_path` / `save_folder`
angel-ruiz7 Mar 12, 2024
681e166
run formatter
angel-ruiz7 Mar 12, 2024
362f9ba
get rid of circular imports
angel-ruiz7 Mar 12, 2024
21cce59
access mosiacml_logger in a differnent way that doesn't affect tests
angel-ruiz7 Mar 12, 2024
be30004
smol improvements to style
angel-ruiz7 Mar 13, 2024
77d2c25
oops get rid of more circular imports 0_0
angel-ruiz7 Mar 13, 2024
82d771a
log `train_loader_workers` and `eval_loaders` to analytics
angel-ruiz7 Mar 13, 2024
b9aa219
fix type checks / access for `torch.utils.data.DataLoader`
angel-ruiz7 Mar 13, 2024
250cfff
Merge branch 'dev' into angel/add-data-to-metadata-for-analytics
angel-ruiz7 Mar 13, 2024
d841895
remove unnecessary comment
angel-ruiz7 Mar 14, 2024
67fd0d8
merge + resolve conflicts
angel-ruiz7 Mar 15, 2024
4a9da31
log analytics on `EVENT.INIT`
angel-ruiz7 Mar 15, 2024
8a5d9df
comment adjustment
angel-ruiz7 Mar 15, 2024
b8032c5
make sure `Logger.destinations` is iterable
angel-ruiz7 Mar 15, 2024
bca595b
Merge branch 'dev' into angel/add-data-to-metadata-for-analytics
angel-ruiz7 Mar 18, 2024
52db068
update default for `backward_prefetch` and move analytics logging to …
angel-ruiz7 Mar 19, 2024
f44ac72
Merge branch 'angel/add-data-to-metadata-for-analytics' of github.com…
angel-ruiz7 Mar 19, 2024
b499ab4
more formatting 🤡
angel-ruiz7 Mar 20, 2024
45b1f8b
get rid of unnecessary diff
angel-ruiz7 Mar 20, 2024
06cd615
make tests for `get_logger_type`
angel-ruiz7 Mar 20, 2024
53ec76c
Merge branch 'dev' of github.com:mosaicml/composer into angel/add-dat…
angel-ruiz7 Mar 20, 2024
a08bc51
add analytics metadata test, log `optimizer` and `algorithms` using `…
angel-ruiz7 Mar 21, 2024
192fb4b
run formatters
angel-ruiz7 Mar 21, 2024
0c6439e
adjust type hint for `get_logger_type` and delete test `Exception`
angel-ruiz7 Mar 21, 2024
6abd957
fix formatting on docstring
angel-ruiz7 Mar 21, 2024
f10e8a9
remove indent in comment
angel-ruiz7 Mar 21, 2024
cee3876
Merge branch 'dev' into angel/add-data-to-metadata-for-analytics
angel-ruiz7 Mar 21, 2024
11eb853
remove underscored fields and `param_groups` from `composer/optimizer…
angel-ruiz7 Mar 21, 2024
51a0dd0
Merge branch 'angel/add-data-to-metadata-for-analytics' of github.com…
angel-ruiz7 Mar 21, 2024
59dea8f
display name and data in one field for `optimizers` and `algorithms`
angel-ruiz7 Mar 21, 2024
5037ffb
fix docstring
angel-ruiz7 Mar 22, 2024
ea2eabc
Merge branch 'dev' into angel/add-data-to-metadata-for-analytics
angel-ruiz7 Mar 22, 2024
8c867bc
Make `MosaicAnalyticsData` class, change cloud path names, and log `f…
angel-ruiz7 Mar 26, 2024
5753020
just log algorithm names for analytics
angel-ruiz7 Mar 26, 2024
a51d5d9
Merge branch 'dev' into angel/add-data-to-metadata-for-analytics
angel-ruiz7 Mar 26, 2024
1083ae6
just pass `evaluator.label` for `eval_loaders`
angel-ruiz7 Mar 26, 2024
25e3ceb
Merge branch 'angel/add-data-to-metadata-for-analytics' of github.com…
angel-ruiz7 Mar 26, 2024
bf02105
fix `fsdp_config`, `eval_loaders`, and `loggers`. also `warn` when an…
angel-ruiz7 Mar 26, 2024
3be10ed
update docstring
angel-ruiz7 Mar 26, 2024
ba907b7
Merge branch 'dev' into angel/add-data-to-metadata-for-analytics
angel-ruiz7 Apr 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions composer/core/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ def run_last(algorithms: Sequence[Algorithm], event: Event) -> Sequence[Algorith
from composer.core.event import Event
from composer.core.state import State
from composer.loggers import Logger, LoggerDestination
from composer.loggers.mosaicml_logger import log_run_analytics
from composer.profiler import ProfilerAction
from composer.utils import ensure_tuple

Expand Down Expand Up @@ -293,6 +294,10 @@ def run_event(
self._run_loggers(event)
self._run_nonlogger_callbacks(event)
traces = self._run_algorithms(event)

# If a MosaicMLLogger is present, log analytics for the run to metadata.
log_run_analytics(self.logger.destinations)

angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved
else:
traces = self._run_algorithms(event)
# Run callbacks first, so any log calls from a callback that are executed lazily
Expand Down
105 changes: 103 additions & 2 deletions composer/loggers/mosaicml_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,20 @@
import warnings
from concurrent.futures import wait
from functools import reduce
from typing import TYPE_CHECKING, Any, Dict, List, Optional
from typing import TYPE_CHECKING, Any, Callable, Dict, Iterable, List, Optional, Tuple, Union

import mcli
import torch
import torch.utils.data

from composer.core.time import TimeUnit
from composer.core.event import Event
from composer.core.time import Time, TimeUnit
from composer.loggers import Logger
from composer.loggers.logger_destination import LoggerDestination
from composer.loggers.wandb_logger import WandBLogger
from composer.utils import dist
from composer.utils.analytics_helpers import get_logger_type
from composer.utils.file_helpers import parse_uri

if TYPE_CHECKING:
from composer.core import State
Expand Down Expand Up @@ -69,10 +73,12 @@ def __init__(
log_interval: int = 60,
ignore_keys: Optional[List[str]] = None,
ignore_exceptions: bool = False,
analytics_data: Optional[Dict[str, Any]] = None,
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved
) -> None:
self.log_interval = log_interval
self.ignore_keys = ignore_keys
self.ignore_exceptions = ignore_exceptions
self.analytics_data = analytics_data
self._enabled = dist.get_global_rank() == 0
if self._enabled:
self.time_last_logged = 0
Expand All @@ -96,6 +102,90 @@ def log_hyperparameters(self, hyperparameters: Dict[str, Any]):
def log_metrics(self, metrics: Dict[str, Any], step: Optional[int] = None) -> None:
self._log_metadata(metrics)

def log_analytics(self,) -> None:
if self.analytics_data is None:
return

# Fetch / cast metrics that we want to log from self.analytics_data
autoresume: bool = self.analytics_data['autoresume']
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved
trainer_state: State = self.analytics_data['state']
save_interval: Union[str, int, Time, Callable[[State, Event], bool]] = self.analytics_data['save_interval']
loggers: List[LoggerDestination] = self.analytics_data['loggers']
load_path: Union[str, None] = self.analytics_data['load_path']
save_folder: Union[str, None] = self.analytics_data['save_folder']

metrics: Dict[str, Any] = {'composer/autoresume': autoresume, 'composer/precision': trainer_state.precision}

train_dataloader = trainer_state.train_dataloader
if train_dataloader is not None and isinstance(train_dataloader, torch.utils.data.DataLoader):
metrics['composer/train_loader_workers'] = train_dataloader.num_workers

metrics['composer/eval_loaders'] = []
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved
for evaluator in trainer_state.evaluators:
dataloader = evaluator.dataloader.dataloader
if isinstance(dataloader, torch.utils.data.DataLoader):
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved
metrics['composer/eval_loaders'].append(
json.dumps({
'label': evaluator.label,
'num_workers': dataloader.num_workers,
}),
)

metrics['composer/optimizers'] = [
json.dumps(optimizer.state_dict(), sort_keys=True) for optimizer in trainer_state.optimizers
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved
]
metrics['composer/algorithms'] = [
json.dumps(algorithm.state_dict(), sort_keys=True) for algorithm in trainer_state.algorithms
]
metrics['composer/loggers'] = [
get_logger_type(logger) if not isinstance(logger, MosaicMLLogger) else 'MosaicMLLogger'
for logger in loggers
]

# Take the service provider out of the URI and log it to metadata. If no service provider
# is found (i.e. backend = ''), then we assume 'local' for the cloud provider.
if load_path is not None:
backend, _, _ = parse_uri(load_path)
metrics['composer/cloud_provider_data'] = backend if backend else 'local'
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved
if save_folder is not None:
backend, _, _ = parse_uri(save_folder)
metrics['composer/cloud_provider_checkpoints'] = backend if backend else 'local'
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved

# Save interval can be passed in w/ multiple types. If the type is a function, then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some idea for utility of analytics on save interval? Nothing is coming to mind for me

Copy link
Author

@angel-ruiz7 angel-ruiz7 Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was requested in a Slack thread a while back. this included some less-helpful metrics (i.e. num_workers) so if it isn't useful i'll take it out. @dakinggg

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think just drop this one

# we log 'callable' as the save_interval value for analytics.
if isinstance(save_interval, Union[str, int]):
save_interval_str = str(save_interval)
elif isinstance(save_interval, Time):
save_interval_str = f'{save_interval._value}{save_interval._unit}'
else:
save_interval_str = 'callable'
metrics['composer/save_interval'] = save_interval_str

if trainer_state.fsdp_config:
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved
metrics['composer/sharding_strategy'] = trainer_state.fsdp_config.get('sharding_strategy', None)
metrics['composer/activation_checkpointing'] = trainer_state.fsdp_config.get(
'activation_checkpointing',
False,
)
metrics['composer/forward_prefetch'] = trainer_state.fsdp_config.get('forward_prefetch', False)
metrics['composer/backward_prefetch'] = trainer_state.fsdp_config.get('backward_prefetch', None)
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved

# Get device_mesh from config so it is in list form and JSON parsable
metrics['composer/device_mesh'] = trainer_state.fsdp_config.get('device_mesh', [])

mixed_precision = trainer_state.fsdp_config.get('mixed_precision', None)
if mixed_precision is not None and isinstance(mixed_precision, dict):
# Sorting the keys allows us to parse this dict value as JSON in a SQL query if needed
metrics['composer/mixed_precision'] = json.dumps(mixed_precision, sort_keys=True)
else:
metrics['composer/mixed_precision'] = mixed_precision

if trainer_state.fsdp_state_dict_type is not None:
metrics['composer/state_dict_type'] = trainer_state.fsdp_state_dict_type

self.log_metrics(metrics)
self._flush_metadata(force_flush=True)

def log_exception(self, exception: Exception):
self._log_metadata({'exception': exception_to_json_serializable_dict(exception)})
self._flush_metadata(force_flush=True)
Expand Down Expand Up @@ -284,3 +374,14 @@ def exception_to_json_serializable_dict(exc: Exception):
except AttributeError:
pass
return exc_data


def log_run_analytics(loggers: Tuple[LoggerDestination, ...]):
"""Log run analytics to metadata if a MosaicMLLogger is available in the list."""
# Avoids a casting bug during testing
if not isinstance(loggers, Iterable):
return

mosaicml_logger = next((logger for logger in loggers if isinstance(logger, MosaicMLLogger)), None)
if mosaicml_logger is not None:
mosaicml_logger.log_analytics()
11 changes: 10 additions & 1 deletion composer/trainer/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -1284,7 +1284,16 @@ def __init__(
MOSAICML_ACCESS_TOKEN_ENV_VAR,
) is not None and not any(isinstance(x, MosaicMLLogger) for x in loggers):
log.info('Detected run on MosaicML platform. Adding MosaicMLLogger to loggers.')
mosaicml_logger = MosaicMLLogger()
mosaicml_logger = MosaicMLLogger(
analytics_data={
'autoresume': autoresume,
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved
'state': self.state,
'save_interval': save_interval,
'loggers': loggers,
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved
'load_path': load_path,
'save_folder': save_folder,
},
)
loggers.append(mosaicml_logger)

# Remote Uploader Downloader
Expand Down
42 changes: 42 additions & 0 deletions composer/utils/analytics_helpers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Copyright 2024 MosaicML Composer authors
# SPDX-License-Identifier: Apache-2.0

"""Helpers for logging analytics with the MosaicMLLogger."""
mvpatel2000 marked this conversation as resolved.
Show resolved Hide resolved

from typing import Any

from composer.loggers.cometml_logger import CometMLLogger
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved
from composer.loggers.console_logger import ConsoleLogger
from composer.loggers.file_logger import FileLogger
from composer.loggers.in_memory_logger import InMemoryLogger
from composer.loggers.logger_destination import LoggerDestination
from composer.loggers.mlflow_logger import MLFlowLogger
from composer.loggers.neptune_logger import NeptuneLogger
from composer.loggers.progress_bar_logger import ProgressBarLogger
from composer.loggers.remote_uploader_downloader import RemoteUploaderDownloader
from composer.loggers.slack_logger import SlackLogger
from composer.loggers.tensorboard_logger import TensorboardLogger
from composer.loggers.wandb_logger import WandBLogger

LOGGER_TYPES = [
FileLogger,
SlackLogger,
WandBLogger,
MLFlowLogger,
NeptuneLogger,
ConsoleLogger,
CometMLLogger,
InMemoryLogger,
TensorboardLogger,
ProgressBarLogger,
RemoteUploaderDownloader,
LoggerDestination,
]


def get_logger_type(logger: Any) -> str:
angel-ruiz7 marked this conversation as resolved.
Show resolved Hide resolved
"""Returns the type of a logger as a string. If the logger is not a known type, returns 'Custom'."""
for logger_type in LOGGER_TYPES:
if isinstance(logger, logger_type):
return logger_type.__name__
return 'Custom'
Loading