diff --git a/RFC-0026-logging-system.md b/RFC-0026-logging-system.md new file mode 100644 index 00000000..17c84cb9 --- /dev/null +++ b/RFC-0026-logging-system.md @@ -0,0 +1,499 @@ +# New PyTorch Logging System + +## **Summary** +Create a message logging system for PyTorch with the following requirements: + +### Consistency + +* The C++ and Python APIs should match each other as closely as possible. + +* All errors, warnings, and other messages generated by PyTorch should be + emitted using the the logging system API. + + +### Severity level and message classes + +* Offer different message severity levels, including at least the following: + + - **Info**: Emits a message without creating a warning or error. By default, + this gets printed to stdout. + + - **Warning**: Emits a message as a warning. If a warning is never caught, + it gets printed to stderr by default. + + - **Error**: Emits a message as an error. If an error is never caught, the + application will print the error to stderr and quit. + +* Offer different message classes under each severity level. + + - Every message is emitted as an instance of a message class. + + - Each message class has both a C++ class and a Python class, and when a + C++ message is propagated to Python, it is converted to its corresponding + Python class. + + - Whenever it makes sense, the Python class should be one of the builtin + Python error/warning classes. For instance, currently in PyTorch, the C++ + error class `c10::Error` gets converted to the Python `RuntimeError` class. + +* Adding new message classes and severity levels should be easy + +### Configurability and filtering + +* Ability to turn warnings into errors. This is already possible with the + Python `warnings` module filter, but the PyTorch docs should mention it and + we should probably have unit tests for it. + See [documentation](https://docs.python.org/3/library/warnings.html#the-warnings-filter) + +* Settings to disable specific **Warning** or **Info** classes + + - Disabling warnings in Python is already possible with the `warnings` + module filter. See [documentation](https://docs.python.org/3/library/warnings.html#the-warnings-filter). + There is no similar system in C++ at the moment, and building one is probably + low priority. + + - Filtering out **Info** messages would be nice to have because excessive + printouts can degrade the user experience. Related to issue + [#68768](https://github.com/pytorch/pytorch/issues/68768) + +* Settings to enable/disable emitting duplicate messages generated by multiple + `torch.distributed` ranks. Related to issue + [#68768](https://github.com/pytorch/pytorch/issues/68768) + +* Ability to make a particular **Warning** or **Info** message only emit once. + Warn-once should be the default for most warnings. + + - Currently `TORCH_WARN_ONCE` does this in C++, but there is no Python + equivalent + + - Offer a filter to override warn- and log-once, so that they always emit. + The filter could work similarly to the Python `warnings` filter. This is + a low priority feature. + + - TODO: `torch.set_warn_always()` currently controls some warnings (maybe + only the ones from C++? I need to find out for sure.) + +* Settings can be changed from Python, C++, or environment variables + + - Filtering warnings with Python command line arguments should + remain possible. For instance, the following turns a `DeprecationWarning` + into an error: `python -W error::DeprecationWarning your_script.py` + +### Compatibility + +* Should integrate with Meta's internal logging system, which is + [glog](https://github.com/google/glog) + + - TODO: What are all the requirements that define "integrating with glog" + +* Must be OSS-friendly, so it shouldn't require libraries (like glog) which may + cause incompatibility issues for projects that use PyTorch + +### Other requirements + +* Continue using warning/error APIs and message classes that currently exist in + PyTorch wherever possible. For instance, `TORCH_CHECK`, `TORCH_WARN`, and + `TORCH_WARN_ONCE` should continue to be used in C++ + +* TODO: Determine the requirements for the following concepts: + + - Log files? (default behavior and any settings) + + +## **Motivation** + +Original issue: [link](https://github.com/pytorch/pytorch/issues/72948) + +Currently, it is challenging for PyTorch developers to provide messages that +act consistently between Python and C++. + +It is also challenging for PyTorch users to manage the messages that PyTorch +emits. For instance, if a PyTorch user happens to be calling PyTorch functions +that emit lots of messages, it can be difficult for them to filter out those +messages so that their project's users don't get bombarded with warnings and +printouts that they don't need to see. + + +## **Proposed Implementation** + +### Message classes + +At least the following message classes should be available. The name of the +C++ class appears first in all the listed entries below, with the Python class +to the right of it. + +Each severity level has a default class. All other classes within a given +severity level inherit from the corresponding default class. + +NOTE: Most of the error classes below already exist in PyTorch. However, +info classes do not currently exist. Also, only one type of warning currently +exists in C++, and it is not implemented as a C++ class that can be inherited +(as far as I understand). + +#### Error message classes: + +* **`c10::Error`** - Python `RuntimeError` + - Default error class. Other error classes inherit from it. + +* **`c10::IndexError`** - Python `IndexError` + - Emitted when attempting to access an element that is not present in + a list-like object. + +* **`c10::ValueError`** - Python `ValueError` + - Emitted when a function receives an argument with correct type but + incorrect value. + +* **`c10::TypeError`** - Python `TypeError` + - Emitted when a function receives an argument with incorrect type. + +* **`c10:NotImplementedError`** - Python `NotImplementedError` + - Emitted when a feature that is not implemented is called. + +* **`c10::LinAlgError`** - Python `torch.linalg.LinAlgError` + - Emitted from the `torch.linalg` module when there is a numerical error. + +* **`c10::NondeterministicError`** - Python `torch.NondeterministicError` + - Emitted when `torch.use_deterministic_algorithms(True)` and + `torch.set_deterministic_debug_mode('error')` are set, and a + nondeterministic operation is called. + + +#### Warning message classes: + +* **`c10::UserWarning`** - Python `UserWarning` + - Default warning class. Other warning classes inherit from it. + +* **`c10::BetaWarning`** - Python `torch.BetaWarning` + - Emitted when a beta feature is called. See + [PyTorch feature classifications](https://pytorch.org/blog/pytorch-feature-classification-changes/). + - TODO: This warning type might not be very useful--find out if we really + want this + +* **`c10::PrototypeWarning`** - Python `torch.PrototypeWarning` + - Emitted when a prototype feature is called. See + [PyTorch feature classifications](https://pytorch.org/blog/pytorch-feature-classification-changes/). + - TODO: This warning type might not be very useful--find out if we really + want this + +* **`c10::NondeterministicWarning`** - Python `torch.NondeterministicWarning` + - Emitted when `torch.use_deterministic_algorithms(True)` and + `torch.set_deterministic_debug_mode('warn')` are set, and a + nondeterministic operation is called. + +* **`c10::DeprecationWarning`** - Python `DeprecationWarning` + - Emitted when a deprecated function is called. + - TODO: `DeprecationWarning`s are ignored by default in Python, so we may + actually want to use a different Python class for this. + + +#### Info message classes: + +* **`c10::Info`** - Python `torch.Info` + - Default info class. Other info classes inherit from it. + + +### Message APIs + +In order to emit messages, developers can use the APIs defined in this section. + +These APIs all have a variable length argument list, `...` in C++ and `*args` +in Python. When a message is emitted, these arguments are concatenated into +a string, and the string becomes the body of the message. + +In C++, the arguments in `...` must all have the `std::ostream& operator<<` +function defined so that they can be concatenated. + +In Python, each element in `*args` must either have a `__str__` function or it +must be a callable that, when called, produces another object that has +a `__str__` fuction. Providing the body of a message as a callable can provide +better performance in cases where the message would not be emitted, as in +`torch._check(True, lambda: expensive_function())` if `cond == True`, since the +`expensive_function()` would not be called in that case. + + +#### Error APIs + +The APIs for raising errors all check a boolean condition, the `cond` argument +in the following signatures, and throw an error if that condition is false. + +The error APIs are listed below, with the C++ signature on the left and the +corresponding Python signature on the right. + +**`TORCH_CHECK(cond, ...)`** - `torch._check(cond, *args)` + - C++ error: `c10::Error` + - Python error: `RuntimeError` + +**`TORCH_CHECK_INDEX(cond, ...)`** - `torch._check_index(cond, *args)` + - C++ error: `c10::IndexError` + - Python error: `IndexError` + +**`TORCH_CHECK_VALUE(cond, ...)`** - `torch._check_value(cond, *args)` + - C++ error: `c10::ValueError` + - Python error: `IndexError` + +**`TORCH_CHECK_TYPE(cond, ...)`** - `torch._check_type(cond, *args)` + - C++ error: `c10::TypeError` + - Python error: `TypeError` + +**`TORCH_CHECK_NOT_IMPLEMENTED(cond, ...)`** - `torch._check_not_implemented(cond, *args)` + - C++ error: `c10::NotImplementedError` + - Python error: `NotImplementedError` + +**`TORCH_CHECK_WITH(error_t, cond, ...)`** - `torch._check_with(error_type, cond, *args)` + - C++ error: Specified by `error_t` argument + - Python error: Specified by `error_type` argument + +Additionally, `cond` for the Python overloads is allowed to be a boolean tensor +with one element. + + +#### Warning APIs + +**`TORCH_WARN(...)`** - `torch._warn(*args)` + - C++ warning: `c10::UserWarning` + - Python warning: `UserWarning` + +**`TORCH_WARN_ONCE(...)`** - `torch._warn_once(*args)` + - C++ warning: `c10::UserWarning` + - Python warning: `UserWarning` + - For a given callsite, the warning is emitted only upon the first time it is + called. + +**`TORCH_WARN_DEPRECATION(...)`** - `torch._warn_deprecation(*args)` + - C++ warning: `c10::DeprecationWarning` + - Python warning: `UserWarning` + +**`TORCH_WARN_DEPRECATION_ONCE(...)`** - `torch._warn_deprecation_once(*args)` + - C++ warning: `c10::DeprecationWarning` + - Python warning: `DeprecationWarning` + - For a given callsite, the warning is emitted only upon the first time it is + called. + +**`TORCH_WARN_WITH(warning_t, ...)`** - `torch._warn_with(warning_type, ...)` + - C++ warning: Specified by `warning_t` argument + - Python warning: Specified by `warning_type` argument + +**`TORCH_WARN_ONCE_WITH(warning_t, ...)`** - `torch._warn_with(warning_type, ...)` + - C++ warning: Specified by `warning_t` argument + - Python warning: Specified by `warning_type` argument + - For a given callsite, the warning is emitted only upon the first time it is + called. + +TODO: In C++, `TORCH_WARN_ONCE` is implemented as a macro that defines a local +static variable to track whether the warning has been emitted from each +callsite. It is not possible to implement it this way in Python, so need to +think of some other way to do it. Of course the Python `warnings` module's +[`"default"` filter](https://docs.python.org/3/library/warnings.html#the-warnings-filter) +prevents duplicate warnings from being emitted, but it acts a little +differently--if two warning messages emitted from the same location differ even +slightly (for instance, if the value of some variable is included in the +message and that value differs between two different `warnings.warn` calls), +then both warnings are emitted. `TORCH_WARN_ONCE` does not check whether +messages differ. But we could probably implement `torch._warn_once` in a similar +way to how the `warnings` module filter is implemented. + + +#### Info APIs + +Just like the error and warning APIs, the info APIs each have a variable length +argument list, `...` in C++ and `*args` in Python. These arguments are +concatenated into the info message. + +**`TORCH_LOG_INFO(...)`** - `torch._log_info(*args)` + - C++ info class: `c10::Info` + - Python warning: `torch.Info` + - TODO: Is there a better name than `log_info`? I didn't want to call it + `torch.info`, because + [`numpy.info`](https://numpy.org/doc/stable/reference/generated/numpy.info.html) + has a completely different functionality. And obviously + [`torch.log`](https://pytorch.org/docs/stable/generated/torch.log.html?highlight=torch%20log#torch.log) + is already taken. + +**`TORCH_LOG_INFO_WITH(info_t, ...)`** - `torch._log_info_with(info_type, *args)` + - C++ info class: Specified by `info_t` argument + - Python info class: Specified by `info_type` argument + + +### Multi-process messaging APIs + +Currently, when running subprocesses that use PyTorch, some messages are +emitted by every running subprocess. See +[issue #68768](https://github.com/pytorch/pytorch/issues/68768) for specific +examples. Avoiding emitting duplicate messages from each subprocess by default +would give a better user experience. + +In issue #68768, the duplicate messages related to `cpp_extension.load` can be +modified to only be emitted by subprocess rank 0, simply by checking the node's +rank first. For instance, where there is a `warnings.warn(...)`, call we can +replace with: + +```python +if rank == 0: + warnings.warn(...) +``` + +This successfully avoids duplicate warnings. A few concrete examples can be +seen in [this draft PR](https://github.com/pytorch/pytorch/pull/79288). + +However, implementing the duplicate filter like this is not ideal. It would be +better to have dedicated message system API calls for this. In the case of +warnings, the following signature could be used: + +**`torch._warn_rank(my_rank, *args, warn_rank=0)`** + * Args: + - `my_rank` - Rank of the subprocess calling this function + - `args` - Warning message + - `warn_rank` - Rank that should emit the message + * The warning is only emitted if `my_rank == warn_rank` + +TODO: Add APIs for the rest of the message classes, like +`torch._log_info_rank()`, etc. + +TODO: There should also be a global setting to enable emitting the duplicates. +`torch._warn_rank` could check the setting, and if it's turned on, then it would +emit the warning for all ranks. + +TODO: Should we have a `TOCH_WARN_RANK` (and others) in C++ as well? Is there +an existing use case for it? + + +# PyTorch's current messaging API + +The rest of this document contains details about the current messaging API in +PyTorch. This is included to give better context about what will change and +what will stay the same in the new messaging system. + +At the moment, PyTorch has some APIs in place to make a lot of aspects of +message logging easy, from the perspective of a developer working on PyTorch. +Messages can be either printouts, warnings, or errors. + +Errors are created with the standard `raise` statement in Python +([documentation](https://docs.python.org/3/tutorial/errors.html#raising-exceptions)). +In C++, PyTorch offers macros for creating errors (which are listed later in +this document). When a C++ function propagates to Python, any errors that were +generated get converted to Python errors. + +Warnings are created with `warnings.warn` in Python +([documentation](https://docs.python.org/3/library/warnings.html)). In C++, +PyTorch offers macros for creating warnings (which are listed later in this +document). When a C++ function propagates to Python, any warnings that were +generated get converted to Python warnings. + +Printouts (or what is called "Info" severity messages in the new system) are +created with just `print` in Python and `std::cout` in C++. + +PyTorch's C++ warning/error macros are declared in +[`c10/util/Exception.h`](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h). + +## PyTorch C++ Errors + +In C++, there are several different types of errors that can be used, but +PyTorch developers typically don't deal with these error classes directly. +Instead, they use macros that offer a concise interface for raising different +error classes. + +### C++ error macros + +Each of the error macros evaluate a boolean conditional expression, `cond`. If +the condition is false, the error is raised, and whatever extra arguments are +in `...` get concatenated into the error message with `operator<<`. + +| Macro | C++ Error class | +| ---------------------------------------- | ------------------------------ | +| `TORCH_CHECK(cond, ...)` | `c10::Error` | +| `TORCH_CHECK_WITH(error_t, cond, ...)` | caller specifies `error_t` arg | +| `TORCH_CHECK_LINALG(cond, ...)` | `c10::LinAlgError` | +| `TORCH_CHECK_INDEX(cond, ...)` | `c10::IndexError` | +| `TORCH_CHECK_VALUE(cond, ...)` | `c10::ValueError` | +| `TORCH_CHECK_TYPE(cond, ...)` | `c10::TypeError` | +| `TORCH_CHECK_NOT_IMPLEMENTED(cond, ...)` | `c10::NotImplementedError` | + +There is some documentation on error macros [here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L344-L362) + +The reason why C++ preprocessor macros are used, rather than function calls, is +to ensure that the compiler can optimize for the `cond == true` branch. In +other words, if an error does not get raised, overhead is minimized. + +### C++ error classes + +The primary error class in C++ is `c10::Error`. Documentation and declaration +are +[here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L21-L28). +`c10::Error` is a subclass of `std::exception`. + +There are other error classes which are child classes of `c10::Error`, defined +[here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L195-L236). + +When these errors propagate to Python, they are each converted to a different +Python error class: + +| C++ error class | Python error class | +| ------------------------------- | -------------------------- | +| `std::exception` | `RuntimeError` | +| `c10::Error` | `RuntimeError` | +| `c10::IndexError` | `IndexError` | +| `c10::ValueError` | `ValueError` | +| `c10::TypeError` | `TypeError` | +| `c10::NotImplementedError` | `NotImplementedError` | +| `c10::EnforceFiniteError` | `ExitException` | +| `c10::OnnxfiBackendSystemError` | `ExitException` | +| `c10::LinAlgError` | `torch.linalg.LinAlgError` | + + +## PyTorch C++ Warnings + +When warnings propagate from C++ to Python, they are converted to a Python +`UserWarning`. Whatever is in `...` will get concatenated into the warning +message using `operator<<`. + +* `TORCH_WARN(...)` + - [Definition](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L515-L530) + +* `TORCH_WARN_ONCE(...)` + - [Definition](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/c10/util/Exception.h#L557-L562) + - This macro only generates a warning the first time it is encountered during + run time. + + +## Implementation details + +### C++ to Python Error Translation + +`c10::Error` and its subclasses are translated into their corresponding Python +errors [in `CATCH_CORE_ERRORS`](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/torch/csrc/Exceptions.h#L54-L100). + +However, not all of the `c10::Error` subclasses in the table above appear here, +which could just be an oversight. + +`CATCH_CORE_ERRORS` is included within the `END_HANDLE_TH_ERRORS` macro that +most Python-bound C++ functions use for handling errors. For instance, +`THPVariable__is_view` uses the error handling macro +[here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/tools/autograd/templates/python_variable_methods.cpp#L76). +There is also a similar `END_HANDLE_TH_ERRORS_PYBIND` macro that is used for +pybind-based bindings. + + +#### `torch::PyTorchError` + +There's also an extra error class in `CATCH_CORE_ERRORS`, +`torch::PyTorchError`. I'm not sure yet why it exists and how it differs from +`c10::Error`. `torch::PyTorchError` has several overloads: + +* `torch::IndexError` +* `torch::TypeError` +* `torch::ValueError` +* `torch::NotImplementedError` +* `torch::AttributeError` +* `torch::LinAlgError` + + +### C++ to Python Warning Translation + +The conversion of warnings from C++ to Python is described [here](https://github.com/pytorch/pytorch/blob/72e4aab74b927c1ba5c3963cb17b4c0dce6e56bf/torch/csrc/Exceptions.h#L25-L48) + + +## Misc Notes + +[PyTorch Developer Podcast - Python exceptions](https://pytorch-dev-podcast.simplecast.com/episodes/python-exceptions) +explains how C++ errors/warnings are converted to Python. TODO: listen to it +again and take notes.