Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(docs): Add ADR on structured logging. #1018

Merged
merged 17 commits into from
Oct 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .config/dictionaries/project.dic
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ icudtl
ideascale
idents
ilap
IMEI
Instantitation
integ
Intellij
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
---
title: 0008 Structured Logging
adr:
author: Steven Johnson
created: 17-Oct-2024
status: proposed
tags:
- logging
---

<!-- cspell: words Sematext,Stackify -->

## Context

Both Backend and Frontend components of Catalyst Voices produce log messages.
These log message help in development, but also in fault-finding in production.

Structured logging is a way of logging which separates the log text message from the data pertinent to the logged event.

## Assumptions

That both the Frontend and Backend are using logging libraries capable of structured logging.

## Decision

* We will use structured logging for all log messages that are to be sent to a log collection system.
* We will not use string formatting within structured log messages, but embed all data within fields in the log message.
* All log levels *MUST* use structured logging.
* It is possible that **DEBUG** or **TRACE** level logs can selectively be enabled in production to help investigate issues.
* Log text messages should be specific enough to identify the actual log messages when searching or summarizing logs.
* Avoid using the same log message multiple times in the same code.
* As far as we are able, we should be consistent with field names.
* For example, we always use "error" for error values, not "Error" or "err" or "e".
* Fields do not need to have the same name as the variable, what is important is the field name be unambiguous and consistent.
* Logs *MUST* include the date and time of the event being logged.
* Date and Time *MUST* always be referenced to UTC, and not Local Time.
* Date and Time *SHOULD* be formatted according to [RFC3339] (An open formalized subset of [ISO8601]).
* If [RFC3339] is not possible to use, then [ISO8601] format *MAY* be used instead.
* Logs *SHOULD*, if possible, include the filename, and line of the log message in the code.
* Loggers *SHOULD* be configured to automatically include log fields required for every log message.
* All logs *MUST* obfuscate a client's private information before logging.
* A non-exhaustive list of private information includes:
* IP Addresses
* User-Agent Strings
* Location Information (Longitude/Latitude)
* Device identifiers (IMEI, MAC Address, etc.)
* Browser Fingerprints
* For data like this, that uniquely identifies a client or session, the obfuscation of that data should be consistent.
* For example, `ip addresses` can be hashed and turned into a [UUID].
* The [UUID] in this case will always be the same for an `ip address`.
* The privacy of the client is protected because it is not easily possible to derive the `ip address` from the [UUID].
* This gives us the ability to cluster multiple interactions from a common connection without revealing the users' identity.
* It also allows the Logs to help provide support to a user if they supply their IP Address. (Client Directed Unmasking).
* *NEVER* log Security sensitive information either in the clear or obfuscated, such as (non-exhaustive):
* Usernames
* Email Addresses
* Passwords
* API Keys
* Private Encryption Keys

*NOTE :* **IF AND ONLY IF** log messages are a developer convenience,
**AND** will never be collected for online diagnostic purposes,
**THEN** they do not need to follow this ADR.
*IF in doubt, use structured logging.*

### Log Levels

The Standard Abstract Log levels are defined (from the highest priority to lowest) as:

* **CRITICAL** *(Optional)* - When a system has completely failed.
* **ERROR** can also be used for this.
* **ERROR** - Something has failed which normally should not.
* Does not have to be a fatal failure.
* The Log message should clearly show if the error is fatal or unrecoverable.
* For example, a failed DB connection is an *ERROR* but is not fatal.
* Errors should be considered not fatal, unless they explicitly state otherwise.
* **WARNING** - Something which can happen, but normally shouldn't.
* **INFO** - Important System informational messages.
* **DEBUG** - Highest level detailed system operation logs.
* **TRACE** - Verbose and detailed system operation logs.

Individual languages may use different terms for these, and may have more or less defined levels.
Use the language's native log levels which map closest to this list.
For example, some languages do not have *CRITICAL*, we would map that level to *ERROR*.
In all such cases, use the next most appropriate level which matches closest with the above definitions.

#### Logging in Libraries

* Libraries *SHOULD NOT* use *INFO* level logs.
* this level of log should be used only by applications or services.
* Libraries *MAY* use any other log level.

### Log Verbosity

Logs should aim to provide exactly the right amount of information, and no more.

* All **ERROR** logs should be logged in the most appropriate location.
* **ERROR** or **WARNING** logs, *SHOULD* include enough context to identify the actual source of the error.
* Info level logs should only be used to provide ***IMPORTANT*** operational data.
* Such as:
* Build or configuration information.
* The start of internal persistent services
* **INFO** logs should not typically continuously stream as a result of normal system use.
* API Endpoint statistics logs are an example of an exception to this guidance.
* However, they *SHOULD* also have a method of selectively enabling/disabling these *INFO* logs.
* The same principle *SHOULD* apply to any other regularly logged **INFO** level logs that may be required.
* **DEBUG** level logs should be assumed to be enabled selectively in production.
* Their purpose *SHOULD* be aimed to helping to diagnose faults.
* Detailed and Verbose or Streaming Debug level logs should be at **TRACE** level.

### Example of Unstructured Logging to avoid

An example of unstructured logging in rust, which is not to be used:

```rust
error!("Hello {name} An error occurred in {thing}, doing {something:?}: {err}");
```

The problems this example exhibits are:

1. This is difficult to process with automation.
2. It would need to be searched with a regex.
3. The fixed words probably appear in similar patterns in other log messages, making it difficult to discern what the log is about.
4. It may be easy for a developer to read.
However, it is not easy to read when there are tens of thousands of logs from multiple instances of a service.

### Example of Structured Logging to utilize

Instead, we should use:

```rust
error!(name=name, thing=%thing, something=?something, error=?err, "An error occurred processing named updates to the database");
```

* Each piece of dynamic data is a field, including the string representation of the error.
* The message explains what the error is and helps locate the log both in the log messages itself, and in the code.

## Risks

The only risk relates to not doing structured logging.
It will make the system harder to manage for Operations and Support staff.

***ALWAYS REMEMBER:
Log messages are intended to be primarily read by operations and support staff.
Do not assume they know how the code works, just because you might.
Attempt to make log messages helpful to the people who will work with the system.***

## Consequences

Structured logging makes searching, summarizing and parsing log messages significantly easier.
Without it, log messages quickly become difficult to use, which defeats the purpose of generating them.

## More Information

* [Sematext: What is structured logging](https://sematext.com/glossary/structured-logging/)
* [Stackify: What Is Structured Logging and Why Developers Need It](https://stackify.com/what-is-structured-logging-and-why-developers-need-it/)

[RFC3339]: <https://www.rfc-editor.org/rfc/rfc3339>
[ISO8601]: <https://www.iso.org/iso-8601-date-and-time-format.html>
[UUID]: <https://www.rfc-editor.org/rfc/rfc9562.html>
Loading