Skip to content

Commit

Permalink
feat: Detect and log anomalous traces in DatadogDiagnosticMiddleware
Browse files Browse the repository at this point in the history
Manual testing:

1. Install in devstack, add middleware to lms/envs/devstack.py, and
   enable Waffle flag.
2. Make some requests and observe lack of new logging.
3. Add `self.dd_tracer.current_root_span().finish()` to start of
   `process_view` (faking an anomalous trace) and reinstall in devstack
4. Make some more requests and observe log output like the following:
   `Anomalous Datadog local root span (duration already set): id = 66abf66b000000001168265a391d04a3; duration = 0.108 sec; worker age = 1.560 sec`
  • Loading branch information
timmc-edx committed Aug 1, 2024
1 parent c7ca515 commit 8e1ea7e
Show file tree
Hide file tree
Showing 4 changed files with 39 additions and 9 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@ Change Log
Unreleased
~~~~~~~~~~

[3.7.0] - 2024-08-02
~~~~~~~~~~~~~~~~~~~~
Added
-----
* ``DatadogDiagnosticMiddleware`` can now detect and log anomalous traces, enabled by Waffle flag ``datadog.diagnostics.detect_anomalous_trace``

[3.6.0] - 2024-07-24
~~~~~~~~~~~~~~~~~~~~
Added
Expand Down
2 changes: 1 addition & 1 deletion edx_arch_experiments/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
A plugin to include applications under development by the architecture team at 2U.
"""

__version__ = '3.6.0'
__version__ = '3.7.0'
7 changes: 0 additions & 7 deletions edx_arch_experiments/datadog_diagnostics/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,3 @@ Datadog Diagnostics
When installed in the LMS as a plugin app, the ``datadog_diagnostics`` app adds additional logging for debugging our Datadog integration.

This is intended as a temporary situation while we debug the `trace concatenation issue <https://github.com/edx/edx-arch-experiments/issues/692>`_.

Usage
*****

In LMS:

- Install ``edx-arch-experiments`` as a dependency
33 changes: 32 additions & 1 deletion edx_arch_experiments/datadog_diagnostics/middleware.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,31 @@
"""
Diagnostic middleware for Datadog.
To use, install edx-arch-experiments and add
``edx_arch_experiments.datadog_diagnostics.middleware.DatadogDiagnosticMiddleware``
to ``MIDDLEWARE``, then set the below settings as needed.
"""

import logging
import time

from django.core.exceptions import MiddlewareNotUsed
from edx_toggles.toggles import WaffleFlag

log = logging.getLogger(__name__)

# .. toggle_name: datadog.diagnostics.detect_anomalous_trace
# .. toggle_implementation: WaffleFlag
# .. toggle_default: False
# .. toggle_description: Enables logging of anomalous Datadog traces for web requests.
# .. toggle_warning: This is a noisy feature and so it should only be enabled
# for a percentage of requests.
# .. toggle_use_cases: temporary
# .. toggle_creation_date: 2024-08-01
# .. toggle_target_removal_date: 2024-11-01
# .. toggle_tickets: https://github.com/edx/edx-arch-experiments/issues/692
DETECT_ANOMALOUS_TRACE = WaffleFlag('datadog.diagnostics.detect_anomalous_trace', module_name=__name__)

# .. toggle_name: datadog.diagnostics.log_root_span
# .. toggle_implementation: WaffleFlag
# .. toggle_default: False
Expand Down Expand Up @@ -43,6 +60,8 @@ def __init__(self, get_response):
# If import fails, don't even load this middleware.
raise MiddlewareNotUsed # pylint: disable=raise-missing-from

self.worker_start_epoch = time.time()

def __call__(self, request):
return self.get_response(request)

Expand All @@ -63,9 +82,21 @@ def log_diagnostics(self, request):
"""
Contains all the actual logging logic.
"""
local_root_span = self.dd_tracer.current_root_span()

if DETECT_ANOMALOUS_TRACE.is_enabled():
root_duration_s = local_root_span.duration
if root_duration_s is not None:
worker_run_time_s = time.time() - self.worker_start_epoch
log.warning(
f"Anomalous Datadog local root span (duration already set): "
f"id = {local_root_span.trace_id:x}; "
f"duration = {root_duration_s:0.3f} sec; "
f"worker age = {worker_run_time_s:0.3f} sec"
)

if LOG_ROOT_SPAN.is_enabled():
route_pattern = getattr(request.resolver_match, 'route', None)
local_root_span = self.dd_tracer.current_root_span()
current_span = self.dd_tracer.current_span()
# pylint: disable=protected-access
log.info(
Expand Down

0 comments on commit 8e1ea7e

Please sign in to comment.