Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⭐ [Enhancement]: Improve Health Endpoint #2366

Open
JerryNixon opened this issue Sep 6, 2024 · 2 comments
Open

⭐ [Enhancement]: Improve Health Endpoint #2366

JerryNixon opened this issue Sep 6, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@JerryNixon
Copy link
Contributor

JerryNixon commented Sep 6, 2024

What is it?

  • Add configuration information to health endpoint
  • Add endpoint basics to health endpoint
  • Add thresholds to health endpoint

Health as a standard

There is no official industry standard for the health endpoint. /health or variations like /_health are common by convention. ASP.NET Core uses Microsoft.Extensions.Diagnostics.HealthChecks. Data API builder will use the current root / for accessing health - this is where it already is exposed.

Useful for automation

For example, Azure App Service & Azure Kubernetes Service (AKS) support health probes to monitor the health of your application. If a service fails health checks, Azure can automatically restart it or redirect traffic to healthy instances.

Similarly, if Data API builder fails health checks in a way a customer deems past a threshold, they have the option to recycle the container or send an alert to direct engineers.

Term Description
Health Endpoint The URL (e.g., /) exposed as JSON.
Check A specific diagnostic test (e.g., database, API).
Status The result of a check.
Status.Healthy The system is functioning correctly.
Status.Unhealthy The system has a critical failure or issue.

Overall health calculation

Healthy Unhealthy Global Status
- 0 Healthy
- ≥ 1 Unhealthy

This logic shows how the global health status is determined:

  • Healthy: All checks are healthy.
  • Unhealthy: If any is unhealthy.

Output standard schema

Health check responses follow a common convention rather than a strict standard. The typical pattern involves a "checks" property for individual components' statuses (e.g., database, memory), with each status rolling up to an overall "status" at the top level.

Basic format

{
  "status": "Healthy",
  "checks": {
    "check-name": { "status": "Healthy" },
    "check-name": { "status": "Healthy" }
  }
}

Example

{
  "status": "Healthy",
  "checks": {
    "database": { "status": "Healthy" },
    "memory": { "status": "Healthy" }
  }
}

Other common fields

Fields like description, tags, data, exception, and reason provide additional metadata.

1. Description

A textual explanation of what the health check is doing or testing.

{
  "status": "Healthy",
  "description": "Checks database connection and query speed."
}

2. Tags

Labels or categories that group or identify related health checks.

{
  "status": "Healthy",
  "tags": ["database", "critical"]
}

3. Data

Any additional information collected during the health check, often technical metrics or diagnostics.

{
  "status": "Degraded",
  "data": {
    "responseTime": "250ms",
    "maxAllowedResponseTime": "100ms"
  }
}

4. Exception

Information about any error or failure encountered during the health check.

{
  "status": "Unhealthy",
  "exception": "TimeoutException: Database query timed out."
}

(Additive) Data API builder config

The standard allows for additive data, like DAB config data we could add.

{
  "status": "Healthy",
  "version": "1.2.10",
  "app-name": "dab_oss_1.2.10",
  "dab-configuration": {
    "http": true,
    "https": true,
    "rest": true,
    "graphql": true,
    "telemetry": true,
    "caching": true,
    "mode": "development",
    "dab-configs": [
      "/App/dab-config.json (mssql)"
    ],
    "dab-schemas": [
      "/App/schema.json"
    ]
  }
}

Configuration changes

{
  "runtime" : {
    "health" : {
      "enabled": true, (default: true)
      "cache-ttl": 5, (optional default: 5)
      "max-dop": 5, (optional default: 1)
      "roles": ["anonymous", "authenticated"] (optional default: *)
    }
  }
}
{
  "data-source" : {
    "health" : {
      "moniker": "sqlserver", (optional default: NULL) // not required, on purpose, most have just one
      "enabled": true, (default: true)
      "query": "SELECT TOP 1 1", (option)
      "threshold-ms": 100 (optional default: 10000)
    }
  }
}
{
  "<entity-name>": {
      "health": {
        "enabled": true, (default: true)
        "filter": "Id eq 1" (optional default: null),
        "first": 1 (optional default: 1),
        "threshold-ms": 100 (optional default: 10000)
      },
      ...
    },
  }
}

Because we have the configuration, we know if this is a stored procedure or table/view endpoint. We want to allow the developer to influence how the checks work against the endpoint/entity.

runtime.health Configuration

Property Data Type Required Default Description
enabled Boolean No true Enables or disables health checks at the runtime level.
cache-ttl Integer No 5 Time-to-live (in seconds) for caching health check results.
max-dop Integer No 1 Maximum Degree of Parallelism for running health checks.
roles Array No * Roles allowed to access the health endpoint (e.g., anonymous, authenticated).

data-source.health Configuration

Property Data Type Required Default Description
moniker String No NULL Identifier for the data source; useful when multiple data sources exist.
enabled Boolean No true Enables or disables health checks for the data source.
query String No N/A Custom SQL query used to perform the health check.
threshold-ms Integer No 10000 Threshold in milliseconds for the query response time before the check is considered degraded.

<entity-name>.health Configuration

Property Data Type Required Default Description
enabled Boolean No true Enables or disables health checks for the specific entity.
filter String No null Filter condition applied to the health check query (e.g., "Id eq 1").
first Integer No 1 Number of records to query during the health check.
threshold-ms Integer No 10000 Threshold in milliseconds for the query response time before the check is considered degraded.

Output Sample

{
  "status": "Unhealthy",
  "status": "Healthy",
  "version": "1.2.10",
  "app-name": "dab_oss_1.2.10",
  "dab-configuration": {
    "http": true,
    "https": true,
    "rest": true,
    "graphql": true,
    "telemetry": true,
    "caching": true,
    "mode": "development",
    "dab-configs": [
      "/App/dab-config.json ({data-source-moniker})",
      "/App/dab-config-2.json ({data-source-moniker})"
    ],
    "dab-schemas": [
      "/App/schema.json"
    ]
  },
  "checks": {
    "database-moniker" : {
        "status": "Healthy",
        "tags": ["database", "performance"]
        "description": "Checks if the database is responding within an acceptable timeframe.",
        "data": {
            "responseTimeMs": 10,
            "maxAllowedResponseTimeMs": 10
        }
    },
    "database-moniker" : {
        "status": "Unhealthy",
        "tags": ["database", "performance"],
        "description": "Checks if the database is responding within an acceptable timeframe.",
        "data": {
            "responseTimeMs": 20,
            "maxAllowedResponseTimeMs": 10
        }
    },
    "database-moniker" : {
        "status": "Unhealthy",
        "tags": ["database", "performance"]
        "description": "Checks if the database is responding within an acceptable timeframe.",
        "data": { 
            "responseTimeMs": NULL,
            "maxAllowedResponseTimeMs": 10
        },
        "exception": "TimeoutException: Database query timed out."
    },
    "<entity-name>": {
      "status": "Healthy",
      "description": "Checks if the endpoint is responding within an acceptable timeframe.",
      "tags": ["endpoint", "performance"]
      "data": {
          "responseTimeMs": 10,
          "maxAllowedResponseTimeMs": 10
      }
   },
    "<entity-name>": {
      "status": "Unhealthy",
      "description": "Checks if the endpoint is responding within an acceptable timeframe.",
      "tags": ["endpoint", "performance"]
      "data": {
          "responseTimeMs": 20,
          "maxAllowedResponseTimeMs": 10
      }
   },
    "<entity-name>": {
      "status": "Unhealthy",
      "description": "Checks if the endpoint is responding within an acceptable timeframe.",
      "tags": ["endpoint", "performance"]
      "data": {
          "responseTimeMs": 20,
          "maxAllowedResponseTimeMs": 10
      }
      "exception": "{exception-message-here}"
   },
  }
}
@JerryNixon JerryNixon added the enhancement New feature or request label Sep 6, 2024
@JerryNixon JerryNixon self-assigned this Sep 6, 2024
@seantleonard seantleonard added this to the October2024-March2025 milestone Sep 11, 2024
@JerryNixon JerryNixon pinned this issue Sep 26, 2024
@seantleonard
Copy link
Contributor

Another healthcheck example: DabHealthCheck.cs

internal class DabHealthCheck : IHealthCheck

@JerryNixon JerryNixon unpinned this issue Oct 3, 2024
@aaronpowell
Copy link
Contributor

It looks like https://github.com/Xabaril/AspNetCore.Diagnostics.HealthChecks has support across the four supported data sources for DAB, would it be easier to add those internally to surface up the health checks, or at least treat them as additive to DAB-specific ones?

Once there is some native health check info surfaced by DAB, I'd love to get it integrated in the .NET Aspire Community Toolkit integration (tracking via CommunityToolkit/Aspire#190).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants