Skip to content

Commit

Permalink
Add resilience info (#245)
Browse files Browse the repository at this point in the history
  • Loading branch information
SeBBBe authored Oct 22, 2024
1 parent 00ad41c commit 0029e25
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 5 deletions.
1 change: 1 addition & 0 deletions modules/ROOT/content-nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
* Introduction
** xref:introduction/overview.adoc[]
** xref:introduction/architecture.adoc[]
** xref:introduction/resilience.adoc[]
** xref:introduction/deployment.adoc[]
** xref:introduction/versions.adoc[]
Expand Down
60 changes: 60 additions & 0 deletions modules/ROOT/pages/introduction/resilience.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
= Resilience
:description: This section describes the resilience features of Ops Manager.

Neo4j Ops Manager is equipped with multiple features to ensure resilient service.

== Rate limiting
To avoid the server being overloaded by one or a few clients, a rate limit is applied per IP address. This helps ensure that the server is always available to respond to requests by putting a limit on the load an individual connection is allowed to place on the system.

The default configuration can be changed with the following server configuration parameters.

[cols="<,<,<, <",options="header"]
|===
| Command line argument
| Environment variable name
| Description
| Default value

| `ratelimiter.period`
| `RATELIMITER_PERIOD`
| The amount time before the rate limiter resets the number of requests.
| PT20S

| `ratelimiter.limit_for_period`
| `RATELIMITER_LIMIT_FOR_PERIOD`
| Number of requests permitted (per IP) within the period.
| 200

| `ratelimiter.timeout_duration`
| `RATELIMITER_TIMEOUT_DURATION`
| When the limit is hit, wait this amount of time and check again.
| PT10S
|===

== Circuit breaker

Since query log capture can produce a vast amount of data depending on the workload and agent configuration, a so-called circuit breaker governs the reception of log data on the server side. If the amount of logs being received is vast enough to cause a degradation in handling of other messages, the circuit breaker will temporarily stop processing of query logs and assign full priority to other messages. Query log processing resumes automatically after some time has passed.

The circuit breaker is automatically configured and cannot be disabled. If there are problems, please reduce the amount of logs being sent by each agent. See *xref:../addition/agent-installation/self-registered.adoc#querylog[Query log collection configuration]*.

[NOTE]
====
Best practices dictate the use of a minimum duration filter which greatly cuts down on the volume of logs to be processed, while preserving queries of interest. The built-in obfuscation functionality also helps by reducing query text cardinality.
====

== Data caching

If the communication between the agent and server is interrupted, some amount of data will be cached on the agent-side and retransmitted once the connection is reestablished.

[cols="<,<",options="header"]
|===
| Type of data
| Cache size

| Metrics
| *Up to* 50 minutes (18 minutes if the query cache is full)

| Query logs
| 32 minutes or 32,768 unique queries, whichever happens first.

|===
10 changes: 5 additions & 5 deletions modules/ROOT/pages/operation/upgrade-manager.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ They are not a replacement for reading the documentation, upgrade-planning, and
image::upgrades.png[width=800]
+

. Select a version and request a step by step tailored plan to guide you through the upgrade of the managed DBMS from it’s current version to the selected version.
Default settings (e.g. backup directory) can be overridden to suit your deploymemt environment and preferences.
. Select a version and request a step-by-step tailored plan to guide you through the upgrade of the managed DBMS from its current version to the selected version.
Default settings (e.g. backup directory) can be overridden to suit your deployment environment and preferences.
+
image::upgrades-choose-plan.png[width=800]
+
Expand All @@ -31,7 +31,7 @@ image::upgrades-plan-in-progress.png[width=800]
+


== Keeping verison information up to date
== Keeping version information up to date

=== Auto-refresh of available Neo4j and NOM versions.
NOM server periodically attempts an auto-refresh of available Neo4j and NOM versions by fetching version information from a central location.
Expand All @@ -40,9 +40,9 @@ As this requires access to external web resources, if NOM server is running behi
For details, see xref:installation/server.adoc#behind_proxy[Running NOM server behind proxy].

=== Manual upload of version information
If it is not feasible for NOM server to be configured to access extrernal web resources then versions can be refreshed by:
If it is not feasible for NOM server to be configured to access external web resources, then the available versions can be refreshed by:

. Downloading versions file from https://storage.googleapis.com/production-ops-manager-frontend-bucket/supported-neo4j-versions.yml
. Downloading the latest version manifest file from https://storage.googleapis.com/production-ops-manager-frontend-bucket/supported-neo4j-versions.yml
. Uploading file to NOM manually

image::version-fetch-error.png[width=800]
Expand Down

0 comments on commit 0029e25

Please sign in to comment.