-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add monitoring instrumentation around dhcp-disable
state.
#133
Open
davidhankins
wants to merge
7
commits into
isc-projects:master
Choose a base branch
from
davidhankins:disable-dhcp-monitoring
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Add monitoring instrumentation around dhcp-disable
state.
#133
davidhankins
wants to merge
7
commits into
isc-projects:master
from
davidhankins:disable-dhcp-monitoring
Commits on Jan 19, 2024
-
Add monitoring instrumentation around
dhcp-disable
state.PROBLEM ======= We're tracking reported issues in our environment where Kea servers are acting like they are being left disabled by unkonwn causes. Discovering the root cause of that problem has been hampered by poor visibility; * It is normal for servers to go without receiving packets. When the `pkt?-received` counter stops incrementing, we cannot determine reliably if this is due to e.g. DHCP relay agent misconfiguration or a Kea service issue. It frustrates debugging. * Kea does not increment any counter to indicate it is disabled, providing us with no clear monitoring predicate to detect the situation early. * It is consequently hours or days before this is noticed, by which time debug logs have rotated. This results in the situation of there being no outward indication of disabled state. SOLUTION ======== In this change, we resolve our visibility problems; * A new `pkt?-raw-received` counter is incremented at the earliest point of execution we know a packet has been received (in `runOne()`). * A new `pkt?-dhcp-disabled` counter counts packets dropped due to the service being in a disabled state at the time of reception. * A WARN level log is emitted for packets dropped while disabled. To alleviate spam concerns, a 1-in-100 sample is used. * To facilitate this change, the `StatsMgr` now provides a `getInteger()` method. * `NetworkState` internal boolean constructs are exposed as monitoring metrics. This allows us to observe the normal process of disabling service during e.g. HA reconnects and recovery, and separately detect extended unexpected disabled states prior to sufficient packet level impacts (consider e.g. DHCP anycast servers which may not receive packets to count as disabled for extended periods - we want to know it had been disabled for several days before the anycast route changes, not after). This "belt and suspenders" approach should close our monitoring gaps and provide attention to faulty services as they present. DISCUSSION ========== The `pkt?-raw-received` counters are a glaring admission of guilt, and we should use `pkt?-received` in these places. The exsisting `pkt?-received` counters should be renamed to `pkt?-processed`, as they indicate execution of the respective `processPacket()` function - still a useful monitoring indicator that e.g. the thread handoff succeeded. Probably there should also be a counter for the case where the callback addition failed instead of the present debug log. **HOWEVER**, existing Kea users may rely on the present behavior and names of metrics. A monitoring predicate using the current `pkt?-received` metrics may trigger automatic remediation processes for Kea servers unintentionally left disabled for too long; these predicate configurations would have to change or else the remediation would be defeated. This may not be an ideal user experience. Consequently, I've offered the unfortunate solution you see here; `pkt?-raw-received`, but I'm extremely unhappy with it; I'd be happy to modify the situation to whatever ISC prefers to offer their users in this case.
Configuration menu - View commit details
-
Copy full SHA for 93b65a1 - Browse repository at this point
Copy the full SHA 93b65a1View commit details
Commits on Jan 22, 2024
-
Configuration menu - View commit details
-
Copy full SHA for e0392c4 - Browse repository at this point
Copy the full SHA e0392c4View commit details
Commits on Jan 25, 2024
-
Use
std::unordered_set<>.count()
instead of.contains()
.The contains() method was added in C++20, while count() will always return 0 or 1 and goes back to C++11.
Configuration menu - View commit details
-
Copy full SHA for 323435d - Browse repository at this point
Copy the full SHA 323435dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 34e3453 - Browse repository at this point
Copy the full SHA 34e3453View commit details
Commits on Jan 30, 2024
-
Configuration menu - View commit details
-
Copy full SHA for e726921 - Browse repository at this point
Copy the full SHA e726921View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6e0ce9d - Browse repository at this point
Copy the full SHA 6e0ce9dView commit details -
Configuration menu - View commit details
-
Copy full SHA for e534bc9 - Browse repository at this point
Copy the full SHA e534bc9View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.