Feature: Station behavior anomaly-detection policy #1314
Labels
💟 Community involvement
A feature that the community is invloved with
Feature Request
New feature or request
good first issue
Good for newcomers
Description
Hey,
In multiple scenarios, data stopped being produced/consumed to/from a Memphis station for various reasons.
A bug was found on some occasions, and in others, it was a client coding issue. Both scenarios had no crash, so clients did not write any logs. They appeared connected to Memphis, and Memphis itself did not get into an issue. Therefore, no report was made.
To overcome such a scenario and to be able to provide a higher level of observability and protection, I suggest creating a per-station ability to define a policy that will state a range of number of messages in a second that should be produced/consumed to/from a station and a difference threshold in %, meaning "if there is 50% smaller number of produced messages in a second" meaning that we have some issue and a notification should be sent.
That policy should be entirely defined by the users and per station. No pre-assumptions should be taken.
Involved components
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: