Feature: Station behavior anomaly-detection policy #1314

yanivbh1 · 2023-09-13T09:05:54Z

Description

Hey,
In multiple scenarios, data stopped being produced/consumed to/from a Memphis station for various reasons.
A bug was found on some occasions, and in others, it was a client coding issue. Both scenarios had no crash, so clients did not write any logs. They appeared connected to Memphis, and Memphis itself did not get into an issue. Therefore, no report was made.

To overcome such a scenario and to be able to provide a higher level of observability and protection, I suggest creating a per-station ability to define a policy that will state a range of number of messages in a second that should be produced/consumed to/from a station and a difference threshold in %, meaning "if there is 50% smaller number of produced messages in a second" meaning that we have some issue and a notification should be sent.

That policy should be entirely defined by the users and per station. No pre-assumptions should be taken.

Involved components

GUI
SDKs
Broker
Notifications channels/notifications integrations

Additional context

No response

Code of Conduct

I agree to follow this project's Code of Conduct

itajenglish · 2023-09-15T16:47:48Z

@yanivbh1 I think this is a great idea! I think there is even some potential to take advantage of machine learning using the historical throughput of a station to alert on in conjunction with the manually set policy. Maybe automatic anomaly detection could be a cloud feature 👀

g41797 · 2023-11-07T12:54:13Z

Simple "ping/pong" - periodical exchange with adapter will be good enough
Adapter should run as regular client - external (not a part of multi-container)

yanivbh1 · 2023-11-08T06:20:40Z

@g41797, it's not answering the challenge.
The scenario I want to tackle here is, for example: In a certain station, every 24 hours, there should be at least 100GB of produced data and 300GB of consumed data, and all of a sudden, there was only 20GB in and 50GB out.
It might be nothing, but it can also be some alert that something is not working. Btw, it arose from one of our customers.

ping/pong won't be good in such a scenario.

yanivbh1 added Feature Request New feature or request good first issue Good for newcomers 💟 Community involvement A feature that the community is invloved with labels Sep 13, 2023

yanivbh1 assigned avrhamNeeman and idanasulin2706 Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Station behavior anomaly-detection policy #1314

Feature: Station behavior anomaly-detection policy #1314

yanivbh1 commented Sep 13, 2023 •

edited

Loading

itajenglish commented Sep 15, 2023

g41797 commented Nov 7, 2023

yanivbh1 commented Nov 8, 2023

Feature: Station behavior anomaly-detection policy #1314

Feature: Station behavior anomaly-detection policy #1314

Comments

yanivbh1 commented Sep 13, 2023 • edited Loading

Description

Involved components

Additional context

Code of Conduct

itajenglish commented Sep 15, 2023

g41797 commented Nov 7, 2023

yanivbh1 commented Nov 8, 2023

yanivbh1 commented Sep 13, 2023 •

edited

Loading