Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU on Postgres database when using coldchain monitoring #297

Open
jmbrunskill opened this issue Nov 7, 2024 · 3 comments
Open

High CPU on Postgres database when using coldchain monitoring #297

jmbrunskill opened this issue Nov 7, 2024 · 3 comments
Labels
bug Something isn't working Team Piwakawaka

Comments

@jmbrunskill
Copy link
Collaborator

jmbrunskill commented Nov 7, 2024

Describe the bug

Notify ties up a lot of resources on the Postgres Server querying for Cold Chain Data.
This process of individually querying each sensor has a lot of over head (lots of querys to process) vs a process that queried all relevent sensors at once then processed in notify.

Notify should also probably throttle itself so that it waits a little bit longer between query attempts.
Possibly user configurable, so that if you want to get more realtime alerts (e.g. 10s you can but in a big system you might want to only check for alerts every 5 or 10 minutes)

To Reproduce

Steps to reproduce the behaviour:

  1. Setup cold chain alerts for a significant number of sensors
  2. Check CPU usage for Postgres

Expected behaviour

Additional context

@jmbrunskill jmbrunskill added the bug Something isn't working label Nov 7, 2024
@kahn-msupply
Copy link

Hey team - just a couple of videos showing the CPU spiking - Happening roughly every 10 seconds. The CPU spiking is causing some slowness for mSupply users.

The ideal situation would be to migrate Postgres to a separate server so it does not throttle the CPU. I understand the CPU spiking is fairly normal, but if possible it would be great if the query could be optimised.

Screen.Recording.2024-11-08.at.9.07.36.AM.mov
Screen.Recording.2024-11-08.at.9.07.01.AM.mov

@kahn-msupply
Copy link

Hi team - is there an update on this issue by chance? It seems the spiking has gotten worse recently and mSupply is grinding to a halt.

Let me know if further details are required from me.

@andreievg
Copy link
Contributor

Had a check, i couldn't resist, was taking 2-3 seconds for this query (this was the sensor with most logs):

SELECT 
    id,
    sensor_id,
    CONCAT(TO_CHAR(date,'YYYY-MM-DD'),' ', TO_CHAR(time,'HH24:MI:SS'))::timestamp AS log_datetime,
    temperature
    FROM temperature_log
    WHERE sensor_id = '2d3fd6c5-c2a1-4af4-a7c6-9933dd5eaa26'
    AND temperature < 55 -- ignore any obviously bad data see: https://github.com/msupply-foundation/notify/issues/283
    ORDER BY date DESC, time DESC
    LIMIT 1

There was no indexes on temperature_log table, added the following (probably overkill)

create index temperature_log_sensor_id on temperature_log(sensor_id);
create index temperature_log_date on temperature_log(date);
create index temperature_log_time on temperature_log(time);
create index temperature_log_all on temperature_log(sensor_id, date, time);

80 - 100 ms for above query. And server seems to be better.

I was wondering if we add those indexes by default to our dashboard ? @EthanMcQ-TMF ?

Maybe can drop priority a little bit ? I do agree it's better to get data for all sensors (would probably need max by date then max by time for that date and then get that log for the sensor)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team Piwakawaka
Projects
None yet
Development

No branches or pull requests

3 participants