High CPU on Postgres database when using coldchain monitoring #297

jmbrunskill · 2024-11-07T05:12:10Z

Describe the bug

Notify ties up a lot of resources on the Postgres Server querying for Cold Chain Data.
This process of individually querying each sensor has a lot of over head (lots of querys to process) vs a process that queried all relevent sensors at once then processed in notify.

Notify should also probably throttle itself so that it waits a little bit longer between query attempts.
Possibly user configurable, so that if you want to get more realtime alerts (e.g. 10s you can but in a big system you might want to only check for alerts every 5 or 10 minutes)

To Reproduce

Steps to reproduce the behaviour:

Setup cold chain alerts for a significant number of sensors
Check CPU usage for Postgres

Expected behaviour

Additional context

kahn-msupply · 2024-11-07T20:14:51Z

Hey team - just a couple of videos showing the CPU spiking - Happening roughly every 10 seconds. The CPU spiking is causing some slowness for mSupply users.

The ideal situation would be to migrate Postgres to a separate server so it does not throttle the CPU. I understand the CPU spiking is fairly normal, but if possible it would be great if the query could be optimised.

Screen.Recording.2024-11-08.at.9.07.36.AM.mov

Screen.Recording.2024-11-08.at.9.07.01.AM.mov

kahn-msupply · 2024-12-09T03:52:25Z

Hi team - is there an update on this issue by chance? It seems the spiking has gotten worse recently and mSupply is grinding to a halt.

Let me know if further details are required from me.

andreievg · 2024-12-10T05:25:43Z

Had a check, i couldn't resist, was taking 2-3 seconds for this query (this was the sensor with most logs):

SELECT 
    id,
    sensor_id,
    CONCAT(TO_CHAR(date,'YYYY-MM-DD'),' ', TO_CHAR(time,'HH24:MI:SS'))::timestamp AS log_datetime,
    temperature
    FROM temperature_log
    WHERE sensor_id = '2d3fd6c5-c2a1-4af4-a7c6-9933dd5eaa26'
    AND temperature < 55 -- ignore any obviously bad data see: https://github.com/msupply-foundation/notify/issues/283
    ORDER BY date DESC, time DESC
    LIMIT 1

There was no indexes on temperature_log table, added the following (probably overkill)

create index temperature_log_sensor_id on temperature_log(sensor_id);
create index temperature_log_date on temperature_log(date);
create index temperature_log_time on temperature_log(time);
create index temperature_log_all on temperature_log(sensor_id, date, time);

80 - 100 ms for above query. And server seems to be better.

I was wondering if we add those indexes by default to our dashboard ? @EthanMcQ-TMF ?

Maybe can drop priority a little bit ? I do agree it's better to get data for all sensors (would probably need max by date then max by time for that date and then get that log for the sensor)

jmbrunskill added the bug Something isn't working label Nov 7, 2024

andreievg added the Team Piwakawaka label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU on Postgres database when using coldchain monitoring #297

High CPU on Postgres database when using coldchain monitoring #297

jmbrunskill commented Nov 7, 2024 •

edited

Loading

kahn-msupply commented Nov 7, 2024

kahn-msupply commented Dec 9, 2024

andreievg commented Dec 10, 2024

High CPU on Postgres database when using coldchain monitoring #297

High CPU on Postgres database when using coldchain monitoring #297

Comments

jmbrunskill commented Nov 7, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behaviour

Additional context

kahn-msupply commented Nov 7, 2024

kahn-msupply commented Dec 9, 2024

andreievg commented Dec 10, 2024

jmbrunskill commented Nov 7, 2024 •

edited

Loading