Add metric timestamps and metric timeout (fixes #64, #67) #69

tris · 2024-07-23T03:48:28Z

Some context -- it appears that EcoFlow is trickling updates at a slower rate than previously, which was invoking the "erase everything" logic whenever 10 seconds (COLLECTING_INTERVAL) would pass between messages.

Now, we expire each metric individually (METRIC_TIMEOUT or 60 seconds). We also add a timestamp to each metric to expose its staleness.

In order to keep the timestamp accurate, we now process messages immediately rather than in batches (no more COLLECTING_INTERVAL).

Finally, we introduce DEVICE_TIMEOUT (30 seconds), which controls whether a device is considered "online". Each message from MQTT (even unprocessed) resets this timer.

Some context -- it appears that EcoFlow is trickling updates at a slower rate than previously, which was invoking the "erase everything" logic whenever 10 seconds (COLLECTING_INTERVAL) would pass between messages. Now, we expire each metric individually (METRIC_TIMEOUT or 60 seconds). We also add a timestamp to each metric to expose its staleness. In order to keep the timestamp accurate, we now process messages immediately rather than in batches (no more COLLECTING_INTERVAL). Finally, we introduce DEVICE_TIMEOUT (30 seconds), which controls whether a device is considered "online". Each message from MQTT (even unprocessed) resets this timer.

tris · 2024-07-23T04:11:46Z

Some further testing may be warranted to determine the appropriate value for METRIC_TIMEOUT. In particular I notice the interval is all over the place for this one:

tristan@pts/9.camel/9:08PM% docker compose logs ecoflow_delta2max | grep 'Set mppt.inAmp'
prom-ecoflow_delta2max-1  | 2024-07-23 03:45:38,847 DEBUG   Set mppt.inAmp = 178
prom-ecoflow_delta2max-1  | 2024-07-23 03:46:10,024 DEBUG   Set mppt.inAmp = 230
prom-ecoflow_delta2max-1  | 2024-07-23 03:46:41,251 DEBUG   Set mppt.inAmp = 0
prom-ecoflow_delta2max-1  | 2024-07-23 03:48:40,962 DEBUG   Set mppt.inAmp = 0
prom-ecoflow_delta2max-1  | 2024-07-23 03:50:14,352 DEBUG   Set mppt.inAmp = 148
prom-ecoflow_delta2max-1  | 2024-07-23 03:50:45,580 DEBUG   Set mppt.inAmp = 0
prom-ecoflow_delta2max-1  | 2024-07-23 04:03:41,774 DEBUG   Set mppt.inAmp = 0

(but, this is at least an improvement -- most of the important panels work OK, such as battery SoC)

tris · 2024-07-24T04:03:30Z

Further observation reveals:

Maximum time between metric updates is 15 minutes
Setting METRIC_TIMEOUT=900 makes no difference to the charts (there's still gaps)
Adding --query.lookback-delta=15m to the Prometheus commandline args does connect the dots (at some detriment to the rest of the system)

It's not a pretty solution. I hope someone finds #70 and can shed some light onto how to fix this properly.

EcoFlow does have an official API now (which includes MQTT!), so maybe that'll be a viable direction. (I requested a developer account.) There's a fairly active Facebook group for said API, even including one EcoFlow employee... fingers crossed that they're actually listening to what we need.

aauren · 2024-07-25T15:04:18Z

Wow, you've really gone in depth here @tris!

Thanks for all of the work that you put into understanding this. I think that if I'm reading everything correctly, that it seems like even this PR may not fix all of the issues that you've seen with slowness of messages. Is that accurate?

If so, then it seems like the official API might be the best next path forward as from #70 it doesn't seem like we can forge the right identity to get messages coming in the same way that the official apps do. Does that seem accurate as well?

Also, if we go the official API route, does that mean that each user of the exporter will need to request private developer API credentials in order to run the exporter?

tris · 2024-07-28T19:13:24Z

I think that if I'm reading everything correctly, that it seems like even this PR may not fix all of the issues that you've seen with slowness of messages. Is that accurate?

Yeah, that's accurate. The only real fix here is that we no longer wipe metrics possibly before they even get scraped, so there's a slightly better chance of the metric actually making it to Prometheus (though raising the scrape_interval would alleviate that -- and/or raising the COLLECTING_INTERVAL).

The other "fix" is more a nice-to-have than anything -- putting timestamps on the metrics prevents storing stale or duplicate data in Prometheus, and ensures the stats you're looking at are from the correct time. (I feel this becomes a bit more important in this case where we're only getting updates every 15 minutes.)

If so, then it seems like the official API might be the best next path forward as from #70 it doesn't seem like we can forge the right identity to get messages coming in the same way that the official apps do. Does that seem accurate as well?

Yep, unless someone figures out the signature -- maybe over in tolwi/hassio-ecoflow-cloud or v1ckxy/ecoflow-withoutflow...

Also, if we go the official API route, does that mean that each user of the exporter will need to request private developer API credentials in order to run the exporter?

Yep; this is what I see at https://developer.ecoflow.com/us/verify:

Under review
After becoming a developer, you will be able to customize functions, view, operate, and monitor your own equipment. The staff will complete the review within 5 working days

...so, not like some other developer platforms which issue one token to identify the app/developer while using a separate mechanism to authenticate the user -- they're catering to the hobbyist use case only. (But then, why the manual review for hobbyist accounts?)

bass63 · 2024-07-29T19:00:05Z

I’m sorry, I’m just a docker image user, how do I implement this fix right now? Can someone help?

tris · 2024-07-29T22:25:39Z

I’m sorry, I’m just a docker image user, how do I implement this fix right now? Can someone help?

Easiest for right now is probably:

git clone https://github.com/tris/ecoflow_exporter
cd ecoflow_exporter
docker build -t ecoflow_exporter .

and then use that local image (ecoflow_exporter) instead of the one from ghcr.io. But just as important is adding --query.lookback-delta=15m to your Prometheus args.

bass63 · 2024-08-04T07:55:18Z

I’m sorry, I’m just a docker image user, how do I implement this fix right now? Can someone help?

Easiest for right now is probably:
git clone https://github.com/tris/ecoflow_exporter
cd ecoflow_exporter
docker build -t ecoflow_exporter .
and then use that local image (ecoflow_exporter) instead of the one from ghcr.io. But just as important is adding --query.lookback-delta=15m to your Prometheus args.

Thanks a lot! I did all this. And it worked perfectly for a week. And now the gaps are back again. Maybe timeouts should be extended more? Btw, when this will be committed to the main image?

michikrug · 2024-08-19T20:29:23Z

Just FYI:
I completely rewrote the MQTT part to use the official http API from EcoFlow to get all the data (only PowerStream as this is the only device I own) ... and refactored the rest.
Ofc. you will need a developer account at EcoFlow to get the keys, but other than that this solution is working nicely since three weeks at least.
In addition, I also get proper temperature values for the PowerStream now. :)
P.S. I also use metric based timeouts to not write too many data that is not changing.

https://github.com/michikrug/ecoflow_exporter/blob/api/ecoflow_exporter.py

dmitry-semenov · 2024-08-29T18:58:27Z

I rewrote the exporter to use public API authentication and MQTT. It has worked well for me over the past two days.

I created a fork for this so users with public API access can use it immediately. If you want to migrate from username/password authentication, I'd be happy if you adopt my changes in the original repository.

https://github.com/dmitry-semenov/ecoflow_exporter/blob/master/ecoflow_exporter.py

AndriiMalyna · 2024-10-19T11:21:32Z

I’m sorry, I’m just a docker image user, how do I implement this fix right now? Can someone help?

Easiest for right now is probably:
git clone https://github.com/tris/ecoflow_exporter
cd ecoflow_exporter
docker build -t ecoflow_exporter .
and then use that local image (ecoflow_exporter) instead of the one from ghcr.io. But just as important is adding --query.lookback-delta=15m to your Prometheus args.

hello @tris, first of all, thank you for your work, please let me know if I'm doing something wrong.
followed your instructions on building the image, also ran it. added the specified command to compose.yaml

container_name: prometheus
 commands:
 - '--config.file=/etc/prometheus/prometheus.yml'
 - '--query.lookback-delta=15m'

But I receive an error for ecoflow_exporter container if ecoflow android application is closed

I also noticed that if you do not close the ecoflow application on the phone, but simply collapse it, everything works, at least until the phone unloads the application from the RAM

tris force-pushed the master branch from 9f75096 to 2dac70c Compare July 23, 2024 03:49

This was linked to issues Jul 23, 2024

Device offline half of the time, nothing changed #64

Open

Stops receiving metrics from queue until open Ecoflow App #67

Open

tris mentioned this pull request Jul 23, 2024

Slow trickle of MQTT messages #70

Open

tris requested a review from berezhinskiy July 24, 2024 03:00

tris mentioned this pull request Jul 24, 2024

👀 Looking for maintainers #68

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metric timestamps and metric timeout (fixes #64, #67) #69

Add metric timestamps and metric timeout (fixes #64, #67) #69

tris commented Jul 23, 2024 •

edited

Loading

tris commented Jul 23, 2024

tris commented Jul 24, 2024

aauren commented Jul 25, 2024

tris commented Jul 28, 2024 •

edited

Loading

bass63 commented Jul 29, 2024

tris commented Jul 29, 2024

bass63 commented Aug 4, 2024

michikrug commented Aug 19, 2024 •

edited

Loading

dmitry-semenov commented Aug 29, 2024

AndriiMalyna commented Oct 19, 2024 •

edited

Loading

Add metric timestamps and metric timeout (fixes #64, #67) #69

Are you sure you want to change the base?

Add metric timestamps and metric timeout (fixes #64, #67) #69

Conversation

tris commented Jul 23, 2024 • edited Loading

tris commented Jul 23, 2024

tris commented Jul 24, 2024

aauren commented Jul 25, 2024

tris commented Jul 28, 2024 • edited Loading

bass63 commented Jul 29, 2024

tris commented Jul 29, 2024

bass63 commented Aug 4, 2024

michikrug commented Aug 19, 2024 • edited Loading

dmitry-semenov commented Aug 29, 2024

AndriiMalyna commented Oct 19, 2024 • edited Loading

tris commented Jul 23, 2024 •

edited

Loading

tris commented Jul 28, 2024 •

edited

Loading

michikrug commented Aug 19, 2024 •

edited

Loading

AndriiMalyna commented Oct 19, 2024 •

edited

Loading