Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metric timestamps and metric timeout (fixes #64, #67) #69

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tris
Copy link
Collaborator

@tris tris commented Jul 23, 2024

Some context -- it appears that EcoFlow is trickling updates at a slower rate than previously, which was invoking the "erase everything" logic whenever 10 seconds (COLLECTING_INTERVAL) would pass between messages.

Now, we expire each metric individually (METRIC_TIMEOUT or 60 seconds). We also add a timestamp to each metric to expose its staleness.

In order to keep the timestamp accurate, we now process messages immediately rather than in batches (no more COLLECTING_INTERVAL).

Finally, we introduce DEVICE_TIMEOUT (30 seconds), which controls whether a device is considered "online". Each message from MQTT (even unprocessed) resets this timer.

Some context -- it appears that EcoFlow is trickling updates at a slower
rate than previously, which was invoking the "erase everything" logic
whenever 10 seconds (COLLECTING_INTERVAL) would pass between messages.

Now, we expire each metric individually (METRIC_TIMEOUT or 60 seconds).
We also add a timestamp to each metric to expose its staleness.

In order to keep the timestamp accurate, we now process messages
immediately rather than in batches (no more COLLECTING_INTERVAL).

Finally, we introduce DEVICE_TIMEOUT (30 seconds), which controls
whether a device is considered "online".  Each message from MQTT (even
unprocessed) resets this timer.
@tris
Copy link
Collaborator Author

tris commented Jul 23, 2024

Some further testing may be warranted to determine the appropriate value for METRIC_TIMEOUT. In particular I notice the interval is all over the place for this one:

tristan@pts/9.camel/9:08PM% docker compose logs ecoflow_delta2max | grep 'Set mppt.inAmp'
prom-ecoflow_delta2max-1  | 2024-07-23 03:45:38,847 DEBUG   Set mppt.inAmp = 178
prom-ecoflow_delta2max-1  | 2024-07-23 03:46:10,024 DEBUG   Set mppt.inAmp = 230
prom-ecoflow_delta2max-1  | 2024-07-23 03:46:41,251 DEBUG   Set mppt.inAmp = 0
prom-ecoflow_delta2max-1  | 2024-07-23 03:48:40,962 DEBUG   Set mppt.inAmp = 0
prom-ecoflow_delta2max-1  | 2024-07-23 03:50:14,352 DEBUG   Set mppt.inAmp = 148
prom-ecoflow_delta2max-1  | 2024-07-23 03:50:45,580 DEBUG   Set mppt.inAmp = 0
prom-ecoflow_delta2max-1  | 2024-07-23 04:03:41,774 DEBUG   Set mppt.inAmp = 0

(but, this is at least an improvement -- most of the important panels work OK, such as battery SoC)

@tris
Copy link
Collaborator Author

tris commented Jul 24, 2024

Further observation reveals:

  • Maximum time between metric updates is 15 minutes
  • Setting METRIC_TIMEOUT=900 makes no difference to the charts (there's still gaps)
  • Adding --query.lookback-delta=15m to the Prometheus commandline args does connect the dots (at some detriment to the rest of the system)

It's not a pretty solution. I hope someone finds #70 and can shed some light onto how to fix this properly.

EcoFlow does have an official API now (which includes MQTT!), so maybe that'll be a viable direction. (I requested a developer account.) There's a fairly active Facebook group for said API, even including one EcoFlow employee... fingers crossed that they're actually listening to what we need.

@aauren
Copy link
Collaborator

aauren commented Jul 25, 2024

Wow, you've really gone in depth here @tris!

Thanks for all of the work that you put into understanding this. I think that if I'm reading everything correctly, that it seems like even this PR may not fix all of the issues that you've seen with slowness of messages. Is that accurate?

If so, then it seems like the official API might be the best next path forward as from #70 it doesn't seem like we can forge the right identity to get messages coming in the same way that the official apps do. Does that seem accurate as well?

Also, if we go the official API route, does that mean that each user of the exporter will need to request private developer API credentials in order to run the exporter?

@tris
Copy link
Collaborator Author

tris commented Jul 28, 2024

I think that if I'm reading everything correctly, that it seems like even this PR may not fix all of the issues that you've seen with slowness of messages. Is that accurate?

Yeah, that's accurate. The only real fix here is that we no longer wipe metrics possibly before they even get scraped, so there's a slightly better chance of the metric actually making it to Prometheus (though raising the scrape_interval would alleviate that -- and/or raising the COLLECTING_INTERVAL).

The other "fix" is more a nice-to-have than anything -- putting timestamps on the metrics prevents storing stale or duplicate data in Prometheus, and ensures the stats you're looking at are from the correct time. (I feel this becomes a bit more important in this case where we're only getting updates every 15 minutes.)

If so, then it seems like the official API might be the best next path forward as from #70 it doesn't seem like we can forge the right identity to get messages coming in the same way that the official apps do. Does that seem accurate as well?

Yep, unless someone figures out the signature -- maybe over in tolwi/hassio-ecoflow-cloud or v1ckxy/ecoflow-withoutflow...

Also, if we go the official API route, does that mean that each user of the exporter will need to request private developer API credentials in order to run the exporter?

Yep; this is what I see at https://developer.ecoflow.com/us/verify:

Under review
After becoming a developer, you will be able to customize functions, view, operate, and monitor your own equipment. The staff will complete the review within 5 working days

...so, not like some other developer platforms which issue one token to identify the app/developer while using a separate mechanism to authenticate the user -- they're catering to the hobbyist use case only. (But then, why the manual review for hobbyist accounts?)

@bass63
Copy link

bass63 commented Jul 29, 2024

I’m sorry, I’m just a docker image user, how do I implement this fix right now? Can someone help?

@tris
Copy link
Collaborator Author

tris commented Jul 29, 2024

I’m sorry, I’m just a docker image user, how do I implement this fix right now? Can someone help?

Easiest for right now is probably:

git clone https://github.com/tris/ecoflow_exporter
cd ecoflow_exporter
docker build -t ecoflow_exporter .

and then use that local image (ecoflow_exporter) instead of the one from ghcr.io. But just as important is adding --query.lookback-delta=15m to your Prometheus args.

@bass63
Copy link

bass63 commented Aug 4, 2024

I’m sorry, I’m just a docker image user, how do I implement this fix right now? Can someone help?

Easiest for right now is probably:

git clone https://github.com/tris/ecoflow_exporter
cd ecoflow_exporter
docker build -t ecoflow_exporter .

and then use that local image (ecoflow_exporter) instead of the one from ghcr.io. But just as important is adding --query.lookback-delta=15m to your Prometheus args.

Thanks a lot! I did all this. And it worked perfectly for a week. And now the gaps are back again. Maybe timeouts should be extended more? Btw, when this will be committed to the main image?
IMG_0573

@michikrug
Copy link

michikrug commented Aug 19, 2024

Just FYI:
I completely rewrote the MQTT part to use the official http API from EcoFlow to get all the data (only PowerStream as this is the only device I own) ... and refactored the rest.
Ofc. you will need a developer account at EcoFlow to get the keys, but other than that this solution is working nicely since three weeks at least.
In addition, I also get proper temperature values for the PowerStream now. :)
P.S. I also use metric based timeouts to not write too many data that is not changing.

https://github.com/michikrug/ecoflow_exporter/blob/api/ecoflow_exporter.py


Screenshot 2024-08-19 at 22 31 54 Screenshot 2024-08-19 at 22 32 08 Screenshot 2024-08-19 at 22 32 16

@dmitry-semenov
Copy link

I rewrote the exporter to use public API authentication and MQTT. It has worked well for me over the past two days.

I created a fork for this so users with public API access can use it immediately. If you want to migrate from username/password authentication, I'd be happy if you adopt my changes in the original repository.

https://github.com/dmitry-semenov/ecoflow_exporter/blob/master/ecoflow_exporter.py

@AndriiMalyna
Copy link

AndriiMalyna commented Oct 19, 2024

I’m sorry, I’m just a docker image user, how do I implement this fix right now? Can someone help?

Easiest for right now is probably:

git clone https://github.com/tris/ecoflow_exporter
cd ecoflow_exporter
docker build -t ecoflow_exporter .

and then use that local image (ecoflow_exporter) instead of the one from ghcr.io. But just as important is adding --query.lookback-delta=15m to your Prometheus args.

hello @tris, first of all, thank you for your work, please let me know if I'm doing something wrong.
followed your instructions on building the image, also ran it. added the specified command to compose.yaml

container_name: prometheus
 commands:
 - '--config.file=/etc/prometheus/prometheus.yml'
 - '--query.lookback-delta=15m'

But I receive an error for ecoflow_exporter container if ecoflow android application is closed
image
I also noticed that if you do not close the ecoflow application on the phone, but simply collapse it, everything works, at least until the phone unloads the application from the RAM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stops receiving metrics from queue until open Ecoflow App Device offline half of the time, nothing changed
6 participants