-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track replication lag in otel metrics and log warning when it gets too high #2031
Comments
@kevin-dp and I have been discussing the best way to measure the "replication lag". There are multiple options:
Regarding the warning, can that be setup in Honeycomb? Or do we want a configuration option for Electric to specify when it needs to log. I imagine the threshold will be different for different clients based on their database write rate and other parameters. |
Why not do both? I think it is useful to know the amount of bytes that are pending in the WAL and what is the latency for electric to write a transaction into the log |
Yeah both sound great. I didn't know we could get the actual time diff so didn't mention it but that'd be great to have as well. Yeah warnings from your otel collector is a lot more flexible — people can already set warnings from postgres data directly. We can potentially add warnings directly in the Electric logs as well but a good sequencing of work is to first gather the data and then later decide exactly how to communicate it. |
Fixes #2031. - exports the replication lag in bytes as a metric to Prometheus - also creates a span including the replication lag in milliseconds for every transaction ### Note on clock drift The replication lag in milliseconds may be affected by clock drift between Electric and Postgres. This may occur because Electric and Postgres may be running on different machines and we compare the transaction's commit timestamp (generated by PG) to Electric's timestamp at the time of writing the transaction to the shape log.
We really need visibility into this.
The text was updated successfully, but these errors were encountered: