Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate events coming from lambda workers #140

Open
cursedquail opened this issue Mar 1, 2024 · 1 comment
Open

Duplicate events coming from lambda workers #140

cursedquail opened this issue Mar 1, 2024 · 1 comment
Labels
type: bug Something isn't working

Comments

@cursedquail
Copy link
Contributor

Versions

  • Lambda runtime: al2023
  • Lambda extension: 11.2.3, arm64

Steps to reproduce

I have no steps to reproduce, only solid evidence that this is happening.

image

Invoke is the top-level span name for a lambda invocation - while there may be many of them for a given trace, there should only be one for a given span :). If I zoom into any of these, I can see that they appear to be exactly the same event - same timestamp, columns, data, etc.

Our lambda pipeline should be pretty standard, we're:

  • configuring beeline to write events to stdout
  • running the lambda extension, which points directly at the honeycomb api

Additional context

I suspect that what's happening here is the following:

  • logs api yeets some logs at the extension process
  • extension process receives them, turns them into events, and gives them to libhoney
  • libhoney queues the events up to be sent to honeycomb, and makes the http request to the api
  • because libhoney is acting asynchronously to the logs http server, the logs http server returns
  • lambda freezes the extension while the events are in flight
  • lambda wakes up the extension some time later
  • from the extensions POV, the upstream API just timed out (it has no way of knowing that it was frozen). So it retries!
  • Some of those retries work, and we get duplicate events.

If I'm right, then I think what's needed is to synchronously flush the events after each "batch", thus ensuring libhoney does it's work while the logs server is processing the event.

@cursedquail cursedquail added the type: bug Something isn't working label Mar 1, 2024
@robbkidd robbkidd added the status: oncall Flagged for awareness from Honeycomb Telemetry Oncall label Mar 1, 2024
@JamieDanielson
Copy link
Contributor

ℹ️ FYI for oncall folks, find a bit of additional context in internal slack channel

@MikeGoldsmith MikeGoldsmith removed the status: oncall Flagged for awareness from Honeycomb Telemetry Oncall label Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants