Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEVPROD-4511 Remove db logging from honeycomb #7940

Closed
wants to merge 2 commits into from

Conversation

bynn
Copy link
Contributor

@bynn bynn commented May 30, 2024

DEVPROD-4511

Description

don't log db statements in honeycomb because it could leak secrets

What removing WithCommandAttributeDisabled() does is set our monitor to the default behavior of no longer adding MongoDB commands as attributes

According to @ybrill , that was the only reason we use a copy of the open telemetry library in anser and we could go back to using the original if we don't plan on using this feature again. I don't believe that is necessary at the moment so no follow up tickets have been made. Similarly, no ticket has been created to re-enable this span after sanitizing our commands because it may not be necessary.

Testing

no longer logs db.statement in staging
image

Copy link
Contributor

@ablack12 ablack12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about honeycomb; would you mind putting some words about how this prevents logging db.statement? (Also, I think a comment would be good, so we don't later think this change was just an oversight)

@bynn bynn requested a review from ablack12 May 31, 2024 23:21
@bynn
Copy link
Contributor Author

bynn commented May 31, 2024

Added some comments and updated the description

@khelif96
Copy link
Contributor

khelif96 commented Jun 1, 2024

Sorry for the drive by review. But is it possible to reconsider this or keep it enabled for traces stemming from a graphql query? Having the db statement is invaluable for debugging performance issues or debugging problematic queries that can't be easily replicated. I was under the impression that refinery instance already redacts secrets from db queries? Cc @ybrill

@ybrill
Copy link
Contributor

ybrill commented Jun 3, 2024

The application omits db.statement for certain collections and then refinery does some rudimentary scrubbing.

@@ -368,7 +368,8 @@ func (e *envState) initDB(ctx context.Context, settings DBSettings, tracer trace
opts := options.Client().ApplyURI(settings.Url).SetWriteConcern(settings.WriteConcernSettings.Resolve()).
SetReadConcern(settings.ReadConcernSettings.Resolve()).
SetConnectTimeout(5 * time.Second).
SetMonitor(apm.NewMonitor(apm.WithCommandAttributeDisabled(false), apm.WithCommandAttributeTransformer(redactSensitiveCollections)))
// Removing WithCommandAttributeDisabled(false) to disable db command monitoring.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should be a bit more general -- this is a bit more info about what happened in the PR, but in the future someone will look at this comment and say, "huh"? Perhaps "Command monitoring not disabled to avoid logging redacted secrets" or etc.

@ybrill do you have any suggestions for how to do what Mohamed is proposing, or how to securely log db.statement? We may need some advice on this, since this relates to honeycomb setup.

Copy link
Contributor

@ybrill ybrill Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I once thought a tenable middle ground could be to have the logging parse the db.statement as json and strip the leaf values (like the strings and whatever that don't have children). I never attempted to implement it, though.

Keeping it for just graphql traces could also work, though that would have to be in the collector, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if db.statement without any values would still be useful though?
I'm not super sure on what you mean by "would have to be in the collector" could I ask you to explain that part

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true: the query plan could be different with different values.

Sorry, for not being clear in the first place 😅. In the application, when we're making the span for the database call the only thing we know about our lineage is id of our parent and the overall trace id. So we don't have a way to know whether the trace we're part of is a graphql one. On the other hand, if we wanted to drop the attribute once the span arrives at the collector we could have the application add a flag attribute to all the child spans of a graphql request and when we see a db span with that flag then we'll know to keep the db.statement.

@bynn
Copy link
Contributor Author

bynn commented Jun 4, 2024

Upon discussing this ticket further with @ybrill and seeing that @khelif96 both have recently used this span, I decided to close this ticket as won't do. The data sanitization work done previously seems to mitigate most security risks.

@ablack12 let me know if you have any opinions

@bynn bynn closed this Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants