-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace non-utf8 characters in bodies #228
Changes from 1 commit
fd65cf0
b5d0a5b
29a0e55
104cd9d
5999ec5
bba5bc1
a4838c5
37c9426
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,10 +3,13 @@ package bodyattribute // import "github.com/hypertrace/goagent/sdk/instrumentati | |
import ( | ||
"encoding/base64" | ||
"fmt" | ||
|
||
"github.com/hypertrace/goagent/sdk" | ||
"strings" | ||
"unicode/utf8" | ||
) | ||
|
||
const utf8Replacement = "�" | ||
|
||
// SetTruncatedBodyAttribute truncates the body and sets the body as a span attribute. | ||
// When body is being truncated, we also add a second attribute suffixed by `.truncated` to | ||
// make it clear to the user, body has been modified. | ||
|
@@ -48,7 +51,12 @@ func SetBodyAttribute(attrName string, body []byte, truncated bool, span sdk.Spa | |
return | ||
} | ||
|
||
span.SetAttribute(attrName, string(body)) | ||
bodyStr := string(body) | ||
if !utf8.ValidString(bodyStr) { | ||
bodyStr = strings.ToValidUTF8(bodyStr, utf8Replacement) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a string conversion from bytes and then a utf8 replace. Might become an expensive one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll check perf There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Validation should atleast use the right method without the conversion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, updating to use utf8.Valid method shows similar perf. (unfortunately not good in either case)
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried this, the non-conversion one is still faster if we just look at valid utf's case. I think due the number of replacements needed in case of invalid one's, we do not see any advantage. This is on arm though.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm thinking about this now, I don't like the perf implication that we need to do the validation at the cost of few cases where invalid bytes are there. @prodion23 let's just try to fix this at the source in case we somehow corrupted it ourselves. Truncation is one part and seems like otel SDK has a way to do UTF8 aware truncation now: open-telemetry/opentelemetry-go#3156. Did you also observe that for valid UTF-8 chars (with foreign or multibyte chars) we're also seeing this error like @tim-mwangi says? |
||
} | ||
|
||
span.SetAttribute(attrName, bodyStr) | ||
// if already truncated then set attribute | ||
if truncated { | ||
span.SetAttribute(fmt.Sprintf("%s.truncated", attrName), true) | ||
|
@@ -63,7 +71,12 @@ func SetEncodedBodyAttribute(attrName string, body []byte, truncated bool, span | |
return | ||
} | ||
|
||
span.SetAttribute(attrName+".base64", base64.RawStdEncoding.EncodeToString(body)) | ||
bodyStr := string(body) | ||
if !utf8.ValidString(bodyStr) { | ||
bodyStr = strings.ToValidUTF8(bodyStr, utf8Replacement) | ||
} | ||
|
||
span.SetAttribute(attrName+".base64", base64.RawStdEncoding.EncodeToString([]byte(bodyStr))) | ||
// if already truncated then set attribute | ||
if truncated { | ||
span.SetAttribute(fmt.Sprintf("%s.truncated", attrName), true) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I am not sure if substituting this sequence of bytes is sufficient to fix these errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it should resolve any errors from invalid bytes being in the request or response body from the protobuf perspective? (at least I see this sample export successfully now)
Or are you thinking there are other cases where the replace wouldn't actually replace all invalid byte sequence?
If they appear in headers we'd still run into exception
(but not sure how likely that is for us to be concerned of right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to understand what causes this invalid byte sequence.