Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds custom marshal/unmarshal to Go Statement struct, fixes #363, see sigstore/sigstore-go#326 #403

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

gillisandrew
Copy link

@gillisandrew gillisandrew commented Nov 1, 2024

This PR adds custom marshaler and unmarshaler to the statement struct to help users avoid incorrectly serializing in-toto statements. This fixes #363 and stems from a discussion in sigstore/sigstore-go#326

@gillisandrew gillisandrew requested a review from a team as a code owner November 1, 2024 15:04
Signed-off-by: Andrew Gillis <[email protected]>
@codysoyland
Copy link

This gets a +1 from me!

@adityasaky
Copy link
Member

cc @pxp928 @TomHennen @marcelamelara

Copy link
Contributor

@marcelamelara marcelamelara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending this @gillisandrew . It seems like the need for these custom functions stems from the use of the soon-to-be deprecated in-toto-golang data structures (hence why #363 was closed).

Are these added functions needed for backward compatibility with pre-SLSA v1 Provenance attestations?

@codysoyland
Copy link

I think #363 was closed erroneously, as the issue here is mostly unrelated to the in-toto-golang data structures. We encountered the same bug in sigstore-go after migrating to the data structures in this repo. The issue is specifically about how the new data structures serialize to JSON. When using the new data structures, when you pass them to json.Marshal, the output has field names that are not consistent with the in-toto specification (e.g. predicate_type instead of predicateType). When using protobuf types, the default serialization always assumes snake_case instead of camelCase. This PR works around that fact by explicitly using protojson for marshal/unmarshal, which produces the correct JSON format that is consistent with all other tooling that works with in-toto.

@gillisandrew
Copy link
Author

@marcelamelara as cody noted, the structs in this repo are the issue, the field names emitted in the json struct tags are incorrect.

There's some background and alternative solutions in golang/protobuf#52

@SantiagoTorres
Copy link
Member

ugh, CI complains about golang 1.21, but I can confirm that wfm locally...

@adityasaky
Copy link
Member

we should update the run-go-tests.yml to use a newer version of the toolchain.

@codysoyland
Copy link

we should update the run-go-tests.yml to use a newer version of the toolchain.

That can be updated to always use what's in go.mod using the go-version-file input.

@marcelamelara
Copy link
Contributor

@codysoyland and @gillisandrew thanks for the clarifications.

I'm fine with the tests you've added, thank you! But my preference would be to find an alternative way to emit structs with the right json tags from the start, rather than adding custom marshalling/unmarshalling functions because these will be needed in more places than statement.go.

@marcelamelara
Copy link
Contributor

Does this seem like a reasonable alternative? https://github.com/favadi/protoc-go-inject-tag

@codysoyland
Copy link

This has unfortunately been an issue with protoc-gen-go for years (see golang/protobuf#52).

Looking at other workarounds... maybe this package could help?

@marcelamelara
Copy link
Contributor

Looking at other workarounds... maybe this package could help?

Thanks for the suggestion @codysoyland . My only concern with this package is their "we don't accept issues with bug reports or PRs with bug fixes" clause at the bottom of their README. I worry this package will be quite brittle compared to one that has a bigger community around it.

@codysoyland
Copy link

Looking at other workarounds... maybe this package could help?

Thanks for the suggestion @codysoyland . My only concern with this package is their "we don't accept issues with bug reports or PRs with bug fixes" clause at the bottom of their README. I worry this package will be quite brittle compared to one that has a bigger community around it.

Good callout, the one you posted looks promising!

@marcelamelara
Copy link
Contributor

I'd be a bit more comfortable with a plugin like this one, or the tool I shared earlier.

@gillisandrew
Copy link
Author

@marcelamelara given that encoding/json isn't necessarily compatible with protocol buffer messages1, updating the field names in json tags isn't enough. In fact, suppressing the json tags entirely could help by making the bug less subtle.

Protojson should be used, be it by the end users or included in a custom marshal/unmarshal method. Given the limited number of structs in the library, I feel manually implementing the latter isn't especially onerous and avoids requiring additional tooling and dependencies.

Footnotes

  1. https://github.com/golang/protobuf/issues/1285#issuecomment-791987062

@gillisandrew
Copy link
Author

@SantiagoTorres should I include the CI fix in the PR?

@marcelamelara let me know if I should remove the implementation so the updated tests can be merged, perhaps moving the discussion to an issue.

@marcelamelara
Copy link
Contributor

Protojson should be used, be it by the end users or included in a custom marshal/unmarshal method. Given the limited number of structs in the library, I feel manually implementing the latter isn't especially onerous and avoids requiring additional tooling and dependencies.

Concerns about whether or not re-tagging the json field name in the Go bindings fully resolves the issues you are currently encountering aside, we (as the upstream) repo can certainly take on one additional tool/dependency if it means that "it will just work" for downstream. (I get that it's never quite so simple ;) )

At the same time, I don't see a problem with prescribing the use of protojson for our Go bindings, and we can offer custom marshalling functions for v1 structs as a quick workaround.

So really, I'd like to ultimately see both changes (what you've sent here + the corresponding functions in resource_descriptor.go) and the proto-gen-go changes (possibly in a different PR).

And yes, if you have a fix for the CI, please feel free to include that in this PR.

@TomHennen @pxp928 please chime in.

@gillisandrew
Copy link
Author

gillisandrew commented Nov 20, 2024

@marcelamelara Yes, ideally downstream users won't have to worry about this too much. Having delved into ProtoJSON a bit further it seems the unmarshal may be too permissive here given that both proto field names and lowerCamelCase (or json_name) names are accepted. Overly permissive given that by the time the statement is unmarshaled the original field name isn't accessible for validation meaning invalid statements (i.e. ones using type instead of _type) are passing.

@gillisandrew gillisandrew force-pushed the feat-go-statement-json-marshaler branch from 8f2ad55 to a64ec07 Compare November 21, 2024 14:28
@marcelamelara
Copy link
Contributor

Having delved into ProtoJSON a bit further it seems the unmarshal may be too permissive here given that both proto field names and lowerCamelCase (or json_name) names are accepted. Overly permissive given that by the time the statement is unmarshaled the original field name isn't accessible for validation meaning invalid statements (i.e. ones using type instead of _type) are passing.

@gillisandrew Interesting. So what you're saying is that protojson unmarshalling only behaves how we need it to if we have marshalled the struct with protojson as well. Or is there more to it? I'm thinking we may want to document a warning about this at the very least.

@gillisandrew
Copy link
Author

@marcelamelara Sorry I could have explained that a little more clearly. The issue with using protojson to unmarshal (json -> struct) before validation is that it will accept either the JSON name (lowerCamelCase of field name unless json_name is specified) or field name (matching proto field name), but not both

That behavior is specified in the standard:

Parsers accept both the lowerCamelCase name (or the one specified by the json_name option) and the original proto field name.

This is an issue because the in-toto spec doesn't accept both (yet the generated libs do). Suppose a user parses an validates a struct with the following:

func parseStatementJSON(jsonText []byte) (*v1.Statement, error) {
	s := &v1.Statement{}
	// Step 1: Unmarshal
	err := protojson.Unmarshal(jsonText, s)
	if err != nil {
		return nil, fmt.Errorf("failed to unmarshal statement: %w", err)
	}
	// Step 2: Validate
	err = s.Validate()
	if err != nil {
		return nil, fmt.Errorf("statement validation failed: %w", err)
	}
	// Step 3: Profit???
	return s, nil
}

They would (rightfully) expect that the statement satisfies the statement layer spec, but since protojson is used invalid statements that use proto field names pass validation e.g.

{
  "type": "https://in-toto.io/Statement/v1",
  "subject": [
    {
      "name": "sub name",
      "digest": {
        "sha256": "2ce2e480e3c3f7ca0af83418d3ebaeedacee135dbac94bd946d7d84edabcdb64"
      }
    }
  ],
  "predicate_type": "https://example.com/unknownPred2",
  "predicate": {
    "foo": {
      "bar": "baz"
    }
  }
}

Because s.Validate() doesn't have access to the original json key name, there is no way to validate it at that point.

Further, since protojson doesn't allow both the JSON name and the field name, it means any proto field name is essentially "reserved".

Perhaps a good solution, killing two birds with one stone, is to provide a method to validate and unmarshal the json. Reflection can identify field name that don't match the JSON name and return an error when one is encountered. Or, going full circle, updating the json struct tags to match the spec and using encoding/json to handle the unmarshal (ensuring the cases it can't handle aren't included in protos).

@marcelamelara
Copy link
Contributor

marcelamelara commented Nov 25, 2024

@gillisandrew I appreciate the more detailed explanation.

Perhaps a good solution, killing two birds with one stone, is to provide a method to validate and unmarshal the json. Reflection can identify field name that don't match the JSON name and return an error when one is encountered. Or, going full circle, updating the json struct tags to match the spec and using encoding/json to handle the unmarshal (ensuring the cases it can't handle aren't included in protos).

I will summarize what I think you're suggesting here:

  1. Provide unmarshal functions that use encoding/json (rather than the overly permissive protojson).
  2. Add a ValidateJson() method for Statements and RDs that handles the unmarshalling and checks that the schema is followed.
  3. Generate the language bindings with the right json struct tags.

For 1), I think it's not unreasonable to add this functionality to the existing Validate() methods, but these suggestions make sense to me otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support predicateType: Field predicateType Renamed to predicate_type in Statement Struct
5 participants