Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate package metadata #22

Open
ericmj opened this issue May 29, 2018 · 23 comments
Open

Validate package metadata #22

ericmj opened this issue May 29, 2018 · 23 comments

Comments

@ericmj
Copy link
Member

ericmj commented May 29, 2018

Validate package metadata against spec errors, do not include superfluous fields and do not allow type errors.

We can also add separate, optional functions to validate package metadata for warnings, we can pass it a list of policies that can be used, like (hexpm/public/private). We could handle the tarball size validation the same way to not be locked into hexpm specifics.

NOTE: Do not include maintainers in the package metadata.

@jadeallenx
Copy link

What are superfluous fields? Like if I add <<"dog">> => <<"cat">> to the map in the .app.src file, you're saying prune that field out?

@ericmj
Copy link
Member Author

ericmj commented May 30, 2018

Correct.

@jadeallenx
Copy link

Do you mean modify the underlying file as it's stored in hex or just ignore it for purposes of hex?

@ericmj
Copy link
Member Author

ericmj commented May 30, 2018

I am talking about the package metadata. The package metadata is built from app.src, but we don't care about the actual contents of app.src inside the package files. When we build the package metadata from the configuration in either mix.exs or app.src we should include relevant fields for the metadata.

@jadeallenx
Copy link

ОК, thank you for clarifying!

@wojtekmach
Copy link
Member

When we have basic validations (required fields, type checks etc) we could add more validations on top:

  • check that links are valid URLs
  • check that version follows semver.org
  • check that included license is a valid spdx.org license identifier (proof of concept)

etc. We should work on these separately, just mentioning these as examples of things we may want to consider.

@wojtekmach
Copy link
Member

Another validation:

@starbelly
Copy link
Contributor

Aye, so the validation code that just made it into rebar3_hex (erlef/rebar3_hex#85) really should probably be in hex_core. If agreed, I'll put something together in a few days.

@ericmj
Copy link
Member Author

ericmj commented Jan 4, 2019

It would be great if you could contribute this to hex_core as well. This is what hex validates [1], I haven't looked at the rebar3_hex PR yet.

[1] https://github.com/hexpm/hex/blob/master/lib/mix/tasks/hex.build.ex#L165

@wojtekmach
Copy link
Member

Note, these are two aspects of validating package metadata:

  • when creating the tarball
  • when unpacking the tarball

Let's focus on validations on creation in this issue for now.

@starbelly
Copy link
Contributor

I just released 0.1.0 of https://github.com/starbelly/verl . For the moment, it's just semver parsing, but will be adding the rest later. Whether hex_core wants to vendor this in or use as dep or has other plans is the question. After that decision is made I can do a PR that validates the version.

@starbelly
Copy link
Contributor

starbelly commented Jan 9, 2019

semver aside... what should we do when validation errors have accumulated prior to creating a tarball? I see @ericmj was saying deprecated metadata should be ignored, as it is server side. So it sounds like we don't want to end in error if the maintainers field exists and everything else is ok? If so, I'm thinking we should return

 #{
    errors => [{type, MetaData}, {type, MetaData}], 
    warnings => [{type, MetaData}, {type, MetaData}]
 }

... when they exist and the caller can choose to treat the warnings as errors or not, etc. Then the caller can possibly pass the whole result back to hex_core for formatting errors and warnings using a format_error/1 or could iterate the lists themselves and call format_error/2 per error/warning, possibly selectively.

@ericmj
Copy link
Member Author

ericmj commented Jan 9, 2019

That sounds like a good idea. That way we can also better control the deprecation process.

@wojtekmach
Copy link
Member

wojtekmach commented Jan 9, 2019

I think we have two options:

  1. a separate validate functions that returns the map above. This means that the caller would first have to validate metadata prior to calling hex_tarball:create

  2. run validations on create. This means that on success we return the tarball and warnings (but not errors) and on failure we return warnings and errors.

    Maybe something like this?

    -spec create(metadata(), files()) ->
      {ok, {tarball(), checksum(), Warnings :: [{atom(), term()}]} |
      {error, #{errors => [{binary(), term()}], warnings => [{binary(), term()}]

    Examples:

    > hex_tarball:create(Metadata, Files).
    {ok, {Tarball, Checksum, [{<<"maintainers">>, deprecated}]}}
    
    > hex_tarball:create(#{<<"foo">> => <<"bar">>}, Files).
    {ok, {Tarball, Checksum, [{<<"foo">>, unknown_field}]}}
    
    > hex_tarball:create(#{<<"version">> => <<"1.0.bad">>}}, Files).
    {error, #{errors => [{<<"version">>, invalid}], warnings => []}}
    
    > hex_tarball:create(#{<<"licenses">> => <<"">>}}, Files).
    {error, #{errors => [{<<"licenses">>, invalid}], warnings => []}}

I lean towards option 2. What do you think?

Something to keep in mind is that later we'd want to handle hex_tarball:unpack metadata errors in the same way. There are also non-metadata errors that can be returned, e.g. we call: hex_tarball:create(Metadata, [{'foo.erl', '/non_existing'}]) - should it return a different error tuple? (Currently it crashes.)

@ericmj
Copy link
Member Author

ericmj commented Jan 9, 2019

Yeah, 2. sounds even better.

@starbelly
Copy link
Contributor

@wojtekmach Fully agreed. Will get started on this.

@jadeallenx
Copy link

I think unknown fields should just be treated as non-fatal warnings. There are cases where repo maintainers who aren't hex might want to put in their own fields for various reasons. I think rejecting them outright as invalid is unhelpful.

@starbelly
Copy link
Contributor

starbelly commented Jan 9, 2019

I think that's what we're agreeing on. Treating unknown or deprecated fields as fatal will result in a bad UX. Though, we don't want to dictate what users of hex_core do ... and really can not. rebar3_hex is currently treating some as fatal, and once this is in hex_core I will most definitely do a PR to change that.

@wojtekmach
Copy link
Member

@mrallen1 yup, that was the plan. I now see that I gave incorrect example for this scenario, I've updated the 2nd example in code snippet above to make it clear that it's a successful response albeit with a warning.

@jadeallenx
Copy link

@wojtekmach Thanks! That's what I was confused about.

@ericmj
Copy link
Member Author

ericmj commented Jan 10, 2019

Deprecated fields should not produce errors until they are hard-deprecated, which may be never. Note that we don't even warn about the deprecated maintainers field yet.

I think unknown fields should hard error or warn and be filtered before they are added to the metadata. Repo maintainers shouldn't use new fields to the package metadata because there is no guarantee any clients will support it, which means they need to use their own repos and their own clients at which point I am not sure it is Hex anymore.

If you want to add custom metadata we have the extra metadata field for this reason https://github.com/hexpm/specifications/blob/master/package_metadata.md#format.

@jadeallenx
Copy link

@ericmj I understand that perspective but I don't think people are going to "write their own clients" -they're going to piggyback on the infrastructure that's built into consumers of this library like rebar3. It's possible that people might create a service which mimics some endpoints of hex but isn't hex for their own use "behind the firewall" or for private deployment scenarios.

If hard errors are the default, please allow people to turn that behavior off.

@ericmj
Copy link
Member Author

ericmj commented Jan 10, 2019

If they are going to use existing clients then there are no guarantees custom metadata fields is going to work since it's not following the spec. Why not use the extra field that was actually built for this purpose?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants