Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize and Improve Metadata Validation in the Slice Parser #3165

Merged

Conversation

InsertCreativityHere
Copy link
Member

@InsertCreativityHere InsertCreativityHere commented Nov 18, 2024

This PR introduces all the heavy machinery discussed in #3067, which it is a replacement for.

It still accomplishes the same goal of centralizing the validation of Slice metadata into the Slice library (the parser), instead of making each language do it all from scratch.

But, whereas #3067 used inheritance and method overriding to do the validation, this PR uses a more declarative approach.
It adds a new MetadataInfo struct (to the MetadataValidator header file) which fully describes the information needed to validate a piece of metadata. Our metadata is pretty flexible, so this struct needed quite a couple fields in it to be complete.


The general idea is there is a single visitor in the parser named MetadataValidator.
It takes a map<string, MetadataInfo> (where the key is the directive's name) at construction.

And when the validator is ran, if visits each Slice element the parser has found, goes through their lists of metadata and checks them against the map. If there is no matching key, we report a warning (unless it's for another language of course). If we did find an entry in the map, it will automatically perform all the validation using the information in MetadataInfo.

For some weirder metadata, there is no straightforward way to declaratively state how to validate a piece of metadata, so the struct also has a special std::function extraValidation field, for running truly custom validation after all the normal validation has finished.

I think I was pretty good with writing comments, so I'll stop rambling and let those do the explaining instead.
I would recommend you first review the MetadataValidator.h header file before anything else.

<ClCompile Include="..\Parser.cpp" />
<ClCompile Include="..\Preprocessor.cpp" />
<ClCompile Include="..\Scanner.cpp" />
<ClCompile Include="..\SliceUtil.cpp" />
<ClCompile Include="..\StringLiteralUtil.cpp" />
<ClCompile Include="..\FileTracker.cpp" />
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these files were either duplicates or references to files that no longer existed.
I'm not sure how this happened, or why they haven't been automatically removed.
So I removed them manually.

@@ -136,11 +136,11 @@ namespace
{
string_view directive = meta->directive();
string_view arguments = meta->arguments();
if (directive == "cpp:view-type" && !arguments.empty())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should no longer be performing "on-the-spot" metadata validation like this.

We trust that the metadata validator has done it's thing and any invalid metadata was removed.
No need to double check it's work.

@@ -772,44 +773,19 @@ Slice::Gen::generate(const UnitPtr& p)
}

// Emit #include statements for any 'cpp:include' metadata directives in the top-level Slice file.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea, don't double check the metadata validator.
This also has the added benefit of greatly simplifying the code.

@@ -898,289 +874,184 @@ Slice::Gen::writeExtraHeaders(IceInternal::Output& out)
void
Slice::Gen::validateMetadata(const UnitPtr& u)
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, instead of creating it's own visitor, validateMetadata just creates a map detailing the metadata we care about,
and let's the core MetadataValidator handle the rest.

};
metadataInfo.emplace("cpp:view-type", std::move(viewTypeInfo));

// "cpp:type"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only "messy" thing in this PR is trying to validate the cpp:type metadata.
But it's more a product of the metadata being messy itself I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered splitting the type metadata is 3 separate metadata:
cpp:type:string - does not accept arguments
cpp:type:wstring- does not accept arguments
cpp:type:... - accept a single argument?

with possibly some priority/ordering in the way metadata are validated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did spend some time thinking about it, but decided against adding any mechanisms to handle it because:

A) I believe that this is the only metadata with two disjoint uses-cases like this.
So it's not worth complicating everything else to handle a single strange case.

B) I don't want to encourage this kind of metadata re-use. Realistically, these should just be two separate directives.
But they aren't, and we shouldn't break existing Slice definitions just to make the validation logic cleaner IMO.

The truly painful part is that the string flavors can be placed on containers, but the non-string flavors cannot.
Compare this to cs:generic. Yes cs:generic has a couple 'special' values like cpp:type does, but there's still consistent rules for where/when cs:generic can be used. Unlike cpp:type which has 2 separate rule sets.

}
else if (dynamic_pointer_cast<Sequence>(appliedTo) || dynamic_pointer_cast<Dictionary>(appliedTo) || dynamic_pointer_cast<ClassDecl>(appliedTo))
{
return nullopt; // TODO: I see no reason to support 'cpp:type' on class declarations.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wanted to avoid changing any behavior in this PR, since it also changes alot of other stuff.
After this PR is merged, I'll open a separate PR to remove this, and any of the other weird metadata things that are easier to see now.

@@ -291,12 +291,6 @@ Slice::DefinitionContext::initSuppressedWarnings()
{
_suppressedWarnings.insert(InvalidComment);
}
else
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now handled by the metadata validator.

const list<const type_info*>& validOn = info.validOn;
auto found =
std::find_if(validOn.begin(), validOn.end(), [&](const type_info* t) { return *t == typeid(*appliedTo); });
if (!validOn.empty() && found == validOn.end())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check validOn.empty() before with if (appliedTo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!
We do the check up-front now.
I also moved the find_if in the if statement now that there was extra room.

{
string msg = '\'' + directive + "' metadata cannot be applied to definitions and declarations" +
", it can only be applied directly to types";
p->unit()->warning(metadata->file(), metadata->line(), InvalidMetadata, msg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this warning required, wouldn't this be handled below by appliedTo checks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked this whole section of checks.

}

// Finally, if a custom validation function is specified for this metadata, we run it.
if (info.extraValidation.has_value())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a nullable types like std::function I'lll just use nullptr as not set instead of an optional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other languages have been moving away from using null and preferring explicit optional types.
C++ is not one of them I guess! I removed the optional wrapper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the same in C# a nullable type ? is just the raw reference. The issue I see with using optional for pointer types, is that you still have to check the pointer is not nullptr. C++ doesn't has a non nullable pointer type.

SingleArgument,
/// This metadata can have any number of arguments [0, ∞).
AnyNumberOfArguments,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would insert a blank line between each enumerator, or, alternatively, no blank line. Not a mix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I think having these visual groups helps readability, so you can tell that they kind of go together in a certain way.
In lieu of blank lines, should I add a comment above each group of 3 like:

        // Something something arguments

        /// The metadata takes no arguments.
        NoArguments,

        /// The metadata must have exactly 1 argument.
        SingleArgument,

        /// This metadata can have any number of arguments [0, ∞).
        AnyNumberOfArguments,

        // Something something text

        /// This metadata must have an argument, but that argument is raw text and not validated further.
        RequiredTextArgument,

        /// This metadata can optionally have an argument, but that argument is raw text and not validated further.
        OptionalTextArgument,

Or what's the correct way/style to group related enumerators in C++?

OptionalTextArgument,
};

/// Typedef for functions which can provide additional validation to certain metadata directives.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would name it "alias", since it's definitely not a typedef.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

MetadataArgumentKind acceptedArguments;

/// This field stores the specific values that can be provided as arguments to this metadata.
/// If this field is unset, then we perform no validation of the arguments (ie. arguments can have any value).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the formally correct spelling.
I've always spelt it the colloquial way of ie..
So I went ahead and changed all the ie.s I found in the Slice compiler's files to i.e. to be fully consistent.

WarningInvalidMetadata.ice:48: warning: ignoring unknown metadata: 'bad:unknown'
WarningInvalidMetadata.ice:56: warning: the 'protected' metadata does not take any arguments
WarningInvalidMetadata.ice:56: warning: the 'cpp:array' metadata does not take any arguments
WarningInvalidMetadata.ice:62: warning: 'cpp:type' metadata cannot be applied operations with void return type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message needs to be grammatically correct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good catch. And another one fixed.

WarningInvalidMetadata.ice:56: warning: the 'cpp:array' metadata does not take any arguments
WarningInvalidMetadata.ice:62: warning: 'cpp:type' metadata cannot be applied operations with void return type
WarningInvalidMetadata.ice:65: warning: 'cpp:array' metadata cannot be applied operations with void return type
WarningInvalidMetadata.ice:68: warning: invalid argument 'my_string' supplied to 'cpp:type' metadata in this context.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning ends with period or not? Need to be consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They shouldn't end with periods, I inserted them by habit I guess.
Good catch. I removed this one, and there were other places where I also accidentally added them (also removed).

ValidationFunc extraValidation = nullptr;
};

class MetadataValidator final : public ParserVisitor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the correct API for validation. This visitor is an implementation detail that should not be exposed to "clients" (slice2cpp, slice2java).

A much better API is a simple validateMetadata function on Unit:

// That's what slice2cpp calls:
unit->validateMetadata(...);

Then this MetadataValidator is used in the implementation of Unit::validateMetadata.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I ended up doing was making the visitor inheritance private. It is an implementation detail, and indeed shouldn't be exposed. In it's place I added a single "entry-point" function, but instead of Unit::validateMetadata(MetadataValidator), I added MetadataValidator::validateMetadataWithin(Unit).

For two reasons:

  1. "Calling a function on a validator, and passing in the thing to validate" feels more natural than
    "calling a function on the thing you want to validate, and passing in the validator".
  2. The includes are more logical this way. Currently, MetadataValidator.h includes Parser.h, and that's it (to get all the grammar types). Implementing your suggestions means that we'd have a circular includes (since Unit lives in Parser.h and now needs to take a MetadataValidator parameter). There's ways to address this, but I think keeping the including in a single direction is nicer.

/// If this list is empty we don't perform any automatic checking of whether this metadata is validly applied.
/// Usually, this is because determining validity isn't as straightforward as matching against a list,
/// and requires a more complex approach, which is achieved through providing an `extraValidation` function.
std::list<const std::type_info*> validOn;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the usage, I think it would be better to not use pointers but a std::list<std::reference_wrapper<const std::type_info>>.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this would save us the & address operator from all the lists.
Changed as suggested.

};
metadataInfo.emplace("cpp:view-type", std::move(viewTypeInfo));

// "cpp:type"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered splitting the type metadata is 3 separate metadata:
cpp:type:string - does not accept arguments
cpp:type:wstring- does not accept arguments
cpp:type:... - accept a single argument?

with possibly some priority/ordering in the way metadata are validated?

using namespace Slice;

Slice::MetadataValidator::MetadataValidator(string language, map<string, MetadataInfo> metadataInfo)
: _language(language),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
: _language(language),
: _language(std::move(language)),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good catch, fixed!

cpp/src/Slice/MetadataValidator.cpp Outdated Show resolved Hide resolved
cpp/src/Slice/MetadataValidator.cpp Outdated Show resolved Hide resolved
Comment on lines 338 to 339
string msg = '\'' + directive + "' metadata cannot be applied to definitions and declarations" +
", it can only be applied directly to types";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think users will find this message very confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked this entire section, along with the messages.

}
else if (dynamic_pointer_cast<Builtin>(p))
{
message += "applied to primitive types";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should refer to these as builtin types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

bool seenHeaderExtension = false;
bool seenSourceExtension = false;
bool seenDllExport = false;
map<string, MetadataInfo> metadataInfo;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the variable is poorly chosen. A variable name should match a type name only when the variable is of that type (with possibly qualification/pointer and the like).

Here it should be something more like validationMap.

Copy link
Member Author

@InsertCreativityHere InsertCreativityHere Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to metadataSpecification, because you declaritively specify how the metadata is allowed to be used.

validationMap also felt a little weak to me, because it's just restating the type (ie, this is a map).
It feels more like my original approach, where I had a map of validation functions.

As always, I am open to other names as well though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-renamed it to the simpler knownMetadata.

@externl externl requested a review from Copilot November 19, 2024 20:51

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 14 changed files in this pull request and generated no suggestions.

Files not reviewed (12)
  • cpp/src/Slice/MetadataValidator.h: Language not supported
  • cpp/src/Slice/Parser.cpp: Language not supported
  • cpp/src/Slice/Parser.h: Language not supported
  • cpp/src/Slice/SliceUtil.cpp: Language not supported
  • cpp/src/Slice/Util.h: Language not supported
  • cpp/src/Slice/msbuild/slice.vcxproj: Language not supported
  • cpp/src/Slice/msbuild/slice.vcxproj.filters: Language not supported
  • cpp/src/slice2cpp/CPlusPlusUtil.cpp: Language not supported
  • cpp/src/slice2cpp/Gen.h: Language not supported
  • cpp/test/Slice/errorDetection/WarningInvalidMetadata.err: Language not supported
  • cpp/test/Slice/errorDetection/WarningInvalidMetadata.ice: Language not supported
  • cpp/test/Slice/errorDetection/WarningSuppressInvalidMetadata.ice: Language not supported
bool wasInserted = _seenDirectives.insert(directive).second;
if (!wasInserted)
{
string msg = "ignoring duplicate metadata: '" + directive + "' has already been applied in this context";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this behavior new? Does 3.7 ignore duplicates, or let them override existing entries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the original findMetadata function from 3.7, and it just returns the first instance of metadata it finds, effectively ignoring any duplicates. Throughout the compilers we don't always use this function, instead we'll do "manual lookups". To my knowledge, most of these still have the same logic of picking the first match, but I haven't looked at all of them on 3.7, there may a few strange edge cases that don't do this.

ice/cpp/src/Slice/Parser.cpp

Lines 999 to 1010 in d53f0a9

Slice::Contained::findMetaData(const string& prefix, string& meta) const
{
for(list<string>::const_iterator p = _metaData.begin(); p != _metaData.end(); ++p)
{
if(p->find(prefix) == 0)
{
meta = *p;
return true;
}
}
return false;

@externl externl requested review from externl and removed request for externl November 24, 2024 00:46

/// This field stores the specific values that can be provided as arguments to this metadata.
/// If this field is unset, then we perform no validation of the arguments (i.e. arguments can have any value).
std::optional<StringList> validArgumentValues = std::nullopt;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify that validArgumentValues has an effect only when acceptedArguments is SingleArgument or AnyNumberOfArguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

std::list<std::reference_wrapper<const std::type_info>> validOn;

/// Specifies how many, and what kinds of arguments, this metadata accepts.
MetadataArgumentKind acceptedArguments;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would pick a field name closer to the field type, e.g. argumentKind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed!

Definitions,

/// The metadata can be applied to definitions of types and to where those types are referenced.
/// Ex: both of these are valid: `["java:buffer"] sequence<string> S;` and `["java:buffer"] StringSeq op();`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should include an example for fields, since that's an example of what's beyond Definitions and ParameterTypeReferences.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I changed my operation example to showcase a field instead.

/// (i.e. parser metadata) is always checked.
/// @param knownMetadata A map containing the directives that should be validated, and information describing
/// the various constraints and conditions that should be upheld for it and its arguments.
void validateMetadata(const UnitPtr& p, std::string language, std::map<std::string, MetadataInfo> knownMetadata);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const& since the implementation only reads these parameters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the language to a string_view (since it's preferred over const string&)

But the knownMetadata map was left as-is, since the implementation does actually modify it, and I'd like to avoid making a working copy. Especially since the caller has no use for the map; it exists solely to be consumed by this function.

And for completeness, the function as a whole shouldn't be const because it has side-effects (invalid metadata is filtered out, so the code-gen is assured it will only encounter valid metadata).

@InsertCreativityHere InsertCreativityHere merged commit 4c34c02 into zeroc-ice:main Nov 27, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants