-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Centralize and Improve Metadata Validation in the Slice Parser #3165
Centralize and Improve Metadata Validation in the Slice Parser #3165
Conversation
Co-authored-by: Jose <[email protected]>
<ClCompile Include="..\Parser.cpp" /> | ||
<ClCompile Include="..\Preprocessor.cpp" /> | ||
<ClCompile Include="..\Scanner.cpp" /> | ||
<ClCompile Include="..\SliceUtil.cpp" /> | ||
<ClCompile Include="..\StringLiteralUtil.cpp" /> | ||
<ClCompile Include="..\FileTracker.cpp" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these files were either duplicates or references to files that no longer existed.
I'm not sure how this happened, or why they haven't been automatically removed.
So I removed them manually.
@@ -136,11 +136,11 @@ namespace | |||
{ | |||
string_view directive = meta->directive(); | |||
string_view arguments = meta->arguments(); | |||
if (directive == "cpp:view-type" && !arguments.empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should no longer be performing "on-the-spot" metadata validation like this.
We trust that the metadata validator has done it's thing and any invalid metadata was removed.
No need to double check it's work.
@@ -772,44 +773,19 @@ Slice::Gen::generate(const UnitPtr& p) | |||
} | |||
|
|||
// Emit #include statements for any 'cpp:include' metadata directives in the top-level Slice file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same idea, don't double check the metadata validator.
This also has the added benefit of greatly simplifying the code.
@@ -898,289 +874,184 @@ Slice::Gen::writeExtraHeaders(IceInternal::Output& out) | |||
void | |||
Slice::Gen::validateMetadata(const UnitPtr& u) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, instead of creating it's own visitor, validateMetadata
just creates a map detailing the metadata we care about,
and let's the core MetadataValidator
handle the rest.
}; | ||
metadataInfo.emplace("cpp:view-type", std::move(viewTypeInfo)); | ||
|
||
// "cpp:type" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only "messy" thing in this PR is trying to validate the cpp:type
metadata.
But it's more a product of the metadata being messy itself I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered splitting the type metadata is 3 separate metadata:
cpp:type:string
- does not accept arguments
cpp:type:wstring
- does not accept arguments
cpp:type:...
- accept a single argument?
with possibly some priority/ordering in the way metadata are validated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did spend some time thinking about it, but decided against adding any mechanisms to handle it because:
A) I believe that this is the only metadata with two disjoint uses-cases like this.
So it's not worth complicating everything else to handle a single strange case.
B) I don't want to encourage this kind of metadata re-use. Realistically, these should just be two separate directives.
But they aren't, and we shouldn't break existing Slice definitions just to make the validation logic cleaner IMO.
The truly painful part is that the string flavors can be placed on containers, but the non-string flavors cannot.
Compare this to cs:generic
. Yes cs:generic
has a couple 'special' values like cpp:type
does, but there's still consistent rules for where/when cs:generic
can be used. Unlike cpp:type
which has 2 separate rule sets.
} | ||
else if (dynamic_pointer_cast<Sequence>(appliedTo) || dynamic_pointer_cast<Dictionary>(appliedTo) || dynamic_pointer_cast<ClassDecl>(appliedTo)) | ||
{ | ||
return nullopt; // TODO: I see no reason to support 'cpp:type' on class declarations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wanted to avoid changing any behavior in this PR, since it also changes alot of other stuff.
After this PR is merged, I'll open a separate PR to remove this, and any of the other weird metadata things that are easier to see now.
@@ -291,12 +291,6 @@ Slice::DefinitionContext::initSuppressedWarnings() | |||
{ | |||
_suppressedWarnings.insert(InvalidComment); | |||
} | |||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now handled by the metadata validator.
cpp/src/Slice/MetadataValidator.cpp
Outdated
const list<const type_info*>& validOn = info.validOn; | ||
auto found = | ||
std::find_if(validOn.begin(), validOn.end(), [&](const type_info* t) { return *t == typeid(*appliedTo); }); | ||
if (!validOn.empty() && found == validOn.end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check validOn.empty()
before with if (appliedTo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
We do the check up-front now.
I also moved the find_if
in the if
statement now that there was extra room.
cpp/src/Slice/MetadataValidator.cpp
Outdated
{ | ||
string msg = '\'' + directive + "' metadata cannot be applied to definitions and declarations" + | ||
", it can only be applied directly to types"; | ||
p->unit()->warning(metadata->file(), metadata->line(), InvalidMetadata, msg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this warning required, wouldn't this be handled below by appliedTo checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked this whole section of checks.
cpp/src/Slice/MetadataValidator.cpp
Outdated
} | ||
|
||
// Finally, if a custom validation function is specified for this metadata, we run it. | ||
if (info.extraValidation.has_value()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a nullable types like std::function
I'lll just use nullptr
as not set instead of an optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other languages have been moving away from using null
and preferring explicit optional
types.
C++ is not one of them I guess! I removed the optional
wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the same in C# a nullable type ?
is just the raw reference. The issue I see with using optional for pointer types, is that you still have to check the pointer is not nullptr. C++ doesn't has a non nullable pointer type.
cpp/src/Slice/MetadataValidator.h
Outdated
SingleArgument, | ||
/// This metadata can have any number of arguments [0, ∞). | ||
AnyNumberOfArguments, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would insert a blank line between each enumerator, or, alternatively, no blank line. Not a mix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, I think having these visual groups helps readability, so you can tell that they kind of go together in a certain way.
In lieu of blank lines, should I add a comment above each group of 3 like:
// Something something arguments
/// The metadata takes no arguments.
NoArguments,
/// The metadata must have exactly 1 argument.
SingleArgument,
/// This metadata can have any number of arguments [0, ∞).
AnyNumberOfArguments,
// Something something text
/// This metadata must have an argument, but that argument is raw text and not validated further.
RequiredTextArgument,
/// This metadata can optionally have an argument, but that argument is raw text and not validated further.
OptionalTextArgument,
Or what's the correct way/style to group related enumerators in C++?
cpp/src/Slice/MetadataValidator.h
Outdated
OptionalTextArgument, | ||
}; | ||
|
||
/// Typedef for functions which can provide additional validation to certain metadata directives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would name it "alias", since it's definitely not a typedef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
cpp/src/Slice/MetadataValidator.h
Outdated
MetadataArgumentKind acceptedArguments; | ||
|
||
/// This field stores the specific values that can be provided as arguments to this metadata. | ||
/// If this field is unset, then we perform no validation of the arguments (ie. arguments can have any value). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the formally correct spelling.
I've always spelt it the colloquial way of ie.
.
So I went ahead and changed all the ie.
s I found in the Slice compiler's files to i.e.
to be fully consistent.
WarningInvalidMetadata.ice:48: warning: ignoring unknown metadata: 'bad:unknown' | ||
WarningInvalidMetadata.ice:56: warning: the 'protected' metadata does not take any arguments | ||
WarningInvalidMetadata.ice:56: warning: the 'cpp:array' metadata does not take any arguments | ||
WarningInvalidMetadata.ice:62: warning: 'cpp:type' metadata cannot be applied operations with void return type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message needs to be grammatically correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another good catch. And another one fixed.
WarningInvalidMetadata.ice:56: warning: the 'cpp:array' metadata does not take any arguments | ||
WarningInvalidMetadata.ice:62: warning: 'cpp:type' metadata cannot be applied operations with void return type | ||
WarningInvalidMetadata.ice:65: warning: 'cpp:array' metadata cannot be applied operations with void return type | ||
WarningInvalidMetadata.ice:68: warning: invalid argument 'my_string' supplied to 'cpp:type' metadata in this context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning ends with period or not? Need to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They shouldn't end with periods, I inserted them by habit I guess.
Good catch. I removed this one, and there were other places where I also accidentally added them (also removed).
cpp/src/Slice/MetadataValidator.h
Outdated
ValidationFunc extraValidation = nullptr; | ||
}; | ||
|
||
class MetadataValidator final : public ParserVisitor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the correct API for validation. This visitor is an implementation detail that should not be exposed to "clients" (slice2cpp, slice2java).
A much better API is a simple validateMetadata function on Unit:
// That's what slice2cpp calls:
unit->validateMetadata(...);
Then this MetadataValidator is used in the implementation of Unit::validateMetadata
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I ended up doing was making the visitor inheritance private. It is an implementation detail, and indeed shouldn't be exposed. In it's place I added a single "entry-point" function, but instead of Unit::validateMetadata(MetadataValidator)
, I added MetadataValidator::validateMetadataWithin(Unit)
.
For two reasons:
- "Calling a function on a validator, and passing in the thing to validate" feels more natural than
"calling a function on the thing you want to validate, and passing in the validator". - The includes are more logical this way. Currently,
MetadataValidator.h
includesParser.h
, and that's it (to get all the grammar types). Implementing your suggestions means that we'd have a circular includes (sinceUnit
lives inParser.h
and now needs to take aMetadataValidator
parameter). There's ways to address this, but I think keeping the including in a single direction is nicer.
cpp/src/Slice/MetadataValidator.h
Outdated
/// If this list is empty we don't perform any automatic checking of whether this metadata is validly applied. | ||
/// Usually, this is because determining validity isn't as straightforward as matching against a list, | ||
/// and requires a more complex approach, which is achieved through providing an `extraValidation` function. | ||
std::list<const std::type_info*> validOn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the usage, I think it would be better to not use pointers but a std::list<std::reference_wrapper<const std::type_info>>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this would save us the &
address operator from all the lists.
Changed as suggested.
}; | ||
metadataInfo.emplace("cpp:view-type", std::move(viewTypeInfo)); | ||
|
||
// "cpp:type" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered splitting the type metadata is 3 separate metadata:
cpp:type:string
- does not accept arguments
cpp:type:wstring
- does not accept arguments
cpp:type:...
- accept a single argument?
with possibly some priority/ordering in the way metadata are validated?
cpp/src/Slice/MetadataValidator.cpp
Outdated
using namespace Slice; | ||
|
||
Slice::MetadataValidator::MetadataValidator(string language, map<string, MetadataInfo> metadataInfo) | ||
: _language(language), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: _language(language), | |
: _language(std::move(language)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good catch, fixed!
cpp/src/Slice/MetadataValidator.cpp
Outdated
string msg = '\'' + directive + "' metadata cannot be applied to definitions and declarations" + | ||
", it can only be applied directly to types"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think users will find this message very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked this entire section, along with the messages.
cpp/src/Slice/MetadataValidator.cpp
Outdated
} | ||
else if (dynamic_pointer_cast<Builtin>(p)) | ||
{ | ||
message += "applied to primitive types"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should refer to these as builtin types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
cpp/src/slice2cpp/Gen.cpp
Outdated
bool seenHeaderExtension = false; | ||
bool seenSourceExtension = false; | ||
bool seenDllExport = false; | ||
map<string, MetadataInfo> metadataInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the variable is poorly chosen. A variable name should match a type name only when the variable is of that type (with possibly qualification/pointer and the like).
Here it should be something more like validationMap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to metadataSpecification
, because you declaritively specify how the metadata is allowed to be used.
validationMap
also felt a little weak to me, because it's just restating the type (ie, this is a map).
It feels more like my original approach, where I had a map of validation functions.
As always, I am open to other names as well though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-renamed it to the simpler knownMetadata
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 2 out of 14 changed files in this pull request and generated no suggestions.
Files not reviewed (12)
- cpp/src/Slice/MetadataValidator.h: Language not supported
- cpp/src/Slice/Parser.cpp: Language not supported
- cpp/src/Slice/Parser.h: Language not supported
- cpp/src/Slice/SliceUtil.cpp: Language not supported
- cpp/src/Slice/Util.h: Language not supported
- cpp/src/Slice/msbuild/slice.vcxproj: Language not supported
- cpp/src/Slice/msbuild/slice.vcxproj.filters: Language not supported
- cpp/src/slice2cpp/CPlusPlusUtil.cpp: Language not supported
- cpp/src/slice2cpp/Gen.h: Language not supported
- cpp/test/Slice/errorDetection/WarningInvalidMetadata.err: Language not supported
- cpp/test/Slice/errorDetection/WarningInvalidMetadata.ice: Language not supported
- cpp/test/Slice/errorDetection/WarningSuppressInvalidMetadata.ice: Language not supported
cpp/src/Slice/MetadataValidator.cpp
Outdated
bool wasInserted = _seenDirectives.insert(directive).second; | ||
if (!wasInserted) | ||
{ | ||
string msg = "ignoring duplicate metadata: '" + directive + "' has already been applied in this context"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this behavior new? Does 3.7 ignore duplicates, or let them override existing entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the original findMetadata
function from 3.7, and it just returns the first instance of metadata it finds, effectively ignoring any duplicates. Throughout the compilers we don't always use this function, instead we'll do "manual lookups". To my knowledge, most of these still have the same logic of picking the first match, but I haven't looked at all of them on 3.7, there may a few strange edge cases that don't do this.
Lines 999 to 1010 in d53f0a9
Slice::Contained::findMetaData(const string& prefix, string& meta) const | |
{ | |
for(list<string>::const_iterator p = _metaData.begin(); p != _metaData.end(); ++p) | |
{ | |
if(p->find(prefix) == 0) | |
{ | |
meta = *p; | |
return true; | |
} | |
} | |
return false; |
5113b75
to
ae6c99f
Compare
fb5e346
to
5c14563
Compare
|
||
/// This field stores the specific values that can be provided as arguments to this metadata. | ||
/// If this field is unset, then we perform no validation of the arguments (i.e. arguments can have any value). | ||
std::optional<StringList> validArgumentValues = std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify that validArgumentValues has an effect only when acceptedArguments is SingleArgument or AnyNumberOfArguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
cpp/src/Slice/MetadataValidation.h
Outdated
std::list<std::reference_wrapper<const std::type_info>> validOn; | ||
|
||
/// Specifies how many, and what kinds of arguments, this metadata accepts. | ||
MetadataArgumentKind acceptedArguments; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would pick a field name closer to the field type, e.g. argumentKind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also fixed!
cpp/src/Slice/MetadataValidation.h
Outdated
Definitions, | ||
|
||
/// The metadata can be applied to definitions of types and to where those types are referenced. | ||
/// Ex: both of these are valid: `["java:buffer"] sequence<string> S;` and `["java:buffer"] StringSeq op();`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should include an example for fields, since that's an example of what's beyond Definitions and ParameterTypeReferences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I changed my operation example to showcase a field instead.
cpp/src/Slice/MetadataValidation.h
Outdated
/// (i.e. parser metadata) is always checked. | ||
/// @param knownMetadata A map containing the directives that should be validated, and information describing | ||
/// the various constraints and conditions that should be upheld for it and its arguments. | ||
void validateMetadata(const UnitPtr& p, std::string language, std::map<std::string, MetadataInfo> knownMetadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const& since the implementation only reads these parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the language
to a string_view
(since it's preferred over const string&
)
But the knownMetadata
map was left as-is, since the implementation does actually modify it, and I'd like to avoid making a working copy. Especially since the caller has no use for the map; it exists solely to be consumed by this function.
And for completeness, the function as a whole shouldn't be const
because it has side-effects (invalid metadata is filtered out, so the code-gen is assured it will only encounter valid metadata).
This PR introduces all the heavy machinery discussed in #3067, which it is a replacement for.
It still accomplishes the same goal of centralizing the validation of Slice metadata into the Slice library (the parser), instead of making each language do it all from scratch.
But, whereas #3067 used inheritance and method overriding to do the validation, this PR uses a more declarative approach.
It adds a new
MetadataInfo
struct (to theMetadataValidator
header file) which fully describes the information needed to validate a piece of metadata. Our metadata is pretty flexible, so this struct needed quite a couple fields in it to be complete.The general idea is there is a single visitor in the parser named
MetadataValidator
.It takes a
map<string, MetadataInfo>
(where the key is the directive's name) at construction.And when the validator is ran, if visits each Slice element the parser has found, goes through their lists of metadata and checks them against the map. If there is no matching key, we report a warning (unless it's for another language of course). If we did find an entry in the map, it will automatically perform all the validation using the information in
MetadataInfo
.For some weirder metadata, there is no straightforward way to declaratively state how to validate a piece of metadata, so the struct also has a special
std::function extraValidation
field, for running truly custom validation after all the normal validation has finished.I think I was pretty good with writing comments, so I'll stop rambling and let those do the explaining instead.
I would recommend you first review the
MetadataValidator.h
header file before anything else.