-
Notifications
You must be signed in to change notification settings - Fork 1
Validators Extensions
Discussion related to https://github.com/SNIA/CDMI-spec/issues/291.
1.1 Overview
Some CDMI systems allow validation to be performed against CDMI objects. In such a system, multiple validations may be performed simultaneously against the same or multiple objects. In these systems, the client receives an indication of validation failure on object creation or modification, with validation success either resulting in normal HTTP responses being returned, or an indication of validation failure annotated as object metadata.
This extension proposes a new type of data object to define validations on object creation and modification. The validation data object (extended in a similar manner as a query queue object) may be used to define validations independently from the objects on which the validator is acting.
Validating existing objects is performed using CDMI jobs.
1.2 Instructions to the Editor
To merge this extension into the CDMI 2.0.0 specification, make the following changes:
- Insert into preamble/terms.txt, as follows:
x.x validator |br| a data object with specific metadata that defines and manages validation operations performed against matching newly created and updated CDMI objects (validation targets) |br|
x.x validation operation |br| the process of evaluating a validation schema against a validation target |br|
x.x validation schema |br| metadata that describes the organization and format of CDMI objects |br|
x.x validation scope |br| metadata that defines which validation targets validation operations are performed against. |br|
x.x validation targets |br| the set of CDMI objects against which validation operations are performed |br|
- Add an entry to the end of the table starting on line 135 of cdmi_advanced/cdmi_capability_object.txt, as follows:
Table 1: System-wide capabilities Capability name Type Definition cdmi_validators JSON string If present and “true”, the CDMI server supports validation data objects. cdmi_validators_global_container JSON string If present, contains the URI for the container for all validator data objects in the CDMI server.
- Add an entry to the end of the table starting on line 612 of cdmi_advanced/cdmi_capability_object.txt, as follows:
Table 2: Capabilities for data objects Capability name Type Definition cdmi_validator_schema_formats JSON string If present, contains a list of schema formats that may be specified a validator data objects.
Schema formats are media types as specified in RFC 6838.
Currently defined schema formats include: • application/schema+json
- Add an entry to the end of the table starting on line 662 of cdmi_advanced/cdmi_capability_object.txt, as follows:
Table 3: Capabilities for container objects Capability name Type Definition cdmi_create_validator_container JSON string If present and “true”, indicates that the container allows the creation of validator container objects. cdmi_create_validator_dataobject JSON string If present and “true”, indicates that the container allows the creation of validator data objects.
- Add an entry to the end of the table starting on line 216 of cdmi_advanced/cdmi_metadata.txt, as follows:
Table 4: Data system metadata Metadata name Type Definition cdmi_validation_schema JSON array of JSON strings Contains one or more a validation schemas that are to be applied against the object.
- Add an entry to the end of the table starting on line 533 of cdmi_advanced/cdmi_metadata.txt, as follows:
Table 5: Data system metadata Metadata name Type Definition cdmi_validation_schema_provided JSON array of JSON strings For schemas specified in cdmi_validation_schema, contains the JSON path to the validation schema.
For schemas specified in a validator object, contains the URI to the validator data object where the validation schema is specified. cdmi_validation_result_provided JSON array of JSON strings Contains an indication of the validation result for each validation schema that is applied against the object.
Supported values are “passed”, “failed”, “skipped”, and “unsupported”.
- Create new clause, “cdmi_validators.txt” after existing clause 25 “Data Object Versions”, as follows.
Clause 2
A cloud storage system may optionally implement object validation functionality. Validator implementation is indicated by the presence of the cloud storage system-wide capabilities for validators, and requires support for CDMI data objects.
Validators allow the evaluation of schemas on object creation and modification. In addition, multiple validators may perform validation actions against a single CDMI object. By creating a well-defined “validator” object, clients may define validators, specify the schema to be used to perform the validation, and specify which objects the validation is to be performed against.
Validators may be stored in container objects or may exist as standalone data objects with no parent container.
Cloud storage systems should consider implementing support for validator data objects when the system supports the following types of client-controlled activities:
- Data format consistency: If the user requires CDMI objects to conform to a given schema in order to ensure data consistency, the user may define a validator to prevent non-conformant objects. For example, this allows the user to specify that created data objects shall have a value that validates against a given schema.
- Metadata presence and values: If the user requires CDMI objects’ metadata to conform to a given schema in order to specify metadata constraints, the user may define a validator to prevent non-conformant objects. For example, this allows the user to specify that created data objects shall have a metadata value greater than one for the cdmi_data_redundency data system metadata.
- Limiting object types: If a user requires the limitation of what types of objects can be created, the user may define a validator to prevent the creation of non-conformant objects. For example, this allows the user to specify that created data objects shall have a mimetype that validates against a given schema.
- Limiting use of CDMI features: If a user requires the limitation of which CDMI creation and modification features are to be exposed, the user may define a validator to prevent the specification of non-desired CDMI features. For example, this allows the user to specify that created data objects cannot specify deserialization sources.
When a client wishes to create a validator data object, it may first check if the system is capable of providing validation functionality by checking for the presence of the cdmi_validator capability in the root container capabilities. If this capability is not present, creating a validator data object shall be successful, but no validation operations shall be performed.
Validators may be created by CDMI clients and CDMI server internal processes.
Examples of validators created by CDMI clients and internal system processes may include:
- Validating supported data object media types
- Validating supported data object value contents
- Validating presence of and supported metadata values
- Validating CDMI options specified on create or update
CDMI clients may create validators through a variety of methods:
A client may create a validator data object without specifying the location by performing a POST operation. In this case, the system shall create the validator in a validator container and return an HTTP response code of 202 Accepted. The URI for the newly created validator data object shall be returned in an HTTP Location response header.
A client may create a validator data object at a specific location by performing a PUT operation. Only containers with a cdmi_create_validator_container capability shall allow validator data objects to be created. The semantics for this are the same as other data objects.
A client may view and access validators created by internal system processes through the validator container. To get a list of system-created validators, clients may list the children of the container.
When a client creates a validator data object, the presence of the metadata item cdmi_validation_schema indicates that the data object represents a validator.
Metadata, including cdmi_validation_schema metadata item may be changed by a client. If the cdmi_validation_schema metadata item is removed, that indicates that the validator data object shall no longer result in validations being performed; instead, it shall be treated as a regular CDMI data object by the CDMI server.
The metadata items for a validator data object are shown in Table 6:
Table 6: Validator data object metadata Metadata name Type Definition Requirement cdmi_validation_schema JSON array of JSON strings Contains one or more self-contained validation schemas that are to be applied against the object.
Each of these schemas must be in a format specified by the cdmi_validator_schema_formats data object capability. Mandatory cdmi_validation_mark JSON string If true, indicates that validation failures shall be permitted. Optional cdmi_scope_specification JSON string The scope specification determines which objects are included in the query results. This scope specification is similar to a “WHERE” clause in SQL¬like languages. To query all objects, specify an empty JSON array. See Clause 19 for how to construct a scope specification. Optional
A validator object that includes a cdmi_scope_specification shall not have a value.
A validator object that does not include a cdmi_scope_specification shall have the specified cdmi_validation_schema applied against the object, including the value.
EXAMPLE 1: A CDMI object that includes a validator:
{
"metadata": {
"cdmi_validation_schema" : "..."
}
}
This validation schema will be evaluated against the object when it is created, and each time it is updated.
Example 2: A CDMI object that acts as a validator against all matching objects:
{
"metadata": {
"cdmi_validation_schema" : "...",
"cdmi_scope_specification" : [
{
"parentURI" : "starts /sandbox/"
}
}
}
}
This validation schema will be evaluated against all objects that match the scope specification.
Example 3: A CDMI object that acts as a validator against all matching objects, and marks objects that don't validate successfully:
{
"metadata": {
"cdmi_validation_schema" : "...",
"cdmi_validation_mark" : "true",
"cdmi_scope_specification" : [
{
"parentURI" : "starts /sandbox/"
}
}
}
}
This validation schema will be evaluated against all objects that match the scope specification, and will mark each object with an indication of the validation result, rather than allowing or denying validation results.
Example of validation pass:
{
"objectType" : "application/cdmiobject",
"objectID" : "00007ED90010D891022876A8DE0BC0FD",
"objectName" : "myDataObject.txt",
"parentURI" : "/sandbox/",
"parentID" : "00007E7F00102E230ED82694DAA975D2",
"domainURI" : "/cdmi_domains/MyDomain/",
"capabilitiesURI" : "/cdmi_capabilities/dataobject/",
"completionStatus" : "Complete",
"mimetype" : "text/plain",
"metadata": {
"cdmi_size" : "37",
"cdmi_validation_schema_provided" : [
"/cdmi_objectid/00007E7F00102E230ED82694DAA975D2"
],
"cdmi_validation_result_provided" : [
"passed"
]
}
}
Example of validation failure:
{
"objectType" : "application/cdmiobject",
"objectID" : "00007ED90010D891022876A8DE0BC0FD",
"objectName" : "myDataObject.txt",
"parentURI" : "/sandbox/",
"parentID" : "00007E7F00102E230ED82694DAA975D2",
"domainURI" : "/cdmi_domains/MyDomain/",
"capabilitiesURI" : "/cdmi_capabilities/dataobject/",
"completionStatus" : "Complete",
"mimetype" : "text/plain",
"metadata": {
"cdmi_size" : "37",
"cdmi_validation_schema_provided" : [
"/cdmi_objectid/00007E7F00102E230ED82694DAA975D2"
],
"cdmi_validation_result_provided" : [
"failed"
],
"cdmi_validation_result_details_provided" : [
"<schema standard-specific output>"
]
}
}
Should you be able to specify separate creation and modification schemas?
Need to add mechanism that validator specifies if the create/modify operation is failed at the HTTP level, or annotated with metadata.
Need to include an example of how to not apply an existing validation to a bulk update without having to delete the validator (e.g. normal transactions continue to be validated, but creates with a metadata flag aren’t validated because we “know” the are already validated).
Media types for schemas seems to be missing. Should we be requesting these be added to the IANA registry? https://www.iana.org/assignments/media-types/media-types.xhtml
application/schema+xml application/json+xml See https://github.com/orgs/json-schema-org/discussions/198
Do we need containers, where the contents of the container is based on a query scope? E.g. show me all jobs?
Provide some guidance around the two ways of allowing validators to be created (mirroring what is currently done for queries)
- Specify, via capabilities, a container where validators are allowed to be created.
- Allow validators to be created inline in the namespace in any location.
If you have an validator that matches against external objects (e.g. via path). If that path is deleted, the validator will not match against any objects.
Queries have been implemented in-namespace.
But most implementations of jobs are in a dedicated namespace. I would expect the same for validators and transforms.
We should write some implementation guidance for special objects (query, jobs, validators, etc). We may want to provide some guidance for how to map these into jobs.
Also should write about using CDMI queues as source points and sink points for streaming processing, e.g. Kafka, mqtt, etc. Those require exports.
Add note about using query queues to have a running list of all objects that fail validation. Provide example.
E.g. Web form submit to container, validator checks form against schema, query queue finds all objects where validation fails, export on query queue to Kafka, Kafka endpoint sends e-mail telling submitter to fix their submission.