Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data-classification.md extension #1317

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

rob-sessink
Copy link

Provides an extension where an event source can annotate an event with
information around data classification of an event and its payload. CloudEvents
may contain payload which is subjected to data protection regulations like GDPR
or HIPAA. For intermediaries and consumers knowing how event payload is
classified enables compliant processing of an event.

Adds an extension with attributes:

  • dataclassification (Required). Data classification level of an event and
    payload within the context of a data protection regulation.
  • dataregulation (Optional). Applicable data protection regulation.
  • datacategory (Optional). Data category of the event payload within the
    context of data classification and data protection regulation.

`confidential`, `restricted`.
- Constraints:
- REQUIRED
- SHOULD be applicable to data protection regulation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on what this "SHOULD" means? What does someone need to do (from a coding perspective) to adhere to this "SHOULD"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SHOULD statement is merely meant as an indication towards event producers that the data classification label should have its origin within the applicable data-regulation. But maybe this is stating the obvious and from a coding perspective not relevant. Being already stated in the description, it does not add value. I will remove it

`datacategory` attributes MAY be set to provide additional details on the
classification context.

Intermediaries and consumers SHOULD take these attributes into account and act
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this "SHOULD" should be a "MUST" instead? Should a consumer reject a request if it can't meet the data regulation requirements? Are clients expecting some kind of guarantee? Meaning, a non-error means "yup, got it and it'll be protected appropriately". Although, extensions can be ignored... maybe it would need to be worded like: "If an implementation supports this extension, then it MUST reject the event if it can not adhere to the requirements of the specified data classification attributes" ??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises an interesting possibility, which is too late for v1 but could be interesting in a future version: if an event could say "consumers/intermediaries must understand extensions x, y and z, and must otherwise reject/ignore the event" then we could be stricter. (So that would be an attribute that's part of the main spec, but the values of which would be names of extension attributes.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes changing this section to be more prescriptive towards consumers is warranted. When an implementation supports this extension, an event MUST be handled in a compliant manner or otherwise MUST be rejected/ignored.

I will adjust the phrasing.

@duglin
Copy link
Collaborator

duglin commented Nov 13, 2024

Can you update the README in the "extensions" dir too?

- Type: `String`
- Description: Data classification level for the event payload within the
context of a `dataregulation`. Typical labels are: `public`, `internal`,
`confidential`, `restricted`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect these values are probably defined by the data regulations being adhered to, but since dataregulation is optional, should this spec define some recommended values for cases where it's missing to provide some consistency?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I feel that is a good approach. I did not want to make the dataregulation attribute required as I feel this is supportive information and not directly mandatory for processing. My intent is that usage of this extension should be as light as possible, meaning less required attributes as possible.

What do you think of:

Description: Data classification level for the event payload within the context of a dataregulation. In a situation where dataregulation is undefined, recommended labels are: public, internal, confidential, or restricted.

cloudevents/extensions/data-classification.md Outdated Show resolved Hide resolved
cloudevents/extensions/data-classification.md Show resolved Hide resolved
`datacategory` attributes MAY be set to provide additional details on the
classification context.

Intermediaries and consumers SHOULD take these attributes into account and act
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises an interesting possibility, which is too late for v1 but could be interesting in a future version: if an event could say "consumers/intermediaries must understand extensions x, y and z, and must otherwise reject/ignore the event" then we could be stricter. (So that would be an attribute that's part of the main spec, but the values of which would be names of extension attributes.)

@@ -0,0 +1,89 @@
# Data Classification Extension

CloudEvents might contain payload which is subjected to data protection
Copy link
Collaborator

@duglin duglin Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/payload/payloads/ & s/is/are/
or
s/payload/a payload/
but I prefer the former

CloudEvents might contain payload which is subjected to data protection
regulations like GDPR or HIPAA. For intermediaries and consumers knowing how
event payload is classified, which data protection regulation applies and how
payload is categorized, enables compliant processing of an event.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/payload is/payloads are/

- Description: Data classification level for the event payload within the
context of a `dataregulation`. In situations where `dataregulation` is
undefined or the data protection regulation does not define any labels, then
recommended labels are: `public`, `internal`, `confidential`, or
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/recommended/RECOMMENDED/

For example: `GDPR`, `HIPAA`, `PCI-DSS`, `ISO-27001`, `NIST-800-53`, `CCPA`.
- Constraints:
- OPTIONAL
- if present, MUST be a non-empty string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anal but... since it's a string with commas that need to be parsed, we may want to add something like "leading and trailing spaces around each entry MUST be ignored. Spaces within an entry MAY exist but MUST be reduced down to a single space for comparison purposes". Are spaces allowed in the entries?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct spaces are not allowed

Examples where data classification of events can be useful are:

- When an event contains PII or restricted information and therefore processing
by intermediaries or consumers MUST adhere to certain policies. For example
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/MUST/need to/ since this isn't a normative section, it's just examples.

@duglin
Copy link
Collaborator

duglin commented Nov 20, 2024

Not an expert in this space but it LGTM with the minor edits I just commented on.
Thanks for the work on this!

@duglin
Copy link
Collaborator

duglin commented Nov 20, 2024

@rob-sessink while not 100% necessary, can you rebase this on the latest 'main' branch so that the tests will run successfully for you?

Rob Sessink added 4 commits November 20, 2024 16:42
…README.md and usage of MUST keyword in example use case

-

Signed-off-by: Rob Sessink <[email protected]>
…bels, remove 'applicability constraints', extend usage section.

-

Signed-off-by: Rob Sessink <[email protected]>
@duglin
Copy link
Collaborator

duglin commented Nov 20, 2024

woo hoo - tests again! thanks for the rebase.
still LGTM

Ping @jskeet for another look

Signed-off-by: Rob Sessink <[email protected]>

- Type: `String`
- Description: A comma-delimited list of applicable data protection regulations.
For example: `GDPR`, `HIPAA`, `PCI-DSS`, `ISO-27001`, `NIST-800-53`, `CCPA`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this potentially goes down a rabbit-hole of trying to maintain catalogs but is there value is formalizing some of the regulation codes or referencing some well-known external catalog (if one exists).

In addition, does the applicability of some of these regulations vary by jurisdiction? if so, does that need to be represented in some fashion ?

Copy link
Contributor

@jskeet jskeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Jem's concerns are reasonable, but I don't know an appropriate resolution. (It may be that there's already a standards body defining these.)

When an implementation supports this extension, then intermediaries and
consumers MUST take these attributes into account and act accordingly to data
regulations and/or internal policies in processing the event and payload. If
intermediaries or consumers cannot meet such requirements, they MUST reject or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should a client do if they know "support" this extension, but see values they don't know about (e.g. a new dataregulation value)? We may want to adjust this to say "if you don't know you can meet requirements, you should assume you can't".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants