Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stix Difficulties: Generate Object ID from hashed Object contents #62

Open
terrymacdonald opened this issue Dec 2, 2015 · 3 comments
Open

Comments

@terrymacdonald
Copy link

PROBLEM

There has been a few people who have mentioned that they would like to create Object IDs from a hash of the Object contents. The argument is that this would help during deduplication of CybOX Objects, as the content would be the same if multiple Objects were detected, and duplication would be easy to detect.

POTENTIAL ANSWER

Current list consensus seems to be that this should be permitted as a way of generating the ID, but that this shouldn’t be mandated as the only way that it is generated. It was posited that some lower powered devices may not have enough processing power to be able to generate a hash, and therefore mandating hash generation of the data would exclude them.

I believe that this should be mandated, as it provides a quick way of determining if the content was inadvertently modified during transit. As the hash is not a HMAC it does not provide malicious tampering detection (although this change would allow it to be supported in the future).

@jmgnc
Copy link

jmgnc commented Dec 3, 2015

Are you saying that it should be mandated that the object id be a hash of the data?

I disagree that embedded devices don't have enough power to generate a hash. If the device is sub-100MHz, it's doubtful to be speaking STIX directly, and if it is, hashing the object contents isn't that expensive.

The hardest part of this is defining the correct serialization method for how to hash the data (due to whitespace issues, etc) such that when it gets reformatted, that the hash does not change.

Requiring the object id be the hash of the contents seems to break the ability to update some of the higher level objects w/o having to go regenerate all the lower level objects, which could create a massive cascading issue of updates.

@terrymacdonald
Copy link
Author

Hi John,

It only breaks updates of objects if we still allow use of the Incremental
Update mechanism. The Incremental Update mechanism requires the Object IDs
to stay the same, and the timestamp to change. I've proposed in Issue #64
that we only allow Major Updates, and stop using Incremental Updates. This
will ensure that all updates explicitly relate themselves to the previous
version of the object, removing all abiguity, and allowing us to generate
the ID based on the content. This will also allow us to use the Object ID
as a form of checksum to make sure the data within the Object maps to the
ID it has. It will mean that no-one will be able to modify data within the
Object with a particular Object ID in transit.

In other words, if every object shared is immutable (via the ID be related
to the content) then we can avoid some of the problems we currently have.

Cheers
Terry MacDonald

On 3 December 2015 at 11:58, John-Mark Gurney [email protected]
wrote:

Are you saying that it should be mandated that the object id be a hash of
the data?

I disagree that embedded devices don't have enough power to generate a
hash. If the device is sub-100MHz, it's doubtful to be speaking STIX
directly, and if it is, hashing the object contents isn't that expensive.

The hardest part of this is defining the correct serialization method for
how to hash the data (due to whitespace issues, etc) such that when it gets
reformatted, that the hash does not change.

Requiring the object id be the hash of the contents seems to break the
ability to update some of the higher level objects w/o having to go
regenerate all the lower level objects, which could create a massive
cascading issue of updates.


Reply to this email directly or view it on GitHub
#62 (comment)
.

@jmgnc
Copy link

jmgnc commented Dec 11, 2015

There is still the how to specify the format for the data for the hash. We'd have to normalize time stamps to UTC, or objects that different only by time zone offset (but have same UTC time) would have different hashes, and you can't just feed JSON or XML into a hash function due to the fact that both formats allow whitespace in locations that do not effect the meaning. Specifying this and implementing this is a huge pain.

For example, look at the XML signing tools for the pain to get hashing done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants