Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Cases: Files #34

Open
coderpatros opened this issue Jun 3, 2021 · 7 comments
Open

Use Cases: Files #34

coderpatros opened this issue Jun 3, 2021 · 7 comments

Comments

@coderpatros
Copy link
Member

We should provide guidance and an example of describing components down to the file level.

In some cases it is possible to determine a file version, i.e. DLLs. But for a lot of file types this isn't possible. And I suggest a hash is used as the version on those files.

@coderpatros
Copy link
Member Author

Ok, currently suggesting guidance along the lines of this

If including files that have been generated as part of a build process, like DLLs, and a version can be reliably determined, the recommended option is to use the actual file version.

For other files, without a reliable way to determine the version, the recommended approach is to use 0.0.0-SHORTHASH. Where SHORTHASH is a truncated SHA-1 hash as used by Git.

@nscuro
Copy link
Member

nscuro commented Jun 4, 2021

Here's an example SBOM that I generated with a prototype of cyclonedx-gomod, using the 0.0.0-SHORTHASH versioning scheme. The hash itself is SHA1, and I took the first 12 characters.


We should probably include guidance on the length of those short hashes as well. Git doesn't use a fixed length, so "just do it like Git" may be ambiguous:

Git can figure out a short, unique abbreviation for your SHA-1 values. If you pass --abbrev-commit to the git log command, the output will use shorter values but keep them unique; it defaults to using seven characters but makes them longer if necessary to keep the SHA-1 unambiguous

See https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection

I don't think we have to care about hashes of files overlapping, like git has to, because the version alone does not define the identity of a file.

@coderpatros
Copy link
Member Author

Yeah, collision is also less likely on the same file that has been modified. I wonder what the length should be though? 12 seems reasonable to me.

@stevespringett
Copy link
Member

Do we have agreement on recommending 0.0.0-SHORTHASH?

Do we have agreement on the use of SHA1?

Should we recommend SHA-256 instead?

Do we have agreement on recommending use of the first 12 characters of the hash?

@coderpatros
Copy link
Member Author

Another idea, should a recommended option be 0.0.0-BUILDNUMBER for commercial closed source software.

The 0.0.0-SHORTHASH option is really good for recreating a BOM for opensource software. But that is, perhaps, not as relevant for commercial software.

Regarding SHA1 vs SHA-256, this shouldn't be used for integrity use cases, and the hash is being truncated. So I don't think the quality of the hash is as relevant for this. But it would be good to align to existing identification use cases like the short hash in git. Are there other existing ecosystem scenarios similar to this we should consider?

@nscuro
Copy link
Member

nscuro commented Jun 7, 2021

Another idea, should a recommended option be 0.0.0-BUILDNUMBER for commercial closed source software.

Curious, would the build number be something that vendors typically include in their BOM? While sometimes you see build numbers being part of the version, most of the time vendors don't expose that info IME.

The 0.0.0-SHORTHASH option is really good for recreating a BOM for opensource software.

Not only is it fairly easy to recreate, it's also obvious what it means. Especially if the full hashes can be found in the component's hashes field. With things like build numbers, it's not obvious, unless the vendor explains what their version suffix means somewhere.

Additionally, if you were to diff two BOMs from different builds, you'd get a lot of changes without the files actually being different.

After all, everyone is and should be free to do what they feel is best. But the standard / best practice should be to choose a simple, reproducable and obvious way IMO.

@coderpatros
Copy link
Member Author

Yeah, I agree @nscuro. Ignore the build number idea.

madpah added a commit to CycloneDX/cyclonedx-python-lib that referenced this issue Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants