You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I spent the last 3 days to find out why for the LICENSE.txt file of javax.activation:javax.activation-api:1.2.0 I got ASLv2, CDDL 1.0, and CDDL 2.0 as found licenses.
I have significant amount of customization including own normalizer bundle without reusing the built-in one (probably at some point will create a PR with the improvements), so I initially thought it is something I broke.
I was even more confused when removing the normalizer rule
{
"bundleName": "CDDL-1.0",
"licenseNamePattern": "Common Development and Distribution License( \\(CDDL\\),?)? (version )?(.?\\s?)?1\\.0"
},
suddenly made the CDDL 1.0 disappear, while this rule should not even look at the license file contents.
It turned out, that the logic in com.github.jk1.license.reader.LicenseFilesReader#createFileDetails is highly questionable imho.
When creating the LicenseFileDetails instance it looks at the text content.
If the text contains a certain EPLv2 string representation the file is assumed to be EPLv2.
If not, but the text contains literally "CDDL", the file is assumed to be CDDL 1.0.
If not, but the text contains a certain ASLv1.1 string representation the file is assumed to be ASLv1.1.
If not, but the text contains a certain ASLv2 string representation the file is assumed to be ASLv2.
If not, but the text contains another certain EPLv2 string representation the file is assumed to be EPLv2 but with slightly different values than the most significant case.
This is highly problematic per-se as the inspected license files could for example also contain multiple licenses and especially the CDDL case is extremely bad.
What happened here for the javax.activation:javax.activation-api:1.2.0 license file is, that it did not contain the most significant EPLv2 string, but it contained a CDDL 1.1 license. As that method just looks for "CDDL" it was recognized as CDDL 1.0.
The bundle normalizer then in com.github.jk1.license.filter.LicenseBundleNormalizer#normalizeLicenseFileDetails sends not only the license text, but also the prematurely "detected" license name and url to the transformation rules. After the rules are applied the previous information is discarded and the outcome of the normalizer rules is used instead. So without that rule, no rule matches the prematurely wrongly detected CDDL 1.0 license and it is thrown out. With the rule, it survives the bundle normalizer and is wrongly reported by any reporter using the licenses detected in the license file.
While the EPLv2, ASLv1.1, and ASLv2 detections in that method are pretty unlikely to produce false-positives, the CDDL 1.0 rule is pretty likely as we have seen.
My question is, why this premature lookup is done at all, except maybe for legacy reasons.
Well, true, the license bundle normalizer needs to be configured manually to do any work.
But especially with license files that can contain any text and multiple licenses it is a major flaw to not use the bundle normalizer anyway imho.
And in the default HTML reports the license file licenses are not rendered at all anyway, so currently there would not be lost too much when removing these detections at all.
If they stay, then at least the EPLv2 rules should be changed to produce consistent license name and URL,
and the CDDL rule needs to be made more specific or removed.
Or a CDDL 1.1 rule with higher precedence (lower in the if-block cascade) needs to be added so that it is more unlikely that CDDL 1.1 is misrecognized as CDDL 1.0 as it is right now.
The text was updated successfully, but these errors were encountered:
I spent the last 3 days to find out why for the
LICENSE.txt
file ofjavax.activation:javax.activation-api:1.2.0
I got ASLv2, CDDL 1.0, and CDDL 2.0 as found licenses.I have significant amount of customization including own normalizer bundle without reusing the built-in one (probably at some point will create a PR with the improvements), so I initially thought it is something I broke.
I was even more confused when removing the normalizer rule
suddenly made the CDDL 1.0 disappear, while this rule should not even look at the license file contents.
It turned out, that the logic in
com.github.jk1.license.reader.LicenseFilesReader#createFileDetails
is highly questionable imho.When creating the
LicenseFileDetails
instance it looks at the text content.If the text contains a certain EPLv2 string representation the file is assumed to be EPLv2.
If not, but the text contains literally "CDDL", the file is assumed to be CDDL 1.0.
If not, but the text contains a certain ASLv1.1 string representation the file is assumed to be ASLv1.1.
If not, but the text contains a certain ASLv2 string representation the file is assumed to be ASLv2.
If not, but the text contains another certain EPLv2 string representation the file is assumed to be EPLv2 but with slightly different values than the most significant case.
This is highly problematic per-se as the inspected license files could for example also contain multiple licenses and especially the CDDL case is extremely bad.
What happened here for the
javax.activation:javax.activation-api:1.2.0
license file is, that it did not contain the most significant EPLv2 string, but it contained a CDDL 1.1 license. As that method just looks for "CDDL" it was recognized as CDDL 1.0.The bundle normalizer then in
com.github.jk1.license.filter.LicenseBundleNormalizer#normalizeLicenseFileDetails
sends not only the license text, but also the prematurely "detected" license name and url to the transformation rules. After the rules are applied the previous information is discarded and the outcome of the normalizer rules is used instead. So without that rule, no rule matches the prematurely wrongly detected CDDL 1.0 license and it is thrown out. With the rule, it survives the bundle normalizer and is wrongly reported by any reporter using the licenses detected in the license file.While the EPLv2, ASLv1.1, and ASLv2 detections in that method are pretty unlikely to produce false-positives, the CDDL 1.0 rule is pretty likely as we have seen.
My question is, why this premature lookup is done at all, except maybe for legacy reasons.
Well, true, the license bundle normalizer needs to be configured manually to do any work.
But especially with license files that can contain any text and multiple licenses it is a major flaw to not use the bundle normalizer anyway imho.
And in the default HTML reports the license file licenses are not rendered at all anyway, so currently there would not be lost too much when removing these detections at all.
If they stay, then at least the EPLv2 rules should be changed to produce consistent license name and URL,
and the CDDL rule needs to be made more specific or removed.
Or a CDDL 1.1 rule with higher precedence (lower in the if-block cascade) needs to be added so that it is more unlikely that CDDL 1.1 is misrecognized as CDDL 1.0 as it is right now.
The text was updated successfully, but these errors were encountered: