Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Official GPL-2.0 license text not recognized #245

Open
sdheh opened this issue Jul 22, 2024 · 8 comments
Open

Official GPL-2.0 license text not recognized #245

sdheh opened this issue Jul 22, 2024 · 8 comments

Comments

@sdheh
Copy link

sdheh commented Jul 22, 2024

For the license text https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt I get the following:

System.out.println(Arrays.toString(LicenseCompareHelper.matchingStandardLicenseIds(licenseText)));
System.out.println(LicenseCompareHelper.matchingStandardLicenseIdsWithinText(licenseText));

outputs

[]
[GPL-2.0, GPL-2.0-or-later, GPL-2.0-only]

The two outputs should be the same since the GPL-2.0 license spans the whole file.
Tested with version 1.1.11
This problem is similar to 217

@pmonks
Copy link
Collaborator

pmonks commented Jul 22, 2024

PR #236 includes unit tests that reproduce this problem, albeit with other license texts - the issue is not limited to just GPL-2.0, or indeed even just GPL family licenses.

See also #234.

@sdheh
Copy link
Author

sdheh commented Jul 23, 2024

I figured out a problem that could explain this case. I think the tokenization does not work properly.
Example:

String license1 = "<one";
String template1 = "<<beginOptional>><<<endOptional>>one";
String license2 = "< one";
System.out.println("template1, license1: " + LicenseCompareHelper.isTextMatchingTemplate(template1, license1).getDifferenceMessage());
System.out.println("template1, license2: " + LicenseCompareHelper.isTextMatchingTemplate(template1, license2).getDifferenceMessage());

Returns

template1, license1: Normal text of license does not match at end of text when comparing to template text "one
".  Last optional text was not found due to the optional difference: 
	Normal text of license does not match at end of text when comparing to template text "<"
template1, license2: No difference found

When I debug I see that for the first case in org.spdx.utility.compare.CompareTemplateOutputHandler.compareText the matchTokens parameter is ["<one"]. I think it should instead be ["<", "one"] like in the second case.

Also if I remove all < and > from the https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt text (
gpl-2.0-removed-angle-brackets.txt
) or if I add a space before and after every < and > (
gpl-2.0-spaces-between-angle-brackets-and-text.txt
) I get the following result for the code in the issue description:

[GPL-2.0, GPL-2.0-only]
[GPL-2.0, GPL-2.0-or-later, GPL-2.0-only]

@goneall
Copy link
Member

goneall commented Jul 23, 2024

Thanks @sdheh for the analysis! I agree, the tokenization is the issue. I'm still working on the 3.0 update, so I won't have much time over the next week or so to look for a fix, but if you want to create a pull request I can review / merge.

@douglasclarke
Copy link

I ran into this issue as well and believe it affects several templates where the optional text would not be tokenized separately. For the above case I did experiment with the following change but am still learning the code base and am unclear on the total impact:

+++ b/src/main/java/org/spdx/utility/compare/CompareTemplateOutputHandler.java
@@ -247,7 +247,13 @@ public class CompareTemplateOutputHandler implements
                                                nextToken = sub.match(matchTokens, nextToken, endToken, originalText, 
                                                                optionalDifference, tokenToLocation);
                                                if (nextToken < 0) {
-                                                       if (!ignoreOptionalDifferences) {
+                                                       // Handle optional text tokenized with license text to handle single char optional text
+                                                       String optionalText = sub.getText();
+                                                       String matchText = LicenseTextHelper.getTokenAt(matchTokens, startToken);
+                                                       if (matchText != null && optionalText != null && matchText.length() > optionalText.length() && matchText.startsWith(optionalText)) {
+                                                               // remove optional text from start of match token
+                                                               matchTokens[startToken] = matchText.substring(optionalText.length());
+                                                       } else if (!ignoreOptionalDifferences) {
                                                                setLastOptionalDifference(optionalDifference);
                                                        }                                       
                                                        return startToken;      // the optional text didn't match, just return the start token

Modifying the parsed tokens feels wrong but could not sort out an easy way to adjust the tokenization of the license text being matched to the template.

@pmonks
Copy link
Collaborator

pmonks commented Oct 2, 2024

@douglasclarke this was fixed in PR #249, which is merged but awaiting release.

@douglasclarke
Copy link

@pmonks thanks. I believe I am testing with the latest code on master and still failing to get this case to work:

		String template = "<<beginOptional>><<<endOptional>>one";
		String sample = "<one";
		
		DifferenceDescription diff = LicenseCompareHelper.isTextMatchingTemplate(template, sample);

@goneall
Copy link
Member

goneall commented Oct 2, 2024

@pmonks thanks. I believe I am testing with the latest code on master and still failing to get this case to work:

		String template = "<<beginOptional>><<<endOptional>>one";
		String sample = "<one";
		
		DifferenceDescription diff = LicenseCompareHelper.isTextMatchingTemplate(template, sample);

Thanks @douglasclarke for reporting this along with the specific example.

This may be a separate issue - let's open a new issue and close this one as resolved since the prior examples seem to be fixed with the latest.

@pmonks
Copy link
Collaborator

pmonks commented Oct 7, 2024

Just a head's up that this license list XML issue will also cause matching of (recent) GPL-2.0 texts to fail, but not because of anything wrong in Spdx-Java-Library. I only mention it as sometimes it can be time-consuming to discern whether a matching bug is in the code, the templates, or (as in this case) a breaking change to the official license text by the license publisher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants