-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fortify SCA SARIF inaccuracy causing poor GitHub Code Scanning experience #72
Comments
@simon-engledew, thanks for reporting this. I'm aware that the output isn't strictly following the expectation of what kind of data is supplied in the Primary cause is not necessarily in FortifyVulnerabilityExporter itself, but in how Fortify SSC/FoD expose vulnerability data in the REST API that is used by this integration. At the time when I originally developed this, generating a separate rule for every finding seemed like the best solution and didn't seem to have any significant impact on user experience. Can you give some specific examples / screenshots as to how the current approach is negatively impacting user experience? Based on that, I can research whether there may be better ways to represent rules and results. However, again this is dependent on data provided by the SSC/FoD REST API. I don't think current APIs return generic, issue-agnostic Fortify category descriptions, so if we are to generate issue-agnostic |
Thanks for the explanation! That totally tracks with what I am seeing. 👍 For context I work at GitHub, hi! 👋 😅 – we have various internal limits to stop Code Scanning tools breaking UI elements / internal endpoints and Fortify SCA is tripping them as it generates so many more rules than anything else 😬 😂. Things don't look too broken but we are discarding rules to keep pages loading. 🙇 In the short term, if you could drop fullDescription/help/properties from rules and put some or all of the information in the results message field that would avoid creating so many unique rules. ❤️ |
Thanks for the additional info. So, if you're discarding rules, I guess users may not see the bottom block (tool/rule-id and description) from the screenshot below if we report many issues? At what point does GitHub start discarding rules? If I reduce/de-duplicate the rules by removing issue-specific information, obviously rule id's for existing issues will change. Will this have any impact on issues/rules reported by previous runs of this utility on an existing repository, other than having a different layout? As in the example below, the text that's currently displayed in the rule block may be fairly long, so I don't think it would provide good user experience if we move all this information to the results message field, as that's shown inline with code snippets. Does the text block that shows the results message field also show a 'Show More' link if the text is too long, or will it always show the full text? If there's no text to be shown in the rule block, will GitHub still show that block? I think it would look a bit weird to show an empty rule block. |
Funnily enough that's not actually a problem! 😂 It's more the ancillary things like API calls and rule selector drop-downs - we also have some internal indexing that's returning truncated results.
IIRC we have an internal limit of 1,000. Fortify SCA is on about 300,600. 😂
The rule ID does need to line up with the previous analysis in order for the alert to be classified as the same, yeah. 😞
It will show the block with the text "No rule help". Not ideal. 😞 The main problem is the large number of unique SARIF identifiers. 😱 Ultimately it is our responsibility to make this work regardless of what SARIF we receive but at some point we may have to start cleaning up data or redacting rules. 🤔 It sounds like there isn't much you can do given the output the tool gives you? |
@simon-engledew, thanks again for the info. Given that number of 300,600, I guess this means that you're storing rules across repositories rather than per-repository? I was expecting that both I've been looking in more detail at the data that's being returned by our product REST endpoints:
Ideally I'd like to keep SARIF output for both integrations the same though. Even if we were able to generate issue-agnostic SARIF rules, there's also a lot of issue-specific information that we're currently showing. I'm not sure whether it's viable to move all this information to the results message field, given that this is shown inline with the code on GitHub. Do you have any suggestions as to how to best handle this? Side note, I'm migrating this functionality to another multi-purpose CLI tool and eventually FortifyVulnerabilityExporter will be deprecated. So, if I were to implement any changes based on this issue, it will probably go into the other tool: |
Yeah, it's a lot of data so we have to deduplicate the storage of rule text.
No, sorry; I think fundamentally SARIF is quite tailored to the way CodeQL wants to send results. 😞 Your best bet is to take a look at how CodeQL structures its rules and messages and try to mimic that as much as possible. 😬 Here is an example of CodeQL's rule help for an integer overflow issue: https://github.com/github/codeql/blob/a9bab18804e28159c81f6ab7b083df53b58f367f/go/ql/src/Security/CWE-681/IncorrectIntegerConversionQuery.qhelp#L5 That ends up with this nice (fairly general) help document that is true for all occurrences of an alert result: and a fairly short result message:
That's helpful context, thanks. ❤ |
Thanks again for the info. In theory, we should be able to generate similar issue-agnostic rule descriptions, again similar to what's for example shown here. As mentioned, the problem is that the products that FortifyVulnerabilityExporter integrates with don't expose these issue-agnostic descriptions (not at all for SSC, somewhat difficult to extract for FoD). I've already raised this concern to the respective product managers, but even if they agree to implement features to support this use case, it will probably take considerable amount of time before this becomes available. As an alternative, I'll investigate whether it's possible to get the issue-agnostic rule descriptions from another source like VulnCat (or the data behind VulnCat), but currently VulnCat doesn't offer any API for this purpose. Until then, I only see two options; leave things as they are, or use empty rule descriptions (at least for the SSC integration). The latter would however have a significant impact on user experience. I'll keep you posted on any progress. Edit: side question, how does GitHub handle changes to rule descriptions over time? Does GitHub potentially store multiple descriptions for the same rule id (showing the description that matches the contents of the SARIF file for a particular repository), or will it only store the first/last submitted rule description? |
We store the changes over time. 👍 |
Hi @simon-engledew, I'm still researching options to improve our SARIF output, and have one additional question for now. Our issue-specific rule descriptions currently include an issue-specific link back to our portal, allowing users to view additional issue details for an individual issue. If we were to generate issue-agnostic rules (as they're meant to be 😉), where should this link go? Will GitHub properly render any links in Thanks, |
@simon-engledew, thanks! According to https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#result-object, only |
I think the SARIF spec is right so it's quite likely I've either a) made a mistake or b) we support it as a happy accident. 😅 Looking through a big old folder of example SARIF files I don't see any that use That said, I definitely do see the link being rendered when I send some markdown so maybe both things are true. 😂 |
@simon-engledew Not sure I understand, do you mean that |
Sure! I've modified a CodeQL SARIF file and added a link into `results[].message.text`.{
"$schema" : "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"version" : "2.1.0",
"runs" : [ {
"tool" : {
"driver" : {
"name" : "CodeQL",
"organization" : "GitHub",
"semanticVersion" : "2.0.0",
"rules" : [ {
"id" : "js/unused-local-variable",
"name" : "js/unused-local-variable",
"shortDescription" : {
"text" : "Unused variable, import, function or class"
},
"fullDescription" : {
"text" : "Unused variables, imports, functions or classes may be a symptom of a bug and should be examined carefully."
},
"defaultConfiguration" : {
"level": "note"
},
"properties" : {
"tags" : [ "maintainability" ],
"kind" : "problem",
"precision" : "very-high",
"name" : "Unused variable, import, function or class",
"description" : "Unused variables, imports, functions or classes may be a symptom of a bug\n and should be examined carefully.",
"id" : "js/unused-local-variable",
"problem.severity" : "recommendation"
}
}]
}
},
"results" : [ {
"ruleId" : "js/unused-local-variable",
"ruleIndex" : 0,
"message" : {
"text" : "Unused variable [foo](https://github.com/)."
},
"locations" : [ {
"physicalLocation" : {
"artifactLocation" : {
"uri" : "main.js",
"uriBaseId" : "%SRCROOT%",
"index" : 0
},
"region" : {
"startLine" : 2,
"startColumn" : 7,
"endColumn" : 10
}
}
} ],
"partialFingerprints" : {
"primaryLocationLineHash" : "39fa2ee980eb94b0:1",
"primaryLocationStartColumnFingerprint" : "4"
}
}],
"newlineSequences" : [ "\r\n", "\n", "
", "
" ],
"columnKind" : "utf16CodeUnits",
"properties" : {
"semmle.formatSpecifier" : "sarif-latest"
}
} ]
} Specifically: "message" : {
"text" : "Unused variable [foo](https://github.com/)."
}, I've uploaded it to a test repository and I can use the link: |
Great, thanks! According to the SARIF specification, |
Hi @simon-engledew, can you please review the SARIF files in this repository (and associated GitHub Code Scanning alerts if you have access) to verify that these meet GitHub expectations? Note that different Fortify rules may generate the same types of results, so there may be some duplication in rule descriptions. For example, the As mentioned before, I won't be updating SARIF output for FortifyVulnerabilityExporter as we plan on deprecating this tool (and proper fix would be difficult to implement based on current tool architecture). We'll advise customers to migrate to fcli once we release similar export functionality. For reference, the SARIF files referenced above were generated using an fcli development version, based on these yaml files: Side question 1: GitHub previously allowed importing a maximum of 1000 results (and would throw an error if a SARIF file contained more than 1000 results), but it looks like more results are now supported: https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#validating-your-sarif-file. Can you please confirm? Side question 2: You mentioned that rule descriptions get deduplicated across all repositories on GitHub. Based on some quick tests, I assume you deduplicate based on description hashes, and not based on other SARIF properties like rule id, correct? Or, in other words, using a SARIF file like this one, it wouldn't be possible to 'hijack' rule descriptions (with same rule id and tool properties) in other repositories, correct? |
Sorry for taking so long to get back to you! 🙇 😓
I can already see that you only use 45 rules for ~800 alerts so that looks like a big improvement 😂 I guess the question is, do you generate the same rule between multiple uploads? It looks like there have been 77 uploads (2 of which have alerts) and 45 unique rules across 818 alerts, but there is no overlap at all between the two analyses. I would expect that both analysis should share the same rules if they are describing the same alert. (This might just be because it's test data though). Looks like your 'more information' link points to localhost? I figure you already know that but I thought I would mention it. 😁
Yup! Looks like the default limit is currently
Yes 👍 |
👋 Hello! Not sure if this is the right place to raise this issue, but we've noticed that the way Fortify SCA is generating SARIF documents is causing a bad user experience with GitHub Code Scanning.
Code Scanning expects that rule metadata will be shared many times between different runs of a Code Scanning tool. A rule should represent a capability of the tool, not information about any specific finding. Information that is scan-specific should be included in the results message field instead (e.g: file paths, container checksums etc).
Fortify SCA appears to be generating large numbers of rules, each one with unique alert specific information in the help text. Possibly due to the configuration in these files?
The text was updated successfully, but these errors were encountered: