Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to config to result[*] rule for return array type in taint analysis #98

Open
Raul1718 opened this issue Apr 6, 2024 · 10 comments
Open

Comments

@Raul1718
Copy link

Raul1718 commented Apr 6, 2024

Description

Hi,

When I test some cases that return type is array and as transfer, such as String.split. I doubt how to correct config the rule.

My test sample:

class ArgToResultStringSplit {
    public static void main(String[] args) {
        String taint = SourceSink.source();
        String[] taints = taint.split(",");
        String s2 = taints[1]; // no taint now!
        SourceSink.sink(s2); // taint
    }
} 

The transfer rule configured below.

- { method: "<java.lang.String: java.lang.String[] split(java.lang.String)>", from: base, to: result, type: "java.lang.String[]" }
could transfer to "String[] taints", but var s2 is not tainted after get taints[1].

or

- { method: "<java.lang.String: java.lang.String[] split(java.lang.String)>", from: base, to: "result[*]", type: "java.lang.String[]" }

I also tested, but could not transfer to "String[] taints".

Could you provide guidance on how to configure correctly to detect this ArgToResultStringSplit case.
Thanks!

@zhangt2333
Copy link
Member

TL;DR:
Use - { method: "<java.lang.String: java.lang.String[] split(java.lang.String)>", from: base, to: "result[*]", type: "java.lang.String[]" }, and set pointer analysis option to only-app:false;.


For your given code snippet, with a static analysis perspective, the expectation is as follows:

flowchart LR
    A[taints]
    B["NewObj ... newarray java.lang.String[...]"]
    C["NewObj ... newarray java.lang.String[...][*]"]
    D["TaintObj"]

    A --> |points-to| B
    C --> |points-to| D
Loading

Intuitively, configuring the pointer analysis option with only-app:true; results in the method split not being processed. As a result, the arrayObj NewObj...newarray java.lang.String[...] will not be pointed to by the taints variable.

@q983868425
Copy link

TL;DR: Use - { method: "<java.lang.String: java.lang.String[] split(java.lang.String)>", from: base, to: "result[*]", type: "java.lang.String[]" }, and set pointer analysis option to only-app:false;.

For your given code snippet, with a static analysis perspective, the expectation is as follows:

flowchart LR
    A[taints]
    B["NewObj ... newarray java.lang.String[...]"]
    C["NewObj ... newarray java.lang.String[...][*]"]
    D["TaintObj"]

    A --> |points-to| B
    C --> |points-to| D
Loading

Intuitively, configuring the pointer analysis option with only-app:true; results in the method split not being processed. As a result, the arrayObj NewObj...newarray java.lang.String[...] will not be pointed to by the taints variable.

I configured the propagation rule as you said and set only-app=false, but the taint propagation path is still missing in the final taint-flow-graph.dot taint flow graph.

@zhangt2333
Copy link
Member

Could you please share your entire Tai-e project with me, either via file upload or by providing a link to your GitHub repository? This will allow me to reproduce the issue with one-click.

@q983868425
Copy link

Could you please share your entire Tai-e project with me, either via file upload or by providing a link to your GitHub repository? This will allow me to reproduce the issue with one-click.

我调试了下,发现返回值类型为数组时,pfg将taint变量直接指向了taints变量,但是在处理ArrayLoad的时候是从taints的数组指针指向LValue,导致传播中断了,我改为了new String[]{taint}之后,IR中会出现ArrayStore,所以会将taint指向new String[]的数组指针,后续load就会连续,所以这种情况应该怎么处理

@zhangt2333
Copy link
Member

Thanks for writing this issue in Chinese, it makes it very easy for a few of us developers to read. However, Tai-e is a project with an international focus, and we aim to keep our community discussions in as widely accessible a language as possible, which is English. This helps us share our open-source progress with interested individuals from all over the world. Kindly consider using English for communication in the future. We appreciate your understanding. Thank you.

Again, could you please share your entire Tai-e project with me, either via file upload or by providing a link to your GitHub repository? This will allow me to reproduce the issue with one-click.

@q983868425
Copy link

@zhangt2333
Copy link
Member

zhangt2333 commented Aug 9, 2024

Thank you ❤️ for uploading the modified files separately, but for convenience, please package the entire project so that I can execute it directly, such as https://github.com/Tai-e/Tai-e-Examples/tree/master/MinimalReproducibleExample-0069.

Besides, I have checked the repo, and found that the Tai-e's arguments is missing.

@q983868425
Copy link

Thank you very much for your patience. When I looked at the example you gave, I found that my problem is the same as the problem in # 69 of the example, and I saw that you explained in # 69 that you need to modify the TFGBuilder.buildComplete method implementation by yourself. But I'm going to try to modify the TransferHandler to add the taint variable to the edge of the array ArrayIndexNode on the PFG. I wonder if I'm doing this right? Thanks again.

@zhangt2333
Copy link
Member

ArrayIndexNode has always existed in TFG, but by default, Tai-e only outputs nodes within the application scope.


To address this, I spent my time implementing a prototype that demonstrates the solution of both Issue 69 and this current issue.

In this commit Tai-e/Tai-e-Examples@8ec1d9a, I modified Tai-e's class implementation to output the complete TFG as well as one of the shortest paths for each taint flow.

After running this example, three key files will be generated in the output directory:

  • taint-flow-graph.dot: This file contains the original Taint Flow Graph (TFG) outputted by Tai-e, which includes only nodes within the application scope.
  • complete-taint-flow-graph.dot: Generated by the modified code. The complete TFG (Taint Flow Graph), which includes all nodes, will show the paths like x1 -> ArrayNode -> x2, but this graph is typically too large to be easily viewed by humans.
  • taint-flow-path-1.dot: Generated by the modified code, this file represents one of the shortest paths for taint flow 1. The shortest path is calculated using the BFS algorithm (serves as a prototype).
    image

@MatthewXY01
Copy link

TL;DR: Use - { method: "<java.lang.String: java.lang.String[] split(java.lang.String)>", from: base, to: "result[*]", type: "java.lang.String[]" }, and set pointer analysis option to only-app:false;.

For your given code snippet, with a static analysis perspective, the expectation is as follows:

flowchart LR
    A[taints]
    B["NewObj ... newarray java.lang.String[...]"]
    C["NewObj ... newarray java.lang.String[...][*]"]
    D["TaintObj"]

    A --> |points-to| B
    C --> |points-to| D
Loading

Intuitively, configuring the pointer analysis option with only-app:true; results in the method split not being processed. As a result, the arrayObj NewObj...newarray java.lang.String[...] will not be pointed to by the taints variable.

Is type supposed to be java.lang.String? If I understand correctly, what's being asserted here is the type of the TaintObj we want the "to pointer (an ArrayIndex)" to point to after the propagation is complete, which in this case would be the type of the element (String) and not the type of the array variable (String[]).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants