Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested functions are considered heap writes in the ModRef analysis #284

Open
khatchad opened this issue Nov 8, 2023 · 10 comments
Open

Nested functions are considered heap writes in the ModRef analysis #284

khatchad opened this issue Nov 8, 2023 · 10 comments
Labels
blocked bug Something isn't working python Pull requests that update Python code

Comments

@khatchad
Copy link
Member

khatchad commented Nov 8, 2023

Consider the test added in 438007e. Currently, the ModRef analysis lists the inner function as a heap write of the outer function. Why?

Next Steps

Dump the call graph for the code in this test.

@khatchad khatchad added bug Something isn't working python Pull requests that update Python code question Further information is requested labels Nov 8, 2023
@khatchad khatchad self-assigned this Nov 8, 2023
@khatchad khatchad changed the title Embedded functions are conidered heap writes in the ModRef analysis Embedded functions are considered heap writes in the ModRef analysis Nov 8, 2023
@khatchad
Copy link
Member Author

khatchad commented Nov 8, 2023

Here's the IR of f():

callees of node f : [g]

IR of node 3, context CallStringContext: [ script A.py.do()LRoot;@96 ]
<Code body of function Lscript A.py/f>
CFG:
BB0[-1..-2]
    -> BB1
BB1[0..3]
    -> BB2
    -> BB3
BB2[4..7]
    -> BB3
BB3[-1..-2]
Instructions:
BB0
BB1
0   v2 = new <PythonLoader,Lscript A.py/f/g>@0<no information> [2=[g]]
1   global:global script A.py/f/g = v2       <no information> [2=[g]]
2   putfield v1.< PythonLoader, LRoot, g, <PythonLoader,LRoot> > = v2<no information> [1=[the function]2=[g]]
3   v5 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v2 @3 exception:v6A.py [6:8] -> [6:11] [5=[a]2=[g]]
BB2
6   v9 = binaryop(eq) v5 , v7:#5             A.py [7:11] -> [7:17] [5=[a]7=[cmp0]]
7   assert v9 (fromSpec: true)               A.py [7:4] -> [7:17]
BB3

Step-by-step:

0   v2 = new <PythonLoader,Lscript A.py/f/g>@0<no information> [2=[g]]
1   global:global script A.py/f/g = v2       <no information> [2=[g]]
2   putfield v1.< PythonLoader, LRoot, g, <PythonLoader,LRoot> > = v2<no information> [1=[the function]2=[g]]

Function g() gets stored in v2. But, that also happens for f() in the script:

callees of node Lscript A.py : [f]

IR of node 2, context CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]
<Code body of function Lscript A.py>
CFG:
BB0[-1..-2]
    -> BB1
BB1[0..95]
    -> BB2
    -> BB4
BB2[96..96]
    -> BB3
    -> BB4
BB3[97..97]
    -> BB4
BB4[-1..-2]
Instructions:
...
90   v241 = new <PythonLoader,Lscript A.py/f>@90<no information> [241=[f]]
91   global:global script A.py/f = v241      <no information> [241=[f]]
92   putfield v1.< PythonLoader, LRoot, f, <PythonLoader,LRoot> > = v241<no information> [241=[f]]

So, that's not unheard of. Thus, it seems that nothing special is really going on for embedded functions; it even happens at the (global) script level.

@khatchad
Copy link
Member Author

khatchad commented Nov 8, 2023

It would seem then that either scripts or functions that define functions have a field whose value is the defined function. But, I'm unsure why. It's not used at all in this IR; it's just stored and the function is called using the value that is being stored and not the field. Maybe it's used if an embedded function is called from a function other than the outer function (is that possible?). Can you call f.g.() from the script level of A.py?

@khatchad
Copy link
Member Author

khatchad commented Nov 8, 2023

Looks like that's not possible. Not sure of the reason for this then.

@khatchad
Copy link
Member Author

khatchad commented Nov 8, 2023

The question now is whether this should be considered a "mod" in the ModRef analysis ...

@khatchad
Copy link
Member Author

khatchad commented Nov 8, 2023

In added test, the points-to set is non-empty, while in mead-baseline, it is empty.

In the test:

pointerKey	StaticFieldKey  (id=167)	
[<field global script A.py/f/g>]

pointsToSet	OrdinalSet<T>  (id=224)	
[SITE_IN_NODE{<Code body of function Lscript A.py/f>:Lscript A.py/f/g in CallStringContext: [ script A.py.do()LRoot;@96 ]}]

This is why we're not filtering out this location.

@khatchad
Copy link
Member Author

khatchad commented Nov 8, 2023

I wonder why we have an empty points-to set in mead-baseline, or why even having an empty points-to set is important here....

@khatchad
Copy link
Member Author

khatchad commented Nov 8, 2023

In test's pointer analysis:

[<field global script A.py/f/g>] --> [SITE_IN_NODE{<Code body of function Lscript A.py/f>:Lscript A.py/f/g in CallStringContext: [ script A.py.do()LRoot;@96 ]}] 

In mead-baseline:

[<field global script pretrain_paired_tf.py/main/_distributed_train_step>] --> []

@khatchad
Copy link
Member Author

khatchad commented Nov 8, 2023

There are other functions that are also nested but have non-empty points-to sets, e.g.:

[<field global script pretrain_paired_tf.py/main/_replicated_train_step>] --> [SMIK:SITE_IN_NODE{<Code body of function Lscript pretrain_paired_tf.py/main>:Lscript pretrain_paired_tf.py/main/_replicated_train_step in CallStringContext: [ script pretrain_paired_tf.py.do()LRoot;@303 ]}@creator:Node: <Code body of function Lscript pretrain_paired_tf.py/main> Context: CallStringContext: [ script pretrain_paired_tf.py.do()LRoot;@303 ]]

I am now thinking that this problem is related to wala/ML#91 because the missing functions are decorated. Moreover, they're decorated with a weird decorator that can't be found.

@khatchad
Copy link
Member Author

khatchad commented Nov 9, 2023

Indeed, this is the case. If you comment out the decorator, the problem doesn't happen.

@khatchad khatchad removed their assignment Nov 9, 2023
@khatchad khatchad removed the question Further information is requested label Nov 9, 2023
@khatchad
Copy link
Member Author

khatchad commented Nov 9, 2023

Blocked by wala/ML#91.

@khatchad khatchad changed the title Embedded functions are considered heap writes in the ModRef analysis Nested functions are considered heap writes in the ModRef analysis Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked bug Something isn't working python Pull requests that update Python code
Projects
None yet
1 participant