Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with multiple possible callables #208

Merged
merged 4 commits into from
Jul 26, 2024

Conversation

khatchad
Copy link
Collaborator

khatchad added 2 commits July 21, 2024 13:01
* Return `null` when there are multiple possible callables.

* Add test to exercise call string imprecision.

Based on the call string length. See wala/WALA#1417 (reply in thread).

* Expect the test to fail.

In the past, we could add 0's to the parameters, but since we are not
enforcing the existing of the node in the CG, we can no longer do that.
Still, this test should now fail if wala#207 is fixed.
We don't the instance key that produces the callable right now.
@khatchad khatchad added enhancement New feature or request callables Callable objects in Python. labels Jul 21, 2024
@khatchad khatchad requested a review from msridhar July 21, 2024 17:04
@khatchad khatchad enabled auto-merge (squash) July 21, 2024 17:04
Copy link
Member

@msridhar msridhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble following the overall flow here. It seems the method trampoline target selector looks at points-to sets to figure out a receiver, and then somewhere this receiver is passed to the superclass target selector. This doesn't fit my standard model of how things work. I would expect the pointer analysis to be iterating through possible receiver objects and then calling into the target selector to get the target (possibly null) for each one. Do you understand how this all fits together, and if so, could you explain it a bit?

@khatchad
Copy link
Collaborator Author

I'm having trouble following the overall flow here. It seems the method trampoline target selector looks at points-to sets to figure out a receiver, and then somewhere this receiver is passed to the superclass target selector. This doesn't fit my standard model of how things work. I would expect the pointer analysis to be iterating through possible receiver objects and then calling into the target selector to get the target (possibly null) for each one. Do you understand how this all fits together, and if so, could you explain it a bit?

That is sort of what happens, except that we don't call into a target selector. First, we detect whether the given receiver is a "callable" object:

Then, we swap the receiver:

// It's a callable. Change the receiver.
receiver = getCallable(caller, receiver.getClassHierarchy(), call);

The new receiver will actually represent the callable method:

/**
* Returns the callable method of the receiver of the given {@link PythonInvokeInstruction}.
*
* @param caller The {@link CGNode} representing the caller of the given {@link
* PythonInvokeInstruction}.
* @param cha The receiver's {@link IClassHierarchy}.
* @param call The {@link PythonInvokeInstruction} in question.
* @return The callable method the given {@link PythonInvokeInstruction}'s receiver.
*/
private IClass getCallable(CGNode caller, IClassHierarchy cha, PythonInvokeInstruction call) {

The points-to set iteration happens here:

// Lookup the callable method.
PointerKeyFactory pkf = builder.getPointerKeyFactory();
PointerKey receiver = pkf.getPointerKeyForLocal(caller, call.getUse(0));
OrdinalSet<InstanceKey> objs = builder.getPointerAnalysis().getPointsToSet(receiver);
for (InstanceKey o : objs) {
AllocationSiteInNode instanceKey = getAllocationSiteInNode(o);
if (instanceKey != null) {
CGNode node = instanceKey.getNode();
IMethod method = node.getMethod();
IClass declaringClass = method.getDeclaringClass();
final ClassLoaderReference classLoaderReference =
declaringClass.getClassLoader().getReference();
TypeName declaringClassName = declaringClass.getName();
final String packageName = "$" + declaringClassName.toString().substring(1);
IClass callable =
cha.lookupClass(
TypeReference.findOrCreateClass(
classLoaderReference, packageName, CALLABLE_METHOD_NAME));

That is where we find the __call__() method for the class. Note that when this happens, the original receiver is just Lobject (that is how we know that the "function call" is on an object rather than a function. Does that answer your question?

@msridhar
Copy link
Member

Thanks, but I'm wondering, where is the target selector for call graph construction getting called in the first place? That's the place where I would expect the points-to set iteration to be happening. Like here is code in the WALA Java bytecode analysis:

https://github.com/wala/WALA/blob/dc3ad5e3740364a3a97786549f16a8808a84811a/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L2074-L2081

The flow is a little complex, but basically it gets the receiver type from an instance key, and only then calls into the MethodTargetSelector (gets called in getTargetForCall here).

@khatchad
Copy link
Collaborator Author

Thanks, but I'm wondering, where is the target selector for call graph construction getting called in the first place? That's the place where I would expect the points-to set iteration to be happening. Like here is code in the WALA Java bytecode analysis:

https://github.com/wala/WALA/blob/dc3ad5e3740364a3a97786549f16a8808a84811a/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L2074-L2081

Ariadne also has this as the starting point for getting the target selector for call graph construction.

The flow is a little complex, but basically it gets the receiver type from an instance key, and only then calls into the MethodTargetSelector (gets called in getTargetForCall here).

I see what you mean. So, instead of swapping the receiver inside the instance method trampoline target selector, we'd swap the receiver at the caller further up the call stack. I think that would work also.

I think a bigger issue perhaps is #207, i.e., I'm unsure whether using the points-to analysis in the first place is the right way to go. Does #207 make sense? Otherwise, I don't see why we can't move the points-to set iteration.

@msridhar
Copy link
Member

Ok, unfortunately I don't have time to dig deep here to understand what is going on. Something does seem off. For now, maybe you can add a TODO comment in the code linking #207 and saying we need a deeper investigation to at least document how this is working? I don't want to block you so we can go ahead and land this PR once we have that.

@khatchad
Copy link
Collaborator Author

Yes, also short on time here. To summarize:

  1. The current points-to set traversal can be moved. Instead of internally swapping the receiver, a different receiver can be sent in the first place as part of the outer call.
  2. While we agree on (1) above, a question is open (Callable identification should be further up the analysis #207) as to whether using a points-to analysis in the first place makes sense.

@msridhar msridhar disabled auto-merge July 26, 2024 15:47
Copy link
Member

@msridhar msridhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved but please add discussed code comment before merging

@khatchad khatchad enabled auto-merge (squash) July 26, 2024 18:21
@khatchad khatchad merged commit 948e6bf into wala:master Jul 26, 2024
1 check passed
@khatchad khatchad deleted the contrib_callables_multiple branch July 26, 2024 18:22
khatchad added a commit to ponder-lab/ML that referenced this pull request Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
callables Callable objects in Python. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants