Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OPENJPA-2817] dropping _transforming from PCClassFileTransformer and replace it by a minimal exclusion list #65

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rmannibucau
Copy link
Contributor

Follow up PR of #64

Goal is to let people test this proposed fix to ensure we didn't miss anything in the analyzis.

@jgrassel
Copy link
Contributor

jgrassel commented Jul 14, 2020

I've already stated I'm not a fan of the exclusion list, for something that's going to be called once for every class that has to be checked to be enhanced, let alone be enhanced, this is going to be very costly.

Might as well make the transform() method itself synchronized.

@rmannibucau
Copy link
Contributor Author

rmannibucau commented Jul 14, 2020

@jgrassel commons is excluded and the jdk as well, just to skip reading the bytecode (which is more costly than this exclusion list if you benchmark both options so overall the fact it is slower or faster depends the classpath size but it is unlikely you notice the difference, in particular with a custom classloader or a default temporary classloader usage which is way slower than that). Also note that the threadlocal option does not work as mentionned so I'm fully open to have another solution but it is better to have a functional solution than nothing I think and I don't have another proposal for now.
About locking in transform: you can't do that since you would deadlock with the classloader and it is not needed, this is not the issue you hit btw.

@jgrassel
Copy link
Contributor

Yeah, I noticed that commons was excluded after taking a second look through the change set, my comment update wasn't fast enough. =). I'm not sure I'm clear on why the ThreadLocal approach does not work?

@rmannibucau
Copy link
Contributor Author

@jgrassel The threadlocal - as the boolean today - just skips the transformer, so it means that if a class must be enhanced and is loaded while another is enhanced then it is skipped and runtime is likely broken. It is not highly highly likely but not uncommon too. In the common cases I see, the case tmpLoader=mainloader (this is a common optimization which saves a lot of time, in particular with CDS on or too wide temp loader impls. In this last case, if the agent is set on the JVM for runtime enhancement, it should work whereas the classloader will guarantee a lock per class and will enable to load a class during a loadclass and the second one will bypass the transformer (wrongly). Also note that the enhancement, using openjpaconf, can load any plugin working on the class graph (thinking to MDR extensions to be concrete) so we don't fully control what can be loaded by openjpa itself during a transform0. So at the end, the only thing we are sure is that we can't transform concurrently a single class but anything else can happen (concurrent calls to transform, nested calls etc). However the main case is still that the "subloading" will be about classes of the enhancer graph so the openjpa+serp+asm (i will add this one ;), thanks for reminding it to me) explicit exclusion is a good default I think.

@jgrassel
Copy link
Contributor

@rmannibucau That optimization sounds like a violation of the PersistenceUnitInfo contract:

/**
* Return a new instance of a ClassLoader that the provider may
* use to temporarily load any classes, resources, or open
* URLs. The scope and classpath of this loader is exactly the
* same as that of the loader returned by
* PersistenceUnitInfo.getClassLoader. None of the classes loaded 
* by this class loader will be visible to application
* components. The provider may only use this ClassLoader within 
* the scope of the createContainerEntityManagerFactory call.
* @return temporary ClassLoader with same visibility as current 
* loader
*/
public ClassLoader getNewTempClassLoader();

It specifically states None of the classes loaded by this class loader will be visible to application components. -- that is impossible if the Temporary CL is the same object instance as the App CL.

Anyways, the _transforming boolean was put in there to catch a reentrant call triggered during enhancement, which could only occur if the JPA impl classes are loaded by the same CL as the entity classes. See this flow:

1.  MyEntity is loaded by the ClassLoader for the first time.  Bytecode is fetched, and passed to the enhancer for transformation (if necessary)
2. Control enters the OpenJPA Enhancer code. (_transforming is set true)
3. While in the OpenJPA Enhancer code, one of the OpenJPA impl types needs to be loaded by the ClassLoader.  This triggers the ClassLoader (which this thread has a lock on, as back then CLer access was synchronized) to load that OpenJPA impl type's bytecode and calls the enhancer (the CL just knows this is a class that it is loading, it cannot distinguish an entity class from a non-entity class)
4. Control re-enters the OpenJPA Enhancer code
5. Because this reentrancy can only happen if a class is loaded while processing the enhancement logic (step 3), the class being requested for transformation could only be an openjpa type, which is never going to be enhanced.  So return null (no transform) -- this condition is detected by the `_transforming` field set in (step 2).
6. Transformer returns with no change, CLer finishes loading the OpenJPA type
7. Control returns from the CLer back to the transformer
8. Transformer finishes transforming the class, and sets _transforming to false
9. Control returns to ClassLoader, which loads the transformed bytecode
10. Done, profit!

Now, because ClassLoaders now permit concurrent access among multiple threads, that _transforming boolean has to be thread context aware -- it should allow any number of threads to use the transformer, but the same thread would not be calling a classload and transform for any type other than a JPA provider impl type.

@rmannibucau
Copy link
Contributor Author

@jgrassel yes it violates but it is important to have it when startup time is needed otherwise you basically potentially load all your app twice for nothing (or load it all in mem which is not better). It is a common "workaround" for a poorly written spec if you want but it is used. Also note that strictly speaking it is compliant with the spec if you use a javaagent (even if I can agree it was maybe not the original intent but technically it is and it fully match the original goal of ensuring runtime enhancement works :).

About the loading: we just don't know as explained in previous message so we can't just have a skip flag (per thread or not). My PR works well while you use openjpa provided classes, with extensions it can requires some more work in integration code (wrap the transformer basically to prefilter known extensions) or we should add a config for the exclusion list in the pu (think it would be good anyway).

// it is faster to use substring when multiple tests are needed
if (className.startsWith("org/apache/")) {
final String sub = className.substring("org/apache/".length());
if (sub.startsWith("openjpa/") ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would likely trash our test classes too? They as well start with org.apache.openjpa...

Copy link
Contributor

@dazey3 dazey3 Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like code changes like this. Referencing hardcoded strings and doing numerous string comparisons/manipulation is kinda gross/hacky. All this complex string comparison is going to be pretty bad for performance when you have 100 persistence classes to transform and the have to be checked repeatedly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@struberg yes but it works cause we use build time enhancement - also probably why we didn't notice this issue earlier
@dazey3 as mentionned please benchmark it on the different kind of mentionned apps (no deps - jpa in the container, no deps + no tempclassloader - means temploader=apploader, lot of deps, lot of deps without temp loader, a few deps, a few deps without temp loader and you do the test without the javaagent or with it for all these cases). In practise this change is faster than most temploader usage and most bytecode reading so the assumption it is slower is not justified in the general case.

@struberg
Copy link
Member

I still wonder if this whole _transforming boolean is a bit gone wrong?

A.) it is set in a different method than cleared! For no apparent reason. This is not per se wrong, but a sign that the 'design' was not really clear.

B.) There is already a method needsEnhance in transform0. This alone would probably be enough to guard us?
The _transform flag blocks ANY recursive definition. That includes any other entity class you hit while transforming this class.
We get likely saved by all those classes will be enhanced later on anyway. Is this intentionally or by accident?

C.) Can't we get into troubles also in other cases in parallel CL mode?

I also don't really like the string matching. But I likely need one more day to get my head around the full complexity.

Btw, fun to still see things like OpenJpaException ke where ke most likely stands for 'KodoException' :)

@rmannibucau
Copy link
Contributor Author

@struberg agree on A, B is true but likely slower than this exclusion list (even if not sexy it is faster than a hashset in most cases and consumes way less constant mem ;)), C normally no since classloading concurrency is protected by the classloader by spec so only thing we care is being able to load a whole subgraph of openjpa from transform without paying all the self instrumentation cost if not already loaded (unlikely but can happen).

And I repeat what I wrote in the other PR since it was maybe not read by everyone: I'm happy to drop the string list (and likely the new config enabling to control it I didnt add yet in the PR) if we find a solution which works in all cases, I just fail to see a better compromise today - but see that a toggle does not work in a few cases :(.

@dazey3
Copy link
Contributor

dazey3 commented Jul 22, 2020

Let's keep in mind that the issue we are trying to determine a fix for here is that PCClassFileTransformer._transforming does not support concurrent transformation access, correct?

Current behavior:

@struberg was correct in pointing out that the current, simple boolean implementation locks a Class transformer instance from being used by ANY Thread concurrently, including itself. All reentrant access to the same instance will just return NULL; no blocking, locking, or anything.

In an environment where many Threads concurrently use the same Class transformer instance, a race condition occurs. Whichever Thread sets the boolean lock first, all other Threads seeking concurrent access return NULL and no transformation occurs. This is the behavior that needs to change. The boolean implementation needs to change behavior only to consider concurrent Thread access.

Proposed fixes:

It would appear that the reason for this current implementation was to prevent reentry for OpenJPA libraries/packages, and the implementation appears to do that. However, I think the original reason is irrelevant and should not be discussed in this fix. If further enhancements to this code want to be pursued later, under a separate issue, that's fine. The issue is NOT that there is an issue with recursive transformation on specific packages. What is important is preserving the current behavior as much as possible. The current implementation blocks ALL reentrance and current works fine; minus the issue with concurrent threads.

With the code change to a ThreadLocal boolean, the current behavior is preserved, but widened to support concurrent access. Rentrance would be allowed, but only on separate Threads. Even on the same Thread, recursive access would be blocks just like the current implementation. I think the only question is: Is concurrent transformation thread safe (for instance, MetaDataRepository)?

The proposed code change to use an exclusion list feels like a re-implementation of the original fix that introduced the PCClassFileTransformer._transforming boolean. I think that is the incorrect path to take here because we are only guessing at the purpose of the original problem ( like what packages we need to prevent reentry for). What matters is what the code DOES right now and preserving that behavior, as much as possible, is the safest path

@rmannibucau
Copy link
Contributor Author

@dazey3 not really, the concurrency is an issue on top of an original issue. The toggle never worked reliably whereas the exclusion list always worked until you plug custom impls under openjpa SPI (used by enhancement) or use a tmploader=apploader (it is done, good or not). This is why I jumped into this issue and proposed a fix version even if not 100% satisfying, it is saner than a toggle which blindly excludes classes and can miss enhancements.

@struberg
Copy link
Member

hi @dazey3 I took about whole evenings for about a week to dig into the code and all it's constellation I could think of. I think we now understand the problem really well. The whole recursive situation usually does not occur if the agent is attached dynamically, it does not occur if the PCClassFileTransformer is used programmatically etc.

It will mostly happen if the transformer is used as javaagent and then only for all the classes used in the PCClassFileTransformer code itself. Like org.apache.openjpa.meta.MetaDataRepository or serp.bytecode.lowlevel.ConstantPoolTable.

It will also not happen for any classes referenced inside an entity. When you touch the first entity the ClassFileTransformer#transform will return a byte[] which will be analysed by the JVM. And only after that it will load the classes found therein (and invoke the Transformer again subsequently).

Now both the boolean and the exclude list from @rmannibucau serve the same needs. Both are an exit criteria in the classloader iteration for internal classes.
Both have their pros and cons.

I personally like the ThreadLocal<Boolean> better as it needs less maintenance.
But actually there is one more thing we should check (later?): once the PCClassFileTransformer did successfully run for the first time and triggered all the classloading, do we need all this guarding at all? Or can we assume that all the necessary class loading did get triggered the first time already?

@rmannibucau
Copy link
Contributor Author

Hmm, a kind of double lock check on a volatile boolean? Think it makes more sense actually. Judt needs to bench the exclude list vs isEnhanced speed at runtime - would need to rerun it but think exclude list is faster - and we are good.

@struberg
Copy link
Member

While talking with @elexx about this issue I had another thought: probably we only need to guard against classes which are touched in PCClassFileTransformer#needsEnhance? because once this runs through the needsEnhance would return null also for OpenJPAs internal classes. Would need to dig down the rabbit hole...

@rmannibucau
Copy link
Contributor Author

rmannibucau commented Jul 23, 2020

@struberg right, this is where the exclude list is likely faster than the end of the method but functionally it should just be fine.

Side note: in agents it is common to exclude the agent classes + some jvm classes (ex: https://github.com/rmannibucau/sirona/blob/trunk/agent/javaagent/src/main/java/com/github/rmannibucau/sirona/javaagent/SironaTransformer.java#L194)

@dazey3
Copy link
Contributor

dazey3 commented Jul 23, 2020

@rmannibucau

The toggle never worked reliably

Other than with concurrency, which is the reason we are pursuing a fix in the first place, do you have examples of how this toggle doesnt work reliably?

it is saner than a toggle which blindly excludes classes and can miss enhancements.

Do you have examples of missed enhancements by the transformer due to this boolean toggle? As I am aware, recursive calls are not used to do transformation/enhancement.
This toggle is absolute in it's exclusion, it excludes all reentrance, however I do not know of an issue that demonstrates that to be an incorrect behavior. Is there an open defect to fix that issue if it exists? Can that issue not be fixed separately?

@struberg

The whole recursive situation usually does not occur if the agent is attached dynamically, it does not occur if the PCClassFileTransformer is used programmatically etc
...
It will mostly happen if the transformer is used as javaagent and then only for all the classes used in the PCClassFileTransformer code itself.

Thank you for verifying this. In a server environment, we are seeing that no transformation recursion is happening as a product of classloading/transforming a class. The current implementation is written in such a way that ALL recursion would be blocked and I do not know of any outstanding issues with that behavior.

Now both the boolean and the exclude list from @rmannibucau serve the same needs. Both are an exit criteria in the classloader iteration for internal classes.

I agree that both implementations serve the same needs. However, I prefer the boolean implementation because it maintains the original behavior of excluding ALL possible recursive calls, on the same Thread. If there is an issue with this implementation, I don't see why that can't be fixed in a separate issue that documents THAT issue.

@rmannibucau
Copy link
Contributor Author

@dazey3 yep, the one mentionned on the list and the PRs: tmploader==apploader. The toggle just disable enhancement under some setups (close to openliberty ones where the load class goes into the apploader which implements itself the transformer). That said you have a very good point that JVM instrumentation does not need that since it skips by itself the nested enhancement so seems it is only appearing on custom classloaders in app servers (like tomcat or openliberty).

So overall it seems it means that it is an application server bug that this boolean is needed.

Regarding the exclude list vs boolean: the exclude list makes the behavior deterministic vs the boolean depends which classes where touched or not before which is way less deterministic and error prone so between both I prefer the exclude list to at least guarantee the understanding is unique.

However, in regards to previous point it seems we should just drop this as a default behavior and use a config to have a nested exclusion for buggy servers for compatibility only.

Concretely it would be something like: if "openjpa.agent.preventNestedTransformations=true" then we wouldnt use PCClassFileTransformer but HarnessedPCClassFileTransformer which would fully delegate to PCClassFileTransformer but transform methods would be wrapped in with this boolean:

if (transforming.get() != null) return null;
transforming.set(true);
try {
    delegate.transform();
} finally {
    transforming.remove();
}

This way we handle buggy impls with a flag but our default is sane and aligned on the JVM.

Long term we can report the bugs in the app servers and make them properly fix it (it is not limited to openjpa).

@dazey3
Copy link
Contributor

dazey3 commented Jul 23, 2020

@rmannibucau

The toggle just disable enhancement under some setups

Perhaps you can explain what you mean here. How does the toggle disable enhancement under "some" setups? The toggle only disables reentrance, but it does it indiscriminately.

That said you have a very good point that JVM instrumentation does not need that since it skips by itself the nested enhancement so seems it is only appearing on custom classloaders in app servers

Please explain.

So overall it seems it means that it is an application server bug that this boolean is needed.

Not sure if I understand what you mean here. What is the server bug?

@rmannibucau
Copy link
Contributor Author

@dazey3 oki, the fact you can't be in transform from transform is only true for a javaagent called from defineClassX, not from a custom classloader handling the transformation itself so if you use such boolean then you just disable the enhancement for potentially other classes usince container entity manager (which are used in standalone too btw). Letting tranform trigger another transform can be considered - or not by implementors - as a bug, but it also means we must support it so not use a toggle. So overall I'm certain exclude list works while a SPI is not "unexpected" or you use plain OpenJPA but there are several cases the toggle can break the environment in hybrid - SE/EE-light but using container JPAAPI - but existing (common actually) envs.

@asfgit asfgit force-pushed the OPENJPA-2817_PCClassFileTransformer-exclusions branch from 9e665c0 to 1c14bc2 Compare July 23, 2020 19:17
@dazey3
Copy link
Contributor

dazey3 commented Jul 27, 2020

oki, the fact you can't be in transform from transform is only true for a javaagent called from defineClassX, not from a custom classloader handling the transformation itself so if you use such boolean then you just disable the enhancement for potentially other classes usince container entity manager (which are used in standalone too btw).

I am not following what your talking about. The current "toggle" implementation blocks reentrant transformation (transform triggering another transform) on the same org.apache.openjpa.enhance.PCClassFileTransformer instance. I don't see how the ClassLoader is involved in this function of the java.lang.instrument.ClassFileTransformer. What do you mean by "a custom classloader handling the transformation itself"?

Letting transform trigger another transform can be considered - or not by implementors - as a bug, but it also means we must support it so not use a toggle.

I don't feel like you have explained why "we must support" reentrance nor why "we must support it so not use a toggle". Is there an existing bug that describes reentrance occurring and breaking some usecase? The toggle appears to block ALL reentrance so I don't see how such a bug could exist, but if if there is, please direct me to it or explain it in detail

@rmannibucau
Copy link
Contributor Author

Transformers only make sense for classloaders and classloaders are the one calling transformers - either in loadclass for custom classloaders and defineclass in jvm one with agent.

Issue is when loading transformer classes from the transform method (will end in classnotfound or noclassdeferror if not excluded) so one or two classes must be excluded. Everything else is a perf optim.

@rmannibucau
Copy link
Contributor Author

up, i'd like it to be fixed for 3.1.3

current options are:

  1. a threadlocal toggle which is likely to work has the drawback to make the impl context dependent
  2. exclusions which always work as soon as configurable (defaults working for 99% of deployments) and are not really slower in most cases (it is only slower if entities are not listed so it concerns only standalone case where build time enhancement is more than highly recommended)

I'm still in favor of 2 since 1 is hard to justify in all cases I can see - indeed config to add but I can handle that.

@rmannibucau
Copy link
Contributor Author

up, forgot to mention option 3. don't protect over this since it shouldn't happen in several PCClassFileTransformer cases (all but agent one?). Otherwise if protection is still very desired I'm still in favor of 2 which is overall saner IMHO and align on the classfiletransformer behavior of the system it runs into instead of assuming it (it is not the same for custom classloaders and agent from what we reviewed).

@rmannibucau
Copy link
Contributor Author

Up, still an issue with 3.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants