-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OPENJPA-2817] dropping _transforming from PCClassFileTransformer and replace it by a minimal exclusion list #65
base: master
Are you sure you want to change the base?
Conversation
I've already stated I'm not a fan of the exclusion list, for something that's going to be called once for every class that has to be checked to be enhanced, let alone be enhanced, this is going to be very costly. Might as well make the transform() method itself synchronized. |
@jgrassel commons is excluded and the jdk as well, just to skip reading the bytecode (which is more costly than this exclusion list if you benchmark both options so overall the fact it is slower or faster depends the classpath size but it is unlikely you notice the difference, in particular with a custom classloader or a default temporary classloader usage which is way slower than that). Also note that the threadlocal option does not work as mentionned so I'm fully open to have another solution but it is better to have a functional solution than nothing I think and I don't have another proposal for now. |
Yeah, I noticed that commons was excluded after taking a second look through the change set, my comment update wasn't fast enough. =). I'm not sure I'm clear on why the ThreadLocal approach does not work? |
@jgrassel The threadlocal - as the boolean today - just skips the transformer, so it means that if a class must be enhanced and is loaded while another is enhanced then it is skipped and runtime is likely broken. It is not highly highly likely but not uncommon too. In the common cases I see, the case tmpLoader=mainloader (this is a common optimization which saves a lot of time, in particular with CDS on or too wide temp loader impls. In this last case, if the agent is set on the JVM for runtime enhancement, it should work whereas the classloader will guarantee a lock per class and will enable to load a class during a loadclass and the second one will bypass the transformer (wrongly). Also note that the enhancement, using openjpaconf, can load any plugin working on the class graph (thinking to MDR extensions to be concrete) so we don't fully control what can be loaded by openjpa itself during a transform0. So at the end, the only thing we are sure is that we can't transform concurrently a single class but anything else can happen (concurrent calls to transform, nested calls etc). However the main case is still that the "subloading" will be about classes of the enhancer graph so the openjpa+serp+asm (i will add this one ;), thanks for reminding it to me) explicit exclusion is a good default I think. |
@rmannibucau That optimization sounds like a violation of the PersistenceUnitInfo contract:
It specifically states Anyways, the
Now, because ClassLoaders now permit concurrent access among multiple threads, that _transforming boolean has to be thread context aware -- it should allow any number of threads to use the transformer, but the same thread would not be calling a classload and transform for any type other than a JPA provider impl type. |
@jgrassel yes it violates but it is important to have it when startup time is needed otherwise you basically potentially load all your app twice for nothing (or load it all in mem which is not better). It is a common "workaround" for a poorly written spec if you want but it is used. Also note that strictly speaking it is compliant with the spec if you use a javaagent (even if I can agree it was maybe not the original intent but technically it is and it fully match the original goal of ensuring runtime enhancement works :). About the loading: we just don't know as explained in previous message so we can't just have a skip flag (per thread or not). My PR works well while you use openjpa provided classes, with extensions it can requires some more work in integration code (wrap the transformer basically to prefilter known extensions) or we should add a config for the exclusion list in the pu (think it would be good anyway). |
// it is faster to use substring when multiple tests are needed | ||
if (className.startsWith("org/apache/")) { | ||
final String sub = className.substring("org/apache/".length()); | ||
if (sub.startsWith("openjpa/") || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would likely trash our test classes too? They as well start with org.apache.openjpa...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't like code changes like this. Referencing hardcoded strings and doing numerous string comparisons/manipulation is kinda gross/hacky. All this complex string comparison is going to be pretty bad for performance when you have 100 persistence classes to transform and the have to be checked repeatedly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@struberg yes but it works cause we use build time enhancement - also probably why we didn't notice this issue earlier
@dazey3 as mentionned please benchmark it on the different kind of mentionned apps (no deps - jpa in the container, no deps + no tempclassloader - means temploader=apploader, lot of deps, lot of deps without temp loader, a few deps, a few deps without temp loader and you do the test without the javaagent or with it for all these cases). In practise this change is faster than most temploader usage and most bytecode reading so the assumption it is slower is not justified in the general case.
I still wonder if this whole A.) it is set in a different method than cleared! For no apparent reason. This is not per se wrong, but a sign that the 'design' was not really clear. B.) There is already a method C.) Can't we get into troubles also in other cases in parallel CL mode? I also don't really like the string matching. But I likely need one more day to get my head around the full complexity. Btw, fun to still see things like |
@struberg agree on A, B is true but likely slower than this exclusion list (even if not sexy it is faster than a hashset in most cases and consumes way less constant mem ;)), C normally no since classloading concurrency is protected by the classloader by spec so only thing we care is being able to load a whole subgraph of openjpa from transform without paying all the self instrumentation cost if not already loaded (unlikely but can happen). And I repeat what I wrote in the other PR since it was maybe not read by everyone: I'm happy to drop the string list (and likely the new config enabling to control it I didnt add yet in the PR) if we find a solution which works in all cases, I just fail to see a better compromise today - but see that a toggle does not work in a few cases :(. |
Let's keep in mind that the issue we are trying to determine a fix for here is that Current behavior:@struberg was correct in pointing out that the current, simple boolean implementation locks a Class transformer instance from being used by ANY Thread concurrently, including itself. All reentrant access to the same instance will just return NULL; no blocking, locking, or anything. In an environment where many Threads concurrently use the same Class transformer instance, a race condition occurs. Whichever Thread sets the boolean lock first, all other Threads seeking concurrent access return NULL and no transformation occurs. This is the behavior that needs to change. The boolean implementation needs to change behavior only to consider concurrent Thread access. Proposed fixes:It would appear that the reason for this current implementation was to prevent reentry for OpenJPA libraries/packages, and the implementation appears to do that. However, I think the original reason is irrelevant and should not be discussed in this fix. If further enhancements to this code want to be pursued later, under a separate issue, that's fine. The issue is NOT that there is an issue with recursive transformation on specific packages. What is important is preserving the current behavior as much as possible. The current implementation blocks ALL reentrance and current works fine; minus the issue with concurrent threads. With the code change to a ThreadLocal boolean, the current behavior is preserved, but widened to support concurrent access. Rentrance would be allowed, but only on separate Threads. Even on the same Thread, recursive access would be blocks just like the current implementation. I think the only question is: Is concurrent transformation thread safe (for instance, MetaDataRepository)? The proposed code change to use an exclusion list feels like a re-implementation of the original fix that introduced the |
@dazey3 not really, the concurrency is an issue on top of an original issue. The toggle never worked reliably whereas the exclusion list always worked until you plug custom impls under openjpa SPI (used by enhancement) or use a tmploader=apploader (it is done, good or not). This is why I jumped into this issue and proposed a fix version even if not 100% satisfying, it is saner than a toggle which blindly excludes classes and can miss enhancements. |
hi @dazey3 I took about whole evenings for about a week to dig into the code and all it's constellation I could think of. I think we now understand the problem really well. The whole recursive situation usually does not occur if the agent is attached dynamically, it does not occur if the PCClassFileTransformer is used programmatically etc. It will mostly happen if the transformer is used as javaagent and then only for all the classes used in the PCClassFileTransformer code itself. Like It will also not happen for any classes referenced inside an entity. When you touch the first entity the Now both the boolean and the exclude list from @rmannibucau serve the same needs. Both are an exit criteria in the classloader iteration for internal classes. I personally like the |
Hmm, a kind of double lock check on a volatile boolean? Think it makes more sense actually. Judt needs to bench the exclude list vs isEnhanced speed at runtime - would need to rerun it but think exclude list is faster - and we are good. |
While talking with @elexx about this issue I had another thought: probably we only need to guard against classes which are touched in |
@struberg right, this is where the exclude list is likely faster than the end of the method but functionally it should just be fine. Side note: in agents it is common to exclude the agent classes + some jvm classes (ex: https://github.com/rmannibucau/sirona/blob/trunk/agent/javaagent/src/main/java/com/github/rmannibucau/sirona/javaagent/SironaTransformer.java#L194) |
Other than with concurrency, which is the reason we are pursuing a fix in the first place, do you have examples of how this toggle doesnt work reliably?
Do you have examples of missed enhancements by the transformer due to this boolean toggle? As I am aware, recursive calls are not used to do transformation/enhancement.
Thank you for verifying this. In a server environment, we are seeing that no transformation recursion is happening as a product of classloading/transforming a class. The current implementation is written in such a way that ALL recursion would be blocked and I do not know of any outstanding issues with that behavior.
I agree that both implementations serve the same needs. However, I prefer the boolean implementation because it maintains the original behavior of excluding ALL possible recursive calls, on the same Thread. If there is an issue with this implementation, I don't see why that can't be fixed in a separate issue that documents THAT issue. |
@dazey3 yep, the one mentionned on the list and the PRs: tmploader==apploader. The toggle just disable enhancement under some setups (close to openliberty ones where the load class goes into the apploader which implements itself the transformer). That said you have a very good point that JVM instrumentation does not need that since it skips by itself the nested enhancement so seems it is only appearing on custom classloaders in app servers (like tomcat or openliberty). So overall it seems it means that it is an application server bug that this boolean is needed. Regarding the exclude list vs boolean: the exclude list makes the behavior deterministic vs the boolean depends which classes where touched or not before which is way less deterministic and error prone so between both I prefer the exclude list to at least guarantee the understanding is unique. However, in regards to previous point it seems we should just drop this as a default behavior and use a config to have a nested exclusion for buggy servers for compatibility only. Concretely it would be something like: if "openjpa.agent.preventNestedTransformations=true" then we wouldnt use PCClassFileTransformer but HarnessedPCClassFileTransformer which would fully delegate to PCClassFileTransformer but transform methods would be wrapped in with this boolean:
This way we handle buggy impls with a flag but our default is sane and aligned on the JVM. Long term we can report the bugs in the app servers and make them properly fix it (it is not limited to openjpa). |
Perhaps you can explain what you mean here. How does the toggle disable enhancement under "some" setups? The toggle only disables reentrance, but it does it indiscriminately.
Please explain.
Not sure if I understand what you mean here. What is the server bug? |
@dazey3 oki, the fact you can't be in transform from transform is only true for a javaagent called from defineClassX, not from a custom classloader handling the transformation itself so if you use such boolean then you just disable the enhancement for potentially other classes usince container entity manager (which are used in standalone too btw). Letting tranform trigger another transform can be considered - or not by implementors - as a bug, but it also means we must support it so not use a toggle. So overall I'm certain exclude list works while a SPI is not "unexpected" or you use plain OpenJPA but there are several cases the toggle can break the environment in hybrid - SE/EE-light but using container JPAAPI - but existing (common actually) envs. |
…eplace it by a minimal exclusion list
9e665c0
to
1c14bc2
Compare
I am not following what your talking about. The current "toggle" implementation blocks reentrant transformation (transform triggering another transform) on the same org.apache.openjpa.enhance.PCClassFileTransformer instance. I don't see how the ClassLoader is involved in this function of the
I don't feel like you have explained why "we must support" reentrance nor why "we must support it so not use a toggle". Is there an existing bug that describes reentrance occurring and breaking some usecase? The toggle appears to block ALL reentrance so I don't see how such a bug could exist, but if if there is, please direct me to it or explain it in detail |
Transformers only make sense for classloaders and classloaders are the one calling transformers - either in loadclass for custom classloaders and defineclass in jvm one with agent. Issue is when loading transformer classes from the transform method (will end in classnotfound or noclassdeferror if not excluded) so one or two classes must be excluded. Everything else is a perf optim. |
up, i'd like it to be fixed for 3.1.3 current options are:
I'm still in favor of 2 since 1 is hard to justify in all cases I can see - indeed config to add but I can handle that. |
up, forgot to mention option 3. don't protect over this since it shouldn't happen in several PCClassFileTransformer cases (all but agent one?). Otherwise if protection is still very desired I'm still in favor of 2 which is overall saner IMHO and align on the classfiletransformer behavior of the system it runs into instead of assuming it (it is not the same for custom classloaders and agent from what we reviewed). |
Up, still an issue with 3.2.0 |
Follow up PR of #64
Goal is to let people test this proposed fix to ensure we didn't miss anything in the analyzis.