Classpath stability for tests #261

seancorfield · 2022-11-14T22:42:15Z

Describe the bug
The classpath that poly test calculates is not stable/reproducible across machines, leading to unpredictable behavior of tests across multiple machines.

To Reproduce
I added debug output to the test orchestrator to print all-paths so that I could see the order being different on multiple machines. The Jetty libraries have changed their group/artifact over time but retained some of the same class names which means that a non-deterministic ordering of the classpath means that you can have tests pass on one machine but fail on another due to that ordering cause a different artifact to be found "first" on the classpath and an incompatible class used.

Expected behavior
tools.deps.alpha uses a specific sorting algorithm to ensure a reproducible classpath across all environments -- the ideal would be Polylith producing that exact same order (this is tricky because poly test has to construct a projection of the :test alias across all the bricks in order to construct the "basis"). Not so ideal but acceptable would at least be for the order of the classpath produced by poly test to be deterministic. That means not using sets or hash maps to store elements of the classpath.

The text was updated successfully, but these errors were encountered:

tengstrand · 2022-12-08T13:21:30Z

This should be solved by #263 @seancorfield I think.

seancorfield · 2022-12-08T16:31:12Z

No. This is not fixed. Please reopen it.

tengstrand · 2023-03-30T12:14:18Z

I now sort the libs (line 17) in the issue-261 branch:

polylith/components/test-runner-orchestrator/src/polylith/clj/core/test_runner_orchestrator/core.clj

Lines 15 to 18 in ab63192

    
           (defn resolve-deps [{:keys [name] :as project} settings is-verbose color-mode] 
        
             (try 
        
               (sort (into #{} (mapcat #(-> % second :paths)) 
        
                               (deps/resolve-deps project settings is-verbose)))

I'm not sure if that solves your problem @seancorfield. I couldn't figure out how to let the poly tool produce the exact same order as tools.deps (even though you tried to explain it).

seancorfield · 2023-03-30T16:44:57Z

That may make things worse -- but it does at least make things predictable (even if it ends up being predictably wrong).

I want to keep this open until I can spend some time looking into this problem. It's not going to affect most people, with smaller projects, but this sort of thing can have a really bad impact on projects of the scale we have at work.

ieugen · 2023-05-05T11:29:31Z

I think I've also encountered this bug.

In my case I get this error while running clojure -M:poly test

Caused by: java.sql.SQLException: No suitable driver
 at java.sql.DriverManager.getDriver (DriverManager.java:299)
    com.zaxxer.hikari.util.DriverDataSource.<init> (DriverDataSource.java:106)
    com.zaxxer.hikari.pool.PoolBase.initializeDataSource (PoolBase.java:326)
    com.zaxxer.hikari.pool.PoolBase.<init> (PoolBase.java:112)
    com.zaxxer.hikari.pool.HikariPool.<init> (HikariPool.java:93)
    com.zaxxer.hikari.HikariDataSource.getConnection (HikariDataSource.java:112)
    next.jdbc.connection$make_connection.invokeStatic (connection.clj:452)
    next.jdbc.connection$make_connection.invoke (connection.clj:436)
    next.jdbc.connection$eval22475$fn__22476.invoke (connection.clj:481)
    next.jdbc.protocols$eval22205$fn__22206$G__22196__22213.invoke (protocols.clj:25)
    next.jdbc$get_connection.invokeStatic (jdbc.clj:168)
    next.jdbc$get_connection.invoke (jdbc.clj:147)
    dre.semmed_ds.interface_test$fn__93117.invokeStatic (interface_test.clj:17)

I do have com.h2database/h2 {:mvn/version "2.1.214"} added to component class path, :test classpath, dev project classpath and :test classpath and also project test classpath .

I still get the above error.

Inside dev repl I can create the component that creates a hikari datasource using next jdbc component .
I can also run the code for the test and it works.

tengstrand · 2023-05-05T16:52:33Z

Is it possible to reproduce the problem in a smaller public repo and share it with me @ieugen?

ieugen · 2023-05-08T14:54:48Z

@tengstrand : I added a repro in this PR ieugen/poly-rcf#1 .

I run clojure -M:poly test .
I have component added to 2 projects.
First project tests are ran ok.
Second project fails with missing class.

Big questions are:

Why are tests ran 2 times for the same component?
Why is the classpath different for second run?

wkok · 2024-03-02T13:32:24Z

Any update on this issue?

We're encountering the same problem on some machines when running poly tests and at first we thought it's related to a specific dependency's problem as we've logged here:
https://discuss.xtdb.com/t/arrow-vector-conflicting-dependency/380/2

but then I discovered this issue which might explain better what we're seeing.

tengstrand · 2024-03-02T18:24:41Z

I have some ideas on how to solve this, so will have a look.

seancorfield · 2024-03-02T18:58:01Z

The reason we haven't spent any time on this at work (since we were the most affected and the first to be affected) is that we switched to the external test runner and, combined with a small change to our dependencies (which I no longer remember, sorry), the problem went away.

I am concerned that any changes to the classpath order at this point could break someone's setup, i.e., making it worse for some people, even if it makes it better for others.

Part of the problem here is that poly test uses the projects deps.edn but then augments the :test alias with parts of the :test alias from each of the bricks' deps.edn -- this is inherently different to how tools.deps does things and so there's no "global" view of dependencies. Artificially ordering the classpath is almost certainly the wrong approach (effectively what the suggested change of March 30th, 2023 proposed).

Polylith's decision to have :test aliases in bricks with each brick's test-only dependencies is part of the cause here but there's no reliable way to combine those into the projects dependencies (and then use tools.deps just once on the result) and guarantee that you'll a sane behavior, unfortunately.

I've continued to think about this problem on and off over the past year and I still don't have a good answer. The closest I've come is to make a fork of tools.deps that supports transitive alias usage across :local/root dependencies, since that's really the only way to guarantee the behavior that poly test is going to need (and that's non-trivial -- I've taken a few runs at it, unsuccessfully).

tengstrand · 2024-03-03T18:34:56Z

I have copied (and slightly changed) the poly-rcf example to examples/poly-rcf in the polylith repo, that also includes all the instructions on how to test the code and what changes that have been made. I also wanted to see if the code worked as a monolith, so I created a monolith directory that contains the same code and libraries.

I still don't understand why the default test runner doesn't behave exactly the same on all machines. I get that it doesn't behave the same as tools.deps, but it should at least behave the same every time. So if it works on one machine, it should also work on another machine? Maybe you have some ideas?

tengstrand · 2024-03-06T07:11:26Z

The main problem was that the tests didn't run in sufficient isolation, but when I switched to the external test runner it worked! I have summarized what I did here.

seancorfield · 2024-03-06T17:13:35Z

Good to know that the external test runner is a potential solution here.

One extra thing it has allowed us to do (and a blog post is coming on this at some point): merge our poly test into our build.clj process. Previously (0.2.18), we used to spawn a new Java process in our build that ran poly test as a command -- we'd originally done this to isolate our tests from our build deps -- then with the external test runner, we ended up with three processes running: build, poly test, and the tests themselves. With 0.2.19, we've added the poly dependency to our :build alias and the poly API allows us to kick off poly test (essentially) in the same process as build, and then our tests run in a completely isolated process.

The new poly API also allows us to compute changed projects in build instead of spawning a Java process to run the poly command so that has speeded things up a bit too.

issue #261 * Added examples/poly-rcf. * 0.2.20-SNAPSHOT #6.

tengstrand · 2024-03-07T05:12:08Z

I was thinking that it could be time to get rid of these isolation problems. My idea is to copy the external test runner code into the polylith repo (and make necessary modifications). What do you think @seancorfield? Then we could either make this the default test runner (in that case we need to discuss this with the community). Or we could set it in the global test config when a workspace is created. When we implement a test runner for ClojureScript (when we have support for cljs) we could make it isolated too, and everything would be consistent. @furkan3ayraktar @imrekoszo

seancorfield · 2024-03-07T05:22:58Z

I'm torn. I think encouraging multiple test runner implementations is a good thing. I don't think favoring one over another is ideal.

For most projects, Polylith's in-process test runner is just fine. It was fine for us for quite a while, even at the scale (complexity) we reached.

Some people are going to want to use Kaocha -- and maybe they'd want the option of an in-process vs external test running (the latter is not available as I understand it?).

There are also all sorts of controls folks might want to apply to their test running process: should it include src as well as test (I recently created an issue about that in my external test runner), should tests be filtered by metadata somehow, e.g,. the ability to mark tests as "integration" (vs unit) so you could ask poly test to only run certain types of tests.

Basically, there are all sorts of "knobs and dials" that test runners provide that -- currently -- poly test has no way to tap into. Look at the options for Cognitect's test runner and Kaocha. Somehow, we should provide a way for those sorts of configuration to be easily available to any test runner.

I'd prefer to stay with the current overall machinery but improve the test runner "API" and make it easier for poly test to pass additional (command-line) options through to the underlying test runner. It would be nice if Polylith's documentation around testing talked about how to use the two available test runners, which might encourage others to write their own.

tengstrand · 2024-03-07T06:27:31Z

Good points!

imrekoszo · 2024-03-07T10:27:41Z

I agree with Sean's last comment here, here are my thoughts:

Kaocha -- and maybe they'd want the option of an in-process vs external test running (the latter is not available as I understand it?)

Yep, I have been wanting to add support for this to polylith-kaocha but I haven't got to it so far.

there are all sorts of "knobs and dials" that test runners provide

This is very true. There is a good blog post from back in 2018 about the various responsibilities around running tests, and how Kaocha chose to address a subset of those.

I had made a related comment on another issue here in 2021 and I still agree with the main points in that:

poly should be responsible for
- calculating project classpaths
- determining which projects/bricks are affected by changes since the stable point
test runners should be responsible for
- discovering/running/reporting etc. tests given a classpath

A default, built-in test runner helps a lot with getting started with a workspace (and also by serving as an example for 3rd party runners). Whether that's an in-process or external (or there are built-in options for both) doesn't really matter IMO.

BTW, following up on Sean's suggestion about perhaps forking tools.deps for classpath calculation, there's a lib for classloader management - perhaps it can be helpful: https://github.com/lambdaisland/classpath

Edit: now that I look at the api of that lib, it might not be that useful after all

tengstrand · 2024-03-07T15:26:15Z

Good input @imrekoszo.

tengstrand · 2024-03-07T15:38:01Z

I managed to get the poly-rcf example repo to work (see the comments in the readme). Have you had time to look at you problems, and take similar actions that I did @ieugen @wkok ?

wkok · 2024-03-07T17:39:03Z

I managed to get the poly-rcf example repo to work (see the comments in the readme). Have you had time to look at you problems, and take similar actions that I did @ieugen @wkok ?

Hey @tengstrand we have since swapped out dependencies in our project (due to other reasons) and are not seeing this problem anymore with our current set of dependencies when running poly tests.

ieugen · 2024-03-07T21:53:30Z

Have you had time to look at you problems, and take similar actions that I did

Not yet. I am busy with some other work. I might not be able to do that in the near future.
I will post here when I do.

tengstrand · 2024-03-08T06:38:57Z

I created issue 448. Please take a look and see if it looks reasonable @seancorfield @ieugen.

seancorfield · 2024-03-08T18:26:27Z

I've responded there but I'm not sure why you've created that issue since it doesn't appear to solve any issue raised here (or anywhere else)?

tengstrand · 2024-03-09T06:22:10Z

It's mentioned by @ieugen here:
"Polylith does not pass source paths to tests so to make tests unde src/ work, you will have to add them to your :extra-paths in :test alias. Hopefully this issue is going to be fixed and RCF will work out of the box, just by providing the JVM option.".

tengstrand · 2024-03-09T07:38:24Z

I added the link to issue 448 because it was mentioned in the workspace that reproduced this issue. But yes, you are right Sean that it wasn't directly related to this issue.

tengstrand closed this as completed Dec 8, 2022

furkan3ayraktar reopened this Dec 8, 2022

tengstrand added the improvement Not a bug but an improvement to overall user experience label Mar 13, 2023

ieugen added a commit to ieugen/poly-rcf that referenced this issue May 8, 2023

Added sample project, base component to reproduce polyfy/polylith#261

4990590

ieugen added a commit to ieugen/poly-rcf that referenced this issue May 8, 2023

Added sample project, base component to reproduce polyfy/polylith#261

6d5fe28

ieugen mentioned this issue May 8, 2023

Poly 261 - Classpath stability for tests ieugen/poly-rcf#1

Merged

tengstrand mentioned this issue Mar 6, 2024

Investigated issue 261 #447

Merged

tengstrand added a commit that referenced this issue Mar 6, 2024

Investigated issue 261 (#447)

408ef27

issue #261 * Added examples/poly-rcf. * 0.2.20-SNAPSHOT #6.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classpath stability for tests #261

Classpath stability for tests #261

seancorfield commented Nov 14, 2022

tengstrand commented Dec 8, 2022

seancorfield commented Dec 8, 2022

tengstrand commented Mar 30, 2023

seancorfield commented Mar 30, 2023

ieugen commented May 5, 2023

tengstrand commented May 5, 2023

ieugen commented May 8, 2023

wkok commented Mar 2, 2024

tengstrand commented Mar 2, 2024

seancorfield commented Mar 2, 2024 •

edited

Loading

tengstrand commented Mar 3, 2024

tengstrand commented Mar 6, 2024

seancorfield commented Mar 6, 2024

tengstrand commented Mar 7, 2024 •

edited

Loading

seancorfield commented Mar 7, 2024

tengstrand commented Mar 7, 2024

imrekoszo commented Mar 7, 2024 •

edited

Loading

tengstrand commented Mar 7, 2024

tengstrand commented Mar 7, 2024

wkok commented Mar 7, 2024

ieugen commented Mar 7, 2024

tengstrand commented Mar 8, 2024

seancorfield commented Mar 8, 2024

tengstrand commented Mar 9, 2024

tengstrand commented Mar 9, 2024

Classpath stability for tests #261

Classpath stability for tests #261

Comments

seancorfield commented Nov 14, 2022

tengstrand commented Dec 8, 2022

seancorfield commented Dec 8, 2022

tengstrand commented Mar 30, 2023

seancorfield commented Mar 30, 2023

ieugen commented May 5, 2023

tengstrand commented May 5, 2023

ieugen commented May 8, 2023

wkok commented Mar 2, 2024

tengstrand commented Mar 2, 2024

seancorfield commented Mar 2, 2024 • edited Loading

tengstrand commented Mar 3, 2024

tengstrand commented Mar 6, 2024

seancorfield commented Mar 6, 2024

tengstrand commented Mar 7, 2024 • edited Loading

seancorfield commented Mar 7, 2024

tengstrand commented Mar 7, 2024

imrekoszo commented Mar 7, 2024 • edited Loading

tengstrand commented Mar 7, 2024

tengstrand commented Mar 7, 2024

wkok commented Mar 7, 2024

ieugen commented Mar 7, 2024

tengstrand commented Mar 8, 2024

seancorfield commented Mar 8, 2024

tengstrand commented Mar 9, 2024

tengstrand commented Mar 9, 2024

seancorfield commented Mar 2, 2024 •

edited

Loading

tengstrand commented Mar 7, 2024 •

edited

Loading

imrekoszo commented Mar 7, 2024 •

edited

Loading