Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mull cannot run CTest's tests #794

Closed
ligurio opened this issue Dec 19, 2020 · 19 comments
Closed

Mull cannot run CTest's tests #794

ligurio opened this issue Dec 19, 2020 · 19 comments

Comments

@ligurio
Copy link
Contributor

ligurio commented Dec 19, 2020

CMake allows to generate binaries with unit tests from a C functions and these tests supports special protocol to run testcases [1].
To run a single testcase one need to execute a binary and choose a number of testcase:

(venv) sergeyb@pony:~/sources/FreeRDP/build$ ./Testing/TestPipe 
Available tests:
  0. TestPipeCreatePipe
  1. TestPipeCreateNamedPipe
  2. TestPipeCreateNamedPipeOverlapped
To run a test, enter the test number: 1
Server ReadFile: 32 bytes
Client ReadFile: 32 bytes
(venv) sergeyb@pony:~/sources/FreeRDP/build$ 

or run specify arbitrary name of testcase to run:

(venv) sergeyb@pony:~/sources/FreeRDP/build$ ./Testing/TestPipe -R TestPipeCreateNamedPipe
Server ReadFile: 32 bytes
Client ReadFile: 32 bytes
(venv) sergeyb@pony:~/sources/FreeRDP/build$ 

Also there was a MR to run all testcases at once [2], but it was not merged to a master though.

FreeRDP project actively uses tests written to run with CTest. Test failed with timeout when running using Mull because Mull don't know that test expects an input with a number of testcase.

$ git clone https://github.com/FreeRDP/FreeRDP/
$ cmake -DCMAKE_C_FLAGS="-fembed-bitcode -g -O0" -DCMAKE_CXX_FLAGS="-fembed-bitcode -g -O0" -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=ON  -DCMAKE_C_COMPILER="/usr/bin/clang" -DCMAKE_CXX_COMPILER="/usr/bin/clang++" -DBUILD_TESTING=ON ..
$ make -j
$ mull-cxx -test-framework=CustomTest --ld-search-path=/lib/x86_64-linux-gnu --compdb-path=compile_commands.json  --reporters=Elements ./Testing/TestAsn1  
[info] Extracting bitcode from executable (threads: 1)
       [################################] 1/1. Finished in 2ms
[info] Loading bitcode files (threads: 8)
       [################################] 13/13. Finished in 11ms
[info] Compiling instrumented code (threads: 8)
       [################################] 13/13. Finished in 10ms
[info] Loading dynamic libraries (threads: 1)
       [################################] 1/1. Finished in 1ms
[info] Searching tests (threads: 1)
       [################################] 1/1. Finished in 0ms
[info] Preparing original test run (threads: 1)
       [################################] 1/1. Finished in 2ms
[info] Running original tests (threads: 1)
[warning] Original test failed
test: main
status: Timedout
stdout: 'Available tests:
  0. TestAsn1Module
  1. TestAsn1Encoder
  2. TestAsn1Decoder
  3. TestAsn1Encode
  4. TestAsn1Decode
  5. TestAsn1String
  6. TestAsn1Integer
  7. TestAsn1Compare
  8. TestAsn1BerEnc
  9. TestAsn1BerDec
 10. TestAsn1DerEnc
 11. TestAsn1DerDec
To run a test, enter the test number: '
stderr: ''

       [################################] 1/1. Finished in 3014ms
[info] No mutants found. Mutation score: infinitely high
[info] Total execution time: 3077ms
  1. https://gitlab.kitware.com/cmake/community/-/wikis/doc/ctest/Testing-With-CTest#running-individual-tests
  2. https://gitlab.kitware.com/cmake/cmake/-/merge_requests/3661#note_617780
@ligurio
Copy link
Contributor Author

ligurio commented Apr 29, 2021

Also there was a MR to run all testcases at once [2], but it was not merged to a master though.

In upcoming cmake release it will be possible to run all testcases in a single test run Kitware/CMake@3f6ff4b

(I think it will be helpful if Mull's authors will decide to support tests generated by CMake.)

@correaa
Copy link

correaa commented Nov 24, 2024

What is the correct way to use mull with cmake/ctest?

If for example this ok:

$ CXX=clang++-11 cmake .. -DCMAKE_CXX_FLAGS="-O1 -fexperimental-new-pass-manager -fpass-plugin=/usr/lib/mull-ir-frontend-11 -g -grecord-command-line" -DCMAKE_TEST_LAUNCHER="mull-runner-11;--strict;--ld-search-path=/lib/x86_64-linux-gnu"

It is not clear if the subsequent call to ctest actually does something with mull

ctest -j 2 --output-on-failure --verbose

(mull is either not run or nothing is detected)

and neither I understand if when I run on individual files, the reports are relevant, since only one test at a time is being checked instead of all at the same time.

https://gitlab.com/correaa/boost-multi/-/jobs/8455957522#L1519

In particular I don't understand what happens when the test to a library are distributed across several .cpp/execuable files.
Does mull integrate the test to check that a mutant survives? Because if not a single small test will never cover all the possible mutations of a library.
Please advise.

@AlexDenisov
Copy link
Member

Hi @correaa, thank you for the feedback and questions!

What is the correct way to use mull with cmake/ctest?

There is no easy way to integrate Mull into ctest seamlessly.

It is not clear if the subsequent call to ctest actually does something with mull

No, at this point ctest runs the test targets just as normal programs and Mull is not involved in any way.


I just implemented a standalone reporter tool which works with an SQLite report, and which enables use cases like ctest and multiple test targets.

You can find a usage example #1073

I'll release a new version with proper packages later this week, but if you want to give it a try you can pick one of the nightly builds.

General idea is as follows:

  1. Run mull against each test target and generate an SQLite report, e.g. mull-runner-15 -allow-surviving -reporters SQLite -report-name mull_report
  2. Run reporter against the sqlite report, e.g. mull-reporter-15 -reporters IDE mull_report.sqlite

Each mull-runner result will be accumulated in the mull_report.sqlite, and mull-reporter then will merge/deduplicate the mutants and produce the full report.

I hope it helps, please do not hesitate to reach out if you have any questions/feedback.

@AlexDenisov
Copy link
Member

Added a short tutorial here https://mull.readthedocs.io/en/0.24.0/tutorials/CTestIntegration.html

Feel free to reopen or create a new issue if there is a better way to integrate Mull into CTest.

@correaa
Copy link

correaa commented Dec 19, 2024

Thanks Alex for adding this material.

I am wondering if there is a flaw in the way you show how integrate with Ctest.
I think what happens is that for each executable, all the test have to be run as well.

I have a template library and it seems that the mutations that are generated for each executable (each test) depend only the part of the code that is being tested.
This means that there will be a lot of surviving mutants, that simply survive because not all the test are run for mutations based on a given test.

I am very confused, but this is what I am doing now and seem to be working:
https://gitlab.com/correaa/boost-multi/-/jobs/8683692588#L1516

This is probably doing more work than needed (the number of executions is n^2), but I think it covers more (all?) mutations, also a key is to execute ctest with --stop-on-failure.

Maybe I am compiling in the wrong way, or maybe there something else to take into account for libraries that use template code?

@AlexDenisov
Copy link
Member

I haven't looked into your setup deeply yet, but it sounds like your understanding and observations are correct.

I am very confused

Could you please expand a bit on which part is confusing?

@correaa
Copy link

correaa commented Dec 19, 2024

I was just trying to say that I wasn't sure if I was doing something technical wrong (command line options) or if my logic was flawed.

It is not intuitive yet to me to think in terms of mutation testing, because it is like talking in "double negatives" :)

@correaa
Copy link

correaa commented Dec 19, 2024

This is the full setup:

    - CXX=clang++-16 cmake .. -DCMAKE_CXX_FLAGS="-O1 -fpass-plugin=/usr/lib/mull-ir-frontend-16 -g -grecord-command-line -fprofile-instr-generate -fcoverage-mapping"
    - cmake --build . --parallel 2 || cmake --build . --parallel 1 --verbose
    - ctest -j 2 --output-on-failure --verbose
    - cd test
    - ls *.x | xargs -n 1 sh -c 'echo $0 && ((mull-runner-16 --ld-search-path=/usr/lib/x86_64-linux-gnu $0 -test-program=ctest -- -j2 --stop-on-failure) || exit 255)'

https://gitlab.com/correaa/boost-multi/-/blob/clang17-mull/.gitlab-ci.yml?ref_type=heads#L228-232

@AlexDenisov
Copy link
Member

Yeah, it makes sense this way. Can you run an individual test binary instead of using ctest?

Something like mull-runner-16 --ld-search-path=/usr/lib/x86_64-linux-gnu $0 should work just fine I think, unless the test suite can only be run under ctest of course.

@correaa
Copy link

correaa commented Dec 19, 2024

ok, I am doing that already. What about the rest of the line? (ctest)

mull-runner-16 --ld-search-path=/usr/lib/x86_64-linux-gnu $0 -test-program=ctest -- -j2 --stop-on-failure

@AlexDenisov
Copy link
Member

You don't need ctest in this case if each test can run as a standalone executable.

It should work with ctest just fine as well, but, as you mentioned, it would be redundant and quadratic.

I hope it makes sense, sorry if that's still confusing 🙌

@correaa
Copy link

correaa commented Dec 19, 2024

What I am finding is that if I don't run ctest for each executable then I get a lot surviving mutants. I think this is because each executable individually doesn't test the whole library.

Can you help me confirm this?

Maybe there is something wrong with my setup also.

@correaa
Copy link

correaa commented Dec 21, 2024

The diagnosis is fairly simple, with my library, if I run this naive line I get surviving mutants,

(ls test/*.x | xargs -n 1 sh -c 'echo $0 && ((mull-runner-16 $0) || exit 255)')
[info] Running mutants (threads: 12)
       [################################] 27/27. Finished in 12334ms
[info] Survived mutants (7/27):
/home/correaa/boost-multi/include/boost/multi/array_ref.hpp:770:28: warning: Survived: Replaced + with - [cxx_add_to_sub]
                ns_ = xs_.from_linear(nn + n);
                                         ^

But if I run this line instead there are no survivors,

(ls test/*.x | xargs -n 1 sh -c 'echo $0 && ((mull-runner-16 $0 -test-program=ctest -- -j2 --stop-on-failure)

My interpretation is that each executable test is informed by the runner of only some of the mutants, but all the test still need to be run for each set of mutations associated with a single executable, and no single executable is testing all the mutants.

This is the full configuration:

git clone https://github.com/correaa/boost-multi.git
cd boost-multi

mkdir -p .build.clang++.mull
cd .build.clang++.mull

CXX=clang++-16 cmake .. -DCMAKE_CXX_FLAGS="-O1 -fpass-plugin=/usr/lib/mull-ir-frontend-16 -g -grecord-command-line -fprofile-instr-generate -fcoverage-mapping"
cmake --build . 

ls test/*.x | xargs -n 1 sh -c 'echo $0 && ((mull-runner-16 $0 -test-program=ctest -- -j2 --stop-on-failure) || exit 255)')

Please advise.

@AlexDenisov
Copy link
Member

and no single executable is testing all the mutants

@correaa, this is precisely what's happening there. You can still run things separately, collect intermediate results, and then run analysis, like in this example https://mull.readthedocs.io/en/0.24.0/tutorials/MultipleTestTargets.html

@correaa
Copy link

correaa commented Dec 29, 2024

ok, I am going to try it.

What I am trying to figure out is how to "decorate" the run commands executed by ctest, so to call with the mull-runner.

I couldn't make that work. It is not simple: https://stackoverflow.com/questions/79207944/adding-a-prefix-command-to-all-ctest-executions

@correaa
Copy link

correaa commented Dec 30, 2024

I am afraid I am not doing things correctly to use the database method, I still get surviving mutants, even after aggregating the results:

https://gitlab.com/correaa/boost-multi/-/jobs/8743460604#L5351

This is the old method that works (although it is N^2 and slower)

https://gitlab.com/correaa/boost-multi/-/jobs/8743460603

Could the fact that the code makes use of heavy templating the cause of this problem?

@AlexDenisov
Copy link
Member

I am afraid I am not doing things correctly to use the database method, I still get surviving mutants, even after aggregating the results:

https://gitlab.com/correaa/boost-multi/-/jobs/8743460604#L5351

This is the old method that works (although it is N^2 and slower)

https://gitlab.com/correaa/boost-multi/-/jobs/8743460603

Could the fact that the code makes use of heavy templating the cause of this problem?

This certainly deserves a closer look, I'll dig into it to see where the problem is coming from. It could very well be a bug with Mull, the aggregating results is a very new feature, so it wasn't heavily exercized yet.

@correaa thank you for all the feedback, this is greatly appreciated!

@AlexDenisov
Copy link
Member

I just picked a random surviving mutant from https://gitlab.com/correaa/boost-multi/-/jobs/8743460604#L5414, applied it manually and all the tests passes:

diff --git a/include/boost/multi/detail/layout.hpp b/include/boost/multi/detail/layout.hpp
index 5aa94da0..0faa9bfb 100644
--- a/include/boost/multi/detail/layout.hpp
+++ b/include/boost/multi/detail/layout.hpp
@@ -377,7 +377,7 @@ template<> struct extensions_t<1> : tuple<multi::index_extension> {
 			idx = get<0>(this->base()).back();
 			return true;
 		}
-		--idx;
+		++idx;
 		return false;
 	}

It could be that the ctest runs show the wrong results and the mutants you are seeing now are actually survived/not detected.
I'm looking at the other run you shared (https://gitlab.com/correaa/boost-multi/-/jobs/8743460603) and it tells that the original test run has also failed due to a timeout, it means that the mutant runs are also failing thus reporting that no one has survived.

Original test failed
status: Timedout

Perhaps I should turn the warning into an actual error to prevent this confusion, sorry about that! 😅

@correaa
Copy link

correaa commented Dec 30, 2024

ups, I shared the CI link before it finished.

Now neither of the two methods works.

It worked before: https://gitlab.com/correaa/boost-multi/-/jobs/8734666344

I am afraid there is indeterminism in these tests, maybe because of the timeouts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants