CI: Benchmarks ➟ False positive of the ASV benchmarks execution #2612

airvzxf · 2024-05-10T18:33:36Z

airvzxf
May 10, 2024

Description

In the CI Benchmark workflow, the execution of the benchmarks are always true, rather than they could be fail.

It is because if we run asv run … it always has two outputs, the STD Output and the STD Error, which cause that the job for CI is always failing, not matter if it passed or fails. For this, reason the return value (0: success | Not 0: fail) from the ASV package (app) is ignored then stored in logs to validate if it shows some error.

Evidence

https://github.com/tardis-sn/tardis/actions/runs/9033563191/job/24824186527

The workflow “Run benchmarks for last 5 commits if not PR” passed. But, any benchmarks were executed and drop an error.

· Discovering benchmarks
·· Uninstalling from mamba-py3.12
·· Building 7e7069a6 <master> for mamba-py3.12
·· Installing 7e7069a6 <master> into mamba-py3.12
     …
     File "/home/runner/work/tardis/tardis/.asv/env/0a7f40a14f159f43256c541ac3f740f8/lib/python3.12/importlib/__init__.py", line 90, in import_module
       return _bootstrap._gcd_import(name[level:], package, level)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
     File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
     File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
     File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
     File "<frozen importlib._bootstrap_external>", line 995, in exec_module
     File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
     File "/home/runner/work/tardis/tardis/benchmarks/benchmark_base.py", line 18, in <module>
       from tardis.transport.montecarlo import NumbaModel, opacity_state_initialize
   ImportError: cannot import name 'NumbaModel' from 'tardis.transport.montecarlo' (/home/runner/work/tardis/tardis/.asv/env/0a7f40a14f159f43256c541ac3f740f8/lib/python3.12/site-packages/tardis/transport/montecarlo/__init__.py)

·· Failed to build the project and import the benchmark suite.

Actual source code

.github/workflows/benchmarks.yml.

      - name: Run benchmarks for last 5 commits if not PR
        if: github.event_name != 'pull_request_target'
        run: |
          git log -n 5  --pretty=format:"%H" >> tag_commits.txt
          asv run HASHFILE:tag_commits.txt | tee asv-output.log  
          if grep -q failed asv-output.log; then 
            echo "Some benchmarks have failed!"
            exit 1
          fi

Conclusion / Proposal

Some of these proposals could be together, others could be separate.

Improve the grep command grep -q failed asv-output.log.
- Research, if adding the ignore case, resolve the problem.
  - -i, --ignore-case: ignore case distinctions in patterns and data.
- Add more words to find in other words which reflect the error.
Research, in the ASV documentation or source code, if we can catch the return error for some specific error for the failing tests.
- In my little research doing experiments with ASV, the codes are:
  - Executed: time asv run --verbose --show-stderr master^\!; echo "${?}".
  - 0: Success.
  - 1: Fatal error, the ASV could not build the benchmarks.
  - 2: Error during the execution of the benchmarks.
I don't have more ideas, so far, it looks like a good start.

Erratum

If all the benchmarks run good, then the returned error code is 0 (success). For this reason, I think that we don't need the condition if grep -q failed asv-output.log; then. But, we need to explore more cases to find ways to break it.

Update 01 [2024-05-10 13:29 CST]

Executed: time asv run --verbose --show-stderr --bench transport_montecarlo_opacities HASHFILE:tag_commits.txt; echo "${?}". Returned the code 2 (runtime error).

The possible solution is checking if there are results to publish in the folder .asv/results/MACHINE_ID/. In my computer, I got it:

Command: time asv run --verbose --show-stderr --bench transport_montecarlo_opacities HASHFILE:tag_commits.txt; echo "${?}"

Result:

ls -lh .asv/results/9fb9a8132f7e/
-rw-r--r-- 1 root root 12213 May 10 13:01 303b0d39-mamba-py3.12.json
-rw-r--r-- 1 root root  1610 May 10 13:01 328ec77d-mamba-py3.12.json
-rw-r--r-- 1 root root 20967 May 10 12:57 7e7069a6-mamba-py3.12.json
-rw-r--r-- 1 root root 12219 May 10 12:58 8d70aaa5-mamba-py3.12.json
-rw-r--r-- 1 root root 12204 May 10 13:00 b668802a-mamba-py3.12.json
-rw-r--r-- 1 root root   205 May 10 12:56 machine.json

If I remove this .asv folder and execute ASV with failing build, the result is:

Command: time asv run --verbose --show-stderr --bench transport_montecarlo_opacities HASHFILE:tag_commits.txt; echo "${?}"

Result:

ls -lh .asv/results/9fb9a8132f7e/
-rw-r--r-- 1 root root  205 May 10 13:23 machine.json

Conclusion

With this information and experiments, the optimal solution looks like:

Create a condition to check the return error code. If it is 1 (build fail), then mark this job in the workflow as a fail.
If the error code is 2 (runtime error), then we can check if the benchmark results were generated, if it didn't generate, we can return exit 1 to fail the job.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: Benchmarks ➟ False positive of the ASV benchmarks execution #2612

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

CI: Benchmarks ➟ False positive of the ASV benchmarks execution #2612

airvzxf May 10, 2024

Description

Evidence

Actual source code

Conclusion / Proposal

Erratum

Update 01 [2024-05-10 13:29 CST]

Conclusion

Replies: 0 comments

airvzxf
May 10, 2024