Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server : replace behave with pytest #10416

Merged
merged 17 commits into from
Nov 26, 2024
Merged

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Nov 19, 2024

Motivation

We already have a test script for server using behave framework. While it works well for common use cases, there are some problems:

  • It introduces many boilerplate code. For example, any actions or definitions must be defined in 2 places, one in steps.py and one in *.feature file
  • It's complicated to implement typing
  • Poor auto-completion (i.e. copilot can't help in most cases)

Proposed solution

My proposed solution is to switch to pytest, which is a more "mainstream" framework. This helps reduce the complexity for future contributors, since pytest is quite straight-forward to use. Indeed, many parts of this PR is written by vscode github copilot.

TODO:

  • Migrate all sequential tests
  • Migrate "parallel" tests (i.e. multiple completions)
  • Remove *.feature files

Test cases to be added in the future:

  • test KV cache reuse: test with and w/o cache_prompt and different values of n_cache_reuse
  • test with incorrect input for various endpoints (embd, completions, rerank, etc). example for incorrect cases are: empty string, null, incorrect type, etc
  • loading split model
  • speculative, related to server : add speculative decoding support #10455

@github-actions github-actions bot added examples python python script changes server labels Nov 19, 2024
@github-actions github-actions bot added the devops improvements to build systems and github actions label Nov 19, 2024
@github-actions github-actions bot added the nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment label Nov 20, 2024
@ngxson ngxson requested a review from Copilot November 20, 2024 16:58
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 5 out of 20 changed files in this pull request and generated 1 suggestion.

Files not reviewed (15)
  • .devops/nix/python-scripts.nix: Language not supported
  • examples/server/tests/.gitignore: Language not supported
  • examples/server/tests/requirements.txt: Language not supported
  • examples/server/tests/tests.sh: Language not supported
  • examples/server/tests/unit/test_ctx_shift.py: Evaluated as low risk
  • examples/server/tests/unit/test_basic.py: Evaluated as low risk
  • examples/server/tests/unit/test_chat_completion.py: Evaluated as low risk
  • examples/server/tests/features/steps/steps.py: Evaluated as low risk
  • examples/server/tests/README.md: Evaluated as low risk
  • .github/workflows/server.yml: Evaluated as low risk
  • examples/server/tests/utils.py: Evaluated as low risk
  • examples/server/tests/unit/test_completion.py: Evaluated as low risk
  • examples/server/tests/unit/test_rerank.py: Evaluated as low risk
  • examples/server/tests/unit/test_slot_save.py: Evaluated as low risk
  • examples/server/tests/unit/test_infill.py: Evaluated as low risk

examples/server/tests/conftest.py Show resolved Hide resolved
@ngxson ngxson requested a review from Copilot November 20, 2024 21:29

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 18 out of 34 changed files in this pull request and generated no suggestions.

Files not reviewed (16)
  • .devops/nix/python-scripts.nix: Language not supported
  • examples/server/tests/.gitignore: Language not supported
  • examples/server/tests/features/ctx_shift.feature: Language not supported
  • examples/server/tests/features/embeddings.feature: Language not supported
  • examples/server/tests/features/infill.feature: Language not supported
  • examples/server/tests/features/issues.feature: Language not supported
  • examples/server/tests/features/lora.feature: Language not supported
  • examples/server/tests/features/parallel.feature: Language not supported
  • examples/server/tests/features/passkey.feature: Language not supported
  • examples/server/tests/features/rerank.feature: Language not supported
  • examples/server/tests/features/results.feature: Language not supported
  • examples/server/tests/features/security.feature: Language not supported
  • examples/server/tests/features/server.feature: Language not supported
  • examples/server/tests/features/slotsave.feature: Language not supported
  • examples/server/tests/features/wrong_usages.feature: Language not supported
  • examples/server/tests/requirements.txt: Language not supported
Comments skipped due to low confidence (1)

examples/server/tests/conftest.py:12

  • The variable 'server_instances' is used but not defined in this snippet. Ensure it is defined elsewhere in the codebase.
server_instances
@ngxson ngxson marked this pull request as ready for review November 20, 2024 22:05
@ngxson ngxson requested a review from ggerganov November 20, 2024 22:11
@ngxson
Copy link
Collaborator Author

ngxson commented Nov 20, 2024

@ggerganov I'm glad to say that (more than) half of the code in this PR is written by copilot. This will be very useful in the future, where contributors only need to write the docs and the AI will write all test cases.

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 26, 2024

@ggerganov Sorry for pinging, but can you review this PR soon? I planned to add more test cases in near future (updated in the description)

@ggerganov
Copy link
Owner

I've missed this PR somehow - sorry for the delay and don't hesitate to ping me. Will take a look now.

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 26, 2024

Seems like test_consistent_result_same_seed fails with the latest master branch. The test run 4 completions with the same seed=42 and temperature=1.0. However, I can't reproduce the error on my Mac M3. I'm spinning up a linux VM to have a look.

@ggerganov Do you have any clue why it fails?

@ggerganov
Copy link
Owner

Likely #10501 is the reason. We should add the cache_prompt: false to the test and see if it fixes.

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 26, 2024

Thanks, that fixes the issue.

I added a test case for cached vs non-cached, but is marked as skip for now (will need to be fixed in the future). The whole test case was written by copilot:

Screenshot 2024-11-26 at 16 02 31

@slaren
Copy link
Collaborator

slaren commented Nov 28, 2024

This may be causing intermittent CI failures:

Traceback (most recent call last):
Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=0x00007fc352376358)

https://github.com/ggerganov/llama.cpp/actions/runs/12069327600/job/33656351121

From what I could gather, this may be caused by the logging threads not being finished before exiting.

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 28, 2024

Thanks for reporting. Yeah, indeed the deamon=True thread fails to exit. Fixing it in another PR

@ngxson ngxson mentioned this pull request Nov 28, 2024
2 tasks
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
* server : replace behave with pytest

* fix test on windows

* misc

* add more tests

* more tests

* styling

* log less, fix embd test

* added all sequential tests

* fix coding style

* fix save slot test

* add parallel completion test

* fix parallel test

* remove feature files

* update test docs

* no cache_prompt for some tests

* add test_cache_vs_nocache_prompt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions examples nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment python python script changes server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants