Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(eval): SWE-Bench stability improvement and add utils #6177

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Jan 9, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

  • move SWE-Bench eval_infer.py runtime creation inside try-except, so we can properly cleanup in case of error
  • add combine completion util scripts useful for SWE-Bench evaluation/SWE-Gym rollout
  • improve memory efficiency for SWE-Bench update output script to support loading larger files

Link of any specific issues this addresses


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:b58d8f7-nikolaik   --name openhands-app-b58d8f7   docker.all-hands.dev/all-hands-ai/openhands:b58d8f7

@xingyaoww xingyaoww requested review from enyst and neubig January 9, 2025 19:48
@xingyaoww xingyaoww marked this pull request as ready for review January 9, 2025 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants