Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NOT FOR MERGE] Adjust prompt to use view command #5506

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

enyst
Copy link
Collaborator

@enyst enyst commented Dec 10, 2024

Give a summary of what the PR does, explaining any non-trivial design decisions

This is the prompt adjustment I used, with the purpose that the LLM uses more the view tool for directories, which is part of its file_editor tool, than other options (ls -R /workspace or ls -la /workspace).

This is used for the initial exploration of the repository a lot.

I think it would be interesting to eval this after Ryan's fix is merged in main.

Edited to add: in the event stream of CodeAct 2.2 swe-bench run, the agent uses ls -R /workspace a lot, and ls -la sometimes.


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:44a6bd9-nikolaik   --name openhands-app-44a6bd9   docker.all-hands.dev/all-hands-ai/openhands:44a6bd9

@enyst enyst marked this pull request as draft December 10, 2024 07:12
@enyst
Copy link
Collaborator Author

enyst commented Dec 10, 2024

There we go:

THOUGHT: I'll help you implement the necessary changes to fix the issue with `django.utils.http.parse_http_date`. Let's follow the steps you outlined.

1. First, let's explore the repository structure:
CODE:
print(file_editor(**{'command': 'view', 'path': '/workspace/django__django__3.1'}))

OBSERVATION
[Agent Controller default] **IPythonRunCellObservation**
Here's the files and directories up to 2 levels deep in /workspace/django__django__3.1, excluding hidden items:
/workspace/django__django__3.1
/workspace/django__django__3.1/AUTHORS
/workspace/django__django__3.1/CONTRIBUTING.rst
/workspace/django__django__3.1/FUNDING.yml
/workspace/django__django__3.1/Gruntfile.js
/workspace/django__django__3.1/INSTALL
/workspace/django__django__3.1/LICENSE
/workspace/django__django__3.1/LICENSE.python
...

@enyst
Copy link
Collaborator Author

enyst commented Dec 10, 2024

I ran 13 instances that are unresolved (0/13) in the CodeAct 2.2 results. They're all on django, and all part of the intersection of Lite with Verified.

CodeAct2.2: 0/13
Branch: 1/13.

Too little to matter, but FWIW! @xingyaoww

@ryanhoangt
Copy link
Contributor

ryanhoangt commented Dec 11, 2024

I'm thinking about whether we should still make this change in the prompt, as encouraging the agent to use view over ls -R can save us on tokens, hence allowing the agent to execute more steps before reaching the context limit 🤔

@enyst enyst added the run-eval-m Runs evaluation with 30 instances label Dec 13, 2024
@enyst enyst added run-eval-m Runs evaluation with 30 instances and removed run-eval-m Runs evaluation with 30 instances labels Dec 13, 2024
Copy link
Contributor

Running evaluation on the PR. Once eval is done, the results will be posted.

@openhands-agent
Copy link
Contributor

openhands-agent commented Dec 13, 2024

Evaluation results: ## Summary

  • submitted instances: 30
  • empty patch instances: 12
  • resolved instances: 8
  • unresolved instances: 22
  • error instances: 0

Empty patches were from the litellm proxy error:

2024-12-13 11:47:01,561 - ERROR - [Agent Controller default] Error while running the agent: litellm.NotFoundError: NotFoundError: OpenAIException - Error code: 404 - {'error': {'message': 'litellm.NotFoundError: AnthropicException - {"type":"error","error":{"type":"not_found_error","message":"model: *"}}\nReceived Model Group=claude-3-5-sonnet-20241022.......
 'code': '404'}}

@mamoodi
Copy link
Collaborator

mamoodi commented Dec 13, 2024

Haven't automated this part yet so here ya go:
evaluation.zip

@enyst
Copy link
Collaborator Author

enyst commented Jan 5, 2025

@openhands-agent Your last attempt to fix the conflicts didn't work. Please do this again: pull main into this branch and fix the conflicts.

@openhands-agent
Copy link
Contributor

OpenHands started fixing the pr! You can monitor the progress here.

@All-Hands-AI All-Hands-AI deleted a comment from openhands-agent Jan 5, 2025
@All-Hands-AI All-Hands-AI deleted a comment from openhands-agent Jan 5, 2025
@enyst enyst added lint-fix and removed run-eval-m Runs evaluation with 30 instances labels Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants