Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do conversation with the llama-2-7B-chat model. #846

Open
Harsh-raj opened this issue Oct 10, 2023 · 5 comments
Open

How to do conversation with the llama-2-7B-chat model. #846

Harsh-raj opened this issue Oct 10, 2023 · 5 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@Harsh-raj
Copy link

Harsh-raj commented Oct 10, 2023

Hey, hope you doing well. I am able to run inference on the llama-2-7B-chat model successfully with the example python script provided. I am new to working and experimenting with large language models. I wanted to know how can i do conversation with the model where the model will consider its previous user prompts chat completion context too for answering next user prompt. I am currently experimenting with the dialogue list present in the example python script but it seems that i will have to go through all of the code and make changes in it. Any guidance is much appreciated. Thank you!

@subramen
Copy link
Contributor

We don't have a full chat program example in the repo, but you can adapt the example to build one. Take a look at this thread for a related conversation: #162

@jspisak jspisak added the documentation Improvements or additions to documentation label Oct 11, 2023
@jspisak jspisak assigned jspisak and unassigned jspisak Oct 11, 2023
@sekyondaMeta sekyondaMeta self-assigned this Oct 11, 2023
@jeffxtang
Copy link
Contributor

@Harsh-raj You can use LangChain's ConversationalRetrievalChain example or ConversationChain with ConversationBufferMemory example.

@Harsh-raj
Copy link
Author

I am now able to do conversation with the llama-2-7b-chat model. But when max prompt length exceeds the max sequence length the conversation abruptly terminates. I wanted to remove the oldest context of the conversation from the model's memory and make space for the next user prompt. Is this possible?
assert max_prompt_len <= params.max_seq_len this line of code in generate method of Llama class terminates the conversation.

@Harsh-raj
Copy link
Author

Harsh-raj commented Oct 12, 2023

Also for running inference of the llama-2-7b-chat model somehow, torchrun is not able to identify fire module (used for CLI argument parsing) but when i used python -m torch.distributed.run it ran just fine as intended. Is there something i am missing in the setup for model inferencing?

@jeffxtang
Copy link
Contributor

@Harsh-raj You can use LangChain's ConversationalRetrievalChain example or ConversationChain with ConversationBufferMemory example.

I am now able to do conversation with the llama-2-7b-chat model. But when max prompt length exceeds the max sequence length the conversation abruptly terminates. I wanted to remove the oldest context of the conversation from the model's memory and make space for the next user prompt. Is this possible? assert max_prompt_len <= params.max_seq_len this line of code in generate method of Llama class terminates the conversation.

Are you using ConversationChain or ConversationalRetrievalChain to do conversation? You can then remove earlier Q/A pairs in the chat_history list to make the prompt length not exceeding 4096.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

6 participants