How to do conversation with the llama-2-7B-chat model. #846

Harsh-raj · 2023-10-10T07:15:43Z

Hey, hope you doing well. I am able to run inference on the llama-2-7B-chat model successfully with the example python script provided. I am new to working and experimenting with large language models. I wanted to know how can i do conversation with the model where the model will consider its previous user prompts chat completion context too for answering next user prompt. I am currently experimenting with the dialogue list present in the example python script but it seems that i will have to go through all of the code and make changes in it. Any guidance is much appreciated. Thank you!

subramen · 2023-10-11T15:43:35Z

We don't have a full chat program example in the repo, but you can adapt the example to build one. Take a look at this thread for a related conversation: #162

jeffxtang · 2023-10-11T18:48:12Z

@Harsh-raj You can use LangChain's ConversationalRetrievalChain example or ConversationChain with ConversationBufferMemory example.

Harsh-raj · 2023-10-12T07:40:13Z

I am now able to do conversation with the llama-2-7b-chat model. But when max prompt length exceeds the max sequence length the conversation abruptly terminates. I wanted to remove the oldest context of the conversation from the model's memory and make space for the next user prompt. Is this possible?
assert max_prompt_len <= params.max_seq_len this line of code in generate method of Llama class terminates the conversation.

Harsh-raj · 2023-10-12T07:43:53Z

Also for running inference of the llama-2-7b-chat model somehow, torchrun is not able to identify fire module (used for CLI argument parsing) but when i used python -m torch.distributed.run it ran just fine as intended. Is there something i am missing in the setup for model inferencing?

jeffxtang · 2023-10-12T14:19:46Z

@Harsh-raj You can use LangChain's ConversationalRetrievalChain example or ConversationChain with ConversationBufferMemory example.

I am now able to do conversation with the llama-2-7b-chat model. But when max prompt length exceeds the max sequence length the conversation abruptly terminates. I wanted to remove the oldest context of the conversation from the model's memory and make space for the next user prompt. Is this possible? assert max_prompt_len <= params.max_seq_len this line of code in generate method of Llama class terminates the conversation.

Are you using ConversationChain or ConversationalRetrievalChain to do conversation? You can then remove earlier Q/A pairs in the chat_history list to make the prompt length not exceeding 4096.

albertodepaola assigned albertodepaola and unassigned albertodepaola Oct 11, 2023

jspisak added the documentation Improvements or additions to documentation label Oct 11, 2023

jspisak assigned jspisak and unassigned jspisak Oct 11, 2023

sekyondaMeta self-assigned this Oct 11, 2023

sekyondaMeta assigned jeffxtang and unassigned sekyondaMeta Oct 11, 2023

Harsh-raj closed this as completed Oct 12, 2023

Harsh-raj reopened this Oct 12, 2023

sand-corp mentioned this issue Jan 29, 2024

Fire module missing #1015

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do conversation with the llama-2-7B-chat model. #846

How to do conversation with the llama-2-7B-chat model. #846

Harsh-raj commented Oct 10, 2023 •

edited

Loading

subramen commented Oct 11, 2023

jeffxtang commented Oct 11, 2023

Harsh-raj commented Oct 12, 2023

Harsh-raj commented Oct 12, 2023 •

edited

Loading

jeffxtang commented Oct 12, 2023

How to do conversation with the llama-2-7B-chat model. #846

How to do conversation with the llama-2-7B-chat model. #846

Comments

Harsh-raj commented Oct 10, 2023 • edited Loading

subramen commented Oct 11, 2023

jeffxtang commented Oct 11, 2023

Harsh-raj commented Oct 12, 2023

Harsh-raj commented Oct 12, 2023 • edited Loading

jeffxtang commented Oct 12, 2023

Harsh-raj commented Oct 10, 2023 •

edited

Loading

Harsh-raj commented Oct 12, 2023 •

edited

Loading