-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An attempt to make LLaMA to act like ChatGPT - success! Amazing result from scratch! #162
Comments
I am running these chats with 30B model on a typical modern PC - 12700k / 3070ti 8Gb / 128 Gb RAM. For those who wish to reproduce and chat with LLaMA I made a repo: https://github.com/randaller/llama-chat Feel free to post your coolest generations here or there, so community will check it up and also have some fun. |
@randaller How did you get 30B model to run with single GPU? Is that get done by the merge_weight step? I tried to run official example.py with 13B model on single GPU but got error when I tried to set MP value to 2. |
@windameister By feeding model to GPU layer by layer using pyarrow. This is not my honor, all credits to @venuatu's repo. I am also running 65B model too, just it inferences slow enough as using swap heavily. |
@randaller Hello, your code actually not work, tried on 7B not work as expected. |
@jinfagang |
@randaller Indeed your repo, but NOT work, please try yourself, your code even didn't have a print, how did it work? |
@jinfagang I have already told, the print is located in another place. Please note the /llama folder of this my repo is different. Probably you've been used /llama folder of another repo? Everything works, for sure I have tested before putting it to public and chatting right now. |
Oh, didn't notice generation changed, |
@jinfagang all files changed. Please start from 0, cloning the repo and passing the readme steps, and you'll be happy :) |
great |
Good job! It is doing quite well, remembering your English is not quite perfect. 😉 I feel sorry for it that you got angry with it 😭 |
great |
Finally was able to generate code, thanks to prompt example in neighbor issue. Php code generates as well too. Model: 30B, prompt:
generation
More examples here: randaller/llama-chat#7 |
Finally was able to generate prompts for Stable Diffusion with LLaMA 30B model! randaller/llama-chat#7 (comment) |
Honestly, this seems less capable than a medium-sized gpt3 model |
Good job! but when i use |
Yes. you need to send this to all the processes. There is a broadcast primitive to do this:
Note how the list must be the same length (hence the empty string append for the rank != 0 case, if you don't have it the sync fails and there's no error). |
@verygreen |
this IS on multiple nodes. See how the input only happens for rank 0 (Whatever that nodes is), but there's no such condition for the broadcast, so it runs in all ranks on all nodes. the second parameter just tells it which rank is the "master" and all the data from that rank is copied to all the other ranks. |
@verygreen hello,When I am running on two nodes, the error message is as follows:
But when I run on a single node, it doesn't report an error. I've tried |
Well. this seems to be in the bowels of pytorch somewhere, so you might want to ask them. Potentially communication between your docker containers on different nodes is somehow restricted? |
OK~Thanks a lot. I do communicate between two docker. But there is no problem when running |
What is the science behind context? How do i make it for example act like Neuro-sama from twitch? |
Closing as this is not an issue anymore with Llama 2 chat launches. Please re-open as needed |
Fantastic work. Many thanks. I have it working great on a Dell Precision 7780 with a small Nvidia 6GB GPU. Could you please tell me how to disable the forward and flayers progress bars, I am sure they are slowing it down a lot! Thanks! |
It's OK. Found it. Just disable the tqdm. |
I made a dummy modification to make LLaMA acts like ChatGPT. It keeps 2048 bytes of context. And it does it pretty well!!!
I am running a sliding chat window keeping 1920 bytes of context, if it's longer than 2048 bytes.
Leaving only 128 bytes length for AI reply probably is not okay, but that's really enough to get amazed.
I am terminating generation by comparing "\n" signs in output, +1 carriage return means for me that AI had answered :)
Here goes 30B model examples of chats:
It is capable to argue!
sometimes it stucks
died from hunger, uhh
handles cyrillic as well
argues too much with my current prompts :)
still no success asking for Stable Diffusion prompt
The text was updated successfully, but these errors were encountered: