Replies: 2 comments 3 replies
-
This is just an issue that current LLMs have. I know you said that it seemed to start happening in the last 1-2 weeks, but it's been true for everyone since way back. Bigger models seem to be less susceptible to the issue, and the longer the context, the more likely it seems to be that it starts happening. There are people that have been investigating how to stop it, like here in this Reddit thread: |
Beta Was this translation helpful? Give feedback.
-
I'm with you on this, something changed significantly for me as well. I'm looking into a way to compare different builds and/or llama-cpp-python versions, and will try to show some examples as well. |
Beta Was this translation helpful? Give feedback.
-
I'm likely going to post this as an actual bug report with actual logs and examples. In a more professional way with less rambling. However for now I'm just going to ramble. Only read if your interested or having the same issue etc.
I have a problem I've been dealing with for about 1-2 weeks. Ever since a recent update. I've updated to current version of posting this and it's not resolved. Using Mixtral 8x7B Instruct. Also present in nous-hermes-2-mixtral-8x7b-dpo.Q5_K_M.
Basically 80% of the time everything goes fairly well. I do notice longer general prompts and more repeating or redundant answers. But particularly when asking it to write a story. It start off great. And than eventually gets stuck into a rut.
It will start just describing someones thoughts. Like he was upset, he was seeing red, he was angry, he was not okay,.... bla bla bla... and sometimes it gets so bad it literally repeats the last 2-3 words over and over. So yeah in general I feel quality of all responses has gone down a tiny bit. But it's majorly noticeable when writing a story but usually in longer conversations.
Even if I re-do the answer it usually breaks down about the same length of tokens into the whatever post started to mess up. But if I copy last answer. And cut the good stuff before it starts to mess up. And than paste it and hit continue it sometime is able to go on from there and re-write the end without the issue. Other times it immediately starts doing it again.
So yes I would describe the issue as longer and more redundant answers to most or all questions. At 20% of the time a long story will just start to melt down. It almost feels like the AI wants to stop typing but a stopping token or something is missing. So it feels it has to go on. And than it melts down either that or it's forgetting what it just typed. Like it's memory is being consumed.
I use "Simple1" but I've tried others none of it really helps. Changed settings I may have "fixed" it once without knowing what I did but it could have just been luck too.
Beta Was this translation helpful? Give feedback.
All reactions