-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUALITY] (Ycros Airoboros v1.4.1 L1-33b 16k) - Hallucinated numbers and narrative. #22
Comments
Here is my other sample and the output. The Chroma Trooper sample often has the AI forgetting one of the characters. Starting conditions are "5-1 = 4". In effect, "5-1 = 3". The missing person in the output below is the Pink Boomerang.
|
I'm hoping that the new "detailed_writing" instructor, as well as instruction stuffing during training will help address some of these details, but it's hard to say. I'm also doing a full fine-tune of the 33b this time, vs qlora, so it may improve the quality. That said, I don't know what affect the 16k scaling does to quality/instruction following so that may be a different issue entirely. |
Ycros's edition of Airoboros 65b-8k and 33b-16k are GGML versions of Bhenrym's extended Airoboros. I personally found 33b-16k to be extremely solid and reliable, I recommend checking it out. Maybe you can ask Bhenrym about things in detail? Going by the perplexity charts that they have on their latest Airo-Dolphin work, the perplexity is 55+ when out of bounds for all models, but the 16k edition is managing a context of at least 12,000 with only 5 perplexity. |
@SabinStargem Hi, randomly spotted this issue, how did you arrive at ROPE 0.5 80000? Both of the longer context models you're referring to were finetuned against linear rope scaling only, which is the first parameter, the second parameter is intended to remain at its default value of 10000 with these models. Bhenrym's PI 16k models are finetuned with a linear rope scale of 0.125 and the PI 8k finetunes (including mine) were done with a linear rope scale of 0.25. You're specifying 0.5 which could be a source of quality issues, but I'm also not sure how the setting of 80000 interacts with this - did you derive these settings via a bunch of testing? (for other people's reference, these particular ROPE parameters apply to llama.cpp and its derivatives, other implementations would use compress_pos_emb values of 8 for 16k and 4 for 8k, though I think transformers now has its own way of encoding linear rope scaling) |
Note: most of my ROPE efforts were for pure CPU+RAM inference, with 64gb RAM. I recently upgraded to a 3060 GPU and 128gb of RAM, but hadn't experimented with ROPE. Dunno if ROPE changes if a GPU is involved, but things feel more stable. There were no ROPE instructions on your repository, so I had to guess. I personally had found many ROPEs supplied by The Bloke to have issues with assorted models, and the general documentation of ROPE is really bad for a casual like me. Following various Github discussions on ROPE, I tried the stuff that people talked about. The rest was just lots of trial and much error. 0.5 and 80000 just happened to be what worked for Ycros-Boros. I also tried out funky ideas, such as multiplying KoboldCPP's context with Kobold Lite's, and then tokegen to arrive at a ROPE number (Which sometimes worked beautifully for oneshots, but were unstable). Sometimes, working configurations broke after changing tokegen, so I figured it had a role in a "magical" formula, if such a thing existed. I suspect that Kobold Lite might have a bug with recent changes to token numbers, but have no evidence for that. On occasion, I would find a ROPE that works for several sessions after rebooting KoboldCPP, but interacting with Silly can make that ROPE stop working. Or more frustrating, simply stop operating after about half a day. At the time, I had only 64gb RAM, so those ROPEs might have been interacting in odd ways with page files if I get to 99% RAM. That is my utterly uneducated guess, anyhow. The "rules" that I could cobble together from various sources were as follows:
Question: Is there a NTK scaling for 16k and 32k? I would like to try that, if they exist. I tried to find the proper values myself, but no dice. I don't understand by what basis the NTK scaling works, so I can't homebrew a valid answer. |
I tend to feed Airoboros a test sample, in order to determine the quality of a preset. Over hundreds of generations, I noticed a particular quality of the model to hallucinate or forget a person. Both of my primary samples involve the commander of a squad dying at the beginning - but the AI often adds a person or forgets one. The commonality here is that a "-1" seems to throw off Airoboros when making a narrative. I am making this post, in the hopes that Airoboros v2.1 could have measures to ensure that mathematics doesn't harm the output.
After the "escaped...", is when the AI writes their interpretation on the sample. This sample is basically "4-1 = 3" for the starting condition. The addition of an extra person is common, resulting in "4-1 = 4".
Note that I am using the highest quality preset that I could find, so the mistakes are reduced. With a bad preset, the contradictions are much more obvious. I will mark parts of the narrative where things don't make sense.
The Fallen Commander sample was used through KoboldCPP, Coherent Creativity preset, ROPE 0.5 80000.
The text was updated successfully, but these errors were encountered: