You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I need your help to develop a matching procedure that will find the best match of a source element to a single target element out of 1000 targets.
I can't define a prompt that contains all the targets because that would exceed the context size. But I can iterate over the targets and compare the source element to a target element. The prompt contains three parts: the actual instruction on how to compare and what the output is, the source element, and the target element.
If I did this, llama.cpp would have to tokenize and decode the first two parts over and over. Is there a way to fix the cache for the first two parts and just swap the third part?
The second possible improvement could be that I can preprocess the target element and insert the target element as an already tokenized or even embedded element?
I would be very grateful if you could point me in the right direction or even show me some sample code.
Thank you!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I need your help to develop a matching procedure that will find the best match of a source element to a single target element out of 1000 targets.
I can't define a prompt that contains all the targets because that would exceed the context size. But I can iterate over the targets and compare the source element to a target element. The prompt contains three parts: the actual instruction on how to compare and what the output is, the source element, and the target element.
If I did this, llama.cpp would have to tokenize and decode the first two parts over and over. Is there a way to fix the cache for the first two parts and just swap the third part?
The second possible improvement could be that I can preprocess the target element and insert the target element as an already tokenized or even embedded element?
I would be very grateful if you could point me in the right direction or even show me some sample code.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions