Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a back-port of a feature I created in my own fork which now has heavy refactors, but it was working so well, I felt I had to clean it up and offer it back to the main repo.
I've been working a lot with multi-vector embeddings lately.
One issue I've been having is what I had termed as "vector cross-talk". This is where you might be training an embedding on a character with blue eyes and red hair, but the AI sometimes generates imagery of the character with red eyes and blue hair, or blue eyes and blue hair, etc.
I suspected that the cross-talk is because vectors are encoding individual colors represented by the character and the AI sometimes just free associates them to other features of the embedding and/or prompt. So, what if the AI could not rely on vector ordering as much when learning?
This introduces a feature called vector shuffling, where the embeddings of individual vectors are shuffled randomly when they're inserted into the prompt during training. The idea is that it will force it to encode sharper and more focused concepts into each vector. No more
blue AND eyes AND red AND hair
. Because it's scrambled, it can only learn the concept correctly if it encodes the entirety of the concept in one embedding, IEblue eyes AND red hair
.This is probably why using only 1 vector got the best results in the study; it had no choice but to encode a focused embedding when it had only 1 vector. This feature tries to bridge the gap and give the best of both worlds: more storage for more complex subjects and focused vectors that resist leaking into other aspects of the prompt.
That's the theory, anyways. I'm no data scientist, so someone who is can take this and test these claims, but doing this seemed to cut down on vector cross-talk in my tests. It can still happen, but it significantly reduced occurrences and their severity when using the trained embedding.
You can enable this feature by setting
model.params.personalization_config.params.shuffle_mode
totrue
in your project's config YAML file, and it will shuffle all vectors.Having done a lot of experimentation with the concept, it also supports the following additional options. Experiment with them and see what works best for your particular subject.
all
oron
ortrue
- Shuffles all vectors.off
orfalse
- Disables shuffling; the default if it is not specified in your config.trailing
- With 3 or more vectors, shuffle all vectors after the first. This provides a stable "intro" vector.leading
- With 3 or more vectors, shuffle all vectors before the last. This provides a stable "outro" vector.between
- With 4 or more vectors, shuffles all vectors between the first and last.progressive
- A special mode forprogressive_words
. Likebetween
, it also establishes stable "intro" and "outro" vectors, but ensures that the first and last vectors are kept the same throughout training as more vectors are added.num_vectors_per_token
to be at least 3 to have any notable effect.between
, when there's enough vectors unlocked to do so.num_vectors_per_token * 2000
steps of training need to have occurred), otherwise some of the middle vectors may still be the initialization word's embedding.dynamic
- The "just make it work no matter what" option that favors stability. This tries to always shuffle the vectors but also tries to establish stable intro and outro vectors when the number of vectors permits it. What shuffle mode it ultimately uses differs based on how many vectors there are.off
.all
.trailing
to at least establish a stable intro vector.between
to establish both intro and outro vectors.If the number of vectors is below the supported number for an option, it acts the same as
off
unless otherwise noted.The
dynamic
setting exists because some of my experiments allowed for different numbers of vectors for different placeholders when training multi-placeholder embeddings (like in theper_image_tokens
mode). It's also just a good option to suggest when someone isn't sure which to use. It will try to provide as much benefit as it can regardless of whatnum_vectors_per_token
is set to.