Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a new method to shuffle/swap values #167

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

Ar57m
Copy link

@Ar57m Ar57m commented Feb 15, 2024

in each tensor.
I'm bad at describing what it does, so here's the exact function in a code to serve as an example/demo.

import torch

x = torch.arange(2, 2*6*4 + 2, 2).view(6, 4)


base = torch.arange(1, 2*6*4, 2).view(6, 4)


print("Tensor x:")
print(x)

print("\nTensor base:")
print(base)

def swap_values(shape, n, base, x):
    if x.dim() == 2:
       rows, cols = shape
       rows_range = torch.arange(rows).view(-1, 1)
       cols_range = torch.arange(cols).view(1, -1)
       mask = ((rows_range + cols_range) % n == 0).bool()
       x = torch.where(mask, x, base)
       print("\nMask:\n",mask)
    else:
       rows_range = torch.arange(shape[0])
       mask = ((rows_range) % n == 0).bool()
       x = torch.where(mask, x, base)
    return x

print("\nTensor Mask and Swapped(n=2):")
swapped = swap_values(x.shape,2,base, x)
print(swapped)


print("\nTensor Mask and Swapped with inverted offset(n=2):")
offset = swap_values(x.shape,2,x, base)
print(offset)



print("\nTensor Mask and Swapped(n=3):")
swapped = swap_values(x.shape,3,base, x)
print(swapped)


print("\nTensor Mask and Swapped with inverted offset(n=3):")
offset = swap_values(x.shape,3,x, base)
print(offset)

I didn't mess with the legacy.py(don't know what to do there, if I need to do something there).
Feel free to do anything you want with it, I did as far as I could to improve or optimize my shenanigans 🙂

You can use task_swapping, task_swapping_ties or task_swapping_dare_ties.

Here's a yaml example:

merge_method: task_swapping
base_model: NousResearch/Yarn-Mistral-7b-128k
models:
  - model: senseable/WestLake-7B-v2
    parameters:
      weight: 0.666
      diagonal_offset: 2     # 2 basically uses a chess board mask
      invert_offset: False    # by default is set to False even without this parameter

# Idk, but I think if you put another model with the same parameters here,
# it will just use one of them, so I recommend changing parameters of the other models, but I'm not sure

dtype: bfloat16

@cg123
Copy link
Collaborator

cg123 commented Feb 21, 2024

This is really interesting! I definitely have to spend some time to get a better feel for how this is working.

From my first read through it looks like this might be effectively negating the delta values in a checkerboard pattern. Does that jive with your thoughts?

Thanks again for the PR, I'm looking forward to playing with it. :)

@Ar57m
Copy link
Author

Ar57m commented Feb 21, 2024

From my first read through it looks like this might be effectively negating the delta values in a checkerboard pattern. Does that jive with your thoughts?

Yep, I think that's correct, Thanks for replying 🤗

@Ar57m
Copy link
Author

Ar57m commented Mar 1, 2024

I fixed some things inverted wrongly.
And I added an optional random mask to merge random parts of the tensor.
here what's doing:

import torch

x = torch.arange(2, 2*10*5 + 2, 2).view(10, 5)


base = torch.arange(1, 2*10*5, 2).view(10, 5)

print(x, '\n', base)

def rand_mask(base, x, percent, seed=None):

    oldseed = torch.seed()

    if seed is not None:
        torch.manual_seed(seed)

    random = torch.rand(base.shape)
    mask = random <= percent
    del random
    print('\n', mask,'\n' , torch.sum(mask).item()/(mask.shape[0]*mask.shape[1]), '% of the base swapped\n')
    torch.manual_seed(oldseed)
    x = torch.where(mask, x, base)
    return x

result = rand_mask(base, x, 0.2, seed=1337)

print(result)

And here is the yaml example of use:

merge_method: task_swapping
base_model: NeuralNovel/Senzu-7B-v0.1-DPO
models:
  - model: senseable/WestLake-7B-v2
    parameters:
      weight: 0.75
      diagonal_offset: 2    #it doesn't do anything when you use random_mask
      random_mask: 0.3333  # if it's 0.0 or not present the normal behavior will happen(chessboard-like mask) 
      random_mask_seed: 98557 # if it's not present it will probably be completely random every time(not reproducible)
dtype: bfloat16

the model ⬆️ merged

I hope I didn't mess up this time 😅

@linux-leo
Copy link

Hey @Ar57m, this method is really cool! Do you know how it compares to slerp interpolation or standard task arithmetic?

@Ar57m
Copy link
Author

Ar57m commented Mar 7, 2024

Hey @Ar57m, this method is really cool! Do you know how it compares to slerp interpolation or standard task arithmetic?

thanks @linux-leo , I don't have much knowledge in calculus to understand what exactly Slerp does, but afaik my method seems to be much simpler and different (not related to what Slerp does).
I build it on top of generalized_task_arithmetic.py, so after the swapping, it will be applied task_arithmetic normally.

Imagine you have two sorted decks of cards(equal in amount of cards) base and X, each card has a number 0 to n. Imagine that each card is an element(or value) in a tensor, You take the base deck and replace the base even cards by the X even cards. That's in the normal behavior.
Using the random mask, it selects random cards from the base to be swapped by the respective cards of X.

@Ar57m Ar57m reopened this May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants