Skip to content

Commit

Permalink
updating README
Browse files Browse the repository at this point in the history
  • Loading branch information
Tej-Deep committed Jul 17, 2024
1 parent 0031a40 commit 6acdb23
Showing 1 changed file with 11 additions and 1 deletion.
12 changes: 11 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,8 @@ A quick overview of the currently supported merge methods:
| [Model Breadcrumbs](https://arxiv.org/abs/2312.06795) | `breadcrumbs` |||
| [Model Breadcrumbs](https://arxiv.org/abs/2312.06795) + [TIES](https://arxiv.org/abs/2306.01708) | `breadcrumbs_ties` |||
| [Model Stock](https://arxiv.org/abs/2403.19522) | `model_stock` |||

| [DELLA](https://arxiv.org/abs/2406.11617) | `della` |||
| [DELLA](https://arxiv.org/abs/2406.11617) [Task Arithmetic](https://arxiv.org/abs/2212.04089) | `della_linear` |||
### Linear

The classic merge method - a simple weighted average.
Expand Down Expand Up @@ -189,6 +190,15 @@ Parameters:

- `filter_wise`: if true, weight calculation will be per-row rather than per-tensor. Not recommended.

### [DELLA](https://arxiv.org/abs/2406.11617)

Building upon DARE, DELLA uses adaptive pruning based on parameter magnitudes. DELLA first ranks parameters in each row of delta parameters and assigns drop probabilities inversely proportional to their magnitudes. This allows it to retain more important changes while reducing interference. After pruning, it rescales the remaining parameters similar to [DARE](#dare). DELLA can be used with (`della`) or without (`della_linear`) the sign elect step of TIES

Parameters: same as [Linear](#linear), plus:
- `density` - fraction of weights in differences from the base model to retain
- `epsilon` - maximum change in drop probability based on magnitude. Drop probabilities assigned will range from `density - epsilon` to `density + epsilon`. (When selecting values for `density` and `epsilon`, ensure that the range of probabilities falls within 0 to 1)
- `lambda` - scaling factor for the final merged delta parameters before merging with the base parameters.

## LoRA extraction

Mergekit allows extracting PEFT-compatible low-rank approximations of finetuned models.
Expand Down

0 comments on commit 6acdb23

Please sign in to comment.