You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/merger_2_model_modes.md
+38-3Lines changed: 38 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,12 +23,47 @@
23
23
---
24
24
25
25
## Power-Up (DARE)
26
-
> Adds the unique capabilities of Model B to Model A using the Drop and Rescale (DARE) technique, which often preserves the knowledge of the base model better than simple additions.
26
+
> Adds the unique capabilities of Model B to Model A using the Drop and Rescale (DARE) technique. This implementation handles shape mismatches between models by padding and uses a randomized dropout mask.
27
27
28
28
**Models Used:** A, B
29
29
**Parameters:**
30
-
-**Alpha:** The dropout rate. This is the proportion of unique weights from Model B that are *dropped* before merging. A higher value means less of B is merged.
30
+
-**Alpha:** The dropout rate ($p$). This is the proportion of delta parameters from Model B that are randomly set to zero.
31
31
-**Beta:** A final multiplier for the rescaled difference before it's added to Model A.
32
+
-**Rescaling Logic:** Remaining weights are automatically rescaled by $1/(1-p)$ as per the DARE paper to approximate the original embeddings.
33
+
34
+
---
35
+
36
+
## Enhanced Man Interp
37
+
> Sophisticated interpolation between values from A and B depending on their difference relative to other values, with manual threshold control.
38
+
39
+
**Models Used:** A, B
40
+
**Parameters:**
41
+
-**Alpha:** Interpolation strength.
42
+
-**Beta:** Lower mean threshold for filtering differences.
43
+
-**Gamma:** Upper mean threshold for filtering differences.
44
+
-**Delta:** Smoothness factor (mix between randomized mask and powered differences).
45
+
46
+
---
47
+
48
+
## Enhanced Auto Interp
49
+
> Automated version of the enhanced interpolation mode that dynamically calculates thresholds based on mean differences.
50
+
51
+
**Models Used:** A, B
52
+
**Parameters:**
53
+
-**Alpha:** Interpolation strength.
54
+
-**Beta:** Threshold adjustment factor.
55
+
-**Gamma:** Smoothness factor.
56
+
57
+
---
58
+
59
+
## Weight-Sum Cutoff
60
+
> A linear interpolation mode that only applies the merge to weights whose differences fall within a specific threshold range.
61
+
62
+
**Models Used:** A, B
63
+
**Parameters:**
64
+
-**Alpha:** Interpolation weight (multiplier for the difference).
65
+
-**Beta:** Upper threshold for the difference cutoff.
66
+
-**Gamma:** Lower threshold for the difference cutoff.
32
67
33
68
---
34
69
@@ -84,4 +119,4 @@ Layers matching any pattern will be **removed entirely** from the output.
84
119
|`text_model`| All text encoder layers |
85
120
|`\.norm`| All normalization layers |
86
121
|`attn\.(q\|k\|v)`| Query, key, value attention weights |
Copy file name to clipboardExpand all lines: docs/merger_3_model_modes.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,19 +22,19 @@
22
22
---
23
23
24
24
## Extract-Features
25
-
> A powerful mode that identifies features present in both `(B - A)` and `(C - A)` and adds them to A. Allows for fine-grained control over combining aspects based on their similarity.
25
+
> A powerful mode that identifies features present in both `(B - A)` and `(C - A)` and adds them to A. It uses per-vector cosine similarity to decide how much of each feature to keep, allowing for fine-grained control over combining aspects.
26
26
27
27
**Models Used:** A, B, C
28
28
**Parameters:**
29
29
-**Alpha:** Weights the merge between Model B (`0.0`) and Model C (`1.0`).
30
30
-**Beta:** Controls the focus on similarity (`0.0`) versus dissimilarity (`1.0`).
31
-
-**Gamma:** A bias exponent for similarity. Higher values increase the bias.
31
+
-**Gamma:** A bias exponent for similarity calculation.
32
32
-**Delta:** A final multiplier for the extracted features before they are added to Model A.
33
33
34
34
---
35
35
36
36
## Add-Dissimilarities
37
-
> Identifies features that are dissimilar between Model B and Model C and adds them to Model A. Useful for combining unique aspects of two different models.
37
+
> Identifies features that are dissimilar between Model B and Model C (relative to A) and adds them to Model A. Useful for combining unique aspects of two different models.
38
38
39
39
**Models Used:** A, B, C
40
40
**Parameters:**
@@ -80,4 +80,4 @@ Layers matching any pattern will be **removed entirely** from the output.
80
80
**Pattern format:**
81
81
- Whitespace-separated regex patterns
82
82
- Patterns use **substring matching** (not full match)
83
-
- Example: `text_model lora` matches any key containing "text_model" OR "lora"
83
+
- Example: `text_model lora` matches any key containing "text_model" OR "lora"
0 commit comments