Skip to content

Commit 0c117ff

Browse files
author
HackTricks News Bot
committed
Add content from: Research Update Enhanced src/AI/AI-llm-architecture/3.-token...
1 parent 0d77c05 commit 0c117ff

1 file changed

Lines changed: 25 additions & 0 deletions

File tree

src/AI/AI-llm-architecture/3.-token-embeddings.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,29 @@ Combined Embedding = Token Embedding + Positional Embedding
173173
- **Contextual Awareness:** The model can differentiate between tokens based on their positions.
174174
- **Sequence Understanding:** Enables the model to understand grammar, syntax, and context-dependent meanings.
175175

176+
## **Positional Embeddings in Modern LLMs**
177+
178+
### **Rotary Positional Embeddings (RoPE)**
179+
180+
RoPE encodes position by applying a position-dependent rotation to pairs of dimensions in the query/key vectors, turning absolute positions into relative phase differences. This provides relative position information while keeping embedding dimensionality unchanged and is widely used in recent decoder-only LLMs.
181+
182+
For how token and positional embeddings are combined inside the model, see [the LLM architecture page](5.-llm-architecture.md).
183+
184+
### **Extending Context Windows in RoPE-Based Models**
185+
186+
Recent work shows that context length is often limited by the positional encoding scheme rather than the token embedding matrix itself.
187+
188+
- **Position Interpolation (PI):** Rescales position indices so longer sequences map into the range seen during training, enabling extension with minimal fine-tuning. Example:
189+
190+
```python
191+
# Position Interpolation (PI) intuition
192+
orig_ctx = 2048
193+
new_ctx = 8192
194+
scaled_pos = pos * (orig_ctx / new_ctx)
195+
```
196+
197+
- **YaRN:** A compute-efficient RoPE extension strategy that modifies RoPE scaling/interpolation to extrapolate to longer contexts with fewer additional training steps.
198+
176199
## Code Example
177200

178201
Following with the code example from [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/ch02.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/ch02.ipynb):
@@ -219,4 +242,6 @@ print(input_embeddings.shape) # torch.Size([8, 4, 256])
219242
- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
220243

221244

245+
- [https://arxiv.org/abs/2306.15595](https://arxiv.org/abs/2306.15595)
246+
- [https://arxiv.org/abs/2309.00071](https://arxiv.org/abs/2309.00071)
222247
{{#include ../../banners/hacktricks-training.md}}

0 commit comments

Comments
 (0)