Add content from: Research Update Enhanced src/AI/AI-llm-architecture/3.-token...

HackTricks News Bot · HackTricks News Bot · commit 0c117ff9d5c2 · 2026-03-13T02:40:15.000Z
diff --git a/src/AI/AI-llm-architecture/3.-token-embeddings.md b/src/AI/AI-llm-architecture/3.-token-embeddings.md
@@ -173,6 +173,29 @@ Combined Embedding = Token Embedding + Positional Embedding
 - **Contextual Awareness:** The model can differentiate between tokens based on their positions.
 - **Sequence Understanding:** Enables the model to understand grammar, syntax, and context-dependent meanings.
 
+## **Positional Embeddings in Modern LLMs**
+
+### **Rotary Positional Embeddings (RoPE)**
+
+RoPE encodes position by applying a position-dependent rotation to pairs of dimensions in the query/key vectors, turning absolute positions into relative phase differences. This provides relative position information while keeping embedding dimensionality unchanged and is widely used in recent decoder-only LLMs.
+
+For how token and positional embeddings are combined inside the model, see [the LLM architecture page](5.-llm-architecture.md).
+
+### **Extending Context Windows in RoPE-Based Models**
+
+Recent work shows that context length is often limited by the positional encoding scheme rather than the token embedding matrix itself.
+
+- **Position Interpolation (PI):** Rescales position indices so longer sequences map into the range seen during training, enabling extension with minimal fine-tuning. Example:
+
+```python
+# Position Interpolation (PI) intuition
+orig_ctx = 2048
+new_ctx = 8192
+scaled_pos = pos * (orig_ctx / new_ctx)
+```
+
+- **YaRN:** A compute-efficient RoPE extension strategy that modifies RoPE scaling/interpolation to extrapolate to longer contexts with fewer additional training steps.
+
 ## Code Example
 
 Following with the code example from [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/ch02.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/ch02.ipynb):
@@ -219,4 +242,6 @@ print(input_embeddings.shape) # torch.Size([8, 4, 256])
 - [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
 
 
+- [https://arxiv.org/abs/2306.15595](https://arxiv.org/abs/2306.15595)
+- [https://arxiv.org/abs/2309.00071](https://arxiv.org/abs/2309.00071)
 {{#include ../../banners/hacktricks-training.md}}