graph LR
labml_nn_transformers_gpt["labml_nn.transformers.gpt"]
labml_nn_transformers_models["labml_nn.transformers.models"]
labml_nn_transformers_mha["labml_nn.transformers.mha"]
labml_nn_transformers_positional_encoding["labml_nn.transformers.positional_encoding"]
labml_nn_transformers_rope["labml_nn.transformers.rope"]
labml_nn_transformers_configs["labml_nn.transformers.configs"]
labml_nn_transformers_label_smoothing_loss["labml_nn.transformers.label_smoothing_loss"]
labml_nn_transformers_utils["labml_nn.transformers.utils"]
labml_nn_transformers_gpt -- "uses" --> labml_nn_transformers_mha
labml_nn_transformers_gpt -- "uses" --> labml_nn_transformers_positional_encoding
labml_nn_transformers_gpt -- "uses" --> labml_nn_transformers_rope
labml_nn_transformers_gpt -- "receives configurations from" --> labml_nn_transformers_configs
labml_nn_transformers_gpt -- "utilizes" --> labml_nn_transformers_label_smoothing_loss
labml_nn_transformers_gpt -- "relies on" --> labml_nn_transformers_utils
labml_nn_transformers_models -- "uses" --> labml_nn_transformers_mha
labml_nn_transformers_models -- "uses" --> labml_nn_transformers_positional_encoding
labml_nn_transformers_models -- "uses" --> labml_nn_transformers_rope
labml_nn_transformers_models -- "receives configurations from" --> labml_nn_transformers_configs
labml_nn_transformers_models -- "relies on" --> labml_nn_transformers_utils
labml_nn_transformers_mha -- "is a core building block for" --> labml_nn_transformers_gpt
labml_nn_transformers_mha -- "is a core building block for" --> labml_nn_transformers_models
labml_nn_transformers_mha -- "may use" --> labml_nn_transformers_utils
labml_nn_transformers_positional_encoding -- "provides positional information to" --> labml_nn_transformers_gpt
labml_nn_transformers_positional_encoding -- "provides positional information to" --> labml_nn_transformers_models
labml_nn_transformers_rope -- "provides alternative positional embedding to" --> labml_nn_transformers_gpt
labml_nn_transformers_rope -- "provides alternative positional embedding to" --> labml_nn_transformers_models
labml_nn_transformers_configs -- "provides configuration settings to" --> labml_nn_transformers_gpt
labml_nn_transformers_configs -- "provides configuration settings to" --> labml_nn_transformers_models
labml_nn_transformers_utils -- "provides utility functions to" --> labml_nn_transformers_mha
labml_nn_transformers_utils -- "provides utility functions to" --> labml_nn_transformers_gpt
labml_nn_transformers_utils -- "provides utility functions to" --> labml_nn_transformers_models
The Transformer Model Implementations subsystem is primarily encapsulated within the labml_nn.transformers package. This subsystem focuses on providing core building blocks and complete implementations of various transformer architectures.
Implements the Generative Pre-trained Transformer (GPT) architecture, an autoregressive model designed for sequence generation tasks. It manages the overall GPT model structure, including layers, attention mechanisms, and forward pass logic.
Related Classes/Methods:
Provides a generic and reusable framework for constructing various transformer models, capable of encompassing both encoder and decoder functionalities. It offers a flexible base for building different transformer architectures by composing core components.
Related Classes/Methods:
Implements the multi-head attention mechanism, a fundamental component for capturing dependencies across different representation subspaces within sequences. It computes attention scores and combines information from multiple "heads" to form a richer representation.
Related Classes/Methods:
Generates and applies sinusoidal positional encodings to input sequences, providing crucial positional information to attention mechanisms, as transformers are permutation-invariant. It injects absolute positional information into token embeddings.
Related Classes/Methods:
Implements Rotary Positional Embeddings (RoPE), an alternative and often more effective method for integrating relative positional information directly into attention computations. It modifies attention computations to incorporate relative positional data.
Related Classes/Methods:
Defines and manages configuration settings for various transformer sub-components and models. It enables flexible and standardized parameterization of models and their building blocks.
Related Classes/Methods:
Provides a label smoothing regularization technique, commonly used to improve the generalization and calibration of deep learning models, particularly in sequence-to-sequence tasks. It computes a modified cross-entropy loss with label smoothing.
Related Classes/Methods:
Offers general utility functions that support various transformer implementations. It provides common helper operations such as mask generation, tensor manipulations, etc.
Related Classes/Methods: