Does Gemma 4 support Context Parallelism (CP)?

**Is your feature request related to a problem? Please describe.**

When working with long-context models, memory and communication overhead quickly become a bottleneck. 
Context Parallelism (CP) is a useful approach to scale sequence length by partitioning activations across GPUs.

In the NeMo framework, CP is supported via Megatron strategy (e.g., setting `context_parallel_size > 1`), which can significantly improve long-context training efficiency.  [oai_citation:0‡NVIDIA Docs](https://docs.nvidia.com/nemo-framework/user-guide/24.09/longcontext/contextparallel.html?utm_source=chatgpt.com)

However, it is unclear whether this is supported for Gemma 4 models, especially given that some variants (e.g., MoE) may have additional constraints.

---

**Describe the solution you'd like**

I would like to understand:

- Does Gemma 4 support Context Parallelism (CP) in NeMo / AutoModel?
- If not, are there plans to support it in the future?
- Are there any limitations (e.g., MoE variants not supporting CP)?

---

**Describe alternatives you've considered**

Currently, alternatives include:
- Tensor Parallelism / Pipeline Parallelism
- Sequence Parallelism

However, these approaches are not as effective as CP for scaling long context lengths.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does Gemma 4 support Context Parallelism (CP)? #1912

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does Gemma 4 support Context Parallelism (CP)? #1912

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions