Skip to content

feat:Support LoRA incremental weight synchronization on disk for FSDP and SGLang#1233

Open
TaoZex wants to merge 25 commits intoinclusionAI:mainfrom
TaoZex:lora_incre
Open

feat:Support LoRA incremental weight synchronization on disk for FSDP and SGLang#1233
TaoZex wants to merge 25 commits intoinclusionAI:mainfrom
TaoZex:lora_incre

Conversation

@TaoZex
Copy link
Copy Markdown
Collaborator

@TaoZex TaoZex commented Apr 23, 2026

Description

Implement disk-based LoRA delta weight synchronization for FSDP training engine and SGLang inference backend. When lora_delta_sync is enabled, the first weight sync transmits the full base model via /update_weights_from_disk, and subsequent syncs only transmit LoRA adapter weights via /load_lora_adapter, significantly reducing communication overhead.

Key changes:

  • Disk-based delta sync path: Base model weights are saved as HuggingFace safetensors and loaded by SGLang via /update_weights_from_disk; adapter weights are saved as PEFT format and loaded via /load_lora_adapter.
  • Dispatch logic: When lora_delta_sync=True, the delta sync path is used regardless of weight_update_mode (disk or xccl), eliminating the need for NCCL-based distributed weight update.
  • Entropy regularization: Add entropy_coeff parameter to PPOActorConfig and grpo_loss_fn, supporting entropy bonus in the loss function.
  • Shared filesystem support: Add delta_sync_dir config field for multi-node setups, defaulting to ~/.cache/areal/ for single-node.
  • Documentation: Update both Chinese and English LoRA reference docs to reflect disk-based sync flow, new parameters, and multi-node filesystem guidance.
  • Tests: Add TestDeltaSyncDispatchLogic, test_entropy_coeff_default, test_entropy_coeff_can_set unit tests.

Related Issue

Fixes #(issue)

Type of Change

  • ✨ New feature
  • 💥 Breaking change
  • 📝 Documentation update
  • ♻️ Refactoring
  • ⚡ Performance improvement
  • ✅ Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • Pre-commit hooks pass (pre-commit run --all-files)
  • Relevant tests pass; new tests added for new functionality
  • Documentation updated (if applicable; built with ./docs/build_all.sh)
  • Branch is up to date with main
  • Self-reviewed via /review-pr command
  • This PR was created by a coding agent via /create-pr
  • This PR is a breaking change

Breaking Change Details (if applicable):

Additional Context

Files changed:

File Change
areal/api/cli_args.py Add entropy_coeff and delta_sync_dir fields; update lora_delta_sync help text
areal/engine/fsdp_engine.py Disk-based delta sync (_update_weights_delta_sync_disk, _save_base_model_for_delta_sync);
areal/engine/sglang_remote.py Update docstrings for disk-based sync
areal/trainer/ppo/actor.py Add entropy_coeff parameter and entropy bonus logic

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces "LoRA Delta Sync," an incremental weight update mechanism for FSDP and SGLang that reduces communication overhead by transmitting only adapter weights after an initial full sync. It also adds entropy regularization to the PPO actor. Feedback focuses on potential memory issues when collecting full model parameters on a single rank, the need for shared storage in multi-node setups for adapter files, and minor code cleanups regarding imports and logic simplification.

Comment thread areal/engine/fsdp_engine.py
Comment thread areal/engine/fsdp_engine.py
Comment thread areal/engine/fsdp_engine.py Outdated
Comment thread areal/engine/fsdp_engine.py Outdated
Comment thread areal/trainer/ppo/actor.py Outdated
@TaoZex
Copy link
Copy Markdown
Collaborator Author

TaoZex commented Apr 23, 2026

Test Content

Test the code and generate test results

  1. tests/test_lora_delta_sync.py
image
  1. tests/test_lora_delta_sync_e2e.py (Invoke tests/torchrun/run_lora_delta_sync.py for testing)
image

@TaoZex
Copy link
Copy Markdown
Collaborator Author

TaoZex commented Apr 23, 2026

Before optimization, parameters of both the base model and adapter model needed to be updated. After optimization, Step > 1 only the adapter model parameters are transmitted.

1. Task Reward

image LoRA adopts incremental disk weight update, delivering stable training with continuous reward growth.

2. Weight Synchronization Data Volume

  • Base model synchronization: 2944.40M
image
  • Adapter model synchronization: 35.21M
image

The proportion of weight synchronization parameters is reduced to: 35.21 / (35.21 + 2944.40) = 1.18%

The overall parameter transmission volume is reduced by 98.82%.

3. Weight Synchronization Latency

image
  • Step 1 (Base + Adapter): 6.74s
  • Step > 1 (Only Adapter): 0.35s (average indicator in the chart)

The latency is reduced from 6.74s to 0.35s, with the update weight time decreased by 94.8%.

@TaoZex TaoZex marked this pull request as ready for review April 23, 2026 10:21
@TaoZex
Copy link
Copy Markdown
Collaborator Author

TaoZex commented Apr 23, 2026

@rchardx This idea is inspired by the incremental update of weights. If you’re interested, would you mind reviewing it or sharing your suggestions in your spare time? Looking forward to your reply. Thanks~

Copy link
Copy Markdown
Collaborator

@garrett4wade garrett4wade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @TaoZex , lora update is expected to be finished via the "disk" update mode, which has already implemented the path of "/load_lora_adapter". The critical issue that we should fix is that the FSDP engine always save full parameters rather than the LoRAs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants