Skip to content

feat(pt): add descriptor name & parameter numbers output & gpu name (only for cuda) & Capitalise some infos#5140

Closed
OutisLi wants to merge 3 commits intodeepmodeling:masterfrom
OutisLi:pr/display
Closed

feat(pt): add descriptor name & parameter numbers output & gpu name (only for cuda) & Capitalise some infos#5140
OutisLi wants to merge 3 commits intodeepmodeling:masterfrom
OutisLi:pr/display

Conversation

@OutisLi
Copy link
Copy Markdown
Collaborator

@OutisLi OutisLi commented Jan 9, 2026

Summary by CodeRabbit

  • New Features
    • Model summary logging added at training start: shows descriptor type and parameter counts for single-model and multi-task runs.
    • Runtime capture of GPU device name when running with the PyTorch backend and CUDA available, added to build info.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 9, 2026 04:37
@github-actions github-actions Bot added the Python label Jan 9, 2026
@dosubot dosubot Bot added the new feature label Jan 9, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 9, 2026

📝 Walkthrough

Walkthrough

Adds a private Trainer method _log_model_summary that logs descriptor type and parameter counts at rank 0 during initialization (supports single-model and multi-task). Also augments build_info to include the CUDA device name when Backend is PyTorch and CUDA is available.

Changes

Cohort / File(s) Summary
Trainer logging enhancement
deepmd/pt/train/training.py
Added private method _log_model_summary() to determine descriptor type (via get_descriptor_type) and count model parameters (count_parameters(model)); invoked on rank 0 during Trainer initialization after profiling attrs are set. Handles single-model and multi-task model_key iteration.
Build info GPU capture
deepmd/utils/summary.py
When Backend == "PyTorch" and CUDA is available, imports torch and sets build_info["device name"] to the name of device 0, adding runtime GPU device info to the build metadata.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title accurately reflects the main changes: adding descriptor name output, parameter numbers output, GPU name logging for CUDA, and capitalizing some info in the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @deepmd/pt/train/training.py:
- Around line 731-744: The get_descriptor_type function can raise
IndexError/AttributeError/KeyError when accessing model.atomic_model.models[0]
or calling serialize()["type"]; add a bounds check to ensure
model.atomic_model.models is a non-empty sequence before indexing, and wrap
calls to descriptor.serialize() and dict access to ["type"] in a try/except that
falls back to "UNKNOWN" (or returns the serialized type plus " (with ZBL)" on
success); update checks to verify dp_model has a descriptor attribute and that
its serialize() returns a mapping with a "type" key to avoid unhandled
exceptions.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f73113 and d6fa9cb.

📒 Files selected for processing (1)
  • deepmd/pt/train/training.py
🧰 Additional context used
🧬 Code graph analysis (1)
deepmd/pt/train/training.py (2)
deepmd/pt/model/model/dp_model.py (1)
  • get_descriptor (52-54)
deepmd/pt/model/atomic_model/dp_atomic_model.py (1)
  • serialize (168-180)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (42)
  • GitHub Check: Agent
  • GitHub Check: CodeQL analysis (python)
  • GitHub Check: Test Python (9, 3.13)
  • GitHub Check: Test Python (7, 3.13)
  • GitHub Check: Test Python (1, 3.13)
  • GitHub Check: Test Python (6, 3.10)
  • GitHub Check: Test Python (12, 3.13)
  • GitHub Check: Test Python (7, 3.10)
  • GitHub Check: Test Python (11, 3.13)
  • GitHub Check: Test Python (2, 3.10)
  • GitHub Check: Test Python (11, 3.10)
  • GitHub Check: Test Python (9, 3.10)
  • GitHub Check: Test Python (5, 3.10)
  • GitHub Check: Test Python (5, 3.13)
  • GitHub Check: Test Python (10, 3.13)
  • GitHub Check: Test Python (3, 3.13)
  • GitHub Check: Test Python (6, 3.13)
  • GitHub Check: Test Python (8, 3.10)
  • GitHub Check: Test Python (12, 3.10)
  • GitHub Check: Test Python (10, 3.10)
  • GitHub Check: Test Python (8, 3.13)
  • GitHub Check: Test Python (2, 3.13)
  • GitHub Check: Test Python (4, 3.13)
  • GitHub Check: Test Python (3, 3.10)
  • GitHub Check: Test Python (4, 3.10)
  • GitHub Check: Test Python (1, 3.10)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Test C++ (true, true, true, false)
  • GitHub Check: Test C++ (true, false, false, true)
  • GitHub Check: Test C++ (false, false, false, true)
  • GitHub Check: Test C++ (false, true, true, false)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Analyze (python)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (clang, clang)
🔇 Additional comments (3)
deepmd/pt/train/training.py (3)

724-726: LGTM!

The placement and conditional execution on rank 0 are appropriate for logging model summary information after initialization.


746-748: Verify whether all parameters or only trainable parameters should be counted.

The current implementation counts all parameters, including non-trainable ones. If the intent is to count only trainable parameters, consider filtering by p.requires_grad.

Alternative implementation for trainable parameters only

If trainable parameters are intended:

 def count_parameters(model: Any) -> int:
     """Count the total number of trainable parameters."""
-    return sum(p.numel() for p in model.parameters())
+    return sum(p.numel() for p in model.parameters() if p.requires_grad)

750-761: LGTM!

The logging logic correctly handles both single-task and multi-task models, formatting the output appropriately for each case.

Comment thread deepmd/pt/train/training.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds logging functionality to output model summary information during training initialization, specifically the descriptor type and parameter count. This provides useful diagnostic information about the model being trained.

Key changes:

  • Added _log_model_summary() method to the Trainer class that logs descriptor type and total parameter count
  • Supports both single-task and multi-task training scenarios
  • Handles both standard models and ZBL models with appropriate descriptor type detection

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread deepmd/pt/train/training.py
Comment on lines +728 to +761
def _log_model_summary(self) -> None:
"""Log model summary information including descriptor type and parameter count."""

def get_descriptor_type(model: Any) -> str:
"""Get the descriptor type name from model."""
# Standard models have get_descriptor method
if hasattr(model, "get_descriptor"):
descriptor = model.get_descriptor()
return descriptor.serialize()["type"].upper()
# ZBL models: descriptor is in atomic_model.models[0]
if hasattr(model, "atomic_model") and hasattr(model.atomic_model, "models"):
dp_model = model.atomic_model.models[0]
if hasattr(dp_model, "descriptor"):
return (
dp_model.descriptor.serialize()["type"].upper() + " (with ZBL)"
)
return "UNKNOWN"

def count_parameters(model: Any) -> int:
"""Count the total number of trainable parameters."""
return sum(p.numel() for p in model.parameters())

if not self.multi_task:
desc_type = get_descriptor_type(self.model)
num_params = count_parameters(self.model)
log.info(f"Descriptor: {desc_type}")
log.info(f"Model params: {num_params / 1e6:.3f} M")
else:
# For multi-task, log each model's info
for model_key in self.model_keys:
desc_type = get_descriptor_type(self.model[model_key])
num_params = count_parameters(self.model[model_key])
log.info(f"Descriptor [{model_key}]: {desc_type}")
log.info(f"Model params [{model_key}]: {num_params / 1e6:.3f} M")
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new _log_model_summary() method that logs descriptor type and parameter count lacks explicit test coverage. While existing training tests will execute this code path, consider adding a dedicated test to verify that the descriptor type is correctly detected for different model types (standard models, ZBL models, multi-task models) and that the parameter counting logic works as expected. This would help catch potential issues with model structure assumptions.

Copilot uses AI. Check for mistakes.
@OutisLi OutisLi changed the title feat(pt): add descriptor name and paramter numbers output feat(pt): add descriptor name & parameter numbers output & gpu name (only for cuda) Jan 9, 2026
@OutisLi OutisLi changed the title feat(pt): add descriptor name & parameter numbers output & gpu name (only for cuda) feat(pt): add descriptor name & parameter numbers output & gpu name (only for cuda) & Capitalise some infos Jan 9, 2026
@OutisLi OutisLi closed this Jan 9, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 92.85714% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.94%. Comparing base (5f73113) to head (50f90b4).
⚠️ Report is 143 commits behind head on master.

Files with missing lines Patch % Lines
deepmd/pt/train/training.py 95.83% 1 Missing ⚠️
deepmd/utils/summary.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5140   +/-   ##
=======================================
  Coverage   81.94%   81.94%           
=======================================
  Files         712      712           
  Lines       72887    72915   +28     
  Branches     3616     3617    +1     
=======================================
+ Hits        59725    59751   +26     
- Misses      11998    12000    +2     
  Partials     1164     1164           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants