|
| 1 | +# Coding Principles |
| 2 | + |
| 3 | +Guidelines for production code in ModelOpt. Key values: simplicity, modularity, |
| 4 | +and conciseness. |
| 5 | + |
| 6 | +## Principles |
| 7 | + |
| 8 | +- **Prefer simple, surgical changes.** Touch only what the task requires. Avoid speculative |
| 9 | + refactors, broad rewrites, and "while we're here" cleanups. |
| 10 | +- **Design for simplicity and readability.** Choose the design that is easiest to understand and maintain. |
| 11 | + Code is read top to bottom: put high-level behavior first, hide lower-level details behind well-named helpers, |
| 12 | + and treat heavy branching as a signal to reconsider the design. |
| 13 | +- **Prefer modular, composable solutions.** Avoid input-specific or case-specific hard-coding. |
| 14 | + Use existing extension points when they fit. If none fit, add a simple, focused helper, |
| 15 | + class, or plugin that cleanly captures the new behavior. Keep scope limited to known cases. |
| 16 | +- **Respect inheritance boundaries.** Parent abstractions should define shared contracts and |
| 17 | + shared behavior, not child-specific special cases. |
| 18 | +- **Don't repeat yourself; keep a single source of truth.** Consolidate repeated logic or intent with a shared helper, API, |
| 19 | + or abstraction when doing so keeps the design simpler. Avoid duplication that can drift out of sync. |
| 20 | +- **Comment cautiously.** Comments should add context, not translate code into English. |
| 21 | + Prefer making the code self-explanatory first. Use comments only for non-obvious |
| 22 | + intent or constraints that remain unclear from the code. Apply this guidance to new |
| 23 | + comments only; do not rewrite or delete existing comments just for style. |
| 24 | +- **Document public APIs.** Public and higher-level APIs should have docstrings, including examples when useful. |
| 25 | + Internal helpers should usually be self-documenting through clear names and structure. |
| 26 | +- **Fix the bug cause, not the side effect.** For bug fixes, find the root cause instead of patching for its side effect. |
| 27 | +- **Validate external input once.** Check types and values at the interface boundary. Internal code can trust those |
| 28 | + checks and avoid redundant assertions. |
| 29 | +- **Remove dead code.** Delete unused imports, unreachable branches, and obsolete helpers. |
| 30 | +- **Use relative paths** from the repo root in commands and file references. |
| 31 | + |
| 32 | +## Testing |
| 33 | + |
| 34 | +- **Develop with focused tests.** During development, write as many focused |
| 35 | + tests as needed, including lower-level unit tests or internal probes, to |
| 36 | + understand and harden behavior. |
| 37 | +- **Curate production tests and keep them lean.** Before staging or committing, |
| 38 | + decide which tests should be checked in. Checked-in tests should document |
| 39 | + expected behavior, protect against regressions, or flag backward-incompatible |
| 40 | + behavior changes. Remove redundant lower-level tests when a higher-level test |
| 41 | + already covers the same behavior, keeping CI/CD fast and lean. |
| 42 | + |
| 43 | +## Performant AI Code |
| 44 | + |
| 45 | +- **Keep tensor work on the GPU and avoid unnecessary CPU-GPU syncs.** Reading metadata such as `tensor.shape` is fine. |
| 46 | + Avoid Python scalar extraction and operators such as `tensor.item()`, `float(tensor)`, or `min(tensor)` because they |
| 47 | + can trigger CPU-GPU syncs. Use PyTorch tensor ops such as `tensor.min()` by default, and only extract Python scalars |
| 48 | + when the CPU needs the value. Tensor-value-based Python branching can also break CUDA graphs. |
| 49 | +- **Develop with distributed processing in mind.** Examples: Use `print_rank_0` or `warn_rank_0` |
| 50 | + when possible to avoid noisy logs. Guard shared side effects, such as |
| 51 | + file writes or shared state updates, against race conditions between ranks. |
| 52 | + |
| 53 | +## Compatibility |
| 54 | + |
| 55 | +- **Preserve config and checkpoint backward compatibility.** ModelOpt checkpoints include serialized |
| 56 | + `ModeloptBaseConfig` instances such as `QuantizeConfig`. If these Pydantic-based configs change |
| 57 | + without backward compatibility handling, older checkpoints may no longer load. Make breaking changes |
| 58 | + explicit and intentional. |
0 commit comments