Add --dry-run to DE CLI, fix docs for v0.7.0 API (#7)

katosh · web-flow · commit 9d1d69101710 · 2026-04-28T16:20:00.000-07:00
* Add smooth command to CLI docs, fix double-backslash rendering

- Add kompot smooth section with usage, options, and examples
- Add smooth to overview, quick start workflow, config templates, help
- Fix all \\ to single \ in code-block directives (RST renders literally)

* Remove redundant Quick Start examples from CLI docs

The per-command basic examples repeated the same info as the workflow
block above and the detailed command sections below.

* Fix Sphinx docs: exclude deprecated APIs, add smooth module, update labels

- Exclude compute_differential_abundance/expression from automodule
- Add smooth_expression automodule (exclude deprecated compute_smoothed_expression)
- Add RunInfo.to_settings() and call_args() to documented members
- Rename "Gene Expression Imputation" to "Smoothing" in toctree

* Add --dry-run to DE CLI with JSON output for pipeline integration

- Add --dry-run flag to kompot de: estimates resource requirements
  without running the analysis
- JSON output to stdout (or -o file), human-readable report to stderr
- Exit code 0 if feasible, 1 if not
- Add ResourcePlan.to_dict() for machine-parseable serialization
- Add configure_logging() to redirect kompot logger stream
- CLI now logs to stderr by default (keeps stdout clean for
  machine-parseable output like --dry-run and --table-output)
- Document --dry-run in CLI docs with pipeline examples

* Remove -o file option from --dry-run to prevent accidental overwrites

Users would likely reuse the same args as the real run, which would
overwrite their h5ad output with a JSON file. Stdout-only is safer.

* Add Unreleased changelog section for post-v0.7.0 changes

* Fix DE test failures and update CI actions to Node.js 24

- Add dry_run=False to all 8 DE test Namespace objects (run_de now
  accesses args.dry_run from the --dry-run flag added in this PR)
- Update actions/checkout v4→v6 and actions/setup-python v5→v6 to
  resolve Node.js 20 deprecation warnings

* Add dry-run and configure_logging tests, fix CI coverage reporting

- Add TestCLIDryRun: covers --dry-run JSON output, infeasible exit
  code 1, and output validation skip when dry_run=True
- Add TestConfigureLogging: covers stream redirection and default
- Fix codecov action: file→files (v5 API change)
- Add --cov-report=term-missing to pytest so coverage stats appear
  in CI logs

* Fix missing sys import in configure_logging tests
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -14,9 +14,9 @@ jobs:
         python-version: ['3.10', '3.11', '3.12']
 
     steps:
-    - uses: actions/checkout@v4
+    - uses: actions/checkout@v6
     - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python@v5
+      uses: actions/setup-python@v6
       with:
         python-version: ${{ matrix.python-version }}
         cache: 'pip'
@@ -35,11 +35,11 @@ jobs:
     
     - name: Test with pytest and generate coverage report
       run: |
-        pytest --cov=kompot --cov-report=xml
+        pytest --cov=kompot --cov-report=xml --cov-report=term-missing:skip-covered
     
     - name: Upload coverage to Codecov
       uses: codecov/codecov-action@v5
       with:
-        file: ./coverage.xml
+        files: ./coverage.xml
         fail_ci_if_error: false
         token: ${{ secrets.CODECOV_TOKEN }}
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,23 @@
 
 All notable changes to this project will be documented in this file.
 
+## [Unreleased]
+
+### New features
+
+ - **`--dry-run` flag for `kompot de` CLI**: estimates memory, disk, and output field requirements without running the analysis. Outputs machine-parseable JSON to stdout and a human-readable report to stderr. Exit code reflects feasibility.
+ - **`kompot.configure_logging(stream)`**: reconfigure the kompot logger output stream. The CLI now logs to stderr by default, keeping stdout clean for machine-parseable output (dry-run JSON, table output).
+
+### Improvements
+
+ - **CLI logs to stderr**: all `kompot` CLI commands now write log messages to stderr instead of stdout, so stdout is reserved for data output.
+ - **`kompot smooth` documented in CLI guide**: added full command reference, options, and examples to the Sphinx CLI docs.
+ - Fix double-backslash rendering in all CLI doc code blocks.
+ - Exclude deprecated `compute_differential_*` functions from Sphinx automodule output.
+ - Add `smooth_expression()` module to Sphinx API docs.
+ - Add `RunInfo.to_settings()` and `call_args()` to documented members.
+ - Fix "Gene Expression Imputation" → "Gene Expression Smoothing" in docs toctree.
+
 ## [0.7.0] - 2026-04-13
 
 ### Breaking changes
diff --git a/docs/source/anndata.rst b/docs/source/anndata.rst
@@ -22,6 +22,7 @@ Differential Abundance
    :members:
    :undoc-members:
    :show-inheritance:
+   :exclude-members: compute_differential_abundance
 
 Differential Expression
 -----------------------
@@ -30,6 +31,16 @@ Differential Expression
    :members:
    :undoc-members:
    :show-inheritance:
+   :exclude-members: compute_differential_expression
+
+Smooth Expression
+-----------------
+
+.. automodule:: kompot.anndata.smooth
+   :members:
+   :undoc-members:
+   :show-inheritance:
+   :exclude-members: compute_smoothed_expression
 
 Resource Estimation
 -------------------
@@ -71,7 +82,7 @@ Utilities
 ---------
 
 .. autoclass:: kompot.anndata.utils.RunInfo
-   :members: __init__, get_summary, get_data, compare_with
+   :members: __init__, get_summary, get_data, compare_with, to_settings, call_args
    :show-inheritance:
 
 Cleanup Utilities
diff --git a/docs/source/cli.rst b/docs/source/cli.rst
@@ -40,9 +40,6 @@ All commands support:
 Quick Start
 -----------
 
-Complete Workflow
-^^^^^^^^^^^^^^^^^
-
 .. code-block:: bash
 
    # 1. Compute diffusion maps (preprocessing)
@@ -54,54 +51,18 @@ Complete Workflow
    kompot de input_with_dm.h5ad -o de_results.h5ad \
      --groupby condition \
      --condition1 control \
-     --condition2 treatment \
-     --obsm-key DM_EigenVectors
+     --condition2 treatment
 
    # 3. Run differential abundance
    kompot da input_with_dm.h5ad -o da_results.h5ad \
      --groupby condition \
      --condition1 control \
-     --condition2 treatment \
-     --obsm-key DM_EigenVectors
+     --condition2 treatment
 
    # 4. Smooth gene expression for a single condition
    kompot smooth input_with_dm.h5ad -o smoothed.h5ad \
      --groupby condition \
-     --condition treatment \
-     --obsm-key DM_EigenVectors
-
-Diffusion Maps (Preprocessing)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. code-block:: bash
-
-   kompot dm input.h5ad -o output.h5ad \
-     --pca-key X_pca \
-     --n-components 10 \
-     --knn 30
-
-Differential Expression (Basic)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. code-block:: bash
-
-   kompot de input.h5ad -o output.h5ad \
-     --groupby condition \
-     --condition1 control \
-     --condition2 treatment \
-     --obsm-key X_pca \
-     --layer logged_counts
-
-Differential Abundance (Basic)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. code-block:: bash
-
-   kompot da input.h5ad -o output.h5ad \
-     --groupby condition \
-     --condition1 control \
-     --condition2 treatment \
-     --obsm-key X_pca
+     --condition treatment
 
 Using Config Files
 ^^^^^^^^^^^^^^^^^^
@@ -247,6 +208,7 @@ Boolean Flags
 
 .. code-block:: text
 
+   --dry-run                  # Estimate resources, print plan, exit (no analysis)
    --no-progress             # Disable progress bars
    --store-landmarks           # Store landmarks for reuse
    --store-additional-stats    # Store extra statistics
@@ -287,6 +249,31 @@ Example: Complete Analysis
      --null-genes 2000 \
      --store-additional-stats
 
+Example: Dry Run (Resource Estimation)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Use ``--dry-run`` to check memory and disk requirements before committing
+to a long analysis. JSON is written to stdout; the human-readable report
+goes to stderr. Use the same arguments as the real run (``-o`` is ignored).
+
+.. code-block:: bash
+
+   # Check resources (human report on stderr, JSON on stdout)
+   kompot de bone_marrow.h5ad --dry-run \
+     --groupby Age \
+     --condition1 Young \
+     --condition2 Old
+
+   # Save JSON plan and check feasibility in a pipeline
+   kompot de input.h5ad --dry-run --groupby cond --condition1 A --condition2 B \
+     > plan.json && echo "feasible"
+
+   # Parse specific fields with jq
+   kompot de input.h5ad --dry-run --groupby cond --condition1 A --condition2 B \
+     2>/dev/null | jq '.memory.total_human'
+
+The exit code is ``0`` if the plan is feasible, ``1`` if not.
+
 Differential Abundance Command
 -------------------------------
 
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -26,7 +26,7 @@
    Getting Started <notebooks/01_getting_started.ipynb>
    Advanced Differential Expression <notebooks/02_differential_expression_detailed.ipynb>
    Sample Variance Analysis <notebooks/03_sample_variance.ipynb>
-   Gene Expression Imputation <notebooks/04_expression_model.ipynb>
+   Gene Expression Smoothing <notebooks/04_expression_model.ipynb>
 
 .. |doi| image:: https://zenodo.org/badge/944121568.svg
    :target: https://zenodo.org/badge/latestdoi/944121568
diff --git a/kompot/__init__.py b/kompot/__init__.py
@@ -99,6 +99,23 @@
 logging.config.dictConfig(LOGGING_CONFIG)
 logger = logging.getLogger("kompot")
 
+
+def configure_logging(stream=None):
+    """Reconfigure the kompot logger to write to a different stream.
+
+    Parameters
+    ----------
+    stream : file-like, optional
+        Output stream for log messages. Defaults to ``sys.stdout``.
+        CLI tools typically pass ``sys.stderr`` so stdout stays clean
+        for machine-parseable output.
+    """
+    if stream is None:
+        stream = sys.stdout
+    for handler in logger.handlers:
+        if hasattr(handler, "stream"):
+            handler.stream = stream
+
 __all__ = [
     # Version
     "__version__",
@@ -122,6 +139,8 @@
     "StorageSettings",
     "OutputSettings",
     "ModelSettings",
+    # Configuration
+    "configure_logging",
     # Utility functions
     "compute_mahalanobis_distance",
     "find_landmarks",
diff --git a/kompot/cli/de.py b/kompot/cli/de.py
@@ -1,6 +1,7 @@
 """CLI command for differential expression analysis."""
 
 import argparse
+import json
 import sys
 from pathlib import Path
 import logging
@@ -22,6 +23,19 @@
 logger = logging.getLogger("kompot.cli")
 
 
+def _json_default(obj):
+    """JSON serializer for numpy types."""
+    import numpy as np
+
+    if isinstance(obj, (np.integer,)):
+        return int(obj)
+    if isinstance(obj, (np.floating,)):
+        return float(obj)
+    if isinstance(obj, np.ndarray):
+        return obj.tolist()
+    raise TypeError(f"Object of type {type(obj).__name__} is not JSON serializable")
+
+
 def add_de_parser(subparsers) -> argparse.ArgumentParser:
     """
     Add differential expression subcommand parser.
@@ -165,6 +179,14 @@ def add_de_parser(subparsers) -> argparse.ArgumentParser:
         help="Estimate per-gene heteroscedastic noise from squared residuals to deflate significance for high-noise genes",
     )
 
+    # Dry run
+    parser.add_argument(
+        "--dry-run",
+        action="store_true",
+        help="Estimate resource requirements instead of running the analysis. "
+        "JSON to stdout, human-readable report to stderr. -o/--output is ignored.",
+    )
+
     # Compute configuration
     parser.add_argument(
         "--use-gpu",
@@ -192,8 +214,8 @@ def run_de(args):
     args
         Parsed arguments from argparse
     """
-    # Validate output arguments
-    if not args.output and not args.table_output:
+    # Validate output arguments (not required for dry-run)
+    if not args.dry_run and not args.output and not args.table_output:
         logger.error("Either --output or --table-output must be specified")
         sys.exit(1)
 
@@ -245,6 +267,7 @@ def run_de(args):
             "command",
             "use_gpu",
             "threads",
+            "dry_run",
         ]
     }
 
@@ -355,23 +378,45 @@ def run_de(args):
         output_kwargs["return_full_results"] = True
     output = OutputSettings(**output_kwargs) if output_kwargs else None
 
+    # Build shared call kwargs
+    call_kwargs = dict(
+        groupby=groupby,
+        condition1=condition1,
+        condition2=condition2,
+        obsm_key=obsm_key,
+        layer=layer,
+        sample_col=sample_col,
+        gp=gp,
+        fdr=fdr,
+        filter=filter_settings,
+        storage=storage,
+        output=output,
+        **params,  # remaining params forwarded as function_kwargs
+    )
+
+    # Dry run: estimate resources, output JSON to stdout, report to stderr
+    if args.dry_run:
+        import io
+
+        try:
+            old_stdout = sys.stdout
+            sys.stdout = sys.stderr  # capture de()'s print(plan.format_report())
+            try:
+                plan = de(adata, dry_run=True, **call_kwargs)
+            finally:
+                sys.stdout = old_stdout
+        except Exception as e:
+            logger.error(f"Dry run failed: {str(e)}")
+            raise
+
+        # Machine-parseable JSON to stdout
+        json.dump(plan.to_dict(), sys.stdout, default=_json_default, indent=2)
+        print(file=sys.stdout)  # trailing newline
+        sys.exit(0 if plan.is_feasible else 1)
+
     # Run analysis
     try:
-        result_dict = de(
-            adata,
-            groupby=groupby,
-            condition1=condition1,
-            condition2=condition2,
-            obsm_key=obsm_key,
-            layer=layer,
-            sample_col=sample_col,
-            gp=gp,
-            fdr=fdr,
-            filter=filter_settings,
-            storage=storage,
-            output=output,
-            **params,  # remaining params forwarded as function_kwargs
-        )
+        result_dict = de(adata, **call_kwargs)
     except Exception as e:
         logger.error(f"Analysis failed: {str(e)}")
         raise
diff --git a/kompot/cli/main.py b/kompot/cli/main.py
@@ -95,8 +95,12 @@ def main():
     # Parse arguments
     args = parser.parse_args()
 
-    # Setup logging
+    # Setup logging — CLI always logs to stderr so stdout stays clean
+    # for machine-parseable output (e.g., --dry-run JSON, --table-output)
     setup_logging(args.verbose)
+    from .. import configure_logging
+
+    configure_logging(stream=sys.stderr)
 
     # If no command provided, print help
     if not args.command:
diff --git a/kompot/resource_estimation.py b/kompot/resource_estimation.py
diff --git a/tests/test_cli_compute_config.py b/tests/test_cli_compute_config.py