Skip to content

HOPSE implementation#338

Merged
gbg141 merged 667 commits into
mainfrom
hopse
Jun 1, 2026
Merged

HOPSE implementation#338
gbg141 merged 667 commits into
mainfrom
hopse

Conversation

@gbg141
Copy link
Copy Markdown
Collaborator

@gbg141 gbg141 commented May 27, 2026

HOPSE branch merge

This release merges the hopse branch into main. It adds the HOPSE model, several new datasets, a suite of feature/positional/structural encoding transforms, and a number of internal infrastructure improvements. Most user-visible behaviour is backwards compatible; see "Behaviour changes" below for the few exceptions.

Added

  • HOPSE model (Higher-Order Positional and Structural Encodings on combinatorial complexes):
    • topobench.nn.backbones.combinatorial.hopse.HOPSE and HOPSELayer.
    • topobench.nn.encoders.hopse_encoder.HOPSEFeatureEncoder, with optional SimpleAtomEncoder / SimpleBondEncoder for OGB-style molecule inputs.
    • topobench.nn.wrappers.combinatorial.hopse_wrapper.HOPSEWrapper.
    • topobench.nn.readouts.hopse.HOPSEReadout (supports both graph- and node-level tasks).
    • 8 ready-to-run experiment configs under configs/experiment/hopse_*.yaml; several reference Hydra config groups that still need to be added (tracked via xfail in test/pipeline/test_hopse_pipeline.py).
  • New dataset loaders:
    • topobench.data.loaders.graph.adme_datasets._ADMEDataset + ADMEDatasetLoader (TDC ADME family, both classification and regression splits).
    • topobench.data.loaders.graph.graph_universe_loader.GraphUniverseDatasetLoader (synthetic graphs via the graph_universe package; configs at configs/dataset/graph/graphuniverse_*.yaml).
    • MoleculeDatasetLoader._collapse_qm9_targets collapsing path for selecting individual QM9 targets.
  • New encoding / data-manipulation transforms under topobench/transforms/data_manipulations/:
    • Heat-kernel feature/structural encodings (hk_feature_encodings.HKFE, hkdiag_encodings.HKdiagSE),
    • K-hop feature encodings (khop_feature_encodings.KHopFE, precompute_khop_features),
    • Electrostatic positional encodings (electrostatic_encodings.ElectrostaticPE),
    • Random-walk / Laplacian / PPR encodings refresh,
    • HOPSE pipeline glue: hopse_ps_information.HOPSE_PE_Information, combine_hopse2cell_transform.HOPSE2CellFeatures,
    • GPSE-aware add-on (add_gpse_information),
    • Utility transforms: rename_fields.RenameFields, barycentric_subdivision.
  • 12 new Hydra resolvers in topobench.utils.config_resolvers driving HOPSE configuration: get_routes_from_neighborhoods, get_pse_dimensions, get_fes_dimensions, get_all_encoding_dimensions, check_pses_in_transforms, check_fes_in_transforms, infer_in_khop_feature_dim, infer_in_hasse_graph_agg_dim, infer_list_length, infer_list_length_plus_one, infer_topotune_num_cell_dimensions, set_preserve_edge_attr, get_list_element.
  • Multi-output classification evaluator / loss path (for tasks such as Mantra Betti numbers): TBEvaluator now understands task="multioutput classification" with per-output metric naming (e.g. accuracy-0, f1-1); DatasetLoss accepts MSE on multi-output classification.
  • Tests: end-to-end smoke tests for HOPSE / SANN experiment configs, plus unit tests for the new HOPSE core modules, transforms, dataset loaders, and resolvers.
  • CI gates: lint.yml now runs ruff format --check and a full pre-commit job (numpydoc-validation, end-of-file/trailing-whitespace, etc.); codecov.yml adds an 80% patch-coverage target and a 1% project-coverage threshold; test.yml flips fail_ci_if_error: true on the Codecov upload.

Changed

  • pyproject.toml: replaced the legacy rdkit-pypi mirror with the canonical rdkit wheel.
  • topobench/run.py:
    • cfg.transforms is now Hydra-instantiated before being passed to the PreProcessor (the preprocessor still accepts the raw config for compatibility).
    • Wandb loggers are populated with a preprocessor_time metric when preprocessing runs.
    • Trainer is now constructed with log_every_n_steps=1 (Lightning requires ≥1; previously 0, which was fragile).
    • New delete_checkpoint_after_test config flag (default off) to clean up checkpoint files at the end of a rerun.
  • topobench/data/preprocessor/preprocessor.py:
    • FileLock around the processed-data directory makes concurrent preprocessing safe.
    • tqdm progress bar over the transform pipeline.
    • New preprocessing_time attribute and propagation of split_idx_list.
    • Each transform may opt into GPU execution via a new per-transform preprocessor_device: cuda kwarg (defaults to CPU; CPU↔GPU ToDevice ops are inserted automatically and the pipeline always returns to CPU before saving).
  • HOPSEFeatureEncoder now uses the non-deprecated norm="batch_norm" argument and casts proj_dropout to float for torch_geometric.nn.models.MLP.

Behaviour changes (non-backwards-compatible)

  • PreProcessor.processed_dir no longer appends /processed when transforms are applied; the property simply returns self.root, which already encodes the full preprocessing-specific path (<data_dir>/<repo_name>/<params_hash>). Cached datasets generated by main will therefore not be re-used – delete the old preprocessed cache after upgrade.
  • CellCycleLifting now sorts the cycles it produces deterministically. The numerical incidence_2 values are unchanged but the column ordering may differ from main-branch outputs (column-order-invariant comparison is required if you cached fixtures).

Dependencies

  • Added: yacs==0.1.8, PyTDC==1.1.15, rdkit (replaces rdkit-pypi), setuptools>=69,<82 (PyTDC still imports pkg_resources which setuptools>=82 removed), filelock.
  • Python: still >=3.11, <3.12.

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

❌ Patch coverage is 94.51039% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.06%. Comparing base (5bbcb2b) to head (76c82e2).

Files with missing lines Patch % Lines
...nsforms/data_manipulations/add_gpse_information.py 84.28% 22 Missing ⚠️
topobench/data/preprocessor/preprocessor.py 89.18% 4 Missing ⚠️
topobench/data/loaders/graph/molecule_datasets.py 81.25% 3 Missing ⚠️
topobench/data/utils/utils.py 71.42% 2 Missing ⚠️
topobench/nn/backbones/simplicial/sccnn.py 0.00% 2 Missing ⚠️
...orms/data_manipulations/barycentric_subdivision.py 96.15% 2 Missing ⚠️
...h/data/loaders/simplicial/mantra_dataset_loader.py 87.50% 1 Missing ⚠️
topobench/data/utils/io_utils.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #338      +/-   ##
==========================================
- Coverage   94.18%   94.06%   -0.12%     
==========================================
  Files         185      207      +22     
  Lines        6690     8730    +2040     
==========================================
+ Hits         6301     8212    +1911     
- Misses        389      518     +129     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Coerulatus and others added 18 commits May 27, 2026 14:04
…ectDestinationFEs

- Add TestSelectDestinationFEs to test_CombinedFEs.py (forward, __call__,
  missing-key error, None x)
- Add test_PrecomputeKHopFeatures.py covering repr, max_hop decrement,
  dim-1 and dim-2 forward paths, use_initial_features, and output type
- Add TestTBEvaluatorMultioutput to test_evaluator.py: init, update/compute,
  perfect predictions, reset
- Add TestDatasetLossMultioutput to test_dataset_loss.py: init, repr,
  forward_criterion, forward with model_out dict
- Extend test_adme_loader.py with mocked TDC/smiles2graph tests covering
  classification labels, regression labels, node/edge features, and
  split-index partitioning

Co-authored-by: Cursor <cursoragent@cursor.com>
…dices

- Add test_DifferentFeatureTransforms.py covering DifferentGausFeatures,
  DifferentGausFeaturesSANN, and DifferentZeroFeaturesSANN (__init__,
  __repr__, forward, __call__, shape checks)
- Add test_KeepSelectedTargetIndices.py covering __init__, __repr__,
  forward column selection and __call__
- Extend test_CombinedPSEs.py with TestCombinedPSEsEdgeCases covering
  unsupported-encoding ValueError, CUDA fallback to CPU, and
  preprocessor_device override
- Fix __repr__ bug in DifferentGausFeaturesSANN and DifferentZeroFeaturesSANN:
  list() cannot wrap an int; use self.dimensions directly

Co-authored-by: Cursor <cursoragent@cursor.com>
@gbg141 gbg141 merged commit 21650bd into main Jun 1, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants