HOPSE implementation by gbg141 · Pull Request #338 · geometric-intelligence/topobench

gbg141 · 2026-05-27T01:20:45Z

HOPSE branch merge

This release merges the hopse branch into main. It adds the HOPSE model, several new datasets, a suite of feature/positional/structural encoding transforms, and a number of internal infrastructure improvements. Most user-visible behaviour is backwards compatible; see "Behaviour changes" below for the few exceptions.

Added

HOPSE model (Higher-Order Positional and Structural Encodings on combinatorial complexes):
- topobench.nn.backbones.combinatorial.hopse.HOPSE and HOPSELayer.
- topobench.nn.encoders.hopse_encoder.HOPSEFeatureEncoder, with optional SimpleAtomEncoder / SimpleBondEncoder for OGB-style molecule inputs.
- topobench.nn.wrappers.combinatorial.hopse_wrapper.HOPSEWrapper.
- topobench.nn.readouts.hopse.HOPSEReadout (supports both graph- and node-level tasks).
- 8 ready-to-run experiment configs under configs/experiment/hopse_*.yaml; several reference Hydra config groups that still need to be added (tracked via xfail in test/pipeline/test_hopse_pipeline.py).
New dataset loaders:
- topobench.data.loaders.graph.adme_datasets._ADMEDataset + ADMEDatasetLoader (TDC ADME family, both classification and regression splits).
- topobench.data.loaders.graph.graph_universe_loader.GraphUniverseDatasetLoader (synthetic graphs via the graph_universe package; configs at configs/dataset/graph/graphuniverse_*.yaml).
- MoleculeDatasetLoader._collapse_qm9_targets collapsing path for selecting individual QM9 targets.
New encoding / data-manipulation transforms under topobench/transforms/data_manipulations/:
- Heat-kernel feature/structural encodings (hk_feature_encodings.HKFE, hkdiag_encodings.HKdiagSE),
- K-hop feature encodings (khop_feature_encodings.KHopFE, precompute_khop_features),
- Electrostatic positional encodings (electrostatic_encodings.ElectrostaticPE),
- Random-walk / Laplacian / PPR encodings refresh,
- HOPSE pipeline glue: hopse_ps_information.HOPSE_PE_Information, combine_hopse2cell_transform.HOPSE2CellFeatures,
- GPSE-aware add-on (add_gpse_information),
- Utility transforms: rename_fields.RenameFields, barycentric_subdivision.
12 new Hydra resolvers in topobench.utils.config_resolvers driving HOPSE configuration: get_routes_from_neighborhoods, get_pse_dimensions, get_fes_dimensions, get_all_encoding_dimensions, check_pses_in_transforms, check_fes_in_transforms, infer_in_khop_feature_dim, infer_in_hasse_graph_agg_dim, infer_list_length, infer_list_length_plus_one, infer_topotune_num_cell_dimensions, set_preserve_edge_attr, get_list_element.
Multi-output classification evaluator / loss path (for tasks such as Mantra Betti numbers): TBEvaluator now understands task="multioutput classification" with per-output metric naming (e.g. accuracy-0, f1-1); DatasetLoss accepts MSE on multi-output classification.
Tests: end-to-end smoke tests for HOPSE / SANN experiment configs, plus unit tests for the new HOPSE core modules, transforms, dataset loaders, and resolvers.
CI gates: lint.yml now runs ruff format --check and a full pre-commit job (numpydoc-validation, end-of-file/trailing-whitespace, etc.); codecov.yml adds an 80% patch-coverage target and a 1% project-coverage threshold; test.yml flips fail_ci_if_error: true on the Codecov upload.

Changed

pyproject.toml: replaced the legacy rdkit-pypi mirror with the canonical rdkit wheel.
topobench/run.py:
- cfg.transforms is now Hydra-instantiated before being passed to the PreProcessor (the preprocessor still accepts the raw config for compatibility).
- Wandb loggers are populated with a preprocessor_time metric when preprocessing runs.
- Trainer is now constructed with log_every_n_steps=1 (Lightning requires ≥1; previously 0, which was fragile).
- New delete_checkpoint_after_test config flag (default off) to clean up checkpoint files at the end of a rerun.
topobench/data/preprocessor/preprocessor.py:
- FileLock around the processed-data directory makes concurrent preprocessing safe.
- tqdm progress bar over the transform pipeline.
- New preprocessing_time attribute and propagation of split_idx_list.
- Each transform may opt into GPU execution via a new per-transform preprocessor_device: cuda kwarg (defaults to CPU; CPU↔GPU ToDevice ops are inserted automatically and the pipeline always returns to CPU before saving).
HOPSEFeatureEncoder now uses the non-deprecated norm="batch_norm" argument and casts proj_dropout to float for torch_geometric.nn.models.MLP.

Behaviour changes (non-backwards-compatible)

PreProcessor.processed_dir no longer appends /processed when transforms are applied; the property simply returns self.root, which already encodes the full preprocessing-specific path (<data_dir>/<repo_name>/<params_hash>). Cached datasets generated by main will therefore not be re-used – delete the old preprocessed cache after upgrade.
CellCycleLifting now sorts the cycles it produces deterministically. The numerical incidence_2 values are unchanged but the column ordering may differ from main-branch outputs (column-order-invariant comparison is required if you cached fixtures).

Dependencies

Added: yacs==0.1.8, PyTDC==1.1.15, rdkit (replaces rdkit-pypi), setuptools>=69,<82 (PyTDC still imports pkg_resources which setuptools>=82 removed), filelock.
Python: still >=3.11, <3.12.

…Benchmark into gpse

…Bench into gpse

…Benchmark into gpse

…nce/TopoBench into hopse_sparse

…nce/TopoBench into hopse

…Bench into hopse

review-notebook-app · 2026-05-27T01:20:53Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2026-05-27T19:20:52Z

Codecov Report

❌ Patch coverage is 94.51039% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.06%. Comparing base (5bbcb2b) to head (76c82e2).

Files with missing lines	Patch %	Lines
...nsforms/data_manipulations/add_gpse_information.py	84.28%	22 Missing ⚠️
topobench/data/preprocessor/preprocessor.py	89.18%	4 Missing ⚠️
topobench/data/loaders/graph/molecule_datasets.py	81.25%	3 Missing ⚠️
topobench/data/utils/utils.py	71.42%	2 Missing ⚠️
topobench/nn/backbones/simplicial/sccnn.py	0.00%	2 Missing ⚠️
...orms/data_manipulations/barycentric_subdivision.py	96.15%	2 Missing ⚠️
...h/data/loaders/simplicial/mantra_dataset_loader.py	87.50%	1 Missing ⚠️
topobench/data/utils/io_utils.py	85.71%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #338      +/-   ##
==========================================
- Coverage   94.18%   94.06%   -0.12%     
==========================================
  Files         185      207      +22     
  Lines        6690     8730    +2040     
==========================================
+ Hits         6301     8212    +1911     
- Misses        389      518     +129

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ectDestinationFEs - Add TestSelectDestinationFEs to test_CombinedFEs.py (forward, __call__, missing-key error, None x) - Add test_PrecomputeKHopFeatures.py covering repr, max_hop decrement, dim-1 and dim-2 forward paths, use_initial_features, and output type - Add TestTBEvaluatorMultioutput to test_evaluator.py: init, update/compute, perfect predictions, reset - Add TestDatasetLossMultioutput to test_dataset_loss.py: init, repr, forward_criterion, forward with model_out dict - Extend test_adme_loader.py with mocked TDC/smiles2graph tests covering classification labels, regression labels, node/edge features, and split-index partitioning Co-authored-by: Cursor <cursoragent@cursor.com>

…dices - Add test_DifferentFeatureTransforms.py covering DifferentGausFeatures, DifferentGausFeaturesSANN, and DifferentZeroFeaturesSANN (__init__, __repr__, forward, __call__, shape checks) - Add test_KeepSelectedTargetIndices.py covering __init__, __repr__, forward column selection and __call__ - Extend test_CombinedPSEs.py with TestCombinedPSEsEdgeCases covering unsupported-encoding ValueError, CUDA fallback to CPU, and preprocessor_device override - Fix __repr__ bug in DifferentGausFeaturesSANN and DifferentZeroFeaturesSANN: list() cannot wrap an int; use self.dimensions directly Co-authored-by: Cursor <cursoragent@cursor.com>

…oBench into hopse

martin-carrasco and others added 30 commits July 26, 2025 23:57

FIX: activation before LN

de96c2f

FIX: activation before LN

352565b

mantra scripts

0e0aae9

Merge branch 'gpse' of https://github.com/geometric-intelligence/Topo…

b2e4272

…Benchmark into gpse

scripts

1790404

Fix bugs

fd7a6c8

Merge branch 'gpse' of https://github.com/geometric-intelligence/Topo…

664a745

…Bench into gpse

Update env_setup.sh

99a6a0c

ADD: All the processing files

7516876

FIX: Resort to previous encoder

0b99aeb

ADD: Scripts for ablation on HOPSE_M PEs

0fe9713

FIX: Error with script naming

8d4a29d

run_cell

4c5d9d4

Merge branch 'gpse' of https://github.com/geometric-intelligence/Topo…

90ef85d

…Benchmark into gpse

ADD: Kfold verified

c9a0b18

ADD: processing

34ff943

fix

eab8ce3

Merge branch 'gpse' of https://github.com/geometric-intelligence/Topo…

3665cb8

…Benchmark into gpse

simplicial

22cabd6

Adapt scripts to Bobby

92c284f

Ablation MANTRA

cca4497

updated scripts

ffa1a56

FIX: bug with LapPE

b060fe9

Merge branch 'gpse' of https://github.com/geometric-intelligence/Topo…

b9185aa

…Benchmark into gpse

FIX: hopse one-off index and molhiv

a510da7

FIX: test simplicial

8fb2a4f

ADD: Scripts for molhiv

2f8ee00

FIX: GIN MOLHIV

9d01488

updated scripts

f92227a

Merge branch 'gpse' of https://github.com/geometric-intelligence/Topo…

f4720d0

…Benchmark into gpse

LouisVanLangendonck and others added 6 commits May 6, 2026 13:50

Update re-run to include all

565b399

Merge branch 'hopse_sparse' of https://github.com/geometric-intellige…

a5b1112

…nce/TopoBench into hopse_sparse

Merge branch 'hopse_sparse' of https://github.com/geometric-intellige…

2c67bcd

…nce/TopoBench into hopse

Merge branch 'main' of https://github.com/geometric-intelligence/Topo…

fa715b0

…Bench into hopse

Fix test_preprocesor

e38c614

Add tests and polish code

b142324

Coerulatus added 3 commits May 27, 2026 11:20

ruff formatted with updated lib

b0aaa95

fixed test_pipeline and test_cycle_lifting, removed temporary tutorials

f051bb5

readded ruff ignore up038

b8c0dc3

Coerulatus and others added 18 commits May 27, 2026 14:04

added testing debug=True

3a338de

gpu compatibility and removed test warning (too much memory)

4a731c9

test gpseinformation

f3bc1a5

coverage

88cc4d1

check transforms type

5a5ee73

removed deprecated TBModelT

4fa9f90

removed unused code

f516edd

coverage

5cdd2b3

linting

9c9c57b

default values for num_classes is now 1

0c00d80

Refactor scripts folder

d709f7c

Cleaning

76b76c0

Reproducibility

dd1dbae

Merge branch 'main' into hopse

e9e7612

Remove hardcoded wandb entity in scripts

f79381b

Merge branch 'hopse' of https://github.com/geometric-intelligence/Top…

76c82e2

…oBench into hopse

gbg141 merged commit 21650bd into main Jun 1, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HOPSE implementation#338

HOPSE implementation#338
gbg141 merged 667 commits into
mainfrom
hopse

gbg141 commented May 27, 2026

Uh oh!

review-notebook-app Bot commented May 27, 2026

Uh oh!

codecov Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

gbg141 commented May 27, 2026

HOPSE branch merge

Added

Changed

Behaviour changes (non-backwards-compatible)

Dependencies

Uh oh!

review-notebook-app Bot commented May 27, 2026

Uh oh!

codecov Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

codecov Bot commented May 27, 2026 •

edited

Loading