Skip to content

Commit 86f11a5

Browse files
README.md:
- Remove duplicate [Research & Improvement Roadmap] link from the Theory & Interpretability section; the single canonical reference is in the Development Roadmap section. - Fold the standalone "Future Directions & Clinical Collaborations" section into the Development Roadmap's Longer-term bullet list and attach the collaborator callout there, eliminating the separate section and the content overlap. docs/SC_BEST_PRACTICES.md: - Simplify the two-point numbered intro to a single sentence. - Collapse six functional ### category headers into three groups (Vocabulary & Preprocessing / Training & Supervision / Evaluation, Scale & Tooling) so items of related priority sit together instead of being scattered across thin categories. - Remove the #### "Architectural direction" sub-sub-header; promote it to a bold inline lead paragraph so it reads as context rather than a nested section. - Remove the #### "Coverage trade-offs" sub-sub-header; replace it with a plain lead sentence before the table. - Update items 7–14 and the quick-reference table to reflect the decoupleR + PROGENy reframing agreed in the March 2026 research session (pathway activity as primary task, gene expression as secondary head). Signed-off-by: BenjaminIsaac0111 <12176376+BenjaminIsaac0111@users.noreply.github.com>
1 parent 472bd5f commit 86f11a5

2 files changed

Lines changed: 383 additions & 39 deletions

File tree

README.md

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,6 @@ Visualization plots and spatial expression maps will be saved to the `./results`
113113
- **[Pathway Mapping](docs/PATHWAY_MAPPING.md)**: Clinical interpretability, pathway bottleneck design, and MSigDB integration.
114114
- **[Gene Analysis](docs/GENE_ANALYSIS.md)**: Modeling strategies for mapping morphology to high-dimensional gene spaces.
115115
- **[Data Structure](docs/DATA_STRUCTURE.md)**: Detailed breakdown of the HEST data structure on disk, metadata conventions, and preprocessing invariants.
116-
- **[Single-cell Best Practices](docs/SC_BEST_PRACTICES.md)**: Gap analysis and roadmap for alignment with standard recommendations.
117116

118117
## Development
119118

@@ -124,12 +123,31 @@ Visualization plots and spatial expression maps will be saved to the `./results`
124123
.\test.ps1
125124
```
126125

127-
## Future Directions & Clinical Collaborations
126+
## Development Roadmap
128127

129-
A major future direction for **SpatialTranscriptFormer** is to integrate this architecture into an **end-to-end pipeline for patient risk assessment** and prognosis tracking. By leveraging the model's predicted expression and pathway activations, we aim to build a downstream risk prediction module that allows users to directly evaluate how spatially-resolved expression relates to patient survival.
128+
Active research and development is tracked in the **[Research & Improvement Roadmap](docs/SC_BEST_PRACTICES.md)**. Key directions are summarised below.
129+
130+
### Near-term
131+
132+
- **Vocabulary quality** — mitochondrial gene filtering (`MT-*` exclusion) and a rebuild of the gene vocabulary using SVG-weighted ranking (Moran's I), ensuring training targets are spatially informative rather than dominated by housekeeping genes.
133+
- **Moran's I weighted loss** — weight each gene's contribution to the training loss by its spatial variability score, so that the gradient is driven by spatially coherent genes rather than high-expression noise.
134+
135+
### Medium-term: Architectural Reframing
136+
137+
The current model predicts ~1000 individual gene expression values as its primary task, with pathway activity as a secondary interpretability output. Based on a review of the ST literature and the [Saezlab ecosystem](https://saezlab.org) (PROGENy, decoupleR, LIANA+), we are shifting toward:
138+
139+
- **Pathway activity as the primary prediction target.** Spatial pathway activity maps pre-computed offline via [decoupleR](https://decoupler-py.readthedocs.io) + [PROGENy](https://saezlab.github.io/progeny/) are spatially cleaner, clinically interpretable, and directly supervised — avoiding the circular regularisation issue of the current `AuxiliaryPathwayLoss`.
140+
- **Gene expression as a secondary imputation head**, weighted by Moran's I.
141+
- **Pluggable prior knowledge.** The offline preprocessing step accepts any biological network (PROGENy signalling pathways, MSigDB Hallmarks, LIANA+ ligand-receptor pairs, CollecTRI TF regulons) without changing the model architecture.
142+
143+
### Longer-term
144+
145+
- Evaluation on the 2025 Nat. Comms. benchmark suite (11 methods, 28 metrics, 5 datasets).
146+
- Support for higher-resolution platforms (Visium HD, Xenium) — architecturally trivial, blocked only by data availability.
147+
- **Clinical integration** — using predicted spatial pathway activation maps as features for patient risk assessment and prognosis tracking in an end-to-end pipeline.
130148

131149
> [!NOTE]
132-
> **Call for Collaborators:** Rigorous risk assessment models require vast datasets of clinical metadata and survival outcomes, which we currently lack access to. We are open to investigating *any* disease of interest! If you have access to large clinical cohorts and are interested in exploring how spatial pathway activation correlates with patient prognosis, we would love to partner with you.
150+
> **Call for Collaborators:** Rigorous risk assessment models require large clinical cohorts with spatial transcriptomics and survival outcomes, which we currently lack access to. We are open to investigating *any* disease of interest. If you have access to such cohorts and are interested in exploring how spatially-resolved pathway activation correlates with patient prognosis, we would love to partner with you.
133151
134152
## Contributing
135153

0 commit comments

Comments
 (0)