You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Remove duplicate [Research & Improvement Roadmap] link from the
Theory & Interpretability section; the single canonical reference
is in the Development Roadmap section.
- Fold the standalone "Future Directions & Clinical Collaborations"
section into the Development Roadmap's Longer-term bullet list and
attach the collaborator callout there, eliminating the separate
section and the content overlap.
docs/SC_BEST_PRACTICES.md:
- Simplify the two-point numbered intro to a single sentence.
- Collapse six functional ### category headers into three groups
(Vocabulary & Preprocessing / Training & Supervision /
Evaluation, Scale & Tooling) so items of related priority sit
together instead of being scattered across thin categories.
- Remove the #### "Architectural direction" sub-sub-header; promote
it to a bold inline lead paragraph so it reads as context rather
than a nested section.
- Remove the #### "Coverage trade-offs" sub-sub-header; replace it with
a plain lead sentence before the table.
- Update items 7–14 and the quick-reference table to reflect the
decoupleR + PROGENy reframing agreed in the March 2026 research
session (pathway activity as primary task, gene expression as
secondary head).
Signed-off-by: BenjaminIsaac0111 <12176376+BenjaminIsaac0111@users.noreply.github.com>
Copy file name to clipboardExpand all lines: README.md
+22-4Lines changed: 22 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -113,7 +113,6 @@ Visualization plots and spatial expression maps will be saved to the `./results`
113
113
-**[Pathway Mapping](docs/PATHWAY_MAPPING.md)**: Clinical interpretability, pathway bottleneck design, and MSigDB integration.
114
114
-**[Gene Analysis](docs/GENE_ANALYSIS.md)**: Modeling strategies for mapping morphology to high-dimensional gene spaces.
115
115
-**[Data Structure](docs/DATA_STRUCTURE.md)**: Detailed breakdown of the HEST data structure on disk, metadata conventions, and preprocessing invariants.
116
-
-**[Single-cell Best Practices](docs/SC_BEST_PRACTICES.md)**: Gap analysis and roadmap for alignment with standard recommendations.
117
116
118
117
## Development
119
118
@@ -124,12 +123,31 @@ Visualization plots and spatial expression maps will be saved to the `./results`
124
123
.\test.ps1
125
124
```
126
125
127
-
## Future Directions & Clinical Collaborations
126
+
## Development Roadmap
128
127
129
-
A major future direction for **SpatialTranscriptFormer** is to integrate this architecture into an **end-to-end pipeline for patient risk assessment** and prognosis tracking. By leveraging the model's predicted expression and pathway activations, we aim to build a downstream risk prediction module that allows users to directly evaluate how spatially-resolved expression relates to patient survival.
128
+
Active research and development is tracked in the **[Research & Improvement Roadmap](docs/SC_BEST_PRACTICES.md)**. Key directions are summarised below.
129
+
130
+
### Near-term
131
+
132
+
-**Vocabulary quality** — mitochondrial gene filtering (`MT-*` exclusion) and a rebuild of the gene vocabulary using SVG-weighted ranking (Moran's I), ensuring training targets are spatially informative rather than dominated by housekeeping genes.
133
+
-**Moran's I weighted loss** — weight each gene's contribution to the training loss by its spatial variability score, so that the gradient is driven by spatially coherent genes rather than high-expression noise.
134
+
135
+
### Medium-term: Architectural Reframing
136
+
137
+
The current model predicts ~1000 individual gene expression values as its primary task, with pathway activity as a secondary interpretability output. Based on a review of the ST literature and the [Saezlab ecosystem](https://saezlab.org) (PROGENy, decoupleR, LIANA+), we are shifting toward:
138
+
139
+
-**Pathway activity as the primary prediction target.** Spatial pathway activity maps pre-computed offline via [decoupleR](https://decoupler-py.readthedocs.io) + [PROGENy](https://saezlab.github.io/progeny/) are spatially cleaner, clinically interpretable, and directly supervised — avoiding the circular regularisation issue of the current `AuxiliaryPathwayLoss`.
140
+
-**Gene expression as a secondary imputation head**, weighted by Moran's I.
141
+
-**Pluggable prior knowledge.** The offline preprocessing step accepts any biological network (PROGENy signalling pathways, MSigDB Hallmarks, LIANA+ ligand-receptor pairs, CollecTRI TF regulons) without changing the model architecture.
142
+
143
+
### Longer-term
144
+
145
+
- Evaluation on the 2025 Nat. Comms. benchmark suite (11 methods, 28 metrics, 5 datasets).
146
+
- Support for higher-resolution platforms (Visium HD, Xenium) — architecturally trivial, blocked only by data availability.
147
+
-**Clinical integration** — using predicted spatial pathway activation maps as features for patient risk assessment and prognosis tracking in an end-to-end pipeline.
130
148
131
149
> [!NOTE]
132
-
> **Call for Collaborators:** Rigorous risk assessment models require vast datasets of clinical metadata and survival outcomes, which we currently lack access to. We are open to investigating *any* disease of interest! If you have access to large clinical cohorts and are interested in exploring how spatial pathway activation correlates with patient prognosis, we would love to partner with you.
150
+
> **Call for Collaborators:** Rigorous risk assessment models require large clinical cohorts with spatial transcriptomics and survival outcomes, which we currently lack access to. We are open to investigating *any* disease of interest. If you have access to such cohorts and are interested in exploring how spatially-resolved pathway activation correlates with patient prognosis, we would love to partner with you.
0 commit comments