Skip to content

Commit 8593a34

Browse files
authored
update geneenrichment to be less strict (#1529)
### Description <!-- Provide a detailed description of the changes in this PR --> #### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run. - [ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip) - Skip all CI tests for this PR - [ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks) - Run Jupyter notebooks execution tests for bionemo2 - [ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow) - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2 - [ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all) - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2. - [ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes) - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes. Unit tests marked as `@pytest.mark.multi_gpu` or `@pytest.mark.distributed` are not run in the PR pipeline. For more details, see [CONTRIBUTING](CONTRIBUTING.md) > [!NOTE] > By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. - If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) - If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. #### Triggering Code Rabbit AI Review To trigger a code review from code rabbit, comment on a pull request with one of these commands: - @coderabbitai review - Triggers a standard review - @coderabbitai full review - Triggers a comprehensive review See https://docs.coderabbit.ai/reference/review-commands for a full list of commands. ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: jwilber <jwilber@nvidia.com>
1 parent 1faa6e8 commit 8593a34

14 files changed

Lines changed: 916 additions & 84 deletions

File tree

bionemo-recipes/interpretability/sparse_autoencoders/recipes/codonfm/codon_dashboard/src/App.jsx

Lines changed: 213 additions & 42 deletions
Large diffs are not rendered by default.

bionemo-recipes/interpretability/sparse_autoencoders/recipes/codonfm/codon_dashboard/src/EmbeddingView.jsx

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ class FeatureTooltip {
6464
}
6565
}
6666

67-
export default function EmbeddingView({ brush, categoryColumn, categoryColumns, onFeatureClick, highlightedFeatureId, viewportState, onViewportChange, labels, features, selectedCategory, darkMode }) {
67+
export default function EmbeddingView({ brush, categoryColumn, categoryColumns, onFeatureClick, highlightedFeatureId, viewportState, onViewportChange, labels, features, selectedCategory, darkMode, hiddenCategories }) {
6868
const containerRef = useRef(null)
6969
const viewRef = useRef(null)
7070
const onFeatureClickRef = useRef(onFeatureClick)
@@ -267,7 +267,8 @@ export default function EmbeddingView({ brush, categoryColumn, categoryColumns,
267267
if (!viewRef.current) return
268268

269269
let categoryColName = null
270-
let colors = Array(50).fill(DEFAULT_COLOR)
270+
const HIDDEN_COLOR = darkMode ? "#0a0a0a" : "#fafafa"
271+
let colors = Array(50).fill(HIDDEN_COLOR)
271272

272273
if (categoryColumn && categoryColumn !== "none") {
273274
const colInfo = categoryColumns?.find(c => c.name === categoryColumn)
@@ -278,6 +279,17 @@ export default function EmbeddingView({ brush, categoryColumn, categoryColumns,
278279
} else if (colInfo.type === 'string') {
279280
categoryColName = `${categoryColumn}_cat`
280281
colors = CATEGORY_COLORS.slice(0, Math.max(colInfo.nUnique, 10))
282+
// Map colors to match DENSE_RANK order, dim non-selected when filtering
283+
if (hiddenCategories && hiddenCategories.size > 0 && features) {
284+
const allCatNames = [...new Set(
285+
features.map(f => f[categoryColumn]).filter(v => v != null)
286+
)].sort()
287+
colors = colors.map((c, i) => {
288+
const name = allCatNames[i]
289+
if (!name) return c
290+
return !hiddenCategories.has(name) ? HIDDEN_COLOR : c
291+
})
292+
}
281293
} else {
282294
categoryColName = categoryColumn
283295
colors = CATEGORY_COLORS.slice(0, Math.max(colInfo.nUnique, 10))
@@ -291,7 +303,7 @@ export default function EmbeddingView({ brush, categoryColumn, categoryColumns,
291303
selection: null,
292304
tooltip: null,
293305
})
294-
}, [categoryColumn, categoryColumns])
306+
}, [categoryColumn, categoryColumns, hiddenCategories])
295307

296308
// Handle resize
297309
useEffect(() => {

bionemo-recipes/interpretability/sparse_autoencoders/recipes/codonfm/codon_dashboard/src/FeatureCard.jsx

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -349,6 +349,24 @@ const FeatureCard = forwardRef(function FeatureCard({ feature, isHighlighted, fo
349349
lines.push('')
350350
}
351351

352+
// GSEA enrichment section
353+
const gseaCsvFields = [
354+
{ key: 'gsea_overall_best', label: 'GSEA Overall Best' },
355+
{ key: 'gsea_GO_Biological_Process', label: 'GSEA GO Biological Process' },
356+
{ key: 'gsea_GO_Molecular_Function', label: 'GSEA GO Molecular Function' },
357+
{ key: 'gsea_GO_Cellular_Component', label: 'GSEA GO Cellular Component' },
358+
{ key: 'gsea_InterPro_Domains', label: 'GSEA InterPro Domains' },
359+
{ key: 'gsea_GO_Slim', label: 'GSEA GO Slim' },
360+
]
361+
const gseaLines = gseaCsvFields
362+
.filter(({ key }) => feature[key] && feature[key] !== 'unlabeled')
363+
.map(({ key, label }) => `${label},${feature[key]}`)
364+
if (gseaLines.length > 0) {
365+
lines.push('=== GSEA ENRICHMENT ===')
366+
gseaLines.forEach(l => lines.push(l))
367+
lines.push('')
368+
}
369+
352370
// Examples section
353371
if (examples && examples.length > 0) {
354372
lines.push('=== ACTIVATION EXAMPLES ===')
@@ -657,6 +675,26 @@ const FeatureCard = forwardRef(function FeatureCard({ feature, isHighlighted, fo
657675
if (ann.cpg) tags.push({ label: `CpG enriched`, color: '#fce4ec' })
658676
if (ann.position) tags.push({ label: `N-terminal`, color: '#e8f5e9' })
659677

678+
// GSEA enrichment tags
679+
const gseaFields = [
680+
{ key: 'gsea_GO_Biological_Process', prefix: 'GO:BP', color: '#e8eaf6' },
681+
{ key: 'gsea_GO_Molecular_Function', prefix: 'GO:MF', color: '#ede7f6' },
682+
{ key: 'gsea_GO_Cellular_Component', prefix: 'GO:CC', color: '#e0f2f1' },
683+
{ key: 'gsea_InterPro_Domains', prefix: 'InterPro', color: '#fff8e1' },
684+
{ key: 'gsea_GO_Slim', prefix: 'GO Slim', color: '#f1f8e9' },
685+
]
686+
for (const { key, prefix, color } of gseaFields) {
687+
const val = feature[key]
688+
if (val && val !== 'unlabeled' && val !== 'other') {
689+
tags.push({ label: `${prefix}: ${val}`, color })
690+
}
691+
}
692+
693+
// Codon optimality metrics from annotations
694+
if (ann.cai != null) tags.push({ label: `CAI: ${ann.cai.toFixed(3)}`, color: '#e0f7fa' })
695+
if (ann.tai != null) tags.push({ label: `tAI: ${ann.tai.toFixed(3)}`, color: '#e0f7fa' })
696+
if (ann.rscu != null) tags.push({ label: `RSCU: ${ann.rscu.toFixed(2)}`, color: '#e0f7fa' })
697+
660698
if (tags.length === 0) return null
661699
return (
662700
<div style={{ display: 'flex', flexWrap: 'wrap', gap: '4px', marginBottom: '10px' }}>

bionemo-recipes/interpretability/sparse_autoencoders/recipes/codonfm/codon_dashboard/src/FeatureDetailPage.jsx

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,53 @@ export default function FeatureDetailPage({ feature, examples, vocabLogits, feat
369369
<VocabLogitChart logits={logits} />
370370
</div>
371371

372+
{/* Gene-Level GSEA Enrichment */}
373+
{(() => {
374+
const gseaFields = [
375+
{ key: 'gsea_GO_Biological_Process', label: 'GO Biological Process' },
376+
{ key: 'gsea_GO_Molecular_Function', label: 'GO Molecular Function' },
377+
{ key: 'gsea_GO_Cellular_Component', label: 'GO Cellular Component' },
378+
{ key: 'gsea_InterPro_Domains', label: 'InterPro Domains' },
379+
{ key: 'gsea_GO_Slim', label: 'GO Slim' },
380+
]
381+
const gseaEntries = gseaFields
382+
.map(({ key, label }) => ({ label, value: feature[key] }))
383+
.filter(e => e.value && e.value !== 'unlabeled' && e.value !== 'other')
384+
const overallBest = feature.gsea_overall_best
385+
if (gseaEntries.length === 0 && (!overallBest || overallBest === 'unlabeled')) return null
386+
return (
387+
<div style={styles.section}>
388+
<div style={styles.sectionTitle}>Gene-Level Enrichment (GSEA)</div>
389+
<div style={styles.sectionSubtitle}>
390+
Genes ranked by activation strength, tested against GO and InterPro databases.
391+
</div>
392+
{overallBest && overallBest !== 'unlabeled' && (
393+
<div style={{
394+
padding: '8px 12px', marginBottom: '8px', borderRadius: '6px',
395+
background: 'var(--bg-card-expanded)', border: '1px solid var(--accent)',
396+
fontSize: '13px', fontWeight: '600', color: 'var(--text-heading)',
397+
}}>
398+
Best: {overallBest}
399+
</div>
400+
)}
401+
<div style={{ display: 'grid', gridTemplateColumns: '1fr 1fr', gap: '6px' }}>
402+
{gseaEntries.map(({ label, value }) => (
403+
<div key={label} style={{
404+
padding: '6px 10px', borderRadius: '4px',
405+
background: 'var(--bg-card)', border: '1px solid var(--border-card)',
406+
fontSize: '11px',
407+
}}>
408+
<div style={{ color: 'var(--text-muted)', fontSize: '9px', fontWeight: '600', marginBottom: '2px' }}>
409+
{label}
410+
</div>
411+
<div style={{ color: 'var(--text-primary)' }}>{value}</div>
412+
</div>
413+
))}
414+
</div>
415+
</div>
416+
)
417+
})()}
418+
372419
{/* Codon Annotations */}
373420
<div style={styles.section}>
374421
<div style={styles.sectionTitle}>Codon-Level Annotations</div>

bionemo-recipes/interpretability/sparse_autoencoders/recipes/codonfm/run.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,14 @@ def run_train(cfg: DictConfig, cache_dir: Path, output_dir: Path) -> None: # no
133133
cmd.append("--normalize-input")
134134
if t.get("max_grad_norm"):
135135
cmd.extend(["--max-grad-norm", str(t.max_grad_norm)])
136+
if t.get("lr_schedule", "constant") != "constant":
137+
cmd.extend(["--lr-schedule", str(t.lr_schedule)])
138+
if t.get("lr_min", 0.0) != 0.0:
139+
cmd.extend(["--lr-min", str(t.lr_min)])
140+
if t.get("lr_decay_steps"):
141+
cmd.extend(["--lr-decay-steps", str(t.lr_decay_steps)])
142+
if t.get("warmup_steps", 0) > 0:
143+
cmd.extend(["--warmup-steps", str(t.warmup_steps)])
136144

137145
if t.wandb_enabled:
138146
cmd.append("--wandb")

bionemo-recipes/interpretability/sparse_autoencoders/recipes/codonfm/run_configs/config.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,10 @@ train:
4242
wandb_enabled: false
4343
wandb_project: sae_codonfm_recipe
4444
max_grad_norm: null
45+
lr_schedule: constant
46+
lr_min: 0.0
47+
lr_decay_steps: null
48+
warmup_steps: 0
4549

4650
# Eval
4751
eval:

0 commit comments

Comments
 (0)