-
Notifications
You must be signed in to change notification settings - Fork 30
Expand file tree
/
Copy pathinterpretation.qmd
More file actions
474 lines (360 loc) · 22.7 KB
/
interpretation.qmd
File metadata and controls
474 lines (360 loc) · 22.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
# AI-Assisted Biological Interpretation
Traditional enrichment analysis typically results in a list of significant pathways or GO terms. While statistically sound, these lists often leave researchers asking, "**So what?**" What is the underlying biological mechanism? Who are the key drivers? Is this a pro-survival or pro-death signal?
To bridge the gap between statistical results and biological insights, `clusterProfiler` introduces an AI-powered interpretation module. By leveraging Large Language Models (LLMs) and a multi-agent system, `clusterProfiler` can now act as a virtual bioinformatician, converting dry enrichment lists into coherent, evidence-based biological narratives.
## The `interpret` Function
The core function for this feature is `interpret()`. It accepts enrichment results (e.g., from `enrichKEGG`, `enrichGO`, or `compareCluster`) and uses an LLM to generate a structured report.
To use this feature, you need to configure an API key for a provider supported by `aisdk`. Model selection is now delegated to `aisdk`: if `model` is not specified, `interpret()` uses the current default model managed by `aisdk::get_model()` / `aisdk::set_model()`, while an explicit `model = ...` argument overrides that default for a single call. In practice, this usually means loading credentials from your `.env` file or setting the relevant environment variables before calling `set_model()`. For provider configuration details, see the `aisdk` documentation.
```{r}
#| eval: false
library(aisdk)
library(dotenv)
# Load API credentials from .env (or configure them in your environment)
dotenv::load_dot_env()
# Optional: set a session-wide default model after credentials are available
set_model("stepfun:step-3.5-flash")
# Check the current default model
get_model()
```
```{r}
#| label: interpret-basic
#| eval: false
library(clusterProfiler)
# Basic usage
# 'edo' is your enrichment result object
res <- interpret(edo)
print(res)
```
### Tasks and Inputs
`interpret()` is not just for explaining enrichment results. It breaks down LLM capabilities into three distinct tasks:
+ **`task = "interpretation"`**: (Default) Converts enrichment results into a mechanistic narrative suitable for publication (What -> So What).
+ **`task = "annotation"`**: Performs cell type annotation for single-cell clusters using both marker genes and enrichment terms as evidence.
+ **`task = "phenotyping"`**: Assigns a "state/phenotype label" to a group (e.g., "Pro-inflammatory" or "Senescent-like").
To strengthen the evidence, `interpret()` supports "evidence synthesis" from multiple sources:
+ **Single Object**: `enrichResult`, `gseaResult`, or `compareClusterResult`.
+ **Multiple Objects**: A `list()` of results (e.g., `list(kegg_res, go_res)` or `list(cellmarker_res, go_res)`).
+ **Batch Processing**: If the input is a `compareCluster` result, it automatically splits by cluster and generates a report for each.
The key features of `interpret()` include:
+ **Prompt Skeleton**: A fixed structure to guide the LLM.
+ **Structured Output**: Enforced structure for parsing, comparison, and batch processing.
+ **Reasoning First**: Encourages "deduction before writing" to avoid merely listing pathway names.
### Cell Type Annotation
For example, we can use `Seurat` to identify marker genes for each cluster in a single-cell RNA-seq dataset. Then we can use `compareCluster` to perform enrichment analysis for each cluster. Finally, we can use `interpret` to annotate cell types based on the enrichment results and marker genes.
```{r}
#| eval: false
#| label: interpret-annotation
library(Seurat)
dir = "data/filtered_gene_bc_matrices/hg19"
pbmc.data <- Read10X(data.dir = dir)
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k",
min.cells=3, min.features=200)
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- subset(pbmc,
subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5
)
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize",
scale.factor = 10000)
pbmc <- ScaleData(pbmc)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst",
nfeatures = 2000)
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
pbmc <- RunUMAP(pbmc, dims = 1:10)
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.5)
pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE)
libray(dplyr)
topN_marker <- function(markers, n) {
markers %>%
group_by(cluster) %>%
dplyr::filter(avg_log2FC > 1) %>%
slice_head(n = n) %>%
ungroup()
}
top20 <- topN_marker(pbmc.markers, 20)
# downloaded from: http://www.bio-bigdata.center/CellMarker_download_files/file/Cell_marker_Human.xlsx
cm <- rio::import("Cell_marker_Human.xlsx")
x <- compareCluster(gene~cluster, data=top10, fun=enricher, TERM2GENE=cm[,c("cell_name", "marker")])
y <- interpret(x, task="annotation")
```
The output `y` is a list of interpretation results, one for each cluster. We can extract the inferred cell types.
```
> sapply(y, \(x) x$cell_type)
0
"Naive T Cell"
1
"Classical Monocyte"
2
"CD4+ T cell"
3
"Follicular B cell"
4
"CD8+ Cytotoxic T Cell"
5
"CD16+ monocyte (Non-classical monocyte)"
6
"Natural Killer (NK) cell"
7
"Plasmacytoid Dendritic Cell (pDC)"
8
"Megakaryocyte"
```
This result is highly consistent with the manual annotation from the [Seurat pbmc3k tutorial](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#assigning-cell-type-identity-to-clusters):
| Cluster ID | Markers | Cell Type |
| :--- | :--- | :--- |
| 0 | IL7R, CCR7 | Naive CD4+ T |
| 1 | CD14, LYZ | CD14+ Mono |
| 2 | IL7R, S100A4 | Memory CD4+ |
| 3 | MS4A1 | B |
| 4 | CD8A | CD8+ T |
| 5 | FCGR3A, MS4A7 | FCGR3A+ Mono |
| 6 | GNLY, NKG7 | NK |
| 7 | FCER1A, CST3 | DC |
| 8 | PPBP | Platelet |
The full report provides detailed reasoning, confidence levels, and supporting evidence (markers/pathways) for each cluster assignment, offering transparency and explainability that simple label transfer methods lack.
```{r}
#| eval: false
#| label: print-annotation-result
print(y)
```
::: {.callout-note icon=false}
# Enrichment Interpretation / Annotation Report {.unnumbered .unlisted}
## Cell Type Annotation {.unnumbered .unlisted}
### Cluster: 0 {.unnumbered .unlisted}
**Cell Type:** Naive T Cell
**Confidence:** High
**Reasoning:**
The cluster is definitively identified as a naive T cell based on the co-expression of canonical pan-T cell markers (CD3D, CD3E) and the master regulator of naive T cell identity, TCF7 (cited in top terms: Naive CD8+ T cell, Naive CD4+ T cell, etc.). The high expression of CCR7, a critical homing receptor for naive and central memory T cells, and LEF1, another Wnt-pathway TF co-operating with TCF7, further solidifies this identity. The enrichment list is dominated by naive and central memory T cell subtypes, with effector/cytotoxic terms ranking lower and lacking their specific markers (e.g., GZMB, PRF1). The presence of both CD4+ and CD8+ associated terms suggests a mixed population or a shared naive state before lineage commitment, but the core identity is naive T cell.
**Supporting Markers/Pathways:**
- CD3D
- CD3E
- TCF7
- CCR7
- LEF1
- NOSIP
- MAL
---
## Cell Type Annotation {.unnumbered .unlisted}
### Cluster: 1 {.unnumbered .unlisted}
**Cell Type:** Classical Monocyte
**Confidence:** High
**Reasoning:**
The enrichment list contains many related myeloid cell types, but the specific marker gene profile is definitive. The cluster expresses the core classical monocyte signature: high expression of CD14, S100A8, S100A9, FCN1, and LYZ (Top Specific/Marker Genes). While 'Myeloid cell' and 'Macrophage' are top-ranked by p-value, they are broad categories. The specific 'Classical monocyte' term (GeneRatio: 5/20, p.adjust: 1.637326e-07) is strongly supported by its gene list (S100A9/FCN1/CD14/S100A8/LYZ), which perfectly matches the top markers. The cluster lacks definitive markers to distinguish it as a Dendritic Cell (e.g., no FLT3, CD1C, CLEC9A), Macrophage (e.g., low/absent MRC1/CD163), or Neutrophil (e.g., absent MPO, ELANE). The presence of FCN1 and CD14 together is a hallmark of classical monocytes, and the absence of FCGR3A (CD16) argues against non-classical monocytes.
**Supporting Markers/Pathways:**
- CD14
- S100A8
- S100A9
- FCN1
- LYZ
- CST3
- TYROBP
- MS4A6A
---
## Cell Type Annotation {.unnumbered .unlisted}
### Cluster: 2 {.unnumbered .unlisted}
**Cell Type:** CD4+ T cell
**Confidence:** High
**Reasoning:**
The cluster is definitively a T cell, as the top enriched term is 'T cell' (p.adjust: 2.86e-17) and the marker list includes the core T-cell receptor complex genes CD3D, CD3E, CD3G, and CD247 (LAT). Among T-cell subtypes, the evidence strongly favors a CD4+ lineage over CD8+. The second most significant term is 'CD4+ T cell' (p.adjust: 2.40e-14), and its gene list (IL32, CD3E, IL7R, CD27, CD3D, TNFRSF4, MAL, CD2, LTB, CD40LG, CD3G) is almost entirely contained within the top 'T cell' markers. Key CD4+ T-cell markers IL7R and CD27 are among the top specific genes. While 'CD8+ T cell' is also enriched, its signature genes (like AQP3) are present but lower in the marker list, and definitive cytotoxic CD8+ markers (e.g., GZMB, PRF1) are absent. The presence of CD40LG and TNFRSF4 (OX40), which are associated with CD4+ T helper and regulatory functions, further supports this assignment. The cluster lacks exclusive markers for NK cells (e.g., NCAM1, KLR genes) or Tregs (FOXP3), though it shows some regulatory association.
**Supporting Markers/Pathways:**
- CD3D
- CD3E
- CD3G
- IL7R
- CD27
- CD40LG
- IL32
- LTB
- TNFRSF4
---
## Cell Type Annotation {.unnumbered .unlisted}
### Cluster: 3 {.unnumbered .unlisted}
**Cell Type:** Follicular B cell
**Confidence:** High
**Reasoning:**
The top enriched term is 'Follicular B cell' (p.adjust: 2.95e-16), and its gene list contains definitive B cell lineage markers (CD79A, CD79B, MS4A1, BANK1, FCER2, TCL1A) that are also present in the cluster's top specific genes. While other top terms like 'Secretory cell' and 'Classical monocyte' are enriched, they are driven almost exclusively by MHC Class II genes (HLA-DRA, HLA-DRB1, etc.), which are not specific to those cell types but are also expressed by antigen-presenting B cells. The presence of core B cell receptor components (CD79A/B) and mature B cell markers (MS4A1, FCER2, TCL1A) that are absent from monocyte/dendritic cell definitions, combined with the lack of specific monocyte (e.g., CD14, FCGR3A) or secretory cell markers, confirms the identity as a Follicular B cell.
**Supporting Markers/Pathways:**
- CD79A
- MS4A1
- CD79B
- TCL1A
- FCER2
- BANK1
- CD37
- HLA-DRA
- HLA-DRB1
- CD74
---
## Cell Type Annotation {.unnumbered .unlisted}
### Cluster: 4 {.unnumbered .unlisted}
**Cell Type:** CD8+ Cytotoxic T Cell
**Confidence:** High
**Reasoning:**
The top enriched terms are a mixture of 'Natural killer cell' and various T cell subtypes, indicating shared cytotoxic function. However, the specific marker gene list is definitive. It includes the core T cell receptor complex genes CD3D, CD8A, and CD8B (present in 'CD8+ T cell' and 'Cytotoxic T cell' enrichments), which are lineage-defining for CD8+ T cells and absent from NK cells. While NKG7, PRF1, GZMA, GZMK, and GZMH are shared cytotoxic molecules, the co-expression of CD3D with CD8A/CD8B specifically identifies a cytotoxic T cell lineage. The absence of definitive NK-specific markers (e.g., NCAM1/CD56, KLRD1/CD94, FCGR3A/CD16) from the top marker list, and the presence of the T cell-specific signaling adaptor HCST (DAP10), supports a T cell identity. The 'Cytotoxic CD8+ T cell' enrichment term (GeneRatio: 8/20) provides the most precise functional and lineage match.
**Supporting Markers/Pathways:**
- CD3D
- CD8A
- CD8B
- GZMK
- NKG7
- CCL5
- PRF1
- GZMA
- GZMH
- CST7
- LAG3
- KLRG1
---
## Cell Type Annotation {.unnumbered .unlisted}
### Cluster: 5 {.unnumbered .unlisted}
**Cell Type:** CD16+ monocyte (Non-classical monocyte)
**Confidence:** High
**Reasoning:**
The top enriched term is 'CD1C-CD141- dendritic cell' (p.adjust: 3.68e-23), but this is likely a misannotation due to shared myeloid markers. The gene list for this term (LST1, CKB, HCK, CSF1R, IFITM3, MS4A7, SERPINA1, LILRB1, CDKN1C, PILRA, FCGR3A, HMOX1, RHOC, LRRC25, SIGLEC10, MS4A4A) is a composite of pan-myeloid and monocyte-specific genes, and lacks definitive dendritic cell markers (e.g., CD1C, CLEC9A, BATF3). The cluster's specific marker list is dominated by canonical markers for non-classical monocytes: FCGR3A (CD16) is the defining marker, supported by MS4A7, CSF1R, and HES4 (enriched in 'CD16+ monocyte' and 'Non-classical monocyte' terms). The absence of T/NK/B cell markers and the presence of macrophage/pan-myeloid genes (e.g., CST3, CTSL) confirm a myeloid lineage, while the high expression of FCGR3A, MS4A7, and HES4 specifically distinguishes the non-classical monocyte subset from classical monocytes, macrophages, and dendritic cells.
**Supporting Markers/Pathways:**
- FCGR3A (CD16)
- MS4A7
- HES4
- CSF1R
- LST1
- IFITM3
- SIGLEC10
- PILRA
---
## Cell Type Annotation {.unnumbered .unlisted}
### Cluster: 6 {.unnumbered .unlisted}
**Cell Type:** Natural Killer (NK) cell
**Confidence:** High
**Reasoning:**
The top enriched terms include both 'Natural killer cell' (17/20 genes, p=2.79e-27) and 'Cytotoxic CD4+ T cell' (14/20 genes, p=5.24e-28). While the latter has a slightly better p-value, the discriminatory marker analysis strongly favors NK cells. The cluster's top specific genes include canonical NK markers FGFBP2, SPON2, XCL2, XCL1, SH2D1B, and FCGR3A (CD16a), which are not specific to T cells. Critically, the cluster lacks definitive T-cell lineage markers (e.g., CD3D, CD3E, CD4, CD8A). The shared cytotoxic genes (PRF1, GNLY, GZMB, GZMA, NKG7) are expressed by both NK cells and cytotoxic T cells, but the presence of NK-specific markers and absence of T-cell receptor complex genes confirms NK cell identity.
**Supporting Markers/Pathways:**
- FGFBP2
- SPON2
- XCL2
- XCL1
- SH2D1B
- FCGR3A (CD16a)
- KLRD1 (CD94)
- GNLY
- PRF1
- GZMB
- GZMA
- NKG7
---
## Cell Type Annotation {.unnumbered .unlisted}
### Cluster: 7 {.unnumbered .unlisted}
**Cell Type:** Plasmacytoid Dendritic Cell (pDC)
**Confidence:** High
**Reasoning:**
The enrichment list contains two distinct dendritic cell lineages: conventional/myeloid DCs (cDC2, CD1C+ DCs) and plasmacytoid DCs (pDC). While the top-ranked term is 'CD1C+_A dendritic cell' (a cDC2 subtype), the cluster's specific marker genes are definitive for pDC identity. The pDC-specific markers LILRA4, CLEC4C (BDCA-2), SERPINF1, and P2RY6 are present in the top marker list and are the defining genes for the 'Plasmacytoid dendritic cell(pDC)' and 'Plasmacytoid dendritic cell' enrichment terms. Critically, the cluster lacks the core, non-overlapping markers for cDC2s: while it expresses CD1C and FCER1A (which can be expressed at low levels in some pDCs), it does NOT express the definitive cDC2 markers CLEC10A (in the cDC2 enrichment term but not in the pDC-specific marker set from the top genes) and CD1C at high specificity relative to pDC markers. The rule of exclusion applies: the top term is a cDC2 type, but the specific marker gene list is dominated by pDC markers and lacks exclusive cDC2 commitment.
**Supporting Markers/Pathways:**
- LILRA4 (ILT7)
- CLEC4C (BDCA-2)
- SERPINF1
- P2RY6
- CLIC2
- SCT
- LRRC26
---
## Cell Type Annotation {.unnumbered .unlisted}
### Cluster: 8 {.unnumbered .unlisted}
**Cell Type:** Megakaryocyte
**Confidence:** High
**Reasoning:**
The assignment is based on the definitive convergence of enrichment terms and marker genes. The top enriched term is 'Megakaryocyte' (p.adjust: 3.58e-10) with genes SPARC, GNG11, PF4, GP9, ITGA2B, GP1BA. The related terms 'Platelet' and 'Progenitor cell' are lower-ranked and share subsets of these genes (e.g., PF4, GP9, ITGA2B), which is expected as platelets are anucleate fragments of megakaryocytes. The top specific marker gene list is dominated by canonical megakaryocyte/platelet markers (GP9, ITGA2B, GP1BA, PF4, ITGB3, SPARC, GNG11) and lacks definitive markers for other hematopoietic lineages that could challenge this identity (e.g., no CD3, CD19, CD14, ELANE). The 'Progenitor cell' term is likely reflective of the shared SPARC and PF4 expression in some progenitor states, but the presence of terminal differentiation markers like GP1BA and ITGA2B confirms a mature megakaryocyte identity.
**Supporting Markers/Pathways:**
- GP9
- ITGA2B
- GP1BA
- PF4
- SPARC
- GNG11
- ITGB3
- CLDN5
- CMTM5
- SDPR
---
:::
## The Multi-Agent System
Instead of relying on a single prompt, `clusterProfiler` employs a **Multi-Agent System (MAS)** to ensure accuracy and depth. This system consists of three specialized agents that work in a pipeline:
1. **Agent Cleaner**: Acts as a curator. It filters out "housekeeping" pathways (e.g., Ribosome, Spliceosome) that may be statistically significant but irrelevant to the specific biological context (e.g., tumor immunology), reducing noise.
2. **Agent Detective**: Acts as a systems biologist. It analyzes the gene list, looks for **Hub Genes** in Protein-Protein Interaction (PPI) networks, and combines this with Fold Change data to identify **Key Drivers** and infer regulatory mechanisms.
3. **Agent Storyteller**: Acts as a scientific writer. It synthesizes the findings from the Cleaner and Detective into a logical narrative, distinguishing between observations ("What"), mechanisms ("How"), and implications ("So What").
You can activate this deep mode using the `interpret_agent()` function.
```{r}
#| label: interpret-agent
#| eval: false
# Provide biological context to help Agent Cleaner
context <- "scRNA-seq analysis of CD8+ T cells in Tumor Microenvironment, comparing Exhausted vs. Naive states."
res <- interpret_agent(edo, context = context)
```
## Knowledge-Guided Interpretation
Enrichment analysis often treats genes as a "bag of words," ignoring their interactions and expression changes. The **Knowledge-Guided Interpretation** mode injects external knowledge to empower the AI's reasoning.
### 1. PPI Networks and Hub Genes
By setting `add_ppi = TRUE`, the system fetches protein-protein interaction data (from STRING). The AI can then identify functional modules (e.g., a TCR signaling complex) rather than just isolated genes.
### 2. Expression Trends
By providing `gene_fold_change`, the AI can infer the direction of pathway activity. For example, if the "Apoptosis" pathway is enriched but pro-apoptotic genes are downregulated while anti-apoptotic genes are upregulated, the AI will correctly interpret this as "Apoptosis Resistance."
```r
#| label: interpret-knowledge-guided
#| eval: false
# Prepare a named vector of fold changes
gene_list <- c("CD8A" = 2.5, "PDCD1" = 1.8, "GZMB" = 3.2)
res <- interpret(edo,
task = "cell_type",
add_ppi = TRUE, # Enable PPI network analysis
gene_fold_change = gene_list # Inject expression data
)
```
## Reference-Guided Interpretation
For cell type annotation, LLMs can sometimes hallucinate. To prevent this, `clusterProfiler` supports **Reference-Guided Interpretation**.
### Prior Knowledge Injection
You can provide "prior knowledge" (e.g., results from SingleR, scGPT, or manual rough annotation) to the AI. The AI acts as a validator and refiner:
* **Validation**: Checks if pathway evidence supports the prior label.
* **Refinement**: Refines a broad label (e.g., "T cell") into a specific state (e.g., "Proliferating CD8+ T cell") based on pathway activity.
* **Correction**: Flags potential misannotations if the evidence contradicts the prior.
```{r}
#| label: interpret-prior-knowledge
#| eval: false
# Prior knowledge from SingleR
my_priors <- c("Cluster1" = "T cells")
res <- interpret(edo, prior = my_priors, task = "cell_type")
```
### Hierarchical Interpretation
For complex datasets, `interpret_hierarchical()` mimics the human thought process of annotating major lineages first (e.g., Myeloid) and then subtypes (e.g., M1 Macrophage). It enforces lineage constraints to prevent impossible annotations (e.g., a T cell subtype appearing within a Myeloid cluster).
This approach is also highly applicable to **Single-cell Trajectory Inference**. Developmental processes inherently follow a hierarchical structure (e.g., Stem Cell -> Progenitor -> Terminally Differentiated Cell). By utilizing this hierarchical relationship, `interpret_hierarchical()` can provide context-aware interpretations that respect the biological differentiation path, ensuring that downstream states are interpreted within the context of their upstream progenitors.
```r
#| eval: false
# Mapping between minor and major clusters
cluster_mapping <- c(
"SubCluster1_1" = "MajorCluster1",
"SubCluster1_2" = "MajorCluster1",
"SubCluster2_1" = "MajorCluster2"
)
# Hierarchical interpretation
res_hier <- interpret_hierarchical(
x_minor = enrich_minor,
x_major = enrich_major,
mapping = cluster_mapping
)
```
## Gene-Based Fallback Mode
In real-world research, enrichment analysis sometimes fails to return significant pathways due to small gene sets or background noise.
`clusterProfiler` introduces a **Gene-Based Fallback Mode**. When no enriched pathways are found, the agent does not simply error out. Instead, it:
1. Directly analyzes the function of the input genes.
2. Retrieves PPI networks for these genes.
3. Infers biological function based on gene connectivity and function, providing a "Medium Confidence" report instead of an empty result.
```r
#| eval: false
# When enrichment fails, this still works
res <- interpret_agent(tough_genes, add_ppi = TRUE)
```
## Visualization: From Story to Figure
Text reports are great, but graphical representations are often preferred for presentations. The `plot()` method for interpretation results uses `ggtangle` (a grammar of graphics for networks) to visualize the AI-inferred regulatory network.
The resulting plot highlights:
* **Key Drivers**: Central nodes identified by the AI.
* **Activation (Green)** / **Inhibition (Red)**: Regulatory relationships inferred from data.
* **Interactions (Grey)**: Physical associations.
```r
#| eval: false
# Visualize the interpretation result
plot(res)
```
This feature creates a closed loop: from **Enrichment** (Statistics) to **Interpretation** (Insight) and finally to **Visualization** (Communication).