There is an issue in the example for section 5.2, over-representation analysis:
Example: Suppose we have 17,980 genes detected in a Microarray study and 57 genes were differentially expressed. Among the differentially expressed genes, 28 are annotated to a gene set^[example adopted from https://guangchuangyu.github.io/2012/04/enrichment-analysis/].
d <- data.frame(gene.not.interest=c(2613, 15310), gene.in.interest=c(28, 29))
row.names(d) <- c("In_category", "not_in_category")
d
Whether the overlap(s) of 25 genes are significantly over represented in the gene set can be assessed using a hypergeometric distribution. This corresponds to a one-sided version of Fisher's exact test.
fisher.test(d, alternative = "greater")
In the case of Over-Representation Analysis, our question is "what is the probability of observing at least as many genes from the ontology that are DE ?"
However, the alternative in fisher.test can read as "what alternative values in the top-left cell of the provided 2x2 matrix should be considered".
The data.frame should thus be (columns are permuted):
|
gene in interest |
gene not in interest |
| in_category |
28 |
2613 |
| not_in_category |
29 |
15310 |
Correct code:
Example: Suppose we have 17,980 genes detected in a Microarray study and 57 genes were differentially expressed. Among the differentially expressed genes, 28 are annotated to a gene set^[example adopted from https://guangchuangyu.github.io/2012/04/enrichment-analysis/].
d <- data.frame(gene.in.interest=c(28, 29), gene.not.interest=c(2613, 15310))
row.names(d) <- c("In_category", "not_in_category")
d
Whether the overlap(s) of 25 genes are significantly over represented in the gene set can be assessed using a hypergeometric distribution. This corresponds to a one-sided version of Fisher's exact test.
fisher.test(d, alternative = "greater")
Alternatively, using the current data.frame, one can use alternative = "less", but I find it a bit harder to understand/link to the original question ("at least as many"):
d1 <- data.frame(gene.in.interest=c(28, 29), gene.not.interest=c(2613, 15310))
row.names(d1) <- c("In_category", "not_in_category")
fisher.test(d1, alternative = "greater")
Fisher's Exact Test for Count Data
data: d1
p-value = 7.879e-10
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
3.524092 Inf
sample estimates:
odds ratio
5.65631
d2 <- data.frame(gene.not.interest=c(2613, 15310), gene.in.interest=c(28, 29) )
row.names(d2) <- c("In_category", "not_in_category")
fisher.test(d2, alternative = "less")
Fisher's Exact Test for Count Data
data: d2
p-value = 7.879e-10
alternative hypothesis: true odds ratio is less than 1
95 percent confidence interval:
0.000000 0.283761
sample estimates:
odds ratio
0.1767937
There is an issue in the example for section 5.2, over-representation analysis:
In the case of Over-Representation Analysis, our question is "what is the probability of observing at least as many genes from the ontology that are DE ?"
However, the
alternativeinfisher.testcan read as "what alternative values in the top-left cell of the provided 2x2 matrix should be considered".The data.frame should thus be (columns are permuted):
Correct code:
Alternatively, using the current data.frame, one can use
alternative = "less", but I find it a bit harder to understand/link to the original question ("at least as many"):