Example in Section 5.2 - Over-representation analysis

There is an issue in the example for section 5.2, over-representation analysis:


> **Example:** Suppose we have 17,980 genes detected in a Microarray study and 57 genes were differentially expressed. Among the differentially expressed genes, 28 are annotated to a gene set^[example adopted from <https://guangchuangyu.github.io/2012/04/enrichment-analysis/>]. 
> 
> 
> ```{r}
> d <- data.frame(gene.not.interest=c(2613, 15310), gene.in.interest=c(28, 29))
> row.names(d) <- c("In_category", "not_in_category")
> d
> ```                        
> 
> 
> Whether the overlap(s) of 25 genes are significantly over represented in the gene set can be assessed using a hypergeometric distribution. This corresponds to a one-sided version of Fisher's exact test.
> 
> ```{r}
> fisher.test(d, alternative = "greater")
> ```


In the case of Over-Representation Analysis, our question is "what is the probability of observing at least as many genes from the ontology that are DE ?"

However, the `alternative` in `fisher.test` can read as "what alternative values in the **top-left** cell of the provided 2x2 matrix should be considered".

The data.frame should thus be (columns are permuted):

| | gene in interest | gene not in interest |
|---|---|---|
| in_category | 28 | 2613 |
| not_in_category | 29 | 15310 |



Correct code:
> **Example:** Suppose we have 17,980 genes detected in a Microarray study and 57 genes were differentially expressed. Among the differentially expressed genes, 28 are annotated to a gene set^[example adopted from <https://guangchuangyu.github.io/2012/04/enrichment-analysis/>]. 
> 
> 
> ```{r}
> d <- data.frame(gene.in.interest=c(28, 29), gene.not.interest=c(2613, 15310))
> row.names(d) <- c("In_category", "not_in_category")
> d
> ```                        
> 
> 
> Whether the overlap(s) of 25 genes are significantly over represented in the gene set can be assessed using a hypergeometric distribution. This corresponds to a one-sided version of Fisher's exact test.
> 
> ```{r}
> fisher.test(d, alternative = "greater")
> ```



-----------

Alternatively, using the current data.frame, one can use `alternative = "less"`, but I find it a bit harder to understand/link to the original question ("at least as many"):

> ```
> d1 <- data.frame(gene.in.interest=c(28, 29), gene.not.interest=c(2613, 15310))
> row.names(d1) <- c("In_category", "not_in_category")
> fisher.test(d1, alternative = "greater")
> ```

```
	Fisher's Exact Test for Count Data

data:  d1
p-value = 7.879e-10
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
 3.524092      Inf
sample estimates:
odds ratio
   5.65631 
```

> ```
> d2 <- data.frame(gene.not.interest=c(2613, 15310), gene.in.interest=c(28, 29) )
> row.names(d2) <- c("In_category", "not_in_category")
> fisher.test(d2, alternative = "less")
> ```
```
	Fisher's Exact Test for Count Data

data:  d2
p-value = 7.879e-10
alternative hypothesis: true odds ratio is less than 1
95 percent confidence interval:
 0.000000 0.283761
sample estimates:
odds ratio 
 0.1767937
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example in Section 5.2 - Over-representation analysis #37

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Example in Section 5.2 - Over-representation analysis #37

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions