Skip to content

Commit 8423151

Browse files
author
William Ryan
committed
Add README with instructions
Bring configuration to TLD for ease-of-use Update to 2025 GO libraries for enrichR
1 parent c5bea17 commit 8423151

6 files changed

Lines changed: 364 additions & 50 deletions

File tree

3PodR.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ list.files(here::here("R"), full.names = TRUE) %>%
5151
purrr::walk(source)
5252

5353
# Setup report state
54-
configuration_yml <- here::here("extdata", "configuration.yml")
54+
configuration_yml <- here::here("configuration.yml")
5555

5656
if(!file.exists(configuration_yml)) {
5757
stop("Configuration file not found")

R/do_enrichr.R

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ do_enrichr <- function(symbols, alpha = 0.05) {
66

77
# TODO: re-implement enrichR locally using the GMT file?
88

9-
dbs = c("GO_Biological_Process_2023",
10-
"GO_Molecular_Function_2023",
11-
"GO_Cellular_Component_2023")
9+
dbs = c("GO_Biological_Process_2025",
10+
"GO_Molecular_Function_2025",
11+
"GO_Cellular_Component_2025")
1212

1313
columns = c("Biological_Process",
1414
"Molecular_Function",

README.Rmd

Lines changed: 144 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,159 @@
11
---
2-
output: github_document
2+
title: README
3+
subtitle: 3PodR
4+
author: William G. Ryan V
5+
output:
6+
github_document:
7+
html_preview: false
8+
toc: true
39
---
410

511
<!-- README.md is generated from README.Rmd. Please edit that file -->
612

7-
```{r, include = FALSE}
13+
```{r, include=FALSE}
814
knitr::opts_chunk$set(
915
collapse = TRUE,
1016
comment = "#>"
1117
)
1218
```
1319

14-
# 3PodR
15-
1620
<!-- badges: start -->
21+
[![R Markdown](https://img.shields.io/badge/RMarkdown-Analysis-blue.svg)](https://rmarkdown.rstudio.com/)
1722
<!-- badges: end -->
1823

19-
3PodR.
24+
# Introduction
25+
26+
3PodR is an R bookdown site for performing comprehensive differential gene expression analysis.
27+
28+
This template processes differential gene expression data from transcriptomic studies and generates several key outputs, including Gene Set Enrichment Analysis (GSEA), Over-representation Analysis (ORA) and drug prediction analysis (iLINCS).
29+
30+
# Installation & Usage
31+
32+
Follow these step-by-step instructions to install and run 3PodR:
33+
34+
## Clone the Repository
35+
36+
Open a terminal and execute:
37+
38+
```bash
39+
git clone https://github.com/willgryan/3PodR_bookdown.git
40+
```
41+
42+
## Open the Project
43+
44+
Change into the repository directory:
45+
46+
```bash
47+
cd 3PodR_bookdown
48+
```
49+
50+
Alternatively, open `3PodR_bookdown.Rproj` in [RStudio](https://posit.co/download/rstudio-desktop/).
51+
52+
## Restore the Environment
53+
54+
In R, run:
55+
56+
```r
57+
renv::restore()
58+
```
59+
60+
This command ensures that all required packages are installed.
61+
62+
## Prepare Your Input Data
63+
64+
3PodR takes one or more CSV files containing the results of testing for differential gene expression.
65+
66+
Each file must contain the first three columns in the following order.
67+
68+
**CSV Format Requirements:**
69+
70+
- **Column 1:** Gene Symbols (character vector)
71+
- **Column 2:** Log2 Fold Change (numeric vector)
72+
- **Column 3:** Unadjusted P-values (numeric vector)
73+
74+
> **Important:** Gene symbols must conform to the annotation standard for the species used (HGNC for human, RGD for rat, or MGI for mouse). Non-standard IDs (e.g., Ensembl or Entrez) will cause errors.
75+
76+
**Place Your CSV File:**
77+
78+
Copy your differential gene expression CSV file into the `extdata/` folder (e.g., `extdata/YOUR_DGE_FILE.csv`).
79+
80+
**Optional: Sample Metadata and Gene Counts**
81+
If you have sample metadata (e.g., group information) and gene counts, place them in the `extdata/` folder.
82+
83+
84+
Sample metadata should be in a CSV file named `design.csv`. The file should contain the following columns:
85+
86+
**CSV Format Requirements:**
87+
88+
- **Column 1:** Sample (character vector)
89+
- **Column 2:** Group (character vector)
90+
91+
Gene counts in log units should be in a CSV file named `counts.csv.` The file should contain the following columns:
92+
93+
**CSV Format Requirements:**
94+
95+
- **Column 1:** Gene Symbols (character vector)
96+
- **Column 2-N:** Sample Names (numeric vector)
97+
98+
> **Important:** Ensure that the gene symbols in the `design.csv` and `counts.csv` files match those in the differential gene expression file. The sample names in the `design.csv` file must match those in the counts.csv columns. The group names in the `design.csv` file must match those in the report configuration file detailed below.
99+
100+
## Configure the Analysis
101+
102+
Edit the `extdata/configuration.yml` file as follows:
103+
104+
**Species**
105+
106+
- By default, the species is set to human. To use rat or mouse data, set the `species` variable to the appropriate species (e.g., `species: human`, `species: rat`, or `species: mouse`). You must also update the species indicated in the gmt file name by changing Human to Rat or Mouse
107+
108+
> **Important:** Ensure that the species in the configuration file is lowercase. However, you must use the species name in the gmt file name with the first letter capitalized (e.g., `Human`, `Rat`, or `Mouse`).
109+
110+
**For Count and Expression Data:**
111+
112+
- If you are not using count data, you **must** comment out lines related to `design.csv` and `counts.csv` to not cause errors.
113+
114+
**File Name Update:**
115+
116+
- Change the `file` variable (default is `file: DAvsCA.csv` or `file: DBvsCB.csv`) to match your CSV file (e.g., `file: YOUR_INPUT_FILE.csv`).
117+
118+
## Render the Report
119+
120+
Generate the report by running in R:
121+
122+
```r
123+
rmarkdown::render_site(encoding = 'UTF-8')
124+
```
125+
126+
Alternatively, in RStudio, click the **Build Book** or **Build Website** button (a wrench) in the Build pane, typically in the top right of the window.
127+
128+
# Troubleshooting
129+
130+
If you encounter any issues, follow these troubleshooting steps:
131+
132+
## Verify Installation
133+
134+
- Run the example files provided in the `extdata/` folder.
135+
- Ensure all dependencies specified in the `renv.lock` file are installed.
136+
137+
## Verify Species
138+
139+
- Confirm that the species in the configuration file matches the species in the gmt file name.
140+
- Ensure that the species in the configuration file is lowercase and the species in the gmt file name is capitalized.
141+
142+
## Validate Input File Formats
143+
144+
- Confirm your differential gene expression CSV files are formatted correctly shown above.
145+
- If you are using count and expression data, ensure the `design.csv` and `counts.csv` files are correctly formatted as shown above.
146+
147+
## Check Gene Annotation
148+
149+
- Make sure gene symbols are in the proper format (HGNC for human, RGD for rat, or MGI for mouse).
150+
- Incorrect gene identifier formats (e.g., Ensembl or Entrez IDs) will lead to errors.
151+
152+
## Review Configuration
153+
154+
- Verify that the `extdata/configuration.yml` file is correctly set up, including file names and group names if using count data.
155+
- You can refer to the example datasets provided in the `extdata/` folder for guidance on formatting.
156+
157+
For additional help, open an issue for support.
20158

159+
Happy 3PodR'ing!

README.md

Lines changed: 192 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,197 @@
1+
README
2+
================
3+
William G. Ryan V
14

2-
<!-- README.md is generated from README.Rmd. Please edit that file -->
3-
4-
# 3PodR
5+
- [Introduction](#introduction)
6+
- [Installation & Usage](#installation--usage)
7+
- [Clone the Repository](#clone-the-repository)
8+
- [Open the Project](#open-the-project)
9+
- [Restore the Environment](#restore-the-environment)
10+
- [Prepare Your Input Data](#prepare-your-input-data)
11+
- [Configure the Analysis](#configure-the-analysis)
12+
- [Render the Report](#render-the-report)
13+
- [Troubleshooting](#troubleshooting)
14+
- [Verify Installation](#verify-installation)
15+
- [Verify Species](#verify-species)
16+
- [Validate Input File Formats](#validate-input-file-formats)
17+
- [Check Gene Annotation](#check-gene-annotation)
18+
- [Review Configuration](#review-configuration)
519

20+
<!-- README.md is generated from README.Rmd. Please edit that file -->
621
<!-- badges: start -->
22+
23+
[![R
24+
Markdown](https://img.shields.io/badge/RMarkdown-Analysis-blue.svg)](https://rmarkdown.rstudio.com/)
725
<!-- badges: end -->
826

9-
3PodR.
27+
# Introduction
28+
29+
3PodR is an R bookdown site for performing comprehensive differential
30+
gene expression analysis.
31+
32+
This template processes differential gene expression data from
33+
transcriptomic studies and generates several key outputs, including Gene
34+
Set Enrichment Analysis (GSEA), Over-representation Analysis (ORA) and
35+
drug prediction analysis (iLINCS).
36+
37+
# Installation & Usage
38+
39+
Follow these step-by-step instructions to install and run 3PodR:
40+
41+
## Clone the Repository
42+
43+
Open a terminal and execute:
44+
45+
``` bash
46+
git clone https://github.com/willgryan/3PodR_bookdown.git
47+
```
48+
49+
## Open the Project
50+
51+
Change into the repository directory:
52+
53+
``` bash
54+
cd 3PodR_bookdown
55+
```
56+
57+
Alternatively, open `3PodR_bookdown.Rproj` in
58+
[RStudio](https://posit.co/download/rstudio-desktop/).
59+
60+
## Restore the Environment
61+
62+
In R, run:
63+
64+
``` r
65+
renv::restore()
66+
```
67+
68+
This command ensures that all required packages are installed.
69+
70+
## Prepare Your Input Data
71+
72+
3PodR takes one or more CSV files containing the results of testing for
73+
differential gene expression.
74+
75+
Each file must contain the first three columns in the following order.
76+
77+
**CSV Format Requirements:**
78+
79+
- **Column 1:** Gene Symbols (character vector)
80+
- **Column 2:** Log2 Fold Change (numeric vector)
81+
- **Column 3:** Unadjusted P-values (numeric vector)
82+
83+
> **Important:** Gene symbols must conform to the annotation standard
84+
> for the species used (HGNC for human, RGD for rat, or MGI for mouse).
85+
> Non-standard IDs (e.g., Ensembl or Entrez) will cause errors.
86+
87+
**Place Your CSV File:**
88+
89+
Copy your differential gene expression CSV file into the `extdata/`
90+
folder (e.g., `extdata/YOUR_DGE_FILE.csv`).
91+
92+
**Optional: Sample Metadata and Gene Counts**
93+
If you have sample metadata (e.g., group information) and gene counts,
94+
place them in the `extdata/` folder.
95+
96+
Sample metadata should be in a CSV file named `design.csv`. The file
97+
should contain the following columns:
98+
99+
**CSV Format Requirements:**
100+
101+
- **Column 1:** Sample (character vector)
102+
- **Column 2:** Group (character vector)
103+
104+
Gene counts in log units should be in a CSV file named `counts.csv.` The
105+
file should contain the following columns:
106+
107+
**CSV Format Requirements:**
108+
109+
- **Column 1:** Gene Symbols (character vector)
110+
- **Column 2-N:** Sample Names (numeric vector)
111+
112+
> **Important:** Ensure that the gene symbols in the `design.csv` and
113+
> `counts.csv` files match those in the differential gene expression
114+
> file. The sample names in the `design.csv` file must match those in
115+
> the counts.csv columns. The group names in the `design.csv` file must
116+
> match those in the report configuration file detailed below.
117+
118+
## Configure the Analysis
119+
120+
Edit the `extdata/configuration.yml` file as follows:
121+
122+
**Species**
123+
124+
- By default, the species is set to human. To use rat or mouse data, set
125+
the `species` variable to the appropriate species (e.g.,
126+
`species: human`, `species: rat`, or `species: mouse`). You must also
127+
update the species indicated in the gmt file name by changing Human to
128+
Rat or Mouse
129+
130+
> **Important:** Ensure that the species in the configuration file is
131+
> lowercase. However, you must use the species name in the gmt file name
132+
> with the first letter capitalized (e.g., `Human`, `Rat`, or `Mouse`).
133+
134+
**For Count and Expression Data:**
135+
136+
- If you are not using count data, you **must** comment out lines
137+
related to `design.csv` and `counts.csv` to not cause errors.
138+
139+
**File Name Update:**
140+
141+
- Change the `file` variable (default is `file: DAvsCA.csv` or
142+
`file: DBvsCB.csv`) to match your CSV file (e.g.,
143+
`file: YOUR_INPUT_FILE.csv`).
144+
145+
## Render the Report
146+
147+
Generate the report by running in R:
148+
149+
``` r
150+
rmarkdown::render_site(encoding = 'UTF-8')
151+
```
152+
153+
Alternatively, in RStudio, click the **Build Book** or **Build Website**
154+
button (a wrench) in the Build pane, typically in the top right of the
155+
window.
156+
157+
# Troubleshooting
158+
159+
If you encounter any issues, follow these troubleshooting steps:
160+
161+
## Verify Installation
162+
163+
- Run the example files provided in the `extdata/` folder.
164+
- Ensure all dependencies specified in the `renv.lock` file are
165+
installed.
166+
167+
## Verify Species
168+
169+
- Confirm that the species in the configuration file matches the species
170+
in the gmt file name.
171+
- Ensure that the species in the configuration file is lowercase and the
172+
species in the gmt file name is capitalized.
173+
174+
## Validate Input File Formats
175+
176+
- Confirm your differential gene expression CSV files are formatted
177+
correctly shown above.
178+
- If you are using count and expression data, ensure the `design.csv`
179+
and `counts.csv` files are correctly formatted as shown above.
180+
181+
## Check Gene Annotation
182+
183+
- Make sure gene symbols are in the proper format (HGNC for human, RGD
184+
for rat, or MGI for mouse).
185+
- Incorrect gene identifier formats (e.g., Ensembl or Entrez IDs) will
186+
lead to errors.
187+
188+
## Review Configuration
189+
190+
- Verify that the `extdata/configuration.yml` file is correctly set up,
191+
including file names and group names if using count data.
192+
- You can refer to the example datasets provided in the `extdata/`
193+
folder for guidance on formatting.
194+
195+
For additional help, open an issue for support.
196+
197+
Happy 3PodR’ing!

0 commit comments

Comments
 (0)