A comprehensive bioinformatics pipeline for analyzing Affymetrix microarray gene expression data, demonstrating end-to-end workflows from raw data processing to differential expression analysis and pathway enrichment.
This project reanalyzes Affymetrix microarray data from The genomic response of the retinal pigment epithelium to light damage and retinal detachment (Rattner & Bhanu, 2008). The analysis investigates transcriptional changes in Retinal Pigment Epithelium (RPE) cells following ocular light damage.
Key Finding: Through rigorous quality control, this analysis uncovered previously unreported RNA degradation issues in two samples (DARCR1, LDRR1) that affected the original study's reproducibility—demonstrating the critical importance of comprehensive QC in microarray analysis.
| Category | Technologies |
|---|---|
| Programming | R, R Markdown |
| Bioinformatics | Affymetrix .CEL file processing, Gene expression analysis, Pathway enrichment |
| Statistical Methods | Linear models, Multiple testing correction (FDR), Principal Component Analysis |
| R/Bioconductor Packages | affy, limma, simpleaffy, Biobase, clusterProfiler, GEOquery, mouse4302.db |
| Data Visualization | ggplot2, Base R graphics, Volcano plots, Heatmaps, PCA plots |
| Reproducibility | renv (package management), R Markdown (literate programming) |
| Data Sources | GEO (Gene Expression Omnibus), MSigDB Hallmark gene sets |
Raw .CEL Files (6 samples)
│
▼
┌───────────────────────────┐
│ Quality Control (QC) │
│ • Density plots │
│ • Box plots │
│ • MA plots │
│ • RNA degradation │
└───────────────────────────┘
│
▼
┌───────────────────────────┐
│ RMA Normalization │
│ • Background correction │
│ • Quantile normalization │
│ • Log2 transformation │
└───────────────────────────┘
│
▼
┌───────────────────────────┐
│ Exploratory Analysis │
│ • Hierarchical clustering│
│ • PCA visualization │
└───────────────────────────┘
│
▼
┌───────────────────────────┐
│ Differential Expression │
│ • limma linear models │
│ • eBayes moderation │
│ • Volcano plots │
└───────────────────────────┘
│
▼
┌───────────────────────────┐
│ Pathway Enrichment │
│ • CAMERA gene set test │
│ • Hallmark pathways │
└───────────────────────────┘
├── analysis.rmd # Main R Markdown analysis script
├── GSE13074_targets.txt # Sample metadata
├── *.CEL # Raw Affymetrix data files (6 samples)
├── Mm.h.all.v7.1.entrez.rds # MSigDB Hallmark gene sets
├── plots/ # Generated visualizations
│ ├── PreNormDensity.png # Pre-normalization density plot
│ ├── PreNormBox.png # Pre-normalization box plot
│ ├── PostNormBox.png # Post-normalization box plot
│ ├── RNAdeg.png # RNA degradation analysis
│ ├── QCStats.png # Quality control statistics
│ ├── Dendrogram.png # Hierarchical clustering
│ ├── PCA_Plot.png # Principal component analysis
│ └── VolcanoDEG.png # Differential expression volcano plot
├── output_files/ # Analysis results
│ ├── topTableResults.txt # Full differential expression results
│ ├── FoldChangeTable.txt # Fold change calculations
│ └── SortedCameraEnrichment.csv # Pathway enrichment results
└── renv/ # Package version management
- R (version 4.0.x recommended for full compatibility)
- RStudio (optional but recommended)
- Clone the repository:
git clone https://github.com/yourusername/affymetrix-microarray-data-analysis.git
cd affymetrix-microarray-data-analysis- Restore the R environment:
# Install renv if not already installed
install.packages("renv")
# Restore project dependencies
renv::restore()- Open
analysis.rmdin RStudio and run the analysis.
Note: The
simpleaffypackage was removed from Bioconductor 3.13. For full QC functionality, use R 4.0.x with an earlier Bioconductor version. The R Installation Manager (rig) can help manage multiple R versions.
Ideal Affymetrix samples show similar log intensity peaks. Two samples (DARCR1, LDRR1) exhibit significantly different peak heights, indicating potential quality issues.
Expression distributions reveal that DARCR1 and LDRR1 have similar means despite belonging to different experimental groups—an unexpected pattern suggesting sample quality problems.
After RMA normalization, samples show comparable medians, enabling unbiased differential expression analysis.
PCA reveals that DARCR1 and LDRR1 cluster together rather than with their respective groups, confirming within-group variance exceeds between-group variance for these samples.
Hierarchical clustering confirms the anomalous behavior of DARCR1 and LDRR1, prompting deeper QC investigation.
All samples show RNA degradation (steep slopes), but DARCR1 and LDRR1 exhibit more severe degradation patterns.
Present call percentages are low across all samples, with DARCR1 and LDRR1 showing substantially lower values than their replicate group members.
Conclusion: Heavy RNA degradation in DARCR1 and LDRR1 resulted in similar transcriptional profiles despite different experimental conditions. This finding was not reported in the original publication and may explain partial irreproducibility of results.
| Gene | Log2 Fold Change | Adjusted P-value | Function |
|---|---|---|---|
| Mmp3 | 5.61 | 0.0038 | Matrix metalloprotease |
| Serpina3n | 4.47 | 0.0029 | Serine protease inhibitor |
| Serpinb1a | 4.23 | 0.0028 | Serine protease inhibitor |
| Ms4a4c | 2.99 | 0.0028 | Membrane-spanning protein |
Validation: 3 of 4 genes reported in the original study (Mmp3, Serpina3n, Serpinb1a) were successfully reproduced with adjusted p-value < 0.05.
| Pathway | P-value | FDR | Direction |
|---|---|---|---|
| IL6_JAK_STAT3_SIGNALING | 6.44e-18 | 3.22e-16 | Up |
| INTERFERON_GAMMA_RESPONSE | 1.70e-16 | 3.22e-15 | Up |
| INTERFERON_ALPHA_RESPONSE | 1.93e-16 | 3.22e-15 | Up |
| TNFA_SIGNALING_VIA_NFKB | 8.45e-08 | 7.04e-07 | Up |
| INFLAMMATORY_RESPONSE | 7.35e-05 | 4.09e-04 | Up |
Biological Interpretation: Strong enrichment of interferon signaling and inflammatory pathways aligns with expected immune responses to light-induced retinal damage.
| Sample | Group | Description |
|---|---|---|
| DARCR1, DARCR2, DARCR3 | Control | Dark ambient conditions (n=3) |
| LDRR1, LDRR2, LDRR3 | Treatment | Light damage + retinal detachment (n=3) |
Platform: Affymetrix Mouse430-2 GeneChip Data Source: GEO Series GSE13074
- Quality control is essential: Comprehensive QC revealed sample quality issues not reported in the original publication
- RNA degradation affects reproducibility: Degraded samples from different groups clustered together, confounding biological signal
- Results partially validated: 3 of 4 key genes from the original study were reproduced despite sample quality issues
- Pathway analysis confirms biology: Enriched inflammatory and interferon pathways are consistent with light damage response
- Rattner, A., & Bhanu, N. (2008). The genomic response of the retinal pigment epithelium to light damage and retinal detachment. Journal of Neuroscience, 28(39), 9880-9889. DOI: 10.1523/JNEUROSCI.2401-08.2008
This project is for educational and research purposes.







