Skip to content

GenTech2025/affymetrix-microarray-data-analysis

Repository files navigation

Affymetrix Microarray Data Analysis Pipeline

A comprehensive bioinformatics pipeline for analyzing Affymetrix microarray gene expression data, demonstrating end-to-end workflows from raw data processing to differential expression analysis and pathway enrichment.

Project Summary

This project reanalyzes Affymetrix microarray data from The genomic response of the retinal pigment epithelium to light damage and retinal detachment (Rattner & Bhanu, 2008). The analysis investigates transcriptional changes in Retinal Pigment Epithelium (RPE) cells following ocular light damage.

Key Finding: Through rigorous quality control, this analysis uncovered previously unreported RNA degradation issues in two samples (DARCR1, LDRR1) that affected the original study's reproducibility—demonstrating the critical importance of comprehensive QC in microarray analysis.


Skills & Technologies

Category Technologies
Programming R, R Markdown
Bioinformatics Affymetrix .CEL file processing, Gene expression analysis, Pathway enrichment
Statistical Methods Linear models, Multiple testing correction (FDR), Principal Component Analysis
R/Bioconductor Packages affy, limma, simpleaffy, Biobase, clusterProfiler, GEOquery, mouse4302.db
Data Visualization ggplot2, Base R graphics, Volcano plots, Heatmaps, PCA plots
Reproducibility renv (package management), R Markdown (literate programming)
Data Sources GEO (Gene Expression Omnibus), MSigDB Hallmark gene sets

Analysis Pipeline

Raw .CEL Files (6 samples)
        │
        ▼
┌───────────────────────────┐
│   Quality Control (QC)    │
│  • Density plots          │
│  • Box plots              │
│  • MA plots               │
│  • RNA degradation        │
└───────────────────────────┘
        │
        ▼
┌───────────────────────────┐
│   RMA Normalization       │
│  • Background correction  │
│  • Quantile normalization │
│  • Log2 transformation    │
└───────────────────────────┘
        │
        ▼
┌───────────────────────────┐
│  Exploratory Analysis     │
│  • Hierarchical clustering│
│  • PCA visualization      │
└───────────────────────────┘
        │
        ▼
┌───────────────────────────┐
│  Differential Expression  │
│  • limma linear models    │
│  • eBayes moderation      │
│  • Volcano plots          │
└───────────────────────────┘
        │
        ▼
┌───────────────────────────┐
│  Pathway Enrichment       │
│  • CAMERA gene set test   │
│  • Hallmark pathways      │
└───────────────────────────┘

Project Structure

├── analysis.rmd                    # Main R Markdown analysis script
├── GSE13074_targets.txt            # Sample metadata
├── *.CEL                           # Raw Affymetrix data files (6 samples)
├── Mm.h.all.v7.1.entrez.rds        # MSigDB Hallmark gene sets
├── plots/                          # Generated visualizations
│   ├── PreNormDensity.png          # Pre-normalization density plot
│   ├── PreNormBox.png              # Pre-normalization box plot
│   ├── PostNormBox.png             # Post-normalization box plot
│   ├── RNAdeg.png                  # RNA degradation analysis
│   ├── QCStats.png                 # Quality control statistics
│   ├── Dendrogram.png              # Hierarchical clustering
│   ├── PCA_Plot.png                # Principal component analysis
│   └── VolcanoDEG.png              # Differential expression volcano plot
├── output_files/                   # Analysis results
│   ├── topTableResults.txt         # Full differential expression results
│   ├── FoldChangeTable.txt         # Fold change calculations
│   └── SortedCameraEnrichment.csv  # Pathway enrichment results
└── renv/                           # Package version management

Getting Started

Prerequisites

  • R (version 4.0.x recommended for full compatibility)
  • RStudio (optional but recommended)

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/affymetrix-microarray-data-analysis.git
cd affymetrix-microarray-data-analysis
  1. Restore the R environment:
# Install renv if not already installed
install.packages("renv")

# Restore project dependencies
renv::restore()
  1. Open analysis.rmd in RStudio and run the analysis.

Note: The simpleaffy package was removed from Bioconductor 3.13. For full QC functionality, use R 4.0.x with an earlier Bioconductor version. The R Installation Manager (rig) can help manage multiple R versions.


Results

Quality Control Findings

Pre-Normalization Density Plot

Ideal Affymetrix samples show similar log intensity peaks. Two samples (DARCR1, LDRR1) exhibit significantly different peak heights, indicating potential quality issues.

Pre-Normalization Box Plot

Expression distributions reveal that DARCR1 and LDRR1 have similar means despite belonging to different experimental groups—an unexpected pattern suggesting sample quality problems.

Post-Normalization Box Plot

After RMA normalization, samples show comparable medians, enabling unbiased differential expression analysis.


Exploratory Analysis

Principal Component Analysis

PCA reveals that DARCR1 and LDRR1 cluster together rather than with their respective groups, confirming within-group variance exceeds between-group variance for these samples.

Hierarchical Clustering

Hierarchical clustering confirms the anomalous behavior of DARCR1 and LDRR1, prompting deeper QC investigation.


RNA Degradation Analysis

RNA Degradation Plot

All samples show RNA degradation (steep slopes), but DARCR1 and LDRR1 exhibit more severe degradation patterns.

QC Statistics

Present call percentages are low across all samples, with DARCR1 and LDRR1 showing substantially lower values than their replicate group members.

Conclusion: Heavy RNA degradation in DARCR1 and LDRR1 resulted in similar transcriptional profiles despite different experimental conditions. This finding was not reported in the original publication and may explain partial irreproducibility of results.


Differential Expression Results

Volcano Plot

Top Differentially Expressed Genes

Gene Log2 Fold Change Adjusted P-value Function
Mmp3 5.61 0.0038 Matrix metalloprotease
Serpina3n 4.47 0.0029 Serine protease inhibitor
Serpinb1a 4.23 0.0028 Serine protease inhibitor
Ms4a4c 2.99 0.0028 Membrane-spanning protein

Validation: 3 of 4 genes reported in the original study (Mmp3, Serpina3n, Serpinb1a) were successfully reproduced with adjusted p-value < 0.05.


Pathway Enrichment (CAMERA)

Pathway P-value FDR Direction
IL6_JAK_STAT3_SIGNALING 6.44e-18 3.22e-16 Up
INTERFERON_GAMMA_RESPONSE 1.70e-16 3.22e-15 Up
INTERFERON_ALPHA_RESPONSE 1.93e-16 3.22e-15 Up
TNFA_SIGNALING_VIA_NFKB 8.45e-08 7.04e-07 Up
INFLAMMATORY_RESPONSE 7.35e-05 4.09e-04 Up

Biological Interpretation: Strong enrichment of interferon signaling and inflammatory pathways aligns with expected immune responses to light-induced retinal damage.


Experimental Design

Sample Group Description
DARCR1, DARCR2, DARCR3 Control Dark ambient conditions (n=3)
LDRR1, LDRR2, LDRR3 Treatment Light damage + retinal detachment (n=3)

Platform: Affymetrix Mouse430-2 GeneChip Data Source: GEO Series GSE13074


Key Takeaways

  1. Quality control is essential: Comprehensive QC revealed sample quality issues not reported in the original publication
  2. RNA degradation affects reproducibility: Degraded samples from different groups clustered together, confounding biological signal
  3. Results partially validated: 3 of 4 key genes from the original study were reproduced despite sample quality issues
  4. Pathway analysis confirms biology: Enriched inflammatory and interferon pathways are consistent with light damage response

References

  • Rattner, A., & Bhanu, N. (2008). The genomic response of the retinal pigment epithelium to light damage and retinal detachment. Journal of Neuroscience, 28(39), 9880-9889. DOI: 10.1523/JNEUROSCI.2401-08.2008

License

This project is for educational and research purposes.

About

Part of Functional Genomics Technology Course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages