This repository contains the code required to reproduce the end-to-end analyses of HTG EdgeSeq (bulk RNA profiling) and GeoMx DSP (spatial transcriptomics) data from the publication:
Oterino-Sogo, S. & Naji, F. et al. Spatial and bulk transcriptomic profiling defines the molecular evolution of cutaneous squamous cell carcinoma and reveals stage-specific biomarkers of clinical relevance. bioRxiv (2026). https://doi.org/10.64898/2026.04.30.721943
The analyses rely on processed expression matrices derived from the following GEO datasets:
The repository contains two main directories:
scripts/- Differential expression, clustering, and visualization analyses
gprofiler2_ORSUM/- Pathway enrichment and ORSUM summarization workflows
This section describes how to reproduce:
- Differential expression analyses (LRT and Wald tests by stage; DESeq2-based)
- LRT clustering analysis using DEGreport
- We provide a link to download the input files from our local storage to agilize reproducibility. You may also download the processed datasets from GEO (see Data Availability).
- Place all required input files in the directory specified by
path_to_input. - Download the file:
Heatmaps_gene_list.xlsxand place it in the same input directory.
git clone https://github.com/cbib/cSCC_continuum_analyses
cd cSCC_continuum_analyses
This folder contains the processed count matrices and Heatmaps_gene_list.xlsx file.
wget --no-check-certificate -r -np -nH --cut-dirs=1 -R "index.html*" http://services.cbib.u-bordeaux.fr/cSCC_gene_tables/data/
conda create --file geomx.yml
conda activate geomx_env
Edit the following scripts:
GeoMx_lrt_reproduce.RHTGseq_lrt_reproduce.R
Update the variables:
- path_to_input
- path_to_output
Rscript GeoMx_lrt_reproduce.R
Rscript HTGseq_lrt_reproduce.R
All results will be written to the directory specified in
path_to_output with the following structure:
DE_genes_by_stage/- DESeq2 pairwise comparisons for HTG-seq, GeoMx Macrophages, GeoMx PanCK.
Heatmap_plots/- Heatmaps summarizing pathway enrichment results obtained using manual curation of gprofiler2 and ORSUM output.
Single_gene_plots/- Strip plots for candidate stage-specific cSCC biomarkers identified in the study.
Additionally, the following files are saved directly in path_to_output:
.pngLikelihood Ratio Test (LRT) plots.csvfiles containing gene groups
If you wish to reproduce gprofiler2 pathway enrichment and ORSUM summarization, follow the steps below.
conda create --file gprofiler2_orsum.yml
conda activate gprof_orsum_env
The script gprofiler2_ORSUM/gprof_orsum.sh expects a .tsv file as input and calls gprofiler2_ORSUM/gprofiler_standalone.R
Notes:
- LRT and DEGreport outputs are saved as
.csv. They need to be converted to.tsv - For GeoMx PanCK results, only a subset of gene groups was used for pathway analysis.
- ORSUM requires
.gmtfiles, which can be downloaded here. - Before running
gprof_orsum.shrename file prefixes by replacing:with_(example:GO:BP→GO_BP)
cd gprofiler2_ORSUM/
sbatch gprof_orsum.sh -i Publication_HTG_Edgeseq_gene_groups.tsv -o /results -gmt /gprofiler/gmt/hsapiens -os '500' -org 'hsapiens'
sbatch gprof_orsum.sh -i Publication_PanCK_gene_groups.tsv -o /results -gmt /gprofiler/gmt/hsapiens -os '500' -org 'hsapiens'
sbatch gprof_orsum.sh -i Publication_Macrophages_gene_groups.tsv -o /results -gmt /gprofiler/gmt/hsapiens -os '500' -org 'hsapiens'
Parameters:
-i: input gene group file-o: output directory-gmt: path to GMT files-os: ORSUM size threshold-org: organism (e.g.,hsapiens)
The code in this repository was developed by Sergio Oterino Sogo.
LinkedIn: Sergio Oterino Sogo, PhD.
For reproducibility issues, please open a GitHub issue.