Skip to content

Commit ba107d0

Browse files
authored
Update README.md
1 parent 7e5d775 commit ba107d0

1 file changed

Lines changed: 168 additions & 90 deletions

File tree

README.md

Lines changed: 168 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -1,114 +1,192 @@
1-
# GEMCAT: Gene Expression-based Metabolite Centrality Analyses Tool
2-
A computational toolbox associated with the manuscript entitled _GEMCAT — A new algorithm for gene expression-based prediction of metabolic alterations_.
3-
Cite using: https://doi.org/10.1093/nargab/lqaf003
1+
# GEMCAT: Gene Expression-based Metabolite Centrality Analyses Tool
42

5-
Note: We are still refining the tool. Particularly, GEMCAT does not yet provide guidance for significance of predicted changes or any other measure of prediction quality. We suggest filtering the predictions for consistency. We do not recommend pre-filtering the transcriptomics and proteomics data based on significance as this is affecting the network coverage which might negatively impact the prediction quality as genes/proteins not present in the dataset should be unchanged.
3+
GEMCAT is a computational toolbox designed to predict metabolic alterations based on gene expression data. It's the
4+
accompanying software for our manuscript, "_GEMCAT — A new algorithm for gene expression-based prediction of metabolic alterations_."
65

7-
## Compatibility
8-
We tested the package for compatibility with Python >= 3.10 on Ubuntu and Windows.
9-
10-
## Installation
11-
Install from pip:
12-
13-
```pip install gemcat```
14-
15-
Or clone the repository and install GEMCAT from there using:
6+
## Quick links:
7+
* **How to Cite:** [https://doi.org/10.1093/nargab/lqaf003](https://doi.org/10.1093/nargab/lqaf003)
8+
* **PyPI:** [https://pypi.org/project/gemcat/](https://pypi.org/project/gemcat/)
9+
* **Source Code (GitHub):** [https://github.com/MolecularBioinformatics/GEMCAT](https://github.com/MolecularBioinformatics/GEMCAT)
1610

17-
```pip install .```
11+
## Important Considerations
1812

13+
* **Prediction Quality:** GEMCAT is still under refinement. It doesn't yet provide guidance on the
14+
statistical significance of predicted changes or any other measure of prediction quality. We
15+
recommend **filtering predictions for consistency** based on your domain knowledge.
16+
* **Data Pre-filtering:** We **don't recommend pre-filtering transcriptomics and proteomics data based
17+
on significance**. This can negatively impact network coverage, as genes/proteins not present in the
18+
filtered dataset are implicitly considered "unchanged" by GEMCAT.
19+
* **Graphical User Interface (GUI):** We are actively developing a user-friendly GUI for GEMCAT,
20+
which will be released soon. Stay tuned for updates on our GitHub repository and PyPI page! A **development version**
21+
of the GUI is currently hosted in a private repository; if you're interested in gaining early access,
22+
please contact **suraj.sharma@uib.no**.
1923

20-
## Usage
24+
---
2125

22-
### Standard workflow from the Command-Line Interface (CLI)
26+
## Compatibility
27+
GEMCAT has been tested and is compatible with **Python >= 3.10** on Ubuntu and Windows operating systems.
2328

24-
Use a single file containing per-gene fold-changes to calculate the resulting differential centralities:
25-
``` gemcat <./expression_file.csv> <./model_file.xml> -e <column_name> -o <result_file.csv>```
26-
Make sure the .csv file is either comma- or tab-delimited.
27-
`column_name` is the name of the column in the file containing the fold-change.
29+
## Installation
30+
You can install GEMCAT in two ways:
2831

29-
Alternatively, use two files (or one file) with expression values for condition and baseline:
30-
``` gemcat <./condition_file.csv> <./model_file.xml> -e <condition_column_name> -b <./baseline_file> -c <baseline_column_name> -o <result_file.csv>```
32+
1. **Using pip (recommended):**
33+
```bash
34+
pip install gemcat
35+
```
36+
2. **From source (for developers or specific versions):**
37+
First, clone the repository, then install:
38+
```bash
39+
git clone https://github.com/MolecularBioinformatics/GEMCAT.git
40+
cd gemcat
41+
pip install .
42+
```
43+
---
3144

32-
If you do not have a model file ready, some models can be automatically accessed using their names:
33-
``` gemcat ./expression_file.csv <model_name> -e column_name -o <result_file.csv>```
45+
## How to Use GEMCAT
3446

35-
Model names currently supported are:
36-
- ```recon3d```: [Recon3D](http://bigg.ucsd.edu/models/Recon3D)
37-
- ```ratgem```: [Rat-GEM](https://github.com/SysBioChalmers/Rat-GEM)
47+
GEMCAT offers both a Python API for flexible, programmatic access and a command-line interface (CLI) for straightforward, scriptable use.
3848

49+
### Python Workflow with CobraPy
3950

40-
Currently, GEMCAT supports models in SBML, JSON, and MAT formats.
51+
For more control and integration into existing Python projects, use the `workflow_standard` function:
4152

42-
Important points to remember:
43-
Your gene or protein identifiers should be the first column of the expression file.
44-
Make sure the gene or protein identifiers in your expression data file exactly match those in the model.
45-
A results list of all 1.0 is a sure sign of no identifier matching.
53+
```python
54+
import gemcat as gc
55+
import cobra # Assuming cobrapy is installed for model handling
56+
import pandas as pd # For pd.Series
4657
47-
Positional arguments:
48-
- expression file path
49-
- model file path
58+
# Example usage (replace with your actual data and model)
59+
# Make sure your mapped_genes_baseline and mapped_genes_comparison are pandas Series
60+
# with gene/protein identifiers as the index.
5061
51-
All parameters:
52-
`-e --expressioncolumn` name of column containing condition expression data
53-
`-b BASELINE, --baseline` file containing baseline expression data
54-
`-c BASELINECOLUMN, --baselinecolumn` name of column containing baseline expression data
55-
`-v VERBOSE, --verbose` enables verbose output
56-
`-o OUTFILE, --outfile` write output to this file
57-
`-l LOGFILE, --logfile` write logs to this file
62+
# Example: Load a CobraPy model
63+
# model = cobra.io.read_sbml_model("your_model.xml")
5864
65+
# Example: Create dummy mapped gene series
66+
# mapped_genes_baseline = pd.Series([10, 20, 30], index=['geneA', 'geneB', 'geneC'])
67+
# mapped_genes_comparison = pd.Series([15, 25, 35], index=['geneA', 'geneB', 'geneC'])
5968
60-
### Standard workflow in Python using a CobraPy model
61-
```
62-
import gemcat as gc
6369
results = gc.workflows.workflow_standard(
64-
cobra_model: cobra.Model,
65-
mapped_genes_baseline: pd.Series,
66-
mapped_genes_comparison: pd.Series,
67-
adjacency = gc.adjacency_transformation.ATPureAdjacency,
68-
ranking = gc.ranking.PagerankNX,
69-
gene_fill = 1.0
70+
cobra_model=your_cobra_model, # Your loaded cobra.Model object
71+
mapped_genes_baseline=your_baseline_series, # pd.Series of baseline expression
72+
mapped_genes_comparison=your_comparison_series, # pd.Series of comparison expression
73+
adjacency=gc.adjacency_transformation.ATPureAdjacency, # Optional: Customize adjacency method
74+
ranking=gc.ranking.PagerankNX, # Optional: Customize ranking algorithm
75+
gene_fill=1.0 # Value to fill for genes not present in mapped_genes_comparison
7076
)
71-
```
72-
This will return the changes in centrality relative to the baseline in a Pandas Series.
73-
When using fold-changes as the mapped expression, use a vector of all ones as a comparison.
74-
75-
## Modularity and Configuration
76-
GEMCAT is modular, and its central components can easily be swapped out or appended by other components
77-
adhering to the specifications laid out in the module base classes (primarily adjacency transformation, expression integration, and ranking components).
78-
All classes inheriting from the abstract base classes laid out in the modules are exchangeable.
79-
80-
## Core modules
81-
### Model
82-
The core of the package is the GEMCAT model structure that contains the model data, integrates the workflow, and calculates the results.
83-
### adjacency_transformation
84-
Different approaches can be used to calculate adjacency in the networks.
85-
We offer alternatives and a platform to create custom algorithms for the model.
86-
### expression
87-
Module covering the mapping of gene values onto reactions in the model via gene product rules.
88-
Providing different algorithms along with a platform to create alternatives.
89-
### ranking
90-
Module providing ranking algorithms for the models along with a platform to include custom algorithms.
91-
### workflows
92-
The workflow module contains example workflows.
93-
To customize the workflow to your needs simply copy the provided functions and switch out the desired steps.
94-
### cli
95-
Command-line interface for GEMCAT.
96-
### io
97-
Input and output functions that create GEMCAT models from different sources.
98-
### utils
99-
Contains common utility functions used throughout the package.
100-
### verification
101-
Functions to verify data integrity.
102-
### model_manager
103-
Functionality for automatic downloading, storing, and retrieving of common models.
10477
78+
print(results)
79+
```
80+
This function returns the changes in centrality relative to the baseline as a Pandas Series. If you're
81+
using fold-changes as your mapped_genes_comparison, you should provide a vector of all 1.0s for mapped_genes_baseline.
82+
83+
For further examples of using genome-scale metabolic networks from two different organisms refer:
84+
[An engineered human cell line with a functional deletion of the mitochondrial NAD transporter](https://github.com/MolecularBioinformatics/prm_manuscript/blob/main/jupyter_notebooks/pr_SLC25A51ko.ipynb),
85+
[Patients with inflammatory bowel disease](https://github.com/MolecularBioinformatics/prm_manuscript/blob/main/jupyter_notebooks/pr_UC.ipynb),
86+
[Training-induced metabolic changes in rats](https://github.com/MolecularBioinformatics/prm_manuscript/blob/main/jupyter_notebooks/pr_rats.ipynb),
87+
88+
### Command-Line Interface (CLI)
89+
90+
The CLI allows you to calculate differential centralities using gene expression data.
91+
92+
**Key Requirements for Input Files:**
93+
94+
* Your gene or protein identifiers **must be in the first column** of your expression file.
95+
* These identifiers **must exactly match** those in your metabolic model. If you see a results list of all 1.0, it's
96+
a strong indicator of an identifier mismatch.
97+
* Expression `.csv` files can be either comma- or tab-delimited.
98+
99+
**Common Workflows:**
100+
101+
1. **Using a single file with pre-calculated fold-changes:**
102+
```bash
103+
gemcat <expression_file.csv> <model_file.xml> -e <column_name> -o <result_file.csv>
104+
```
105+
* `<expression_file.csv>`: Path to your input file.
106+
* `<model_file.xml>`: Path to your metabolic model file (SBML, JSON, or MAT format).
107+
* `<column_name>`: The name of the column in your CSV containing the fold-change values.
108+
* `<result_file.csv>`: The desired output file path.
109+
110+
2. **Using two files (or one) with condition and baseline expression values:**
111+
```bash
112+
gemcat <condition_file.csv> <model_file.xml> -e <condition_column_name> -b <baseline_file.csv> -c <baseline_column_name> -o <result_file.csv>
113+
```
114+
* `<condition_file.csv>`: Path to the file with expression values for your experimental condition.
115+
* `<baseline_file.csv>`: Path to the file with baseline expression values. If this is the same as the condition file, you can omit the `-b` flag and just use `<condition_file.csv>` as the second positional argument.
116+
* `<condition_column_name>`: Name of the column with condition expression data.
117+
* `<baseline_column_name>`: Name of the column with baseline expression data.
118+
119+
3. **Using built-in models:**
120+
If you don't have a model file, GEMCAT can automatically access some common models by name:
121+
```bash
122+
gemcat <expression_file.csv> <model_name> -e <column_name> -o <result_file.csv>
123+
```
124+
Currently supported model names:
125+
* `recon3d`: [Recon3D](http://bigg.ucsd.edu/models/Recon3D)
126+
* `ratgem`: [Rat-GEM](https://github.com/SysBioChalmers/Rat-GEM)
127+
128+
**All CLI Parameters:**
129+
130+
* **Positional Arguments:**
131+
* `expression_file_path`: Path to your expression data file.
132+
* `model_file_path`: Path to your metabolic model file (or model name).
133+
* **Optional Arguments:**
134+
* `-e --expressioncolumn`: Name of the column containing condition expression data (required for expression files).
135+
* `-b BASELINE, --baseline`: Path to the file containing baseline expression data.
136+
* `-c BASELINECOLUMN, --baselinecolumn`: Name of the column containing baseline expression data.
137+
* `-o OUTFILE, --outfile`: Path to write the output results.
138+
* `-v VERBOSE, --verbose`: Enables verbose output for detailed execution information.
139+
* `-l LOGFILE, --logfile`: Path to write logs.
140+
141+
---
142+
143+
## Modularity and Customization
144+
145+
GEMCAT is designed with a modular architecture, allowing you to easily swap out or append central components
146+
to customize its behavior. This is achieved by adhering to specifications laid out in the module base classes, particularly for:
147+
148+
* **Adjacency Transformation:** Defines how network adjacencies are calculated.
149+
* **Expression Integration:** Handles mapping gene expression values onto reactions.
150+
* **Ranking Components:** Implements different centrality ranking algorithms.
151+
152+
Any class inheriting from the abstract base classes in these modules can be exchanged.
153+
154+
---
155+
156+
## Core Modules Overview
157+
158+
* **`model`**: The central GEMCAT model structure, responsible for integrating workflows and calculating results.
159+
* **`adjacency_transformation`**: Provides various approaches for calculating network adjacency and a platform for custom algorithms.
160+
* **`expression`**: Manages the mapping of gene values onto reactions in the model via gene product rules, offering different algorithms along with a platform to create alternatives.
161+
* **`ranking`**: Offers various ranking algorithms for the models along with a platform to include custom algorithms.
162+
* **`workflows`**: Contains example workflows. To customize the workflow to your needs simply copy the provided functions and switch out the desired steps.
163+
* **`cli`**: Command-line interface for GEMCAT.
164+
* **`io`**: Input and output functions that create GEMCAT models from different sources.
165+
* **`utils`**: Contains common utility functions used throughout the package.
166+
* **`verification`**: Functions to verify data integrity.
167+
* **`model_manager`**: Functionality for automatic downloading, storing, and retrieving of common models.
168+
169+
---
105170
106171
## Development
107-
You can run all local tests with `pytest .`. Default behavior is to also run integration tests, which takes time.
108-
You can exclude slow running tests by using `pytest . -m "not slow"`.
109-
These slow running tests are integration tests with _real world data_ and will take 10-30s each according to your hardware.
110172
111-
To run tests, make sure you have [git lfs](https://git-lfs.com/) installed and all the Tests are running.
112-
Make sure to run `isort` and `black` to have properly formatted code.
173+
If you're contributing to GEMCAT:
174+
175+
* **Running Tests:**
176+
* Run all local tests with `pytest .`.
177+
* You can exclude slow-running tests by using `pytest . -m "not slow"`. These slow-running tests are
178+
integration tests with *real-world data* and will take 10-30 seconds each depending on your hardware.
179+
* **Prerequisites:** Ensure you have [git lfs](https://git-lfs.com/) installed for tests that rely on large files.
180+
* **Code Formatting:** Before committing, make sure your code is properly formatted using `isort` and `black`.
181+
* **CI Pipeline:** The GitHub CI pipeline automatically checks for `isort`, `black`, and `pytest` compliance.
182+
183+
---
184+
185+
## Contact and Support
186+
187+
For questions, bug reports, or support, please open an issue on the
188+
[GitHub Issues page](https://github.com/MolecularBioinformatics/GEMCAT/issues). We will do our best to respond promptly.
189+
190+
For direct inquiries about the **development version of the GEMCAT GUI** or other specific questions, you can also contact:
113191

114-
The CI pipeline in GitHub will check with isort, black, and pytest.
192+
* **Suraj Sharma:** suraj.sharma@uib.no

0 commit comments

Comments
 (0)