Skip to content

Commit 4335d1b

Browse files
authored
Merge branch 'master' into vitkl-obs-exclusion
2 parents fb6715f + ebd9248 commit 4335d1b

52 files changed

Lines changed: 5485 additions & 7571 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/ISSUE_TEMPLATE/config.yml

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
blank_issues_enabled: false
22
contact_links:
3-
- name: Cell2location Community Discussions
4-
url: https://github.com/BayraktarLab/cell2location/discussions
5-
about: Ask how to solve your problem using cell2location.
3+
- name: scverse Discorse
4+
url: https://discourse.scverse.org/c/ecosytem/cell2location/
5+
about: Ask usage questions, how to solve your problems using cell2location and other scvi-tools packages.
6+
7+
- name: cell2location Community Discussions [deprecated]
8+
url: https://discourse.scverse.org/c/ecosytem/cell2location/
9+
about: Find previous answers/issues. For new questions please use the link above.

.github/ISSUE_TEMPLATE/question.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
11
---
22
name: Usage Question
3-
about: Ask how to solve your problem using cell2location.
3+
about: Template for posting a question to scverse Discourse.
44
title: ''
55
labels: question
66
assignees: ''
77
---
88

9+
## Please use the template below to post a question to https://discourse.scverse.org/c/ecosytem/cell2location/.
10+
911
### Problem
1012

1113
<!-- Please describe your problem below: -->
@@ -14,7 +16,7 @@ assignees: ''
1416
- [ ] I follow the instructions from the [cell2location tutorial (using on scvi-tools)](https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_tutorial.html).
1517
- [ ] I have adjusted required hyperparameters to my dataset and tissue `N_cells_per_location` and `detection_alpha`.
1618
- [ ] I have provided 10X reaction/inlet as `batch_key` for reference NB regression.
17-
- [ ] I have checked [Cell2location Community Forum](https://github.com/BayraktarLab/cell2location/discussions), [scvi-tools forum](https://discourse.scvi-tools.org/) and did not find a solution
19+
- [ ] I have checked [scverse Discourse](https://discourse.scverse.org/c/ecosytem/cell2location/) and [old Cell2location Community Forum](https://github.com/BayraktarLab/cell2location/discussions), and did not find a solution.
1820

1921

2022
### Description of the data input and hyperparameters

README.md

Lines changed: 143 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ If you use cell2location please cite our paper:
1515
Kleshchevnikov, V., Shmatko, A., Dann, E. et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol (2022). https://doi.org/10.1038/s41587-021-01139-4
1616
https://www.nature.com/articles/s41587-021-01139-4
1717

18+
Please note that cell2locations requires 2 user-provided hyperparameters (N_cells_per_location and detection_alpha) - for detailed guidance on setting these hyperparameters and their impact see [the flow diagram and the note](https://github.com/BayraktarLab/cell2location/blob/master/docs/images/Note_on_selecting_hyperparameters.pdf). Many real datasets (especially human) show within-slide variability in RNA detection sensitivity - requiring you to try both recommended settings of the `detection_alpha` parameter: `detection_alpha=200` for low within-slide technical variability and `detection_alpha=20` for high within-slide technical variability.
19+
1820
Cell2location is a principled Bayesian model that can resolve fine-grained cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues. Cell2location accounts for technical sources of variation and borrows statistical strength across locations, thereby enabling the integration of single cell and spatial transcriptomics with higher sensitivity and resolution than existing tools. This is achieved by estimating which combination of cell types in which cell abundance could have given the mRNA counts in the spatial data, while modelling technical effects (platform/technology effect, contaminating RNA, unexplained variance).
1921

2022
<p align="center">
@@ -24,11 +26,9 @@ Overview of the spatial mapping approach and the workflow enabled by cell2locati
2426

2527
## Usage and Tutorials
2628

27-
The tutorial covering the estimation of expresson signatures of reference cell types, spatial mapping with cell2location and the downstream analysis can be found here: https://cell2location.readthedocs.io/en/latest/
28-
29-
You can also try cell2location on [Google Colab](https://colab.research.google.com/github/BayraktarLab/cell2location/blob/master/docs/notebooks/cell2location_tutorial.ipynb) on a smaller data subset containing somatosensory cortex.
29+
The tutorial covering the estimation of expresson signatures of reference cell types, spatial mapping with cell2location and the downstream analysis can be found here and tried on [Google Colab](https://colab.research.google.com/github/BayraktarLab/cell2location/blob/master/docs/notebooks/cell2location_tutorial.ipynb): https://cell2location.readthedocs.io/en/latest/
3030

31-
Please report bugs via https://github.com/BayraktarLab/cell2location/issues and ask any usage questions in https://github.com/BayraktarLab/cell2location/discussions.
31+
Please report bugs via https://github.com/BayraktarLab/cell2location/issues and ask any usage questions about [cell2location](https://discourse.scverse.org/c/ecosytem/cell2location/42), [scvi-tools](https://discourse.scverse.org/c/help/scvi-tools/7) or [Visium data](https://discourse.scverse.org/c/general/visium/32) in scverse community discourse.
3232

3333
Cell2location package is implemented in a general way (using https://pyro.ai/ and https://scvi-tools.org/) to support multiple related models - both for spatial mapping, estimating reference cell type signatures and downstream analysis.
3434

@@ -61,10 +61,10 @@ bash Miniconda3-latest-Linux-x86_64.sh
6161
# use prefix /path/to/software/miniconda3
6262
```
6363

64-
Before installing cell2location and it's dependencies, it could be necessary to make sure that you are creating a fully isolated conda environment by telling python to NOT use user site for installing packages, ideally by adding this line to your `~/.bashrc` file , but this would also work during a terminal session:
64+
Before installing cell2location and it's dependencies, it could be necessary to make sure that you are creating a fully isolated conda environment by telling python to NOT use user site for installing packages by running this line before creating conda environment and every time before activatin conda environment in a new terminal session:
6565

6666
```bash
67-
export PYTHONNOUSERSITE="someletters"
67+
export PYTHONNOUSERSITE="literallyanyletters"
6868
```
6969

7070

@@ -79,12 +79,147 @@ Cell2location architecture is designed to simplify extended versions of the mode
7979
We thank all paper authors for their contributions:
8080
Vitalii Kleshchevnikov, Artem Shmatko, Emma Dann, Alexander Aivazidis, Hamish W King, Tong Li, Artem Lomakin, Veronika Kedlian, Mika Sarkin Jain, Jun Sung Park, Lauma Ramona, Liz Tuck, Anna Arutyunyan, Roser Vento-Tormo, Moritz Gerstung, Louisa James, Oliver Stegle, Omer Ali Bayraktar
8181

82-
We also thank Krzysztof Polanski, Luz Garcia Alonso, Carlos Talavera-Lopez, Ni Huang for feedback on the package, Martin Prete for dockerising cell2location and other software support.
82+
We also thank Pyro developers (Fritz Obermeyer, Martin Jankowiak), Krzysztof Polanski, Luz Garcia Alonso, Carlos Talavera-Lopez, Ni Huang for feedback on the package, Martin Prete for dockerising cell2location and other software support.
8383

8484
## FAQ
8585

8686
See https://github.com/BayraktarLab/cell2location/discussions
8787

8888
## Future development and experimental features
89+
Future developments of cell2location are focused on 1) scalability to 100k-mln+ locations using amortised inference of cell abundance (same ideas as used in VAE), 2) extending cell2location to related spatial analysis tasks that require modification of the model (such as using cell type hierarchy information), and 3) incorporating features presented by more recently proposed methods (such as CAR spatial proximity modelling). We are also experimenting with Numpyro and JAX (https://github.com/vitkl/cell2location_numpyro).
90+
91+
## Tips
92+
93+
### Conda environment for A100 GPUs
94+
95+
```bash
96+
export PYTHONNOUSERSITE="literallyanyletters"
97+
conda create -y -n test_scvi16_cuda113 python=3.9
98+
conda activate test_scvi16_cuda113
99+
conda install -y -c anaconda hdf5 pytables git
100+
pip install scvi-tools
101+
pip install git+https://github.com/BayraktarLab/cell2location.git#egg=cell2location[tutorials]
102+
pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 -f https://download.pytorch.org/whl/torch_stable.html
103+
conda activate test_scvi16_cuda113
104+
python -m ipykernel install --user --name=test_scvi16_cuda113 --display-name='Environment (test_scvi16_cuda113)'
105+
```
106+
107+
### Issues with package version mismatches often originate from python user site rather than conda environment being used to install a subset of packages
108+
109+
Before installing cell2location and it's dependencies, it could be necessary to make sure that you are creating a fully isolated conda environment by telling python to NOT use user site for installing packages by running this line before creating conda environment and every time before activatin conda environment in a new terminal session:
89110

90-
We also provide an experimental numpyro translation of the model which has improved memory efficiency (allowing analysis of multiple Visium samples on Google Colab) and minor improvements in speed - https://github.com/vitkl/cell2location_numpyro. You can try it on Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vitkl/cell2location_numpyro/blob/main/docs/notebooks/cell2location_short_demo_colab.ipynb) - however note that both numpyro itself and cell2location_numpyro are in very active development. Numpyro+JAX are being introduces into scvi-tools so follow updates on that.
111+
```bash
112+
export PYTHONNOUSERSITE="literallyanyletters"
113+
```
114+
115+
### Useful code for reading and combining multiple Visium sections
116+
117+
Keeping info on distinct sections in a csv file (Google Sheet).
118+
119+
```python
120+
sample_annot = pd.read_csv('./sample_annot.csv')
121+
122+
from glob import glob
123+
sample_annot['path'] = pd.Series(
124+
glob(f'{sp_data_folder}*'),
125+
index=[sub('^.+WTSI_', '', sub('_GRCh38-2020-A$', '', i)) for i in glob(f'{sp_data_folder}*')]
126+
)[sample_annot['Sample_ID']].values
127+
import os
128+
sample_annot['file'] = [os.path.basename(i) for i in sample_annot['path']]
129+
130+
sample_annot['Sample_ID'].unique()
131+
```
132+
133+
Reading and concatenating samples.
134+
135+
```python
136+
def read_and_qc(sample_name, file, path=sp_data_folder):
137+
"""
138+
Read one Visium file and add minimum metadata and QC metrics to adata.obs
139+
NOTE: var_names is ENSEMBL ID as it should be, you can always plot with sc.pl.scatter(gene_symbols='SYMBOL')
140+
"""
141+
142+
adata = sc.read_visium(path + str(file) +'/',
143+
count_file='filtered_feature_bc_matrix.h5',
144+
load_images=True)
145+
adata.obs['sample'] = sample_name
146+
adata.var['SYMBOL'] = adata.var_names
147+
adata.var.rename(columns={'gene_ids': 'ENSEMBL'}, inplace=True)
148+
adata.var_names = adata.var['ENSEMBL']
149+
adata.var.drop(columns='ENSEMBL', inplace=True)
150+
151+
# just in case there are non-unique ENSEMBL IDs
152+
adata.var_names_make_unique()
153+
154+
# Calculate QC metrics
155+
sc.pp.calculate_qc_metrics(adata, inplace=True)
156+
adata.var['mt'] = [gene.startswith('mt-') for gene in adata.var['SYMBOL']]
157+
adata.obs['mt_frac'] = adata[:, adata.var['mt'].tolist()].X.sum(1).A.squeeze()/adata.obs['total_counts']
158+
159+
# add sample name to obs names
160+
adata.obs["sample"] = [str(i) for i in adata.obs['sample']]
161+
adata.obs_names = 's' + adata.obs["sample"] \
162+
+ '_' + adata.obs_names
163+
adata.obs.index.name = 'spot_id'
164+
165+
file = list(adata.uns['spatial'].keys())[0]
166+
adata.uns['spatial'][sample_name] = adata.uns['spatial'][file].copy()
167+
del adata.uns['spatial'][file]
168+
print(adata.uns['spatial'].keys())
169+
170+
return adata
171+
172+
def read_all_and_qc(
173+
sample_annot, Sample_ID_col, file_col, sp_data_folder,
174+
count_file='filtered_feature_bc_matrix.h5',
175+
):
176+
"""
177+
Read and concatenate all Visium files.
178+
"""
179+
# read first sample
180+
adata = read_and_qc(
181+
sample_annot[Sample_ID_col][0], sample_annot[file_col][0],
182+
path=sp_data_folder
183+
)
184+
185+
# read the remaining samples
186+
slides = {}
187+
for i, s in enumerate(sample_annot[Sample_ID_col][1:]):
188+
adata_1 = read_and_qc(s, sample_annot[file_col][i], path=sp_data_folder)
189+
slides[str(s)] = adata_1
190+
191+
adata_0 = adata.copy()
192+
193+
# combine individual samples
194+
#adata = adata.concatenate(list(slides.values()), index_unique=None)
195+
adata = adata.concatenate(
196+
list(slides.values()),
197+
batch_key="sample",
198+
uns_merge="unique",
199+
batch_categories=sample_annot[Sample_ID_col],
200+
index_unique=None
201+
)
202+
203+
sample_annot.index = sample_annot[Sample_ID_col]
204+
for c in sample_annot.columns:
205+
sample_annot.loc[:, c] = sample_annot[c].astype(str)
206+
adata.obs[sample_annot.columns] = sample_annot.reindex(index=adata.obs['sample']).values
207+
208+
return adata
209+
210+
adata = read_all_and_qc(
211+
sample_annot=sample_annot,
212+
Sample_ID_col='Sample_ID',
213+
file_col='file',
214+
sp_data_folder=sp_data_folder,
215+
count_file='filtered_feature_bc_matrix.h5',
216+
)
217+
218+
adata_incl_nontissue = read_all_and_qc(
219+
sample_annot=sample_annot,
220+
Sample_ID_col='Sample_ID',
221+
file_col='file',
222+
sp_data_folder=sp_data_folder,
223+
count_file='raw_feature_bc_matrix.h5',
224+
)
225+
```

cell2location/__init__.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,11 @@
22
from pyro.distributions.transforms import SoftplusTransform
33
from torch.distributions import biject_to, transform_to
44

5-
from .run_c2l import run_cell2location
5+
from . import models
66
from .run_colocation import run_colocation
7-
from .run_regression import run_regression
87

98
__all__ = [
10-
"run_cell2location",
11-
"run_regression",
9+
"models",
1210
"run_colocation",
1311
]
1412

cell2location/cluster_averages/cluster_averages.py

Lines changed: 38 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ def compute_cluster_averages(adata, labels, use_raw=True, layer=None):
1414
labels
1515
Name of adata.obs column containing cluster labels
1616
use_raw
17-
Use raw slow in adata?
17+
Use raw slow in adata.
1818
layer
19-
use layer in adata? provide layer name
19+
Use layer in adata, provide layer name.
2020
2121
Returns
2222
-------
@@ -38,7 +38,7 @@ def compute_cluster_averages(adata, labels, use_raw=True, layer=None):
3838
var_names = adata.raw.var_names
3939

4040
if sum(adata.obs.columns == labels) != 1:
41-
raise ValueError("cluster_col is absent in adata_ref.obs or not unique")
41+
raise ValueError("`labels` is absent in adata_ref.obs or not unique")
4242

4343
all_clusters = np.unique(adata.obs[labels])
4444
averages_mat = np.zeros((1, x.shape[1]))
@@ -53,29 +53,52 @@ def compute_cluster_averages(adata, labels, use_raw=True, layer=None):
5353
return averages_df
5454

5555

56-
def get_cluster_variances(adata_ref, cluster_col):
56+
def get_cluster_variances(adata, labels, use_raw=True, layer=None):
5757
"""
58-
:param adata_ref: AnnData object of reference single-cell dataset
59-
:param cluster_col: Name of adata_ref.obs column containing cluster labels
60-
:returns: pd.DataFrame of within cluster variance of each gene
58+
Compute variance of each gene in each cluster
59+
60+
Parameters
61+
----------
62+
63+
labels
64+
Name of adata.obs column containing cluster labels
65+
use_raw
66+
Use raw slow in adata.
67+
layer
68+
Use layer in adata, provide layer name.
69+
70+
Returns
71+
-------
72+
pd.DataFrame of within cluster variance of each gene
6173
"""
62-
if not adata_ref.raw:
63-
raise ValueError("AnnData object has no raw data")
64-
if sum(adata_ref.obs.columns == cluster_col) != 1:
65-
raise ValueError("cluster_col is absent in adata_ref.obs or not unique")
74+
if layer is not None:
75+
x = adata.layers[layer]
76+
var_names = adata.var_names
77+
else:
78+
if not use_raw:
79+
x = adata.X
80+
var_names = adata.var_names
81+
else:
82+
if not adata.raw:
83+
raise ValueError("AnnData object has no raw data, change `use_raw=True, layer=None` or fix your object")
84+
x = adata.raw.X
85+
var_names = adata.raw.var_names
86+
87+
if sum(adata.obs.columns == labels) != 1:
88+
raise ValueError("`labels` is absent in adata_ref.obs or not unique")
6689

67-
all_clusters = np.unique(adata_ref.obs[cluster_col])
68-
var_mat = np.zeros((1, adata_ref.raw.X.shape[1]))
90+
all_clusters = np.unique(adata.obs[labels])
91+
var_mat = np.zeros((1, x.shape[1]))
6992

7093
for c in all_clusters:
71-
sparse_subset = csr_matrix(adata_ref.raw.X[np.isin(adata_ref.obs[cluster_col], c), :])
94+
sparse_subset = csr_matrix(x[np.isin(adata.obs[labels], c), :])
7295
c = sparse_subset.copy()
7396
c.data **= 2
7497
var = c.mean(0) - (np.array(sparse_subset.mean(0)) ** 2)
7598
del c
7699
var_mat = np.concatenate((var_mat, var))
77100
var_mat = var_mat[1:, :].T
78-
var_df = pd.DataFrame(data=var_mat, index=adata_ref.raw.var_names, columns=all_clusters)
101+
var_df = pd.DataFrame(data=var_mat, index=var_names, columns=all_clusters)
79102

80103
return var_df
81104

0 commit comments

Comments
 (0)