Skip to content

Commit 3f7a580

Browse files
authored
Merge branch 'master' into vitkl-obs-exclusion
2 parents 4335d1b + 18036ef commit 3f7a580

38 files changed

Lines changed: 1577 additions & 1491 deletions

.github/ISSUE_TEMPLATE/config.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,10 @@ contact_links:
33
- name: scverse Discorse
44
url: https://discourse.scverse.org/c/ecosytem/cell2location/
55
about: Ask usage questions, how to solve your problems using cell2location and other scvi-tools packages.
6+
7+
- name: Frequently asked questions
8+
url: https://github.com/BayraktarLab/cell2location/issues?q=is%3Aissue+is%3Aopen+label%3AFAQ
9+
about: Before asking a question please check this list (issue with FAQ tag).
610

711
- name: cell2location Community Discussions [deprecated]
812
url: https://discourse.scverse.org/c/ecosytem/cell2location/

.pre-commit-config.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
repos:
2-
- repo: https://github.com/python/black
3-
rev: '22.3.0'
2+
- repo: https://github.com/psf/black
3+
rev: '23.3.0'
44
hooks:
55
- id: black
6-
- repo: https://gitlab.com/pycqa/flake8
7-
rev: 3.8.4
6+
- repo: https://github.com/PyCQA/flake8
7+
rev: 6.0.0
88
hooks:
99
- id: flake8
1010
- repo: https://github.com/pycqa/isort
11-
rev: 5.7.0
11+
rev: 5.12.0
1212
hooks:
1313
- id: isort
1414
name: isort (python)

.readthedocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ conda:
1313
environment: docs/environment.yml
1414

1515
python:
16-
version: "3.7"
16+
version: "3.8"
1717
install:
1818
- method: pip
1919
path: .

README.md

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
[![Stars](https://img.shields.io/github/stars/BayraktarLab/cell2location?logo=GitHub&color=yellow)](https://github.com/BayraktarLab/cell2location/stargazers)
88
![Build Status](https://github.com/BayraktarLab/cell2location/actions/workflows/test.yml/badge.svg?event=push)
99
[![Documentation Status](https://readthedocs.org/projects/cell2location/badge/?version=latest)](https://cell2location.readthedocs.io/en/stable/?badge=latest)
10+
[![Downloads](https://pepy.tech/badge/cell2location)](https://pepy.tech/project/cell2location)
1011
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BayraktarLab/cell2location/blob/master/docs/notebooks/cell2location_tutorial.ipynb)
1112
[![Docker image on quay.io](https://img.shields.io/badge/container-quay.io/vitkl/cell2location-brightgreen "Docker image on quay.io")](https://quay.io/vitkl/cell2location)
1213

@@ -42,7 +43,7 @@ Create conda environment and install `cell2location` package
4243
conda create -y -n cell2loc_env python=3.9
4344

4445
conda activate cell2loc_env
45-
pip install git+https://github.com/BayraktarLab/cell2location.git#egg=cell2location[tutorials]
46+
pip install cell2location[tutorials]
4647
```
4748

4849
Finally, to use this environment in jupyter notebook, add jupyter kernel for this environment:
@@ -223,3 +224,51 @@ adata_incl_nontissue = read_all_and_qc(
223224
count_file='raw_feature_bc_matrix.h5',
224225
)
225226
```
227+
228+
Since Version 0.9.0 (released on 2023-04-11), the function `AnnData.concatenate()` has been deprecated in favour of `anndata.concat()` as per the official release notes ([Reference](https://anndata.readthedocs.io/en/latest/release-notes/index.html#id4)). Here is the updated code snippet of `read_all_and_qc`:
229+
230+
```python
231+
from anndata import concat
232+
233+
def read_all_and_qc(
234+
sample_annot, Sample_ID_col, file_col, sp_data_folder,
235+
count_file='filtered_feature_bc_matrix.h5',
236+
):
237+
"""
238+
Read and concatenate all Visium files.
239+
"""
240+
241+
# read all samples and store them in a list
242+
adatas = []
243+
for i, s in enumerate(sample_annot[Sample_ID_col]):
244+
adata_i = read_and_qc(s, Sample_ID_col[file_col][i], path=sp_data_folder)
245+
adatas.append(adata_i)
246+
# combine individual samples
247+
adata = concat(
248+
adatas,
249+
merge="unique",
250+
uns_merge="unique",
251+
label="batch",
252+
keys=sample_annot[Sample_ID_col].tolist(),
253+
index_unique=None
254+
)
255+
256+
sample_annot.index = sample_annot[Sample_ID_col]
257+
for c in sample_annot.columns:
258+
sample_annot.loc[:, c] = sample_annot[c].astype(str)
259+
adata.obs[sample_annot.columns] = sample_annot.reindex(index=adata.obs['sample']).values
260+
261+
return adata
262+
263+
adata = read_all_and_qc(
264+
sample_annot=sample_annot,
265+
Sample_ID_col='Sample_ID',
266+
file_col='file',
267+
sp_data_folder=sp_data_folder,
268+
count_file='filtered_feature_bc_matrix.h5',
269+
)
270+
271+
cell2location.models.Cell2location.setup_anndata(
272+
adata=adata_vis,
273+
batch_key="batch")
274+
```

cell2location/__init__.py

Lines changed: 38 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,51 @@
1+
import logging
2+
13
from pyro.distributions import constraints
24
from pyro.distributions.transforms import SoftplusTransform
5+
from rich.console import Console
6+
from rich.logging import RichHandler
37
from torch.distributions import biject_to, transform_to
48

59
from . import models
10+
from .cell_comm.around_target import compute_weighted_average_around_target
611
from .run_colocation import run_colocation
712

8-
__all__ = [
9-
"models",
10-
"run_colocation",
11-
]
13+
# https://github.com/python-poetry/poetry/pull/2366#issuecomment-652418094
14+
# https://github.com/python-poetry/poetry/issues/144#issuecomment-623927302
15+
try:
16+
import importlib.metadata as importlib_metadata
17+
except ModuleNotFoundError:
18+
import importlib_metadata
1219

1320

21+
# define custom distribution constraints
1422
@biject_to.register(constraints.positive)
1523
@transform_to.register(constraints.positive)
1624
def _transform_to_positive(constraint):
1725
return SoftplusTransform()
26+
27+
28+
package_name = "cell2location"
29+
__version__ = importlib_metadata.version(package_name)
30+
31+
logger = logging.getLogger(__name__)
32+
# set the logging level
33+
logger.setLevel(logging.INFO)
34+
35+
# nice logging outputs
36+
console = Console(force_terminal=True)
37+
if console.is_jupyter is True:
38+
console.is_jupyter = False
39+
ch = RichHandler(show_path=False, console=console, show_time=False)
40+
formatter = logging.Formatter("cell2location: %(message)s")
41+
ch.setFormatter(formatter)
42+
logger.addHandler(ch)
43+
44+
# this prevents double outputs
45+
logger.propagate = False
46+
47+
__all__ = [
48+
"models",
49+
"run_colocation",
50+
"compute_weighted_average_around_target",
51+
]

cell2location/cell_comm/__init__.py

Whitespace-only changes.
Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
import numpy as np
2+
import pandas as pd
3+
from scipy.sparse import csr_matrix
4+
5+
6+
def compute_weighted_average_around_target(
7+
adata,
8+
target_cell_type_quantile: float = 0.995,
9+
source_cell_type_quantile: float = 0.95,
10+
normalisation_quantile: float = 0.95,
11+
distance_bin: list = None,
12+
sample_key: str = "sample",
13+
genes_to_use_as_source: list = None,
14+
gene_symbols: str = None,
15+
obsm_spatial_key: str = "X_spatial",
16+
normalisation_key: str = None,
17+
layer: str = None,
18+
cell_abundance_key: str = "cell_abundance_w_sf",
19+
cell_abundance_quantile_key: str = "q05",
20+
):
21+
"""
22+
Compute average abundance of source cell types or genes around each target cell type.
23+
24+
Parameters
25+
----------
26+
adata
27+
AnnData object of spatial dataset with cell2location results
28+
target_cell_type_quantile
29+
Quantile of target cell type abundance to use for defining
30+
a set locations with highest abundance of target cell types.
31+
Cell abundance below this thereshold is set to 0.
32+
source_cell_type_quantile
33+
Quantile of source cell type abundance to use for defining
34+
a set locations with highest abundance of source cell types.
35+
Cell abundance or RNA abundance for genes below this thereshold is set to 0.
36+
normalisation_quantile
37+
Quantile of source cell type or source RNA abundance for genes to use as normalising constant.
38+
This step can be seen as scaling that puts all source cell types or genes into the same scale.
39+
distance_bin
40+
If using concentric bins list with two elements specifying inner and outer edge of the bin.
41+
Distances specified in coordinates of `obsm_spatial_key`.
42+
sample_key
43+
`adata.obs` column key specifying distinct sections across
44+
which distance bin computation is invalid.
45+
genes_to_use_as_source
46+
To request RNA abundance of genes around target cells provide a list of
47+
var_names or gene SYMBOLs.
48+
gene_symbols
49+
`adata.var` column key containing gene symbols
50+
obsm_spatial_key
51+
`adata.obsm` key containing spatial coordinates (can be 2D or 3D or N-D).
52+
normalisation_key
53+
RNA abundance must be normalised using y_s technical effect term
54+
estimated by cell2location. Provide `adata.obsm` key containing this normalisation term.
55+
layer
56+
adata.layers to use for getting RNA abundance. Default: `adata.X`
57+
cell_abundance_key
58+
which cell2location variable to use as cell abundance
59+
cell_abundance_quantile_key
60+
which quantile of cell abundance to use
61+
62+
Returns
63+
-------
64+
pd.DataFrame of average abundance of source cell types or RNA abundance of requested genes
65+
around target cell types.
66+
67+
"""
68+
# save initial names
69+
if genes_to_use_as_source is None:
70+
source_names = adata.uns["mod"]["factor_names"]
71+
else:
72+
source_names = genes_to_use_as_source
73+
74+
cell_abundance_key_ = cell_abundance_quantile_key + cell_abundance_key
75+
cell_abundance_key = cell_abundance_quantile_key + "_" + cell_abundance_key
76+
77+
# create result data frame to be completed
78+
weighted_avg = pd.DataFrame(
79+
index=[f"target {ct}" for ct in adata.uns["mod"]["factor_names"]],
80+
columns=source_names,
81+
)
82+
if genes_to_use_as_source is None:
83+
# pick locations where source cell type abundance is above source_cell_type_quantile
84+
source_cell_type_filter = adata.obsm[cell_abundance_key] > adata.obsm[cell_abundance_key].quantile(
85+
source_cell_type_quantile
86+
)
87+
# zero-out source cell abundance below selected quantile
88+
source_cell_type_data = adata.obsm[cell_abundance_key] * source_cell_type_filter
89+
# get normalising quantile values
90+
source_normalisation_quantile = adata.obsm[cell_abundance_key].quantile(normalisation_quantile, axis=0)
91+
# compute average abundance above this quantile
92+
source_normalisation_quantile = np.average(
93+
adata.obsm[cell_abundance_key],
94+
weights=adata.obsm[cell_abundance_key] > source_normalisation_quantile,
95+
axis=0,
96+
)
97+
else:
98+
# if using gene symbols get var names:
99+
if gene_symbols is not None:
100+
genes_to_use_as_source = adata.var_names[adata.var[gene_symbols].isin(genes_to_use_as_source)]
101+
# get RNA abundance data
102+
if layer is None:
103+
source_cell_type_data = adata[:, genes_to_use_as_source].X.toarray()
104+
else:
105+
source_cell_type_data = adata[:, genes_to_use_as_source].layers[layer].toarray()
106+
# apply technical across-location normalisation
107+
if normalisation_key:
108+
source_cell_type_data = source_cell_type_data / adata.obsm[normalisation_key]
109+
# pick locations where source cell type abundance is above source_cell_type_quantile
110+
source_cell_type_filter = source_cell_type_data > np.quantile(
111+
source_cell_type_data, q=source_cell_type_quantile, axis=0
112+
)
113+
# zero-out source cell abundance below selected quantile
114+
source_cell_type_data = source_cell_type_data * source_cell_type_filter
115+
# create a dataframe with initial source RNA abundance
116+
source_cell_type_data = pd.DataFrame(
117+
source_cell_type_data,
118+
index=adata.obs_names,
119+
columns=source_names,
120+
)
121+
# get normalising quantile values
122+
source_normalisation_quantile = source_cell_type_data.quantile(normalisation_quantile, axis=0)
123+
# compute average abundance above this quantile
124+
source_normalisation_quantile = np.average(
125+
source_cell_type_data,
126+
weights=source_cell_type_data > source_normalisation_quantile,
127+
axis=0,
128+
)
129+
130+
# [optional] compute average source_cell_type_data across closes locations (concentric circles)
131+
if distance_bin is not None:
132+
# iterate over samples of connected location from the same sections
133+
# or independent chunks registered 3D data
134+
for s in adata.obs[sample_key].unique():
135+
# get sample observations
136+
sample_ind = adata.obs[sample_key].isin([s])
137+
138+
# compute distances bewteen locations
139+
from scipy.spatial.distance import cdist
140+
141+
distances = cdist(adata[sample_ind, :].obsm[obsm_spatial_key], adata[sample_ind, :].obsm[obsm_spatial_key])
142+
# select locations in distance bin
143+
binary_distance = csr_matrix((distances > distance_bin[0]) & (distances <= distance_bin[1]))
144+
# compute average abundance across locations within a bin
145+
data_ = (
146+
(binary_distance @ csr_matrix(source_cell_type_data.loc[sample_ind, :].values))
147+
.multiply(1 / binary_distance.sum(1))
148+
.toarray()
149+
)
150+
# to account for locations with no neighbours within a bin (sum == 0)
151+
data_[np.isnan(data_)] = 0
152+
# complete the average for a given sample
153+
source_cell_type_data.loc[sample_ind, :] = data_
154+
# normalise data by normalising quantile (global value across distance bins)
155+
source_cell_type_data = source_cell_type_data / source_normalisation_quantile
156+
# account for cases of undetected signal
157+
source_cell_type_data[source_cell_type_data.isna()] = 0
158+
159+
# compute average for each target cell type
160+
for ct in adata.uns["mod"]["factor_names"]:
161+
# find locations containing high abundance of target cell type
162+
target_cell_type_filter = adata.obsm[cell_abundance_key][f"{cell_abundance_key_}_{ct}"] > adata.obsm[
163+
cell_abundance_key
164+
][f"{cell_abundance_key_}_{ct}"].quantile(target_cell_type_quantile)
165+
# use thresholded abundance of target cell type as a weight
166+
weights = adata.obsm[cell_abundance_key][f"{cell_abundance_key_}_{ct}"] * target_cell_type_filter
167+
# normalise for target cell type abundance
168+
target_quantile = adata.obsm[cell_abundance_key][f"{cell_abundance_key_}_{ct}"].quantile(normalisation_quantile)
169+
target_quantile = np.average(
170+
adata.obsm[cell_abundance_key][f"{cell_abundance_key_}_{ct}"].values,
171+
weights=adata.obsm[cell_abundance_key][f"{cell_abundance_key_}_{ct}"].values > target_quantile,
172+
).flatten()
173+
assert target_quantile.shape == (1,), target_quantile.shape
174+
weights = weights / target_quantile
175+
# compute the final weighted average
176+
weighted_avg_ = np.average(
177+
source_cell_type_data,
178+
weights=weights,
179+
axis=0,
180+
)
181+
# weighted_avg_[weighted_avg_.isna()] = 0
182+
183+
weighted_avg_ = pd.Series(weighted_avg_, name=ct, index=source_names)
184+
185+
# hack to make self interactions less apparent
186+
weighted_avg_[ct] = weighted_avg_[~weighted_avg_.index.isin([ct])].max() + 0.02
187+
# complete the results dataframe
188+
weighted_avg.loc[f"target {ct}", :] = weighted_avg_
189+
190+
return weighted_avg.astype("float32")
191+
192+
193+
def melt_data_frame_per_signal(weighted_avg_dict, source_var, distance_bins):
194+
source_var_1 = pd.DataFrame(
195+
np.array([weighted_avg_dict[str(distance_bin)][source_var].values for distance_bin in distance_bins]),
196+
columns=weighted_avg_dict[str(distance_bins[0])].index,
197+
index=[np.mean(distance_bin) for distance_bin in distance_bins],
198+
).T
199+
200+
source_var_1 = source_var_1.melt(
201+
value_name="Abundance",
202+
var_name="Distance bin",
203+
ignore_index=False,
204+
)
205+
source_var_1["Target"] = source_var_1.index
206+
source_var_1["Signal"] = source_var
207+
return source_var_1
208+
209+
210+
def melt_signal_target_data_frame(weighted_avg_dict, distance_bins):
211+
source_vars = weighted_avg_dict[str(distance_bins[0])].columns
212+
213+
return pd.concat(
214+
[melt_data_frame_per_signal(weighted_avg_dict, source_var, distance_bins) for source_var in source_vars]
215+
)

0 commit comments

Comments
 (0)