[main] Gene dropout

Dear Segger team,

Hi, @EliHei2! Hope you been well.
I manage to run segger v2 using "main" branch and got all the output.

When I inspecting the "segger_anndata.h5ad", I found that, depending on the Xenium 5k slides, there were gene dropouts when compared to 10x default segmentation:

E.g.)

`adata = sc.read_h5ad("/project/IBD_CD4/jpark/segger_main/IBD_CD4/RUNfistula/SLIDE1_recover_rare/segger_anndata.h5ad")`
`bdata = sc.read_h5ad("/project/IBD_CD4/jpark/segger_main/IBD_CD4/RUNfistula/SLIDE1/sections/xenium/section_00_ROI1.h5ad")`

Genes in adata:           5093
Genes in bdata:           5100
Overlapping genes:        5076
Only in adata (segger):   17
Only in bdata (xenium):   24

=== Genes dropped by segger (in xenium but not segger) ===
['AHSG', 'AMH', 'CACNG5', 'CMTM5', 'CYP2A6', 'F2', 'FEZF2', 'GDF3', 'HTR1A', 'IGFBP1', 'KLK4', 'LBP', 'MAGEA1', 'MAGEC1', 'MAT1A', 'MUC16', 'MYOD1', 'NTSR2', 'PROKR2', 'RXFP3', 'SLC17A6', 'SRY', 'TBL1Y', 'UPK2']

=== Genes only in segger (not in xenium) ===
['Intergenic_Region_10000', 'Intergenic_Region_14000', 'Intergenic_Region_16000', 'Intergenic_Region_17000', 'Intergenic_Region_21000', 'Intergenic_Region_22000', 'Intergenic_Region_23000', 'Intergenic_Region_24000', 'Intergenic_Region_25000', 'Intergenic_Region_26000', 'Intergenic_Region_27000', 'Intergenic_Region_28000', 'Intergenic_Region_29000', 'Intergenic_Region_4000', 'Intergenic_Region_6000', 'Intergenic_Region_7000', 'Intergenic_Region_9000']

=== QC stats for dropped genes ===
         mean_counts  total_counts  n_cells_by_counts
CYP2A6      0.000004           1.0                  1
KLK4        0.000024           6.0                  6
AMH         0.000024           6.0                  6
F2          0.000024           6.0                  6
CMTM5       0.000032           8.0                  8
MAT1A       0.000044          11.0                 11
AHSG        0.000057          14.0                 14
SLC17A6     0.000065          16.0                 16
FEZF2       0.000065          16.0                 16
IGFBP1      0.000065          16.0                 16
MAGEA1      0.000065          16.0                 15
HTR1A       0.000069          17.0                 17
SRY         0.000073          18.0                 18
UPK2        0.000077          19.0                 19
PROKR2      0.000081          20.0                 20
MYOD1       0.000081          20.0                 19
NTSR2       0.000093          23.0                 21
MAGEC1      0.000101          25.0                 25
TBL1Y       0.000113          28.0                 28
RXFP3       0.000117          29.0                 29
CACNG5      0.000121          30.0                 30
GDF3        0.000146          36.0                 36
LBP         0.000182          45.0                 42
MUC16       0.000218          54.0                 51

I try to rescue these dropout genes using following extra parameters but result were same:

$SEGGER segment \
    -i /project/simmons_hts/shared/07_04_2025_FISTULA_REVISION_XENIUM_PRIME/output-XETG00283__0060232__Region_1__20250402__131909 \
    -o /project/IBD_CD4/jpark/segger_main/IBD_CD4/RUNfistula/SLIDE1_recover_rare \
    --tiling-margin-training 5.0 \
    --tiling-margin-prediction 5.0 \
    --cells-min-counts 1 \
    --transcripts-max-dist 15.0 \
    --transcripts-max-k 10 

As it's relatively rare transcript/cell, one might ignore... but I wonder is there any chance I can force segger v2 to keep all 5100 genes? OR you think such forcing may sacrifice accuracy of segmentation?

best wishes,
Jun


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[main] Gene dropout #33

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[main] Gene dropout #33

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions