Dear Segger team,
Hi, @EliHei2! Hope you been well.
I manage to run segger v2 using "main" branch and got all the output.
When I inspecting the "segger_anndata.h5ad", I found that, depending on the Xenium 5k slides, there were gene dropouts when compared to 10x default segmentation:
E.g.)
adata = sc.read_h5ad("/project/IBD_CD4/jpark/segger_main/IBD_CD4/RUNfistula/SLIDE1_recover_rare/segger_anndata.h5ad")
bdata = sc.read_h5ad("/project/IBD_CD4/jpark/segger_main/IBD_CD4/RUNfistula/SLIDE1/sections/xenium/section_00_ROI1.h5ad")
Genes in adata: 5093
Genes in bdata: 5100
Overlapping genes: 5076
Only in adata (segger): 17
Only in bdata (xenium): 24
=== Genes dropped by segger (in xenium but not segger) ===
['AHSG', 'AMH', 'CACNG5', 'CMTM5', 'CYP2A6', 'F2', 'FEZF2', 'GDF3', 'HTR1A', 'IGFBP1', 'KLK4', 'LBP', 'MAGEA1', 'MAGEC1', 'MAT1A', 'MUC16', 'MYOD1', 'NTSR2', 'PROKR2', 'RXFP3', 'SLC17A6', 'SRY', 'TBL1Y', 'UPK2']
=== Genes only in segger (not in xenium) ===
['Intergenic_Region_10000', 'Intergenic_Region_14000', 'Intergenic_Region_16000', 'Intergenic_Region_17000', 'Intergenic_Region_21000', 'Intergenic_Region_22000', 'Intergenic_Region_23000', 'Intergenic_Region_24000', 'Intergenic_Region_25000', 'Intergenic_Region_26000', 'Intergenic_Region_27000', 'Intergenic_Region_28000', 'Intergenic_Region_29000', 'Intergenic_Region_4000', 'Intergenic_Region_6000', 'Intergenic_Region_7000', 'Intergenic_Region_9000']
=== QC stats for dropped genes ===
mean_counts total_counts n_cells_by_counts
CYP2A6 0.000004 1.0 1
KLK4 0.000024 6.0 6
AMH 0.000024 6.0 6
F2 0.000024 6.0 6
CMTM5 0.000032 8.0 8
MAT1A 0.000044 11.0 11
AHSG 0.000057 14.0 14
SLC17A6 0.000065 16.0 16
FEZF2 0.000065 16.0 16
IGFBP1 0.000065 16.0 16
MAGEA1 0.000065 16.0 15
HTR1A 0.000069 17.0 17
SRY 0.000073 18.0 18
UPK2 0.000077 19.0 19
PROKR2 0.000081 20.0 20
MYOD1 0.000081 20.0 19
NTSR2 0.000093 23.0 21
MAGEC1 0.000101 25.0 25
TBL1Y 0.000113 28.0 28
RXFP3 0.000117 29.0 29
CACNG5 0.000121 30.0 30
GDF3 0.000146 36.0 36
LBP 0.000182 45.0 42
MUC16 0.000218 54.0 51
I try to rescue these dropout genes using following extra parameters but result were same:
$SEGGER segment
-i /project/simmons_hts/shared/07_04_2025_FISTULA_REVISION_XENIUM_PRIME/output-XETG00283__0060232__Region_1__20250402__131909
-o /project/IBD_CD4/jpark/segger_main/IBD_CD4/RUNfistula/SLIDE1_recover_rare
--tiling-margin-training 5.0
--tiling-margin-prediction 5.0
--cells-min-counts 1
--transcripts-max-dist 15.0
--transcripts-max-k 10
As it's relatively rare transcript/cell, one might ignore... but I wonder is there any chance I can force segger v2 to keep all 5100 genes? OR you think such forcing may sacrifice accuracy of segmentation?
best wishes,
Jun
Dear Segger team,
Hi, @EliHei2! Hope you been well.
I manage to run segger v2 using "main" branch and got all the output.
When I inspecting the "segger_anndata.h5ad", I found that, depending on the Xenium 5k slides, there were gene dropouts when compared to 10x default segmentation:
E.g.)
adata = sc.read_h5ad("/project/IBD_CD4/jpark/segger_main/IBD_CD4/RUNfistula/SLIDE1_recover_rare/segger_anndata.h5ad")bdata = sc.read_h5ad("/project/IBD_CD4/jpark/segger_main/IBD_CD4/RUNfistula/SLIDE1/sections/xenium/section_00_ROI1.h5ad")Genes in adata: 5093
Genes in bdata: 5100
Overlapping genes: 5076
Only in adata (segger): 17
Only in bdata (xenium): 24
=== Genes dropped by segger (in xenium but not segger) ===
['AHSG', 'AMH', 'CACNG5', 'CMTM5', 'CYP2A6', 'F2', 'FEZF2', 'GDF3', 'HTR1A', 'IGFBP1', 'KLK4', 'LBP', 'MAGEA1', 'MAGEC1', 'MAT1A', 'MUC16', 'MYOD1', 'NTSR2', 'PROKR2', 'RXFP3', 'SLC17A6', 'SRY', 'TBL1Y', 'UPK2']
=== Genes only in segger (not in xenium) ===
['Intergenic_Region_10000', 'Intergenic_Region_14000', 'Intergenic_Region_16000', 'Intergenic_Region_17000', 'Intergenic_Region_21000', 'Intergenic_Region_22000', 'Intergenic_Region_23000', 'Intergenic_Region_24000', 'Intergenic_Region_25000', 'Intergenic_Region_26000', 'Intergenic_Region_27000', 'Intergenic_Region_28000', 'Intergenic_Region_29000', 'Intergenic_Region_4000', 'Intergenic_Region_6000', 'Intergenic_Region_7000', 'Intergenic_Region_9000']
=== QC stats for dropped genes ===
mean_counts total_counts n_cells_by_counts
CYP2A6 0.000004 1.0 1
KLK4 0.000024 6.0 6
AMH 0.000024 6.0 6
F2 0.000024 6.0 6
CMTM5 0.000032 8.0 8
MAT1A 0.000044 11.0 11
AHSG 0.000057 14.0 14
SLC17A6 0.000065 16.0 16
FEZF2 0.000065 16.0 16
IGFBP1 0.000065 16.0 16
MAGEA1 0.000065 16.0 15
HTR1A 0.000069 17.0 17
SRY 0.000073 18.0 18
UPK2 0.000077 19.0 19
PROKR2 0.000081 20.0 20
MYOD1 0.000081 20.0 19
NTSR2 0.000093 23.0 21
MAGEC1 0.000101 25.0 25
TBL1Y 0.000113 28.0 28
RXFP3 0.000117 29.0 29
CACNG5 0.000121 30.0 30
GDF3 0.000146 36.0 36
LBP 0.000182 45.0 42
MUC16 0.000218 54.0 51
I try to rescue these dropout genes using following extra parameters but result were same:
$SEGGER segment
-i /project/simmons_hts/shared/07_04_2025_FISTULA_REVISION_XENIUM_PRIME/output-XETG00283__0060232__Region_1__20250402__131909
-o /project/IBD_CD4/jpark/segger_main/IBD_CD4/RUNfistula/SLIDE1_recover_rare
--tiling-margin-training 5.0
--tiling-margin-prediction 5.0
--cells-min-counts 1
--transcripts-max-dist 15.0
--transcripts-max-k 10
As it's relatively rare transcript/cell, one might ignore... but I wonder is there any chance I can force segger v2 to keep all 5100 genes? OR you think such forcing may sacrifice accuracy of segmentation?
best wishes,
Jun