Releases: CellProfiling/pipex
PIPEX 2.0
Changelog
Version 2.0
Segmentation
-
{marker}_triangle_scorecontinuous score
Signed distance of each cell's normalized mean intensity from the global Triangle threshold, computed per marker. Follows the same scale and sign convention as the existing_otsu3score and is stored as a new column incell_data.csv. Particularly useful for markers with strongly right-skewed distributions, where the Triangle method tends to place a higher threshold than Otsu; comparing the two scores can flag markers that are genuinely difficult to threshold. -
{marker}_gmm_probcontinuous score
Posterior probability (0–1) that a cell belongs to the positive population, derived from a two-component Gaussian Mixture Model fit to the normalized marker image. Unlike the distance-based scores, this captures the shape and width of both populations, so a cell sitting at the boundary gets ~0.5 regardless of how spread out the distributions are. Stored as a new column incell_data.csv; set toNaNwhen the GMM fit is not trusted (see-gmm_min_separation). -
-gmm_min_separationparameter
Controls the minimum separation between the two GMM components (in combined standard deviation units) required to trust the fit. When the marker distribution is not clearly bimodal and the separation falls below this value,_gmm_probis set toNaNfor all cells of that marker rather than reporting a potentially misleading probability. Default is 0.5; increase it to demand a cleaner bimodal shape before trusting the score.
Preprocessing
-tophat_radiusparameter
Applies a morphological top-hat background subtraction to the marker image before segmentation, using a circular structuring element of the specified radius. This removes slowly-varying background signal such as autofluorescence and uneven staining while preserving cell-scale structures. Set to 0 (default) to disable; a good starting point is roughly 1.5× the expected cell radius in pixels.
Analysis
-
-leiden_resparameter
Exposes the Leiden clustering resolution as a user-configurable parameter. Higher values produce more, finer-grained clusters; lower values produce fewer, broader ones. The typical range for spatial proteomics panels of 10–40 markers is 0.1–2.0, with 0.3–0.5 recommended as a starting point. Default is 0.5. -
-k_estimationparameter
When kmeans clustering is enabled, runs 20 successive fits (k = 1 to 19) and produces three diagnostic plots — distortion (elbow method), inertia, and silhouette score per k — to help choose the number of clusters before committing to a final run. Higher silhouette scores indicate better-separated clusters. -
Multiple binarized suffixes via
-use_bin
Previously,-use_binaccepted a single column suffix and substituted it for all marker intensity inputs to clustering. It now accepts a comma-separated list of suffixes, applying each one to its corresponding marker. This allows mixing score types across markers in the same analysis run. -
Multiple parallel cluster refinements
The cluster refinement step (cell_types.csv) now supports multiple independent refinements in a single run by assigning distinctref_idvalues to groups of rules. Each group produces its own output column and JSON report, runs in parallel against the same clustering results, and does not filter the output of previous groups. A typical use is a first strict pass for well-defined populations and a second looser pass for ambiguous clusters. -
Neighborhood analysis
New full-featured spatial analysis activated via-neigh_cluster_id. For each k value in-neigh_k_values(default: 1, 5, 10), PIPEX computes the cell type composition of the k nearest neighbors of every cell, producing a heatmap and stacked bar chart per k. Independently, DBSCAN clustering is run on each cell type's spatial coordinates to classify it as scattered sparsely, clustered sparsely, or clustered densely — with the density threshold controlled by-neigh_density_threshold. Results are saved tocell_data.csvand a dedicated spatial distribution CSV. See Annex 5 in the README for full details.
Filter masks
- LMD export
New output mode forgenerate_filtered_masks.pythat produces an XML cutting file compatible with Leica's Laser Microdissection software. Four parameters control the output geometry:-shape_dilationexpands each cell outline by a given number of pixels,-convolution_smoothingcontrols contour smoothness,-path_optimizationselects the cutting path order strategy (none, Hilbert, or greedy), and-distance_heuristicmerges nearby shapes into a single cutting group to reduce stage movements.
TissUUmaps export
-launchparameter forgenerate_tissuumaps.py
When set toyes, automatically starts a local HTTP server serving theTissUUmaps_webexportfolder after export completes and opens the result in the default browser. The server runs on the first available port starting at 8080 and keeps running until the process is interrupted withCtrl+C. Requiresinclude_html=yesto have generated the webexport folder first.
Extra scripts
-
extra/folder
New folder with standalone Python scripts for post-processing tasks outside the core pipeline. Scripts work directly fromcell_data.csv— no images required — and are designed as ready-to-adapt examples. -
extra/custom_binarization_example.py
Re-applies marker +/- binarization to an existingcell_data.csvusing the continuous scores already computed by segmentation (_gmm_prob,_triangle_score,_otsu3). Useful to re-tune thresholds or regenerate binarization columns without re-running the full pipeline. -
extra/match_segmentation_ids.py
Matches cells between twocell_data.csvfiles from different segmentation runs on the same image (e.g. whole-cell vs. nuclear) by nearest-neighbour spatial proximity. Writes amatched_cell_idcolumn into the target file so the two tables can be joined downstream.
Documentation
-
Annex 3: Continuous marker scores
New annex in the README documenting all five per-cell continuous scores (_local_90,_ratio_pixels,_otsu3,_triangle_score,_gmm_prob), including their scale, interpretation, and guidance on which score to use for different marker types and downstream tasks. -
Annex 5: Neighborhood cell type analysis
New annex in the README (renumbered from the former Annex 4 slot) providing a full explanation of the neighbor composition analysis and the spatial distribution classification methodology, including the DBSCAN approach and the density threshold definition.
Initial public release
This is the first public release of PIPEX. The software has been running for several years in Emma Lundberg's lab and gone through several cycles of polishing and features improvements.