Big improvement changes and feature additions for version 2.0

OmFrederic · OmFrederic · commit 85a7b69a8e29 · 2026-04-28T11:38:44.000-07:00
diff --git a/README.md b/README.md
@@ -191,6 +191,7 @@ There are currently available the following commands:
   - `-include_geojson=<yes or no to include cell segmentation as regions>` : example -> -include_geojson=yes. **OBS**: this includes the cell segmentation as regions in the TissUUmaps project. If this is not set, no regions will be included
   - `-compress_geojson=<yes or no to compress geojson regions into pbf>` : example -> -compress_geojson=yes. **OBS**: this includes the cell segmentation regions as a compressed pbf file
   - `-include_html=<yes or no to export html page for sharing the TissUUmaps project on the web>` : example -> -include_marker_images=yes. **OBS**: this includes the html page for sharing the TissUUmaps project on the web. A web server is needed to visualize the exported web page
+  - `-launch=<yes or no to automatically open the exported web page in a browser>` : example -> -launch=yes. **OBS**: requires `-include_html=yes`. Starts a local HTTP server on the first available port (starting at 8080) serving the `TissUUmaps_webexport` folder and opens it in the default browser. The server keeps running until the process is stopped with Ctrl+C
   
 *Example 2*: contents of `pipex_batch_list.txt` for the images from *example 1*
 <code>
@@ -291,7 +292,9 @@ If you add the `generate_tissuumaps` command to PIPEX command list a `anndata_Ti
  - Install TissUUmaps (https://tissuumaps.github.io/TissUUmaps-docs/docs/intro/installation.html)
  - Load the `anndata_TissUUmaps.h5ad` file in TissUUmaps
 
-If you add the `include_html=yes` parameter to the `generate_tissuumaps` command, a `TissUUmaps_webexport` folder will be generated in your analysis/downstream sub-folder. You can share this file on a web server, and access it from any web browser.
+If you add the `include_html=yes` parameter to the `generate_tissuumaps` command, a `TissUUmaps_webexport` folder will be generated in your analysis/downstream sub-folder. You can share this folder on a web server, and access it from any web browser.
+
+If you also add `launch=yes`, PIPEX will automatically start a local HTTP server and open the result in your default browser once the export is complete — no separate web server setup needed. The server runs on the first available port starting at 8080 and can be stopped with `Ctrl+C`.
 
 **NOTE**: TissUUmaps requires your images to be in `TIFF` format and be named exactly as your markers (for example: `DAPI.tif`, `CPEP.tif`, etc...)
 
@@ -475,7 +478,7 @@ Annex 4: Cluster refinement procedure
 
 PIPEX's analysis step includes the possibility to perform multiple refinements of the unsupervised clustering results (leiden and/or kmeans). This can help you with the manual annotation and merging of the clusters automatically discovered.
 
-The idea behind the cluster refinement algorithm is to explore the ranked genes associated to each cluster and try to match them with rules stated by the user. The algorithm then assigns a confidence score per cluster and rule, depending how close its ranked genes are to the rule/s definition/s. Finally, the refinement picks per cluster the annotated cluster with higher confidence (ties are solved by row order).
+The idea behind the cluster refinement algorithm is to explore the ranked genes associated to each cluster and try to match them with rules stated by the user. The algorithm then assigns a confidence score per cluster and rule, depending how close its ranked genes are to the rule/s definition/s. Finally, the refinement picks per cluster the annotated cluster with matching or above confidence (ties are solved by row order).
 
 To use the cluster refinement, you have to create a `cell_types.csv` file with rows containing the following information:
 - `ref_id`: used as a suffix for the manually annotated cluster name. The final cluster name will be `leiden_ref[ref_id]` or `kmeans_ref[ref_id]`. Each unique `ref_id` group is an independent parallel refinement — it produces its own output column and JSON report, it does not filter the results of a previous ref_id. A typical use is a first ref_id with strict rules (`high` level, higher `min_confidence`) for well-defined populations, and a second ref_id with looser rules to catch remaining ambiguous clusters.
@@ -490,14 +493,14 @@ Here's and example of how a `cell_types.csv` file usually looks:
 <code>
 
     ref_id,cell_group,cell_type,cell_subtype,rank_filter,min_confidence,marker1,rule1,marker2,rule2,marker3,rule3
-    1,artifact,fold,unknown,all,10,CBS,high,CHGA,high,AMY2B,high
-    1,endocrine,islet,all,positive_only,10,CHGA,high,CPEP,high,AMY2B,low
-    1,exocrine,acinar,unknown1,all,10,CBS,high,AMY2B,high
-    1,endothelial,vessels,all,positive_only,30,CD31,high,aSMA,high
-    1,epithelial,ductal,unknown,all,10,KRT19,high,PANCK,high
-    1,immune,potential,artifact,all,10,HLADR,high,NPDC1,high,aSMA,low
-    2,immune,potential,artifact,all,0,HLADR,medium,NPDC1,medium
-    2,epithelial,ductal,unknown,all,0,KRT19,medium,PANCK,medium
+    1,artifact,fold,unknown,all,25,CBS,high,CHGA,high,AMY2B,high,,
+    1,endocrine,islet,all,positive_only,25,CHGA,high,CPEP,high,AMY2B,low
+    1,exocrine,acinar,unknown1,all,25CBS,high,AMY2B,high,,
+    1,endothelial,vessels,all,positive_only,30,CD31,high,aSMA,high,,
+    1,epithelial,ductal,unknown,all,25,KRT19,high,PANCK,high,,
+    1,immune,potential,artifact,all,25,HLADR,high,NPDC1,high,aSMA,low
+    2,immune,potential,artifact,all,10,HLADR,medium,NPDC1,medium,,
+    2,epithelial,ductal,unknown,all,10,KRT19,medium,PANCK,medium,,
 
 </code>
 
diff --git a/analysis.py b/analysis.py
@@ -332,13 +332,13 @@ def calculate_cluster_info(adata, cluster_type, markers):
         sq.pl.nhood_enrichment(adata, cluster_key=cluster_type, method="single", show=False,
                                save='nhood_enrichment_' + cluster_type + '.jpg')
     except Exception as e:
-        log('Neighborhood calculations failed for cluster ' + cluster_type)
+        log('Neighborhood calculations failed for cluster ' + cluster_type + ': ' + str(e))
 
     try:
         sq.gr.interaction_matrix(adata, cluster_key=cluster_type)
         sq.pl.interaction_matrix(adata, cluster_key=cluster_type, show=False, save='interaction_matrix_' + cluster_type + '.jpg')
     except Exception as e:
-        log('Interaction matrix analysis failed for cluster ' + cluster_type)
+        log('Interaction matrix analysis failed for cluster ' + cluster_type + ': ' + str(e))
 
     try:
         sc.tl.rank_genes_groups(adata, cluster_type, method='t-test')
@@ -350,6 +350,19 @@ def calculate_cluster_info(adata, cluster_type, markers):
                   save='_' + cluster_type)
 
 
+def _sort_json_keys(obj):
+    if isinstance(obj, dict):
+        def _key(k):
+            try:
+                return (0, int(k), '')
+            except (ValueError, TypeError):
+                return (1, 0, str(k))
+        return {k: _sort_json_keys(v) for k, v in sorted(obj.items(), key=lambda x: _key(x[0]))}
+    if isinstance(obj, list):
+        return [_sort_json_keys(i) for i in obj]
+    return obj
+
+
 def refine_clustering(adata, cluster_type, curr_ref_id, cell_types_ref):
     clustering_merge_data = {}
     clustering_merge_data['scores'] = {}
@@ -406,20 +419,18 @@ def refine_clustering(adata, cluster_type, curr_ref_id, cell_types_ref):
         best_candidate = None
         best_real_confidence = 0
         for curr_cell_type in clustering_merge_data['cell_types'][cluster_id]:
-            if (best_candidate is None or best_candidate['prob'] < curr_cell_type['prob']) and curr_cell_type['prob'] >= int(curr_cell_type['confidence_threshold']):
-                best_candidate = { 'cell_type': curr_cell_type['cell_type'], 'prob' : curr_cell_type['prob']  / 100 } #, 'real_confidence' : '{:.1%}'.format(curr_cell_type['prob'])} # * len(clustering_merge_data['cell_types'][cluster_id])) / 100.0) }
+            if curr_cell_type['prob'] >= int(curr_cell_type['confidence_threshold']):
+                best_candidate = { 'cell_type': curr_cell_type['cell_type'], 'prob' : curr_cell_type['prob']  / 100 }
                 best_real_confidence = curr_cell_type['prob']
+                break
 
         if best_real_confidence > 0:
             clustering_merge_data['candidates'][cluster_id] = best_candidate
             adata.obs.loc[adata.obs[cluster_type + "_ref" + curr_ref_id] == cluster_id, cluster_type + "_ref" + curr_ref_id] = best_candidate['cell_type']
             adata.obs.loc[adata.obs[cluster_type + "_ref" + curr_ref_id + "_p"] == cluster_id, cluster_type + "_ref" + curr_ref_id + "_p"] = '{:.1%}'.format(best_candidate['prob']) #best_candidate['real_confidence'][:-1]
 
-    clustering_merge_data["scores"] = OrderedDict(sorted(clustering_merge_data["scores"].items()))
-    clustering_merge_data["cell_types"] = OrderedDict(sorted(clustering_merge_data["cell_types"].items()))
-    clustering_merge_data["candidates"] = OrderedDict(sorted(clustering_merge_data["candidates"].items()))
     with open(os.path.join(data_folder, 'analysis', 'downstream', 'cell_types_result_' + cluster_type + curr_ref_id + '.json'), 'w') as outfile:
-        json.dump(clustering_merge_data, outfile, indent = 4)
+        json.dump(_sort_json_keys(clustering_merge_data), outfile, indent = 4)
 
 
 #Function to perform different cluster methods
@@ -449,7 +460,7 @@ def clustering(df_norm, markers):
         log("Dataset too big to create spatial plots per marker")
 
     #We calculate PCA, neighbors and UMAP for the anndata
-    sc.pp.pca(adata, n_comps=min(len(markers), 50))
+    sc.pp.pca(adata, n_comps=min(len(markers), 50, adata.n_obs - 1, adata.n_vars - 1))
 
     pca_loadings = adata.varm['PCs']
     loadings_df = pd.DataFrame(pca_loadings, index=adata.var_names, columns=[f'PC{i + 1}' for i in range(pca_loadings.shape[1])])
@@ -474,6 +485,9 @@ def clustering(df_norm, markers):
     sc.pp.neighbors(adata, n_neighbors=num_neighbors)
     log("Neighbors graph calculated")
 
+    sq.gr.spatial_neighbors(adata, coord_type="generic", n_neighs=num_neighbors)
+    log("Spatial neighbors graph calculated")
+
     sc.tl.umap(adata)
     log("UMAP calculated")
     sc.pl.umap(adata, show=False, save='_base')
@@ -583,8 +597,6 @@ def clustering(df_norm, markers):
             if neigh_cluster_id not in adata.obs:
                 adata.obs[neigh_cluster_id] = df_norm[neigh_cluster_id].astype('category')
             try:
-                sq.gr.spatial_neighbors(adata, coord_type="generic", n_neighs=num_neighbors)
-                log("Spatial neighbors graph calculated")
                 sq.gr.centrality_scores(adata, neigh_cluster_id)
                 sq.pl.centrality_scores(adata, neigh_cluster_id, save=(neigh_cluster_id + "_centrality_scores.jpg"))
                 log("Neighborhood centrality scores calculated")
@@ -672,7 +684,7 @@ def clustering(df_norm, markers):
 def neighborhood_cell_type_analysis(adata, neigh_cluster_id, k_values, density_threshold, data_folder, image_size):
     k_values = sorted(set(k_values))[:3]
     cell_types = adata.obs[neigh_cluster_id].astype(str).values
-    unique_types = sorted(set(cell_types))
+    unique_types = sorted(set(cell_types), key=lambda x: int(x) if x.lstrip('-').isdigit() else x)
     n_types = len(unique_types)
     type_to_idx = {t: i for i, t in enumerate(unique_types)}
     cell_type_idx = np.array([type_to_idx[t] for t in cell_types])
diff --git a/changelog.md b/changelog.md
@@ -41,6 +41,11 @@ Changelog
 - **LMD export**
   New output mode for `generate_filtered_masks.py` that produces an XML cutting file compatible with Leica's Laser Microdissection software. Four parameters control the output geometry: `-shape_dilation` expands each cell outline by a given number of pixels, `-convolution_smoothing` controls contour smoothness, `-path_optimization` selects the cutting path order strategy (none, Hilbert, or greedy), and `-distance_heuristic` merges nearby shapes into a single cutting group to reduce stage movements.
 
+#### TissUUmaps export
+
+- **`-launch` parameter for `generate_tissuumaps.py`**
+  When set to `yes`, automatically starts a local HTTP server serving the `TissUUmaps_webexport` folder after export completes and opens the result in the default browser. The server runs on the first available port starting at 8080 and keeps running until the process is interrupted with `Ctrl+C`. Requires `include_html=yes` to have generated the webexport folder first.
+
 #### Extra scripts
 
 - **`extra/` folder**
diff --git a/generate_filtered_masks.py b/generate_filtered_masks.py
@@ -2,6 +2,8 @@
 import os
 import argparse
 import datetime
+import matplotlib
+matplotlib.use('Agg')
 import pandas as pd
 import numpy as np
 from skimage.io import imsave
diff --git a/generate_geojson.py b/generate_geojson.py
@@ -106,7 +106,7 @@ def options(argv):
         for marker in markers:
             cell_data["properties"]["measurements"].append({
                 "name" : marker,
-                "value" : float(cell[marker])
+                "value" : float(cell[marker]) if pd.notna(cell[marker]) else 0.0
                 })
 
         #if cluster_id parameter is selected, add cluster_id and cluster_color
diff --git a/generate_tissuumaps.py b/generate_tissuumaps.py
@@ -1,12 +1,17 @@
+import os
+os.environ['PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION'] = 'python'
 import scanpy as sc
 import scipy
-import os
 import fnmatch
 import json
 import copy
 import datetime
 import sys
 import argparse
+import threading
+import webbrowser
+import http.server
+import socketserver
 from skimage.measure import approximate_polygon
 import numpy as np
 from pipex_utils import log
@@ -16,6 +21,7 @@
 include_geojson = "no"
 compress_geojson = "no"
 include_html = "no"
+launch = "no"
 
 def find_marker_file(folder, marker):
     """Return the filename of the first image in folder whose name ends with marker."""
@@ -173,6 +179,8 @@ def options(argv):
         help='compress geojson regions into pbf : example -> -compress_geojson=yes')
     parser.add_argument('--include_html', choices=['yes', 'no'], default='no',
         help='export html page for web sharing : example -> -include_html=yes')
+    parser.add_argument('--launch', choices=['yes', 'no'], default='no',
+        help='launch local web server and open browser after export : example -> -launch=yes')
     if not argv:
         parser.print_help()
         sys.exit()
@@ -185,6 +193,7 @@ def options(argv):
     include_geojson = args.include_geojson
     compress_geojson = args.compress_geojson
     include_html = args.include_html
+    launch = args.launch
 
     pidfile_filename = './RUNNING'
     if "PIPEX_WORK" in os.environ:
@@ -199,4 +208,34 @@ def options(argv):
 
     exporting_tissuumaps()
 
+    if launch == "yes":
+        webexport_path = os.path.join(data_folder, 'analysis', 'downstream', 'TissUUmaps_webexport')
+        if not os.path.isdir(webexport_path):
+            print(">>> WARNING: TissUUmaps webexport directory not found, cannot launch", flush=True)
+        else:
+            port = 8080
+            while port < 8200:
+                try:
+                    handler = http.server.SimpleHTTPRequestHandler
+                    handler.log_message = lambda *args: None
+                    httpd = socketserver.TCPServer(("", port), handler)
+                    break
+                except OSError:
+                    port += 1
+            else:
+                print(">>> WARNING: could not find a free port to launch TissUUmaps", flush=True)
+                httpd = None
+            if httpd:
+                os.chdir(webexport_path)
+                thread = threading.Thread(target=httpd.serve_forever, daemon=True)
+                thread.start()
+                url = f"http://localhost:{port}"
+                print(f">>> TissUUmaps running at {url} — press Ctrl+C to stop", flush=True)
+                webbrowser.open(url)
+                try:
+                    thread.join()
+                except KeyboardInterrupt:
+                    httpd.shutdown()
+                    print(">>> TissUUmaps server stopped", flush=True)
+
     log("End time exporting tissuumaps")
diff --git a/pipexGUI.py b/pipexGUI.py
diff --git a/requirements.txt b/requirements.txt
diff --git a/segmentation.py b/segmentation.py