Skip to content

Commit 0048402

Browse files
spec: add scatter-embedding specification (#5261)
## New Specification: `scatter-embedding` Related to #5236 --- ### specification.md # scatter-embedding: t-SNE and UMAP Embedding Visualization ## Description A scatter plot displaying high-dimensional data projected into 2D space using non-linear dimensionality reduction techniques such as t-SNE or UMAP. Points are colored by cluster or class label, revealing groupings and latent structure in the data. This is a standard visualization in machine learning for exploring embeddings, single-cell RNA-seq data, and NLP document clustering, helping practitioners verify that learned representations capture meaningful distinctions. ## Applications - Visualizing cell-type clusters in single-cell RNA-seq data after dimensionality reduction in bioinformatics workflows - Exploring word or document embeddings from NLP models to verify semantic groupings and detect outliers - Inspecting latent space structure of autoencoders or variational autoencoders (VAEs) to assess representation quality - Quality-checking clustering results from K-means or DBSCAN by overlaying cluster assignments on the 2D projection ## Data - `x` (float) — First embedding dimension (e.g., t-SNE 1 or UMAP 1) - `y` (float) — Second embedding dimension (e.g., t-SNE 2 or UMAP 2) - `label` (categorical) — Cluster or class assignment for coloring points - Size: 500–5000 points typical - Example: Synthetic clustered data with 5–10 groups projected via t-SNE or UMAP ## Notes - Color each cluster/class with a distinct, colorblind-accessible color and include a legend mapping colors to labels - Optionally annotate cluster centroids with the cluster label text - Use moderate point size with slight transparency (alpha) to handle overlapping points in dense regions - Include a subtitle noting the algorithm and key parameter (e.g., "t-SNE (perplexity=30)" or "UMAP (n_neighbors=15)") - Axes represent embedding dimensions and typically should not have tick labels, as the coordinates are not directly interpretable --- **Next:** Add `approved` label to the issue to merge this PR. --- :robot: *[spec-create workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/24290812404)* --------- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
1 parent d6c578c commit 0048402

4 files changed

Lines changed: 55 additions & 0 deletions

File tree

plots/scatter-embedding/implementations/.gitkeep

Whitespace-only changes.

plots/scatter-embedding/metadata/.gitkeep

Whitespace-only changes.
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# scatter-embedding: t-SNE and UMAP Embedding Visualization
2+
3+
## Description
4+
5+
A scatter plot displaying high-dimensional data projected into 2D space using non-linear dimensionality reduction techniques such as t-SNE or UMAP. Points are colored by cluster or class label, revealing groupings and latent structure in the data. This is a standard visualization in machine learning for exploring embeddings, single-cell RNA-seq data, and NLP document clustering, helping practitioners verify that learned representations capture meaningful distinctions.
6+
7+
## Applications
8+
9+
- Visualizing cell-type clusters in single-cell RNA-seq data after dimensionality reduction in bioinformatics workflows
10+
- Exploring word or document embeddings from NLP models to verify semantic groupings and detect outliers
11+
- Inspecting latent space structure of autoencoders or variational autoencoders (VAEs) to assess representation quality
12+
- Quality-checking clustering results from K-means or DBSCAN by overlaying cluster assignments on the 2D projection
13+
14+
## Data
15+
16+
- `x` (float) — First embedding dimension (e.g., t-SNE 1 or UMAP 1)
17+
- `y` (float) — Second embedding dimension (e.g., t-SNE 2 or UMAP 2)
18+
- `label` (categorical) — Cluster or class assignment for coloring points
19+
- Size: 500–5000 points typical
20+
- Example: Synthetic clustered data with 5–10 groups projected via t-SNE or UMAP
21+
22+
## Notes
23+
24+
- Color each cluster/class with a distinct, colorblind-accessible color and include a legend mapping colors to labels
25+
- Optionally annotate cluster centroids with the cluster label text
26+
- Use moderate point size with slight transparency (alpha) to handle overlapping points in dense regions
27+
- Include a subtitle noting the algorithm and key parameter (e.g., "t-SNE (perplexity=30)" or "UMAP (n_neighbors=15)")
28+
- Axes represent embedding dimensions and typically should not have tick labels, as the coordinates are not directly interpretable
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Specification-level metadata for scatter-embedding
2+
# Auto-synced to PostgreSQL on push to main
3+
4+
spec_id: scatter-embedding
5+
title: t-SNE and UMAP Embedding Visualization
6+
7+
# Specification tracking
8+
created: "2026-04-11T20:22:05Z"
9+
updated: null
10+
issue: 5236
11+
suggested: MarkusNeusinger
12+
13+
# Classification tags (applies to all library implementations)
14+
# See docs/reference/tagging-system.md for detailed guidelines
15+
tags:
16+
plot_type:
17+
- scatter
18+
data_type:
19+
- numeric
20+
- categorical
21+
domain:
22+
- machine-learning
23+
- science
24+
features:
25+
- color-mapped
26+
- annotated
27+
- 2d

0 commit comments

Comments
 (0)