-
Notifications
You must be signed in to change notification settings - Fork 25
Expand file tree
/
Copy pathcircles.py
More file actions
136 lines (114 loc) · 4.84 KB
/
circles.py
File metadata and controls
136 lines (114 loc) · 4.84 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.17.0
# kernelspec:
# display_name: default
# language: python
# name: python3
# ---
# %% [markdown]
# # Example 1: Exploring Shape
# In this notebook, we use the **Mapper algorithm** to analyze a toy dataset
# composed of two concentric circles. This simple example is a classic case in
# topology and machine learning, and it's perfect for gaining an intuitive
# understanding of how Mapper captures shape. Although this dataset is
# synthetic and well understood, it's ideal for visualizing how Mapper detects
# underlying **topological structures**—in this case, two distinct loops. The
# resulting Mapper graph should ideally reveal two connected components,
# corresponding to the two circular regions.
# %% [markdown]
# ### Mapper pipeline
# %% [markdown]
# We generate a synthetic dataset using `make_circles`, which creates two
# concentric circles in 2D space. To prepare the data for Mapper, we apply
# **Principal Component Analysis (PCA)** to extract the top two components.
# These will serve as our **lens function**, which helps Mapper cover the data
# in a meaningful way. Even though the dataset is already 2D, PCA is still a
# useful and consistent choice for this example, especially when scaling up to
# higher-dimensional problems.
# %%
import numpy as np
from matplotlib import pyplot as plt
from sklearn.cluster import DBSCAN
from sklearn.datasets import make_circles
from sklearn.decomposition import PCA
from tdamapper.cover import CubicalCover
from tdamapper.learn import MapperAlgorithm
from tdamapper.plot import MapperPlot
X, labels = make_circles(n_samples=5000, noise=0.05, factor=0.3, random_state=42)
fig = plt.figure(figsize=(5, 5), dpi=100)
plt.scatter(X[:, 0], X[:, 1], c=labels, s=0.25, cmap="jet")
plt.axis("off")
plt.show()
# fig.savefig("circles_dataset.png", dpi=100)
y = PCA(2, random_state=42).fit_transform(X)
# %% [markdown]
# We now build the Mapper graph using the PCA output as the lens. Mapper
# requires two key components:
#
# - A **cover** algorithm that defines how the data is grouped together along
# the lens
# - A **clustering algorithm** that splits each set of the open cover.
#
# In this example, we use a **cubical cover** with 10 intervals and 30%
# overlap, and we apply **DBSCAN** for clustering, which is well-suited for
# identifying arbitrary shapes. Choosing these parameters often involves some
# trial and error based on the dataset and the desired resolution of the Mapper
# graph.
# %%
mapper = MapperAlgorithm(
cover=CubicalCover(n_intervals=10, overlap_frac=0.3), clustering=DBSCAN()
)
graph = mapper.fit_transform(X, y)
print(f"nodes: {len(graph.nodes())}, edges: {len(graph.edges())}")
# %% [markdown]
# ### Visualization
# %% [markdown]
# We visualize the Mapper graph by coloring each node according to the **mean**
# class label (0 or 1). Since the dataset contains two classes—one for each
# circle—this coloring helps us verify whether the graph structure aligns with
# the true geometry of the data. Ideally, nodes corresponding to the inner and
# outer circles will show clear separation in color, revealing two distinct
# connected components in the graph.
# %%
plot = MapperPlot(graph, dim=2, iterations=60, seed=42)
fig = plot.plot_plotly(
colors=labels,
cmap=["jet", "viridis", "cividis"],
agg=np.nanmean,
width=600,
height=600,
)
fig.show(config={"scrollZoom": True}, renderer="notebook_connected")
# fig.write_image("circles_mean.png", width=500, height=500)
# %% [markdown]
# To explore areas where the two classes might overlap or be hard to
# distinguish, we color each node by the **standard deviation** of class
# labels. A low standard deviation (close to 0) indicates that all samples in a
# node belong to the same class, while a higher value suggests label ambiguity
# within the node. This helps highlight transitional regions in the dataset
# where class boundaries may not be as sharp—useful when analyzing real-world
# data where such ambiguity is common.
# %%
fig = plot.plot_plotly(
colors=labels,
cmap=["jet", "viridis", "cividis"],
agg=np.nanstd,
width=600,
height=600,
)
fig.show(config={"scrollZoom": True}, renderer="notebook_connected")
# fig.write_image("circles_std.png", width=500, height=500)
# %% [markdown]
# ### Conclusions
# This simple example demonstrates how the Mapper algorithm can uncover
# meaningful topological structures, even in a basic synthetic dataset. By
# combining dimensionality reduction (PCA), a thoughtful cover strategy, and
# clustering, Mapper captures the two-loop shape of concentric circles and
# visualizes label consistency and ambiguity across the dataset. This forms a
# solid foundation for applying Mapper to more complex, real-world datasets.