WIP Add module on dimensionality reduction by ArturoAmorQ · Pull Request #876 · INRIA/scikit-learn-mooc

ArturoAmorQ · 2026-05-04T16:19:02Z

Adds module on PCA. Adds the wiki_news dataset, notebooks and exercises.
Probably a good idea to merge this after the clustering module in #836.

Note: This is still WIP. We are missing at least one quiz and the wrap-up quiz.

fritshermans · 2026-05-15T14:22:33Z

+ax.bar_label(bars)
+ax.set_xlim([0, 14])
+ax.set_yticks([1, 2], labels=["PC1", "PC2"])
+ax.set_xlabel("eigenvalues")


My suggestion would be to use explained variance in the plot as it the term eigenvalues is not used (yet):

Suggested change

ax.set_xlabel("eigenvalues")

ax.set_xlabel("Explained variance")

fritshermans · 2026-05-15T14:40:59Z

+# with each other. Two strongly correlated features will jointly define a
+# direction with much higher variance than either one alone, and the explained
+# variance ratios across components will still be very unequal. Scaling removes
+# the unit bias; it does not make all directions equally important.


I would say that without scaling a magnitude bias is introduced more than a unit bias:

Suggested change

# the unit bias; it does not make all directions equally important.

# the magnitude bias; it does not make all directions equally important.

fritshermans · 2026-05-15T14:44:05Z

+# ---
+
+# %% [markdown]
+# # Solution for Exercise M8.01


Suggested change

# # Solution for Exercise M8.01

# # Exercise M8.01

fritshermans · 2026-05-15T14:45:54Z

+# variance, keeps even more: all 900 components we computed pass it, meaning the
+# true cutoff lies beyond what we measured.
+#
+# For text data, a common practice is to fix the number of components to be 100


Out of curiousity; where do these values come from? Why not between 100 and 300?

WIP Add module on dimensionality reduction

ccf3814

This was referenced May 5, 2026

WIP Add module on dimensionality reduction probabl-ai/scikit-learn-course#37

Closed

WIP Add module on dimensionality reduction probabl-ai/scikit-learn-course#38

Merged

fritshermans reviewed May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Add module on dimensionality reduction#876

WIP Add module on dimensionality reduction#876
ArturoAmorQ wants to merge 1 commit into
INRIA:mainfrom
ArturoAmorQ:dimred_module

ArturoAmorQ commented May 4, 2026

Uh oh!

fritshermans May 15, 2026

Uh oh!

fritshermans May 15, 2026

Uh oh!

fritshermans May 15, 2026

Uh oh!

fritshermans May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	ax.set_xlabel("eigenvalues")
	ax.set_xlabel("Explained variance")

	# the unit bias; it does not make all directions equally important.
	# the magnitude bias; it does not make all directions equally important.

Conversation

ArturoAmorQ commented May 4, 2026

Uh oh!

fritshermans May 15, 2026

Choose a reason for hiding this comment

Uh oh!

fritshermans May 15, 2026

Choose a reason for hiding this comment

Uh oh!

fritshermans May 15, 2026

Choose a reason for hiding this comment

Uh oh!

fritshermans May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants