WIP Add module on dimensionality reduction#876
Open
ArturoAmorQ wants to merge 1 commit into
Open
Conversation
This was referenced May 5, 2026
| ax.bar_label(bars) | ||
| ax.set_xlim([0, 14]) | ||
| ax.set_yticks([1, 2], labels=["PC1", "PC2"]) | ||
| ax.set_xlabel("eigenvalues") |
Contributor
There was a problem hiding this comment.
My suggestion would be to use explained variance in the plot as it the term eigenvalues is not used (yet):
Suggested change
| ax.set_xlabel("eigenvalues") | |
| ax.set_xlabel("Explained variance") |
| # with each other. Two strongly correlated features will jointly define a | ||
| # direction with much higher variance than either one alone, and the explained | ||
| # variance ratios across components will still be very unequal. Scaling removes | ||
| # the unit bias; it does not make all directions equally important. |
Contributor
There was a problem hiding this comment.
I would say that without scaling a magnitude bias is introduced more than a unit bias:
Suggested change
| # the unit bias; it does not make all directions equally important. | |
| # the magnitude bias; it does not make all directions equally important. |
| # --- | ||
|
|
||
| # %% [markdown] | ||
| # # Solution for Exercise M8.01 |
Contributor
There was a problem hiding this comment.
Suggested change
| # # Solution for Exercise M8.01 | |
| # # Exercise M8.01 |
| # variance, keeps even more: all 900 components we computed pass it, meaning the | ||
| # true cutoff lies beyond what we measured. | ||
| # | ||
| # For text data, a common practice is to fix the number of components to be 100 |
Contributor
There was a problem hiding this comment.
Out of curiousity; where do these values come from? Why not between 100 and 300?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds module on PCA. Adds the wiki_news dataset, notebooks and exercises.
Probably a good idea to merge this after the clustering module in #836.
Note: This is still WIP. We are missing at least one quiz and the wrap-up quiz.