You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 2-Regression/2-Data/README.md
+76-1Lines changed: 76 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,6 +16,7 @@ In this lesson, you will learn:
16
16
17
17
- How to prepare your data for model-building.
18
18
- How to use Matplotlib for data visualization.
19
+
- How to use Seaborn for more expressive data visualization.
19
20
20
21
## Asking the right question of your data
21
22
@@ -194,11 +195,85 @@ To get charts to display useful data, you usually need to group the data somehow
194
195
195
196
This is a more useful data visualization! It seems to indicate that the highest price for pumpkins occurs in September and October. Does that meet your expectation? Why or why not?
196
197
198
+
## Exercise - experiment with Seaborn
199
+
200
+
Matplotlib is powerful, but it can take a lot of code to produce a polished chart. [Seaborn](https://seaborn.pydata.org/) is a library built _on top of_ Matplotlib that is designed for statistical data visualization. It works directly with Pandas dataframes, applies attractive default styles, and lets you create informative plots with far less code. Because Seaborn returns Matplotlib objects, you can still use everything you already know about Matplotlib to fine-tune the result.
201
+
202
+
> If you don't already have Seaborn installed, install it with `pip install seaborn`.
203
+
204
+
1. Import Seaborn at the top of the notebook, under the other imports. It is conventionally imported as`sns`:
205
+
206
+
```python
207
+
import seaborn as sns
208
+
```
209
+
210
+
### Scatter plots to show relationships
211
+
212
+
A big part of exploring data before building a model is looking for _relationships_ between variables. A [scatter plot](https://en.wikipedia.org/wiki/Scatter_plot) is one of the best tools for this: if the points seem to follow a line, the two variables may be correlated, which is a good sign that a linear regression model could work.
213
+
214
+
1. Recreate the price-to-month scatter plot from before, this time using Seaborn's [`relplot()`](https://seaborn.pydata.org/generated/seaborn.relplot.html) (relational plot), which works directly with your dataframe columns:

231
+
232
+
This particular data is quite noisy, so a line plot isn't the clearest choice here — but it shows how easily you can change chart types in Seaborn.
233
+
234
+
### Bar charts to show distributions
235
+
236
+
Earlier you grouped the data by hand to create a bar chart with Matplotlib. Seaborn's [`catplot()`](https://seaborn.pydata.org/generated/seaborn.catplot.html) (categorical plot) can do the grouping and aggregation for you. By default `kind="bar"` shows the mean of each category along with a black line indicating the confidence interval.

245
+
246
+
This confirms what you saw with Matplotlib — prices peak around September and October — but Seaborn also visualizes how much the price _varies_ within each month.
247
+
248
+
### Heatmaps to show correlations
249
+
250
+
Scatter plots compare two variables at a time. When you have several numeric columns, a [heatmap](https://en.wikipedia.org/wiki/Heat_map) lets you view the strength of the relationship between _every_ pair of columns at once. This is a common way to spot which features are most correlated before choosing what to feed into a model (and the same kind of chart is later used to display confusion matrices in classification).
251
+
252
+
1. Build a correlation matrix with Pandas, then draw it with Seaborn's [`heatmap()`](https://seaborn.pydata.org/generated/seaborn.heatmap.html). The `annot=True` option prints the correlation values on each cell:

260
+
261
+
Values close to `1` (or`-1`) mean the columns are strongly _linearly_ correlated. Notice how `Low Price`and`High Price` are almost perfectly correlated. `Month`, on the other hand, shows only a weak linear correlation with price — even though the bar chart above revealed a clear seasonal peak in September and October. That's an important lesson: the correlation coefficient only measures _straight-line_ relationships, so it can miss seasonal or otherwise non-linear patterns. ✅ Why is it useful to look at both a heatmap *and* charts like the bar chart before deciding which columns to use?
262
+
263
+
### Matplotlib or Seaborn?
264
+
265
+
Both libraries are worth knowing:
266
+
267
+
-**Matplotlib** gives you fine-grained control over every element of a chart andis the foundation almost every other Python plotting library builds on.
268
+
-**Seaborn** provides higher-level functions and attractive defaults for statistical charts, works directly with dataframes, andis often quicker for exploratory data analysis.
269
+
270
+
A common workflow is to reach for Seaborn to explore your data quickly, then drop down to Matplotlib when you need to customize the details.
271
+
197
272
---
198
273
199
274
## 🚀Challenge
200
275
201
-
Explore the different types of visualization that Matplotlib offers. Which types are most appropriate for regression problems?
276
+
Explore the different types of visualization that Matplotlib and Seaborn offer. Which types are most appropriate for regression problems?
"[Seaborn](https://seaborn.pydata.org/) is built on top of Matplotlib and works directly with dataframes, making it quick to create attractive statistical plots with very little code."
0 commit comments