Skip to content

Commit 23c7c3f

Browse files
committed
update pandas to 2.2.3 and other minor fixes
1 parent 6431dbc commit 23c7c3f

6 files changed

Lines changed: 22 additions & 21 deletions

environment.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ channels:
55
dependencies:
66
- python>=3.11,<3.13
77
- altair-all=5.5.*
8-
- pandas=1.5.*
8+
- pandas=2.2.*
99
- scipy
1010
- matplotlib
1111
- jupyter

modules/module2/module2-14-column_arithmetic_questions.qmd

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,11 @@ df[['Column_A']] * df[['Column_B']]
3838
'Question 2',
3939
'What is the correct syntax to multiply <code>Column_A</code> and <code>Column_B</code> from dataframe <code>df</code> and save it as a new column named <code>new_column</code>?',
4040
{
41-
'<code>df = df.assign(new_column=df[Column_A] * df[Column_B])</code></code>': 'Do you need to put your new column name in between quotations?',
42-
'<code>df = df.assign(new_column=df[Column_A] * df[Column_B])</code>': 'You must have been paying attention.',
43-
'<code>df = df.assign[new_column=df(Column_A) * df(Column_B)]</code>': 'Are you sure that you are using the correct parentheses for this?',
41+
'<code>df = df.assign(\'new_column\'=df[\'Column_A\'] * df[\'Column_B\'])</code></code>': 'Do you need to put your new column name in between quotations?',
42+
'<code>df = df.assign(new_column=df[\'Column_A\'] * df[\'Column_B\'])</code>': 'You must have been paying attention.',
43+
'<code>df = df.assign[new_column=df(\'Column_A\') * df(\'Column_B\')]</code>': 'Are you sure that you are using the correct parentheses for this?',
4444
},
45-
'<code>df = df.assign(new_column=df[Column_A] * df[Column_B])</code>',
45+
'<code>df = df.assign(new_column=df[\'Column_A\'] * df[\'Column_B\'])</code>',
4646
);
4747
</script>
4848

modules/module2/module2-17-filtering_question.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ df['location'] == 'Canada'
1919
is
2020

2121
```out
22-
[ True, False, False, True]
22+
[True, False, False, True]
2323
```
2424

2525
<br>
@@ -218,7 +218,7 @@ mighty_pokemon
218218
generateQuiz(
219219
'mcq3',
220220
'Question',
221-
'Which type has the most Pokemon with attack and defense scores greater than 100? <i>(Hint: Think about how we counted the frequency of categorical columns in module 1)</i>',
221+
'Which type has the most Pokemon with attack and defense scores greater than 100? <i>(Hint: Think about how we counted the frequency of categorical columns in module 1).</i>',
222222
{
223223
'Rock and Bug': 'Well done!',
224224
'Water and Rock': 'You can use <code>mighty_pokemon[\'type\'].value_counts()</code> to find out.',

modules/module2/module2-30-plotting_a_groupby_object.qmd

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,9 @@ Create a plot by chaining the following actions.
3232
import pandas as pd
3333
import altair as alt
3434
35-
pokemon = pd.read_csv('data/pokemon.csv').drop("name", axis=1)
35+
pokemon = pd.read_csv('data/pokemon.csv')
3636
37-
____ = pd.DataFrame(____.____('____').____().____[:, '____'])
37+
____ = pd.DataFrame(____.____('____').____(numeric_only=True).____[:, '____'])
3838
3939
____ = ____.____()
4040
@@ -52,8 +52,8 @@ ____ = ____.____()
5252
#| check: true
5353
from src.utils import assert_chart_equal, remove_keys_inplace
5454
55-
pokemon = pd.read_csv('data/pokemon.csv').drop("name", axis=1)
56-
pokemon_type = pd.DataFrame(pokemon.groupby('type').mean().loc[:, 'attack']).reset_index()
55+
pokemon = pd.read_csv('data/pokemon.csv')
56+
pokemon_type = pd.DataFrame(pokemon.groupby('type').mean(numeric_only=True).loc[:, 'attack']).reset_index()
5757
solution = alt.Chart(pokemon_type, width=500,
5858
height=300).mark_bar().encode(x=alt.X('type:N', sort='-y',
5959
title='Pokemon type'), y=alt.Y('attack:Q',
@@ -89,9 +89,9 @@ assert_chart_equal(solution, result)
8989
import pandas as pd
9090
import altair as alt
9191
92-
pokemon = pd.read_csv('data/pokemon.csv').drop("name", axis=1)
92+
pokemon = pd.read_csv('data/pokemon.csv')
9393
94-
pokemon_type = pd.DataFrame(pokemon.groupby('type').mean().loc[:, 'attack'])
94+
pokemon_type = pd.DataFrame(pokemon.groupby('type').mean(numeric_only=True).loc[:, 'attack'])
9595
9696
pokemon_type = pokemon_type.reset_index()
9797

modules/module2/slides/module2_25.qmd

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -37,15 +37,15 @@ We found in Module 1 using `.value_counts()` that there are 7 different manufact
3737
Let's start with "K":
3838

3939
```{python}
40-
cereal[cereal['mfr'] == 'K'].mean()[['sugars']]
40+
cereal[cereal['mfr'] == 'K'].mean(numeric_only=True)[['sugars']]
4141
```
4242

4343
<br>
4444

4545
Next "G":
4646

4747
```{python}
48-
cereal[cereal['mfr'] == 'G'].mean()[['sugars']]
48+
cereal[cereal['mfr'] == 'G'].mean(numeric_only=True)[['sugars']]
4949
```
5050

5151

@@ -154,11 +154,12 @@ Similarly to how we made frequency tables using `.value_counts()`, we can now us
154154
## Summary Statistics with Groups
155155

156156
```{python}
157-
# | inlcude: false
157+
# | include: false
158158
pd.set_option('display.max_rows', 4)
159159
```
160160

161161
```{python}
162+
mfr_group = cereal.drop(columns=["name", "type"]).groupby(by='mfr')
162163
mfr_group.mean()
163164
```
164165

@@ -190,18 +191,18 @@ Of course, using groups is not limited to finding only the mean. We can do the s
190191
## Aggregating dataframes
191192

192193
```{python}
193-
# | inlcude: false
194+
# | include: false
194195
pd.set_option('display.max_rows', 6)
195196
```
196197

197198
```{python}
198-
cereal.agg('mean')
199+
cereal.select_dtypes(include=np.number).agg('mean')
199200
```
200201

201202
<br>
202203

203204
```{python}
204-
cereal.mean()
205+
cereal.mean(numeric_only=True)
205206
```
206207

207208

@@ -216,7 +217,7 @@ Using `.agg()` with only a `mean` input is essentially the same thing as calling
216217
---
217218

218219
```{python}
219-
cereal.agg(['max', 'min', 'median'])
220+
cereal.select_dtypes(include=np.number).agg(['max', 'min', 'median'])
220221
```
221222

222223

modules/module2/slides/module2_29.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -312,7 +312,7 @@ This is a big help for the clarity of our analysis.
312312
---
313313

314314
```{python}
315-
mfr_mean = cereal.groupby(by='mfr').mean()
315+
mfr_mean = cereal.groupby(by='mfr').mean(numeric_only=True)
316316
mfr_mean
317317
```
318318

0 commit comments

Comments
 (0)