You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: projects/analyze-baseball-stats-with-pandas-and-matplotlib/analyze-baseball-stats-with-pandas-and-matplotlib.mdx
+26-19Lines changed: 26 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -346,27 +346,27 @@ We can now find the "value" of a player by calculating their OBP divided by thei
346
346
347
347
```py
348
348
value_df = batting_with_salary[
349
-
(batting_with_salary["salary"] >0) &
350
-
(batting_with_salary["OBP"] >0) &
351
-
(batting_with_salary["AB"] >=200)
349
+
(batting_with_salary['salary'] >0) &
350
+
(batting_with_salary['OBP'] >0) &
351
+
(batting_with_salary['AB'] >=200)
352
352
].copy()
353
353
```
354
354
355
355
We now have a DataFrame named `value_df` that contains only the rows we're interested in. Let's calculate each player's value and sort by the highest value players! We'll display only the columns that are relevant to us.
356
356
357
357
```py
358
358
value_df_sorted = value_df.sort_values(
359
-
by="OBP_per_dollar",
360
-
ascending=False
359
+
by='OBP_per_dollar',
360
+
ascending=False
361
361
)
362
362
363
363
value_df_sorted [[
364
-
"playerID",
365
-
"yearID",
366
-
"teamID",
367
-
"OBP",
368
-
"salary",
369
-
"OBP_per_dollar"
364
+
'playerID',
365
+
'yearID',
366
+
'teamID',
367
+
'OBP',
368
+
'salary',
369
+
'OBP_per_dollar'
370
370
]].head()
371
371
```
372
372
@@ -376,21 +376,25 @@ If we wanted to see if our calculations were working correctly, we could choose
I went to look up `heywaja01` on baseball-reference.com, and it turns out that this data was from a player named Jason Heyward. In 2010, he was an All Star and got 2nd place in voting for Rookie of the Year! It certainly sounds like a player that was high value. You can also confirm that our calculation of OBP was correct!
388
+
I went to look up `heywaja01` on [baseball-reference.com](https://baseball-reference.com), and it turns out that this data was from a player named Jason Heyward. In 2010, he was an All Star and got 2nd place in voting for Rookie of the Year! It certainly sounds like a player that was high value. You can also confirm that our calculation of OBP was correct!
389
389
390
390
391
391
## Recap
392
392
393
-
Clearly there is a _ton_ that you can do with this dataset. In this project, we practiced the following skills in Pandas:
393
+
Congrats on making it to the end!
394
+
395
+
Clearly there is a _ton_ that you can do with this dataset.
396
+
397
+
To recap, in this project tutorial, we practiced the following skills in Pandas:
394
398
395
399
- Initial data exploration using `.describe()` to see summary statistics.
396
400
- Filtering the dataset by boolean values (for example, `[value_df_sorted['yearID'] ==2010`)
@@ -399,3 +403,6 @@ Clearly there is a _ton_ that you can do with this dataset. In this project, we
399
403
- Using `.merge()` to join two tables together.
400
404
401
405
Do you have any favorite players or teams? Shohei Ohtani? Aaron Judge? The Chicago Cubs? We hope that you come up with your own questions about baseball and use your Python and Pandas skills to answer those questions!
0 commit comments