|
| 1 | +--- |
| 2 | +title: "Can Midseason Records Predict WNBA Playoff Teams?" |
| 3 | +date: "2026-05-27" |
| 4 | +author: |
| 5 | + - name: Kristen Varin |
| 6 | + affiliation: |
| 7 | + - id: slu |
| 8 | + name: St. Lawrence University |
| 9 | + - name: Ivan Ramler |
| 10 | + email: iramler@stlawu.edu |
| 11 | + affiliation: |
| 12 | + - ref: slu |
| 13 | +description: "Clean WNBA team-game data and investigate whether teams ranked in the top eight midway through the season tend to make the playoffs." |
| 14 | + |
| 15 | +editor: |
| 16 | + canonical: true |
| 17 | +categories: |
| 18 | + - Data Cleaning |
| 19 | + - Missing Data |
| 20 | + - Frequency Tables |
| 21 | +software: |
| 22 | + - R |
| 23 | + - Python |
| 24 | +image: WNBA_Barnstar.png |
| 25 | +--- |
| 26 | + |
| 27 | +## Module |
| 28 | + |
| 29 | +Please note that these material have not yet completed the required pedagogical and industry peer-reviews to become a published module on the SCORE Network. However, instructors are still welcome to use these materials if they are so inclined. |
| 30 | + |
| 31 | +### Introduction |
| 32 | + |
| 33 | +The WNBA worksheet introduces the idea of using team records from earlier in the season to predict which teams will make the playoffs. Most of the time, we would expect teams with better records halfway through the season to have a higher chance of making the playoffs, but is this always the case? Just as importantly, how do we know whether the data we are using are complete and reliable enough for this analysis? By completing this worksheet, you will work through several data cleaning steps involving WNBA team-game data and then use the cleaned data to investigate the relationship between midseason performance and playoff outcomes. |
| 34 | + |
| 35 | + |
| 36 | +::: {.callout-note collapse="true" title="Background on the WNBA" appearance="minimal"} |
| 37 | +The Women's National Basketball Association (WNBA) is a professional basketball league in the United States that was founded in 1996. It was established by the NBA as a counterpart to promote women's basketball at the professional level. The league's inaugural season began in 1997 with eight teams. |
| 38 | + |
| 39 | +Throughout its history, the WNBA has been a pioneering force in women's sports, providing a platform for talented athletes to showcase their skills and inspire fans globally. The league has expanded and contracted over the years and, typically consisting of 12 or 13 teams. As of 2026, the league consists of 15 teams with plans to expand further. The table below shows the size of the league since its inception. |
| 40 | + |
| 41 | +<details> |
| 42 | + |
| 43 | +<summary><b> WNBA expansion and contraction</b></summary> |
| 44 | + |
| 45 | +| Season(s) | No. of teams | |
| 46 | +|-------------|--------------| |
| 47 | +| 1997 | 8 | |
| 48 | +| 1998 | 10 | |
| 49 | +| 1999 | 12 | |
| 50 | +| 2000–2002 | 16 | |
| 51 | +| 2003 | 14 | |
| 52 | +| 2004–2005 | 13 | |
| 53 | +| 2006 | 14 | |
| 54 | +| 2007 | 13 | |
| 55 | +| 2008 | 14 | |
| 56 | +| 2009 | 13 | |
| 57 | +| 2010–2024 | 12 | |
| 58 | +| 2025 | 13 | |
| 59 | +| 2026 | 15 | |
| 60 | + |
| 61 | +</details> |
| 62 | + |
| 63 | +The WNBA's playoff structure typically features the top eight teams from the regular season standings advancing to the postseason. The playoffs are organized into single-elimination rounds, culminating in the WNBA Finals, where the last two teams standing compete in a best-of-five series to determine the league champion. |
| 64 | + |
| 65 | +Beyond its competitive play, the WNBA has been a leader in promoting social justice initiatives and advocating for equality both on and off the court. It continues to grow in popularity and influence, contributing significantly to the growth of women's basketball worldwide. |
| 66 | + |
| 67 | +For additonal background about the WNBA, please watch the following video: |
| 68 | + |
| 69 | +::: {.callout-note collapse="true" title="WNBA Format Video" appearance="minimal"} |
| 70 | +The video below provides a brief overview of the WNBA league structure and playoff format. |
| 71 | + |
| 72 | +<iframe width="560" height="315" src="https://www.youtube.com/embed/L7xwOsGFga8?si=NzZkTZcUk4kEGZ3m" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> |
| 73 | +::: |
| 74 | + |
| 75 | +::: |
| 76 | + |
| 77 | +::: {.callout-note collapse="true" title="Activity Length" appearance="minimal"} |
| 78 | +Depending on the background of the student, this activity is designed for approximately **50–75 minutes** of class time or as an outside of class activity. |
| 79 | +::: |
| 80 | + |
| 81 | + |
| 82 | +::: {.callout-note collapse="true" title="Learning Objectives" appearance="minimal"} |
| 83 | +By the end of this activity, you will be able to: |
| 84 | + |
| 85 | +- Use various `dplyr`, `tidyr`, and `lubridate` package functions to clean a data set for further use. |
| 86 | + |
| 87 | +- Identify data quality issues, such as inconsistent team names and missing game records. |
| 88 | + |
| 89 | +- Explain how missing or incomplete data can affect an analysis. |
| 90 | + |
| 91 | +- Create and interpret a two-way table that use custom built variables. |
| 92 | +::: |
| 93 | + |
| 94 | + |
| 95 | + |
| 96 | +::: {.callout-note collapse="true" title="Methods" appearance="minimal"} |
| 97 | +Technology requirement: |
| 98 | + |
| 99 | +- `R` version: The activity handout requires knowledge of Quarto and the following tidyverse packages: `dplyr`, `tidyr`, `lubridate`, and `ggplot`. |
| 100 | + |
| 101 | +- `Python` version: The activity handout requires knowledge of Jupyter notebooks and the following packages: `pandas`, `numpy`, `plotnine`, and `statsmodels`. |
| 102 | +::: |
| 103 | + |
| 104 | +### Data |
| 105 | + |
| 106 | +The `wnba_data` data set contains 8920 rows and 9 columns. Each row represents a game played by a WNBA team in one of the 2003 to 2022 regular seasons. Thus, each game is associated with two rows: one for each team. The columns are as follows: |
| 107 | + |
| 108 | +<details> |
| 109 | + |
| 110 | +<summary><b> Data: Variable Descriptions</b></summary> |
| 111 | + |
| 112 | +| **Variable** | **Description** | |
| 113 | +|-------------------|---------------------------------------------------------------| |
| 114 | +| game_id | game id number | |
| 115 | +| season | season number | |
| 116 | +| season_type | binary predictor; 2 if regular season game; 3 if playoff game | |
| 117 | +| game_date | date of the game | |
| 118 | +| team_id | team id number | |
| 119 | +| team_display_name | full team name (name and city) | |
| 120 | +| team_winner | Boolean; True if the team won the game | |
| 121 | +| opponent_team_id | id number of the opponent | |
| 122 | +| team_home_away | Where the game was played; either "home" or "away" | |
| 123 | + |
| 124 | +</details> |
| 125 | + |
| 126 | +Download data: [wnba_data.csv](wnba_data.csv) |
| 127 | + |
| 128 | +#### Data Source |
| 129 | + |
| 130 | +Gilani S, Hutchinson G (2022). *wehoop: Access Women's Basketball Play by Play Data*. R package version 1.5.0, <https://CRAN.R-project.org/package=wehoop>. |
| 131 | + |
| 132 | +### Materials |
| 133 | + |
| 134 | +We offer worksheets (and their solutions) in Quarto (using `R`) and Jupyter Notebook (using `python`) formats. |
| 135 | + |
| 136 | +**R versions** |
| 137 | + |
| 138 | +[Class handout - Quarto](wnba_boxscore_worksheet-R.qmd) |
| 139 | + |
| 140 | +[Class handout - Quarto - with solutions](wnba_boxscore_worksheet_key-R.qmd) |
| 141 | + |
| 142 | + |
| 143 | +**Python versions** |
| 144 | + |
| 145 | +[Class handout - Quarto](wnba_boxscore_worksheet-Python.ipynb) |
| 146 | + |
| 147 | +[Class handout - Quarto - with solutions](wnba_boxscore_worksheet_key-Python.ipynb) |
| 148 | + |
| 149 | + |
| 150 | +::: {.callout-note collapse="true" title="Conclusion" appearance="minimal"} |
| 151 | +Exploration of the WNBA data revealed that some seasons had incomplete (i.e., were missing) game records. We identified this issue by tallying the number of games recorded for each team within each season and noticing that the totals were occassionally inconsistent across teams. |
| 152 | + |
| 153 | +After identifying these data quality issues, we considered several possible ways to continue the analysis. For example, we could remove seasons with incomplete records, manually fill in missing games using outside sources, or use a different data source. Each approach involves tradeoffs between simplicity, accuracy, and the amount of additional work required. |
| 154 | + |
| 155 | +After choosing a data-cleaning strategy, we used the cleaned data to create a two-way table comparing whether teams were ranked in the top eight at midseason with whether they ultimately made the playoffs. This analysis illustrates how data cleaning decisions can directly affect the conclusions we draw from sports data. |
| 156 | +::: |
| 157 | + |
| 158 | +### Acknowledgements |
| 159 | + |
| 160 | +Thumbnail image: “WNBA Barnstar.png” by Mungo Kitsch, licensed under CC BY-SA 4.0 via [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:WNBA_Barnstar.png){target="_blank"}. The image incorporates a WNBA logo element; use does not imply endorsement by the WNBA. |
| 161 | + |
0 commit comments