PremPredict/README.Rmd at main · p0bs/PremPredict · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# PremPredict <a href="https://p0bs.github.io/PremPredict/"><img src="man/figures/logo.png" align="right" height="104" alt="PremPredict website" /></a>

<!-- badges: start -->
  [![R-CMD-check](https://github.com/p0bs/PremPredict/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/p0bs/PremPredict/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/p0bs/PremPredict/graph/badge.svg)](https://app.codecov.io/gh/p0bs/PremPredict)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
<!-- badges: end -->

The `PremPredict` package helps you to generate sensible predictions for individual games or an entire season of the Premier League.

You can find my automatically-updated Premier League predictor (that uses this codebase) on [the landing page of its repo](https://github.com/p0bs/PL-scan?tab=readme-ov-file#predicting-this-seasons-premier-league).

<br/>

## Installation

You can install the development version of PremPredict from [GitHub](https://github.com/) with:

``` r
# install.packages("pak")
pak::pak("p0bs/PremPredict")
```

<br/>

## Approach

I use a simplified version of [David Firth's approach](https://github.com/DavidFirth/alt3code) and data from the [Open Football repo](https://github.com/openfootball/football.json) on GitHub to predict the outcome of this season's Premier League.

The predictions are based on a team's strength, given its performance in recent times. But how should we define 'recent'? In order to duck this question, you could choose a number of different time periods. Please also see some further disclaimers that are in these notes for my automatically-updated Premier League predictor (as linked above).

<br/>

## Example

Here is an example analysis, using data collected towards the end of the 2025/26 season.

First, we collect, combine and tidy the results data.

```{r}
library(PremPredict)
data("example_thisSeason")

results_combined <- get_results(
  results_thisSeason = example_thisSeason,
  seasons = 1L
  )

dim(results_combined)
```

Note that we want to look back across this season (so far) and its predecessor.

```{r}
game_latest <- calc_game_latest(results = results_combined)

results_filtered <- get_results_filtered(
  results = results_combined,
  index_game_latest = game_latest,
  lookback_rounds = 76L
  )

dplyr::glimpse(results_filtered)
```
For reference, we can see the prevailing table.

```{r, message=FALSE}
data_table_current <- example_thisSeason |>
  calc_table_current()

data_table_current |>
  print_table_current()
```

We can now model the strengths of the sides at home and away.

```{r}
data_model <- results_filtered |>
  model_prepare_frame() |>
  model_run()

data_model
```

Next, we use these team strengths to model future games across the season.

```{r}
data_parameters_unplayed <- data_model |>
  model_extract_parameters()

data_model_parameters_unplayed <- model_parameters_unplayed(
  results = results_filtered,
  model_parameters = data_parameters_unplayed
  )

data_points_expected_remaining <- data_model_parameters_unplayed |>
  calc_points_expected_remaining()

calc_points_expected_total(
  table_current = data_table_current,
  points_expected = data_points_expected_remaining
  ) |>
  knitr::kable()
```

On this basis, we can see who looks like favourites to win the season.

In order to project the likelihood of these favourites becoming champions, though, we need to simulate many possible outcomes.

```{r}
number_simulations <- 100000

data_simulate_games <- simulate_games(
  data_model_parameters_unplayed = data_model_parameters_unplayed,
  value_number_sims = number_simulations,
  value_seed = 2602L
  )

data_simulate_standings <- simulate_standings(
  data_game_simulations = data_simulate_games,
  data_table_latest = data_table_current
  )

simulate_outcomes(
  data_standings_simulations = data_simulate_standings,
  value_number_sims = number_simulations
  ) |>
  knitr::kable()
```

Alternatively, this table can be generated, without calculating all intermediate steps, by running `run_simulations`.