Skip to content

Commit 6f782f5

Browse files
mountain deaths
1 parent 5664a13 commit 6f782f5

1 file changed

Lines changed: 180 additions & 0 deletions

File tree

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
---
2+
title: "Death on the Mountains | TidyTuesday"
3+
author: "Mitchell Harrison"
4+
date: "01/21/2025"
5+
categories:
6+
- "Data Viz"
7+
- "TidyTuesday"
8+
image: "../../images/thumbnails/projects/tidytuesday/01212025.jpg"
9+
---
10+
11+
# Introduction
12+
13+
Hello all! Welcome to TidyTuesday! This week, journalist Elizabeth Hawley
14+
provides us with data that documents mountaineering expeditions in the Nepal
15+
Himalaya, the mountain range that includes Mount Everest. There were a few
16+
variables that piqued my interest, so I thought we could build a model to see
17+
if any of my those variables are related to fatalities during expeditions. Let's
18+
read and clean the data!
19+
20+
```{r}
21+
#| label: read-libs-and-data
22+
23+
library(ggthemes)
24+
library(gt)
25+
library(tidyverse)
26+
27+
exped <- read_csv(
28+
paste0(
29+
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/",
30+
"main/data/2025/2025-01-21/exped_tidy.csv"
31+
)
32+
) |>
33+
janitor::clean_names()
34+
35+
peaks <- read_csv(
36+
paste0(
37+
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/",
38+
"main/data/2025/2025-01-21/peaks_tidy.csv"
39+
)
40+
) |>
41+
janitor::clean_names()
42+
43+
climbs <- exped |>
44+
left_join(peaks) |>
45+
mutate(
46+
deaths = mdeaths + hdeaths,
47+
is_fatal = if_else(deaths > 0, T, F),
48+
season_factor = factor(
49+
season_factor,
50+
levels = c("Winter", "Spring", "Summer", "Autumn")
51+
),
52+
agency = factor(agency)
53+
) |>
54+
drop_na(is_fatal)
55+
```
56+
57+
# Exploratory Analysis
58+
59+
Of course, it wouldn't be TidyTuesday without a little bit of data viz. I'm
60+
thinking that oxygen use might predict deaths, although I have absolutely no
61+
domain knowledge in the field of mountaineering. My concern is that oxygen is
62+
only used after a certain altitude, so oxygen use and mountain height may be
63+
highly correlated. Let's see if that's the case, adding in the number of days
64+
that an expedition takes as a second axis.
65+
66+
```{r}
67+
#| label: vizualize-oxygen-use
68+
69+
climbs |>
70+
mutate(o2used = if_else(o2used, "O2 used", "No O2 used")) |>
71+
ggplot(aes(x = highpoint, y = totdays, color = o2used, shape = o2used)) +
72+
geom_jitter(size = 2.5) +
73+
theme_fivethirtyeight() +
74+
scale_color_fivethirtyeight() +
75+
labs(
76+
title = "Higher altitude means oxygen use",
77+
subtitle = "Summit height, length of journey, and oxygen",
78+
x = "High point (m)",
79+
y = "Total days"
80+
) +
81+
theme(
82+
axis.title = element_text(),
83+
legend.title = element_blank()
84+
)
85+
```
86+
87+
So my assumption was correct: oxygen is mostly used at altitudes over 7,500
88+
meters. I'll add both to the model to see if one is more significant than the
89+
other. I'm also curious to see if winter expeditions are more lethal, so we'll
90+
use winter as the baseline season against which we will compare all other
91+
seasons. I'll also add the high point of the expedition, since we want to hold
92+
it constant when analyzing the effect of oxygen use.
93+
94+
Before we build our model, let's see how many of our expeditions proved fatal.
95+
96+
# The Model
97+
98+
```{r}
99+
#| label: visualize-fatalities
100+
101+
climbs |>
102+
mutate(is_fatal = if_else(is_fatal, "Fatal", "Non-Fatal")) |>
103+
group_by(is_fatal) |>
104+
summarise(prop = n() / nrow(climbs)) |>
105+
rename(Fatality = "is_fatal", Proportion = "prop") |>
106+
gt() |>
107+
fmt_number(columns = Proportion, decimals = 3) |>
108+
tab_header(md("**Expedition Fatalities**"))
109+
```
110+
111+
Uh-oh, only 4% of the expeditions were fatal. With such an imbalance between
112+
the binary outcomes, ordinary logistic regression may struggle with class
113+
imbalance. Instead, we will use Firth's logistic regression, which is designed
114+
to combat this exact problem. It will penalize the likelihood function using
115+
a penalty term related to the
116+
(Jefferys Prior)[https://en.wikipedia.org/wiki/Jeffreys_prior]. Long story
117+
short, it will help correct for the class imbalance issue. Let's build the model
118+
and see what we get!
119+
120+
*Note: For ease of interpretation, I have exponentiated the coefficients.*
121+
122+
```{r}
123+
#| label: build-model
124+
#| results: hide
125+
126+
# using Firth's logistic regression to account for few TRUE response values
127+
model <- logistf::logistf(
128+
is_fatal ~ highpoint + season_factor + o2used + totdays,
129+
data = climbs
130+
)
131+
132+
model_summary <- summary(model)
133+
results <- tibble(
134+
term = names(model_summary$coefficients),
135+
coef = exp(model_summary$coefficients),
136+
p_value = model_summary$prob
137+
)
138+
139+
results$term <- c("Intercept", "High Point", "Season | Spring",
140+
"Season | Summer", "Season | Fall", "O2 Used",
141+
"Total Days")
142+
colnames(results) <- c("Variable", "Coefficients", "P-value")
143+
```
144+
145+
```{r}
146+
#| label: display-model
147+
148+
results |>
149+
filter(Variable != "Intercept") |>
150+
gt() |>
151+
fmt_number(columns = c("Coefficients", "P-value"), decimals = 3) |>
152+
tab_header(md("**Model Results**"))
153+
```
154+
155+
# Results
156+
157+
Interpreting logistic regression can be awkward, so let's start with the
158+
p-values. At the $\alpha = 0.05$ level, only the use of oxygen shows a
159+
significant association with the fatality of an expedition. Holding season,
160+
high point, and total days constant, the use of oxygen increases the probability
161+
of a fatality by 4.78-fold. Interestingly, season has no statistically
162+
significant association with fatality when holding other variables constant.
163+
I assumed that the baseline season (winter) would be significantly more fatal,
164+
but that's not the case.
165+
166+
However, it's important to recognize the limitations of our model. Oxygen use
167+
can be caused by many factors. We have already shown that height predicts
168+
oxygen use, but so can medical emergencies or other variables that can appear
169+
during an expedition. It would be worth doing more analysis on oxygen use
170+
specifically to look for a causal relationship, rather than simple association.
171+
172+
# Conclusion
173+
174+
Thanks for your attention! Firth's logistic regression was new to me, so I got
175+
to learn from the modeling process *and* from the model itself. Lucky me! I hope
176+
you got something out of this little analysis project, and if you'd like to ask
177+
me any questions, feel free to send me a connect request on
178+
[LinkedIn](https://linkedin.com/in/harrisonme)!
179+
180+
Thanks for your attention, and I'll see you next time!

0 commit comments

Comments
 (0)