-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathexercises.qmd
More file actions
247 lines (182 loc) · 7.78 KB
/
exercises.qmd
File metadata and controls
247 lines (182 loc) · 7.78 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
---
title: "Types and Functions"
format: html
---
```{r}
#| label: setup
library(tidyverse)
library(gapminder)
```
## Your Turn 1
1. Create a character vector of colors using `c()`. Use the colors "grey90" and "steelblue". Assign the vector to a name.
2. Use the vector you just created to change the colors in the plot below using `scale_color_manual()`. Pass it using the `values` argument.
```{r}
cols <- ________
gapminder |>
mutate(rwanda = ifelse(country == "Rwanda", TRUE, FALSE)) |>
ggplot(aes(year, lifeExp, color = rwanda, group = country)) +
geom_line() +
____________ +
theme_minimal()
```
## Your Turn 2
1. Create a numeric vector that has the following values: 3, 5, NA, 2, and NA.
2. Try using `sum()`. Then add `na.rm = TRUE`.
3. Check which values are missing with `is.na()`; save the results to a new object and take a look
4. Change all missing values of `x` to -9999
5. Try `sum()` again without `na.rm = TRUE`
```{r}
x <- _____________
___(x)
___(x, ___)
x_missing <- _______
x_missing
_________ <- -9999
x
```
## Your Turn 3
1. Create a function called `sim_data` that doesn't take any arguments.
2. In the function body, we'll return a `tibble`.
3. For `x`, have `rnorm()` return 50 random numbers.
4. For `sex`, use `rep()` to create 50 values of "male" and "female". Hint: You'll have to give `rep()` a character vector for the first argument. The `times` argument is how many times `rep()` should repeat the first argument, so make sure you account for that.
5. For `age` use the `sample()` function to sample 50 numbers from 25 to 50 with replacement.
6. Call `sim_data()`
```{r}
______ ___ ______() {
tibble(
x = ______(50),
sex = ______(___, times = ___),
age = ______(___, size = 50, replace = TRUE)
)
}
_________
```
## Your Turn 3: Bonus!
Did you finish early? Try this extra exercise
1. Set a seed for the random number generator using `set.seed()`. Check the help page if you're not sure how. Hint: use `set.seed()` before using `tibble()`. You can put it before the function or within it.
2. Call `sim_data()`
3. Press the green play button several times. What's happening here?
```{r}
```
## Your Turn 4
1. Write a function to calculate an E-Value given an RR. Call the function `evalue` and give it an argument called `estimate`. In the body of the function, calculate the E-Value using `estimate + sqrt(estimate * (estimate - 1))`
2. Call `evalue()` for a risk ratio of 3.9
```{r}
evalue(___)
```
# Your Turn 5
1. Use `if ()` together with `is.numeric()` to make sure `estimate` is a number. Remember to use `!` for not.
2. If the estimate is less than 1, set `estimate` to be equal to `1 / estimate`.
3. Call `evalue()` for a risk ratio of 3.9. Then try 0.80. Then try a character value.
```{r}
evalue <- function(estimate) {
if (_________) stop("`estimate` must be numeric")
if (_________) _________ <- _________
estimate + sqrt(estimate * (estimate - 1))
}
evalue(3.9)
evalue(.80)
evalue("3.9")
```
# Your Turn 6
1. Add a new argument called `type`. Set the default value to "rr"
2. Check if `type` is equal to "or". If it is, set the value of `estimate` to be `sqrt(estimate)`
3. Call `evalue()` for a risk ratio of 3.9. Then try it again with `type = "or"`.
```{r}
evalue <- function(estimate, _______) {
if (!is.numeric(estimate)) stop("`estimate` must be numeric")
if (_______) _______ <- _______
if (estimate < 1) estimate <- 1 / estimate
estimate + sqrt(estimate * (estimate - 1))
}
evalue(3.9)
evalue(3.9, type = "or")
```
# Your Turn 7: Challenge!
1. Create a new function called `transform_to_rr` with arguments `estimate` and `type`.
2. Use the same code above to check if `type == "or"` and transform if so. Add another line that checks if `type == "hr"`. If it does, transform the estimate using this formula: `(1 - 0.5^sqrt(estimate)) / (1 - 0.5^sqrt(1 / estimate))`.
3. Move the code that checks if `estimate < 1` to `transform_to_rr` (below the OR and HR transformations)
4. Return `estimate`
5. In `evalue()`, change the default argument of `type` to be a character vector containing "rr", "or", and "hr".
6. Get and validate the value of `type` using `match.arg()`. Follow the pattern `argument_name <- match.arg(argument_name)`
7. Transform `estimate` using `transform_to_rr()`. Don't forget to pass it both `estimate` and `type`!
```{r}
evalue <- function(estimate, type = c("rr", "or", "hr")) {
# validate arguments
if (!is.numeric(estimate)) stop("`estimate` must be numeric")
____ <- ___________
# calculate evalue
estimate <- __________
estimate + sqrt(estimate * (estimate - 1))
}
evalue(3.9)
evalue(3.9, type = "or")
evalue(3.9, type = "hr")
evalue(3.9, type = "rd")
```
## Your Turn 8
1. Use `...` to pass the arguments of your function, `filter_summarize()`, to `filter()`.
2. In summarize, get the n and mean life expectancy for the data set
3. Check `filter_summarize()` with `year == 1952`.
4. Try `filter_summarize()` again for 2002, but also filter countries that have the word " and " in the country name (e.g., it should detect "Trinidad and Tobago" but not "Iceland"). Use `str_detect()` from the stringr package.
```{r}
filter_summarize <- function(___) {
gapminder |>
filter(___) |>
summarize(n = ___, mean_lifeExp = ________)
}
filter_summarize(_______)
filter_summarize(_______, _______(country, " and "))
```
## Your turn 9
1. Filter gapminder by `year` using the value of `.year` (notice the period before hand!). You do NOT need curly-curly for this. (Why is that?)
2. Arrange it by the variable. This time, *do* wrap it in curly-curly!
3. Make a scatter plot. Use `variable` for x. For y, we'll use `country`, but to keep it in the order we arranged it by, we'll turn it into a factor. Wrap the the `factor()` call with `fct_inorder()`. Check the help page if you want to know more about what this is doing.
```{r}
top_scatter_plot <- function(variable, .year) {
gapminder |>
___________ |>
arrange(desc(______)) |>
# take the 10 lowest values
tail(10) |>
ggplot(aes(x = ______, y = ______(factor(country)))) +
______() +
theme_minimal()
}
top_scatter_plot(lifeExp, 2002)
# add more ggplot layers just like any other plot!
top_scatter_plot(lifeExp, 2002) +
labs(x = "Life Expectancy", y = "Country")
```
## Your turn 9: Bonus!
Finish early? Try this exercise.
1. Filter by `year` by the value of `.year`
2. Group by `continent`
3. Summarize the number of countries per continent. Call the new variable `n_countries`
4. Arrange the data `n_countries`
5. Make a bar plot with `y` set to `n_countries` and `x` set to `continent.` Make `continent` an in-order factor using the pattern above (`fct_inorder(factor(var))`). Use `geom_col()` instead of `geom_bar()` since you are specifying two variables.
6. Set the theme to `theme_minimal()`.
7. Set the axis title for y to `element_blank()` using `theme()`
8. Use `paste()` to make a label for the y axis. (Wait, didn't we just turn that off? Hang on...). For instance, for 2002, the axis should read something like "n Countries in 2002".
9. Rotate the plot with `coord_flip()`
10. Finally, plot the number of countries per continent for 2002
```{r}
plot_n_countries <- function(.year) {
gapminder |>
}
plot_n_countries(2002)
```
***
# Take aways
* Vectors in R each have a type. These include numeric, integer, character, factor, logical, and date.
* You can subset vectors or modify them using brackets like `x[1]` and `x[[1]]`
* Writing functions in R follows this pattern:
```
function_name <- function(arg1, arg2 = NULL) {
# compute something to return
# return at the last line
# or anywhere using `return()`
}
```
* Functions are objects in R like everything else!
* To program with tidyverse functions, use `...` or curly-curly in the pattern of `{{ x }}`. This lets us use bare names like we normally do with these functions.