-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathcandy_data_analysis.Rmd
More file actions
138 lines (117 loc) · 3.17 KB
/
candy_data_analysis.Rmd
File metadata and controls
138 lines (117 loc) · 3.17 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
title: "candy_data_analysis"
author: "VM"
date: "11/11/2020"
output:
html_document:
number_sections: yes
toc: yes
toc_float: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Load libraries and read the file
```{r}
library(tidyverse)
full_candy_data <- read_csv("~/dirty_data_project/task4/clean_data/full_clean_candy_data.csv")
```
# Q1
## What is the total number of candy ratings given across the three years. (number of candy ratings, not number of raters. Don’t count missing values)
```{r}
full_candy_data %>%
select(year, comments) %>%
drop_na() %>%
nrow()
```
```{r}
full_candy_data %>%
select(year, comments) %>%
filter(comments == c("despair", "joy", "meh")) %>%
nrow()
```
# Q2
## What was the average age of people who are going out trick or treating and the average age of people 3. not going trick or treating?
```{r}
full_candy_data %>%
select(age, going_or_not) %>%
filter(going_or_not == "yes") %>%
summarise(avg_age_going = mean(age)) %>%
round()
full_candy_data %>%
select(age, going_or_not) %>%
filter(going_or_not == "no") %>%
summarise(avg_age_not_going = mean(age)) %>%
round()
```
# Q3
## For each of joy, despair and meh, which candy bar revived the most of these ratings?
```{r}
frequency_candies <- full_candy_data %>%
select(candies, comments) %>%
filter(comments == c("despair", "joy", "meh")) %>%
group_by(comments, candies) %>%
count()
frequency_candies %>%
group_by(comments) %>%
slice_max(order_by = n)
```
# Q4
## How many people rated Starburst as despair?
```{r}
full_candy_data %>%
filter(candies == "starburst") %>%
filter(comments == "despair") %>%
nrow()
```
# Q5
## For the next three questions, count despair as -1, joy as +1 and meh as 0.
```{r}
rating_data <- full_candy_data %>%
filter(comments == c("despair", "joy", "meh")) %>%
mutate(rating_count = case_when(
comments == "despair" ~ -1,
comments == "joy" ~ +1,
comments == "meh" ~ 0
))
rating_data
```
## What was the most popular candy bar by this rating system for each gender in the dataset?
```{r}
rating_data %>%
select(gender, candies, rating_count) %>%
filter(gender == c("male", "female")) %>%
group_by(gender,candies) %>%
summarise(total = sum(rating_count)) %>%
slice_max(total)
```
## What was the most popular candy bar in each year?
```{r}
rating_data %>%
select(year, rating_count, candies) %>%
group_by(year, candies) %>%
summarise(total = sum(rating_count)) %>%
slice_max(total)
```
## What was the most popular candy bar by this rating for people in US, Canada, UK and all other countries?
```{r}
rating_data %>%
select(country, candies, rating_count) %>%
drop_na(country) %>%
group_by(country, candies) %>%
summarise(total = sum(rating_count)) %>%
slice_max(order_by = total, with_ties = FALSE)
```
```{r}
rating_data %>%
select(country, candies, rating_count) %>%
mutate(country = case_when(
country == "uk" ~ "uk",
country == "usa" ~ "usa",
country == "canada" ~ "canada",
TRUE ~ "other"
)) %>%
group_by(country, candies) %>%
summarise(total = sum(rating_count)) %>%
slice_max(total)
```