-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathDS_midterm_part1.Rmd
More file actions
113 lines (71 loc) · 1.9 KB
/
DS_midterm_part1.Rmd
File metadata and controls
113 lines (71 loc) · 1.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
```{r}
dt = read.csv("C:\\Users\\91799\\Downloads\\college.csv")
```
```{r}
View(dt)
```
```{r}
summary(dt)
```
```{r}
dt$Private <- ifelse(dt$Private == "Yes",1,0)
pairs(dt[,2:11])
```
```{r}
par(mfrow=c(4,4), mar=c(2, 2, 1, 0))
for (i in 2:17)
boxplot(dt$Outstate ~ dt$Private, main=colnames(dt[i]))
```
```{r}
dt$Elite <- dt$Top10perc > 50
dt$Elite <- ifelse(dt$Elite == "TRUE",1,0)
```
```{r}
summary(dt[, c("Top10perc", "Elite")])
boxplot(Outstate ~ Elite, data=dt)
```
```{r}
par(mfrow=c(4,4))
for (i in 2:17)
hist(dt[,i], xlab = colnames(dt)[i], breaks = 10)
```
#Exploring which universities have the most and least Top25perc, Most S.F. Ratio and plotiing Graduation rate and Top10perc to get an idea how they co-relate
```{r}
row.names(dt)[which.max(dt$Top25perc)]
row.names(dt)[which.min(dt$Top25perc)]
plot(dt$Top10perc, dt$Grad.Rate)
row.names(dt)[which.max(dt$S.F.Ratio)]
```
```{r}
auto = read.csv("C:\\Users\\91799\\Downloads\\auto.csv")
auto<- na.omit(auto)
dim(auto)
summary(auto)
View(auto)
```
```{r}
sapply(auto,class)
auto$origin <- factor(auto$origin, levels=1:3, labels=c("U.S.", "Europe", "Japan"))
sapply(auto,class)
quant <- sapply(auto, is.numeric)
quant
```
```{r}
sapply(auto[,quant], range)
```
```{r}
sapply(auto[,quant], function(x) round(c(mean(x), sd(x)), 3))
```
```{r}
result<-sapply(auto[-10:-85,quant], function(x) round(c(range(x), mean(x), sd(x)), 3))
result
```
We can see heavier weight corelates with lower mpg, more cylinders less mpg and also over time cars become more effecient
```{r}
pairs(auto[,1:7])
```
#All predictors show some kind of corelation with mpg. It is difficult to generalize but we can see that years, origin and other variables affect mpg
```{r}
with(auto, plot(mpg, year))
with(auto, plot(origin, mpg), ylab= "mpg")
```