-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathSI_CP1.RMD
More file actions
115 lines (92 loc) · 3.2 KB
/
Copy pathSI_CP1.RMD
File metadata and controls
115 lines (92 loc) · 3.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
title: "Statistical Inference CP1"
author: "Dan K. Hansen"
date: "31 May 2016"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Overview:
We will investigate the exponential distribution in R and compare it with the Central Limit Theorem (CLT).
## Assumptions:
The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter.
* The mean of exponential distribution is 1/lambda *(theoMean)*
* The standard deviation is also 1/lambda. *(stdDev)*
* Set lambda = 0.2 for all of the simulations.
* Investigate the distribution of averages of 40 exponentials. *(n)*
* We will need to do a thousand simulations. *(iterates)*
### Tasks to do
1. Show the sample mean and compare it to the theoretical mean of the distribution.
2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
3. Show that the distribution is approximately normal.
## Data Processing
First we set the constants and variables
```{r}
lambda <- 0.2
iterates <- 1000
n <- 40
theoMean <- 1/lambda
stdDev <- 1/lambda
```
Create an exponential distribution: *rexp(n,lambda)*
take the mean of this distribution: *mean(rexp(n,lambda))*
Repeat 1000 times, and put it in a vector of means *(vecMeans)*
***Initialise vecMeans:***
```{r}
vecMeans = NULL
```
As we are generating random numbers, we set the seed so we can replicate the analysis later on
```{r}
set.seed(123456)
```
***Run the simulation:***
```{r}
for (i in 1 : iterates) vecMeans = c(vecMeans, mean(rexp(n,lambda)))
```
Draw a histogram showing the density (total area is equal to one).
This will make it more logical when comparing to a normal distribution later on.
```{r}
hist(vecMeans,freq = FALSE)
```
**Question 1: Show the sample mean and compare it to the theoretical mean of the distribution**
***Sample mean (or mean of means):***
```{r}
mean(vecMeans)
```
***The theoretical mean:*** 1/lambda
```{r}
1/lambda
```
***Comparison:***
```{r}
mean(vecMeans) - 1/lambda
```
It is pretty clear that the theoretical mean compares quite nice to the simulated mean.
**Question 2: Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution:**
***The sample variance:***
```{r}
var(vecMeans)
```
***Theoretical variance:*** stdDev^2 / n
```{r}
stdDev^2 / n
```
***Comparison:***
```{r}
var(vecMeans) - stdDev^2 / n
```
Again, it is pretty clear that the theoretical variance compares quite nice to the simulated variance
**Question 3: Show that the distribution is approximately normal:**
CLT says: the arithmetic mean of a sufficiently large number of iterates of
independent random variables will be approximately normally distributed.
We do that by overlaying a normal distribution with the same mean and ranges
as the histogram:
```{r}
hist(vecMeans,freq = FALSE)
xrange <- seq(from=min(vecMeans), to=max(vecMeans), length.out=n)
yrange <- dnorm(x=xrange, mean=theoMean, sd=stdDev/sqrt(n))
lines(x=xrange, y=yrange, lty=1, col="red", lwd=2)
```
## Conclusion:
Comparing the normal distribution "bell-curve" with the density histogram of the "mean of means", shows quite illustrative the proof of the Central Limit Theorem (CLT).