You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: R/README.md
+53Lines changed: 53 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -473,3 +473,56 @@ legend("topright",
473
473
col= c("blue", "red"),
474
474
lty=1)
475
475
```
476
+
477
+
## Express a Statistical Model with `R`
478
+
479
+
- Use factor or ordered to represent categorical variables
480
+
- Use `lm` for linear regression, `glm` for generalized linear models
481
+
-`~` operator means "is modeled by": `Gas ~ Temp`
482
+
-`+` operator means "add another term": `Gas ~ Temp + Insul`
483
+
-`:` operator means "have an interactive term": `Gas ~ Temp + Insul + Temp:Insul`
484
+
-`*` operator means "have both main and interactive terms": `Gas ~ Temp * Insul` is equivalent to `Gas ~ Temp + Insul + Temp:Insul`
485
+
-`-l` means "remove or exclude term": `Gas ~ Temp + Insul -l`
486
+
-`^` operator means "limit depth of interaction": `Gas ~ (Temp + Insul)^2` is equivalent to `Gas ~ Temp + Insul + Temp:Insul`
487
+
-`%in%` operator means "nesting": `effect ~ teacher + school + teacher %in% school` means that the effect is modeled by teacher and school, but teachers are nested within schools
488
+
-`/` operator means "main effect and nesting": `effect ~ school + teacher/school` is equivalent to `effect ~ teacher + school + teacher %in% school`
489
+
490
+
## Common Arguments to Modeling Functions
491
+
492
+
-`data` to specify the data frame containing the variables
493
+
- L.H.S of `~`: dependent variable (response variable)
494
+
- R.H.S of `~`: independent variables (predictor variables)
495
+
-`.`: include all other variables in the `data.frame` as predictors
496
+
-`lm(Gas ~ ., data = whiteside)`
497
+
-`lm(Gas ~ . ^2, data = whiteside)`
498
+
- Subset argument `subset = Gas > 2 & Gas < 5`
499
+
- Weights argument `weights = 1 / (Temp^2)` to give more weight to observations with smaller `Temp` values
500
+
-`na.action` to specify how to handle missing values, e.g., `na.omit` to exclude rows with missing values
501
+
-`na.fail` to throw an error if there are missing values
502
+
-`na.exclude` to exclude missing values from the analysis but keep them in the residuals and fitted values
503
+
-`na.include` to include missing values in the analysis, treating them as a separate category
504
+
505
+
## Random Number Generation (RNG)
506
+
507
+
-`set.seed` to set the seed for reproducibility
508
+
- A key element of a Monte Carlo simulation requires a good quality of RNG
509
+
- More specifically, one needs a good uniform or normal RNG
510
+
- Almost all other distributions can be implemented by using uniform RNG as a source
511
+
- How to generate a random number without a computer (pre-computer age)?
512
+
- Lottery 649
513
+
- A book with pre-printed "random numbers"
514
+
- Drawbacks: slow, limited quantity, not reproducible
515
+
- "True" RNG
516
+
- Quantum mechanics: quantum unpredictability leads to true RNG
517
+
- Physical phenomena without quantum mechanics
518
+
- Thermal noise from resistors; later 1999 Intel CPUs contain such circuit
519
+
- Atmospheric noise detected by radio receiver
520
+
521
+
### Mersenne-Twister RNG (R’s default RNG)
522
+
523
+
- Pseudorandom number generator developed in 1997 by Matsumoto and Nishimura
524
+
- Period: 2^19937 − 1
525
+
- Seed: a 624-dimensional set of 32-bit integers plus a current position in that set
526
+
- The Mersenne Twister is designed with Monte Carlo simulations and other statistical simulations in mind
527
+
- For non-parallel RNG, this is probably the best RNG
0 commit comments