Skip to content

Commit a12d9f4

Browse files
XuhuaHuangCopilot
andcommitted
Study modeling and RNG
Co-authored-by: Copilot <copilot@github.com>
1 parent a90603e commit a12d9f4

1 file changed

Lines changed: 53 additions & 0 deletions

File tree

R/README.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -473,3 +473,56 @@ legend("topright",
473473
col = c("blue", "red"),
474474
lty = 1)
475475
```
476+
477+
## Express a Statistical Model with `R`
478+
479+
- Use factor or ordered to represent categorical variables
480+
- Use `lm` for linear regression, `glm` for generalized linear models
481+
- `~` operator means "is modeled by": `Gas ~ Temp`
482+
- `+` operator means "add another term": `Gas ~ Temp + Insul`
483+
- `:` operator means "have an interactive term": `Gas ~ Temp + Insul + Temp:Insul`
484+
- `*` operator means "have both main and interactive terms": `Gas ~ Temp * Insul` is equivalent to `Gas ~ Temp + Insul + Temp:Insul`
485+
- `-l` means "remove or exclude term": `Gas ~ Temp + Insul -l`
486+
- `^` operator means "limit depth of interaction": `Gas ~ (Temp + Insul)^2` is equivalent to `Gas ~ Temp + Insul + Temp:Insul`
487+
- `%in%` operator means "nesting": `effect ~ teacher + school + teacher %in% school` means that the effect is modeled by teacher and school, but teachers are nested within schools
488+
- `/` operator means "main effect and nesting": `effect ~ school + teacher/school` is equivalent to `effect ~ teacher + school + teacher %in% school`
489+
490+
## Common Arguments to Modeling Functions
491+
492+
- `data` to specify the data frame containing the variables
493+
- L.H.S of `~`: dependent variable (response variable)
494+
- R.H.S of `~`: independent variables (predictor variables)
495+
- `.`: include all other variables in the `data.frame` as predictors
496+
- `lm(Gas ~ ., data = whiteside)`
497+
- `lm(Gas ~ . ^2, data = whiteside)`
498+
- Subset argument `subset = Gas > 2 & Gas < 5`
499+
- Weights argument `weights = 1 / (Temp^2)` to give more weight to observations with smaller `Temp` values
500+
- `na.action` to specify how to handle missing values, e.g., `na.omit` to exclude rows with missing values
501+
- `na.fail` to throw an error if there are missing values
502+
- `na.exclude` to exclude missing values from the analysis but keep them in the residuals and fitted values
503+
- `na.include` to include missing values in the analysis, treating them as a separate category
504+
505+
## Random Number Generation (RNG)
506+
507+
- `set.seed` to set the seed for reproducibility
508+
- A key element of a Monte Carlo simulation requires a good quality of RNG
509+
- More specifically, one needs a good uniform or normal RNG
510+
- Almost all other distributions can be implemented by using uniform RNG as a source
511+
- How to generate a random number without a computer (pre-computer age)?
512+
- Lottery 649
513+
- A book with pre-printed "random numbers"
514+
- Drawbacks: slow, limited quantity, not reproducible
515+
- "True" RNG
516+
- Quantum mechanics: quantum unpredictability leads to true RNG
517+
- Physical phenomena without quantum mechanics
518+
- Thermal noise from resistors; later 1999 Intel CPUs contain such circuit
519+
- Atmospheric noise detected by radio receiver
520+
521+
### Mersenne-Twister RNG (R’s default RNG)
522+
523+
- Pseudorandom number generator developed in 1997 by Matsumoto and Nishimura
524+
- Period: 2^19937 − 1
525+
- Seed: a 624-dimensional set of 32-bit integers plus a current position in that set
526+
- The Mersenne Twister is designed with Monte Carlo simulations and other statistical simulations in mind
527+
- For non-parallel RNG, this is probably the best RNG
528+
- http://en.wikipedia.org/wiki/Mersenne_twister

0 commit comments

Comments
 (0)