|
1 | 1 | # Statistical Programming |
2 | 2 |
|
3 | | -## Setup: Installing R and Tools |
| 3 | +## Setup: Installing `R` and Tools |
4 | 4 |
|
5 | 5 | ```r |
6 | 6 | install.packages("ggplot2") |
@@ -68,14 +68,14 @@ A complete expression is any typed expression that falls into one of the followi |
68 | 68 | - Control flow statements |
69 | 69 | - Grouping statements |
70 | 70 |
|
71 | | -## R Sessions |
| 71 | +## `R` Sessions |
72 | 72 | `setwd` and `getwd` |
73 | 73 |
|
74 | 74 | Use `?Syntax` to find operator syntax |
75 | 75 |
|
76 | 76 | If unsure of operator precedence, use parentheses. |
77 | 77 |
|
78 | | -## R Objects |
| 78 | +## `R` Objects |
79 | 79 |
|
80 | 80 | All objects created in command line (`RStudio` calls it console) are saved |
81 | 81 | in `.GlobalEnv`. |
@@ -110,3 +110,112 @@ attributes(cars) |
110 | 110 | names(cars) |
111 | 111 | dim(cars) |
112 | 112 | ``` |
| 113 | + |
| 114 | +## `R` Data Objects |
| 115 | + |
| 116 | +- Vectors |
| 117 | + - **Simplest** data object starting at index `1` |
| 118 | + - **Ordered** set of values (numeric or character) |
| 119 | + - `scan`, `c`, `rep`, `:`, `seq` |
| 120 | + - `length`, `mode`, `class`, `names` |
| 121 | + - conversion operations: `as.integer`, `as.double` |
| 122 | + - verification operations: `is.integer`, `is.double`, `is.character` |
| 123 | + - statistics: `max`, `min`, `mean`, `var`, `sd` |
| 124 | + - mathematics: `sum`, `rank`, `order`, `round`, `floor`, `ceiling`, `abs`, `sqrt`, `exp`, `sin`, `sign`, `log`, `prod` |
| 125 | + |
| 126 | +- Matrices |
| 127 | + - **Two-dimensional** data object |
| 128 | + - `matrix`, `rbind`, `cbind` |
| 129 | + - `length`, `dim`, `dimnames`, `nrow`, `ncol` |
| 130 | + - indexing: `A[i, j]`, `x[1, 2:3]`, `x[1:2, 3]`, `x[1,]`, `x[, 2]` |
| 131 | + - arithmetic: `+`, `-`, `*`, `/` |
| 132 | + - matrix algebra: `%*%`, `t`, `solve` |
| 133 | + |
| 134 | +- `data.frame` |
| 135 | + - **Tabular** data object (all objects have the same length) |
| 136 | + - `data.frame`, `read.table`, `read.csv`, `as.data.frame`, `cbind`, `rbind`, `merge` |
| 137 | + - `nrow`, `ncol`, `names`, `str` |
| 138 | + - `is.data.frame`, `is.matrix` |
| 139 | + - `length`, `mode`, `class`, `names`, `attributes`, `row.names` |
| 140 | + - indexing: `df[i, j]`, `df$colname`, `df[["colname"]]` |
| 141 | + |
| 142 | +- Arrays |
| 143 | + - **Multi-dimensional** homogeneous data object (all objects have the same type) |
| 144 | + - `list` |
| 145 | + - `array`, `as.array` |
| 146 | + - `dim`, `dimnames` |
| 147 | + - indexing: `A[i, j, k]` |
| 148 | + |
| 149 | +- Lists |
| 150 | + - **Heterogeneous** data object (different types of objects) |
| 151 | + - `list`, `as.list` |
| 152 | + - `length`, `mode`, `class`, `attributes` |
| 153 | + - `names`, `str` |
| 154 | + - indexing: `L[[i]]`, `L$name` |
| 155 | + |
| 156 | +## Recycling Rule |
| 157 | + |
| 158 | +When performing operations on vectors of different lengths, `R` recycles the shorter vector until it matches the length of the longer vector. If the length of the longer vector is not a multiple of the shorter vector, `R` will issue a warning. |
| 159 | + |
| 160 | +```r |
| 161 | +x <- c(1, 2, 3) |
| 162 | +y <- c(10, 20) |
| 163 | +z <- x + y # z will be c(11, 22, 13) |
| 164 | +``` |
| 165 | + |
| 166 | +## Matrix Algebra |
| 167 | + |
| 168 | +- `+`, `-`, `*`, `/`: element-wise operations |
| 169 | +- `^`: element-wise exponentiation |
| 170 | +- module division: `%%`, `%/%` |
| 171 | +- `%*%`: matrix multiplication, `crossprod` |
| 172 | +- outer product: `outer` |
| 173 | +- `t`: transpose |
| 174 | +- `solve`: inverse of a matrix, or solution of linear equations |
| 175 | + |
| 176 | +### Indexing and Subsetting |
| 177 | + |
| 178 | +- `data[['name']] = NULL` to remove a column from a `data.frame` |
| 179 | +- `data$colname` to access a column in a `data.frame` |
| 180 | + |
| 181 | +### Logical indexing |
| 182 | + |
| 183 | +```r |
| 184 | +x <- c(1, 2, 3, 4, 5) |
| 185 | +x[x > 3] # returns c(4, 5) |
| 186 | +x[c(T, F, T)] |
| 187 | +``` |
| 188 | + |
| 189 | +### Missing values |
| 190 | + |
| 191 | +- `NA` represents missing values |
| 192 | +- `is.na(x)` to check for missing values |
| 193 | +- `x[!is.na(x)]` to remove missing values |
| 194 | +- `na.omit(x)` to remove missing values from a vector or data frame |
| 195 | +- `x[is.na(x)] <- 0` to replace missing values with a specific value (e.g., `0`) |
| 196 | + |
| 197 | +### Find Index(es) of a Specific Value within a Vector |
| 198 | + |
| 199 | +```r |
| 200 | +x <- c(1, 2, 3, 4, 5) |
| 201 | +which(x == 3) # returns 3 |
| 202 | +match(3, x) # returns 3 |
| 203 | +which(x %in% vals) |
| 204 | +which(x > 3) # returns c(4, 5) |
| 205 | +``` |
| 206 | + |
| 207 | +## Reading and Writing Datasets |
| 208 | + |
| 209 | +- `read.table`, `read.csv` to read data from files |
| 210 | +- `write.table`, `write.csv` to write data to files |
| 211 | +- `source` and `dump` to read and write `R` objects |
| 212 | +- `save` and `load` to save and load `R` objects in binary format |
| 213 | +- `sink("output.txt")` and `sink()` to redirect output to a file and stop redirecting output |
| 214 | + |
| 215 | +## `R` Functions |
| 216 | + |
| 217 | +Vectorized logical operations and logical indexes are extremely useful to compare, find, replace data elements. |
| 218 | + |
| 219 | +- Domain: arguments lists |
| 220 | +- Function body: expressions that define the function |
| 221 | +- Range: return values |
0 commit comments