Skip to content

Commit 4d46366

Browse files
XuhuaHuangCopilot
andcommitted
Study R datasets and functions
Co-authored-by: Copilot <copilot@github.com>
1 parent 1b504a2 commit 4d46366

1 file changed

Lines changed: 112 additions & 3 deletions

File tree

R/README.md

Lines changed: 112 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Statistical Programming
22

3-
## Setup: Installing R and Tools
3+
## Setup: Installing `R` and Tools
44

55
```r
66
install.packages("ggplot2")
@@ -68,14 +68,14 @@ A complete expression is any typed expression that falls into one of the followi
6868
- Control flow statements
6969
- Grouping statements
7070

71-
## R Sessions
71+
## `R` Sessions
7272
`setwd` and `getwd`
7373

7474
Use `?Syntax` to find operator syntax
7575

7676
If unsure of operator precedence, use parentheses.
7777

78-
## R Objects
78+
## `R` Objects
7979

8080
All objects created in command line (`RStudio` calls it console) are saved
8181
in `.GlobalEnv`.
@@ -110,3 +110,112 @@ attributes(cars)
110110
names(cars)
111111
dim(cars)
112112
```
113+
114+
## `R` Data Objects
115+
116+
- Vectors
117+
- **Simplest** data object starting at index `1`
118+
- **Ordered** set of values (numeric or character)
119+
- `scan`, `c`, `rep`, `:`, `seq`
120+
- `length`, `mode`, `class`, `names`
121+
- conversion operations: `as.integer`, `as.double`
122+
- verification operations: `is.integer`, `is.double`, `is.character`
123+
- statistics: `max`, `min`, `mean`, `var`, `sd`
124+
- mathematics: `sum`, `rank`, `order`, `round`, `floor`, `ceiling`, `abs`, `sqrt`, `exp`, `sin`, `sign`, `log`, `prod`
125+
126+
- Matrices
127+
- **Two-dimensional** data object
128+
- `matrix`, `rbind`, `cbind`
129+
- `length`, `dim`, `dimnames`, `nrow`, `ncol`
130+
- indexing: `A[i, j]`, `x[1, 2:3]`, `x[1:2, 3]`, `x[1,]`, `x[, 2]`
131+
- arithmetic: `+`, `-`, `*`, `/`
132+
- matrix algebra: `%*%`, `t`, `solve`
133+
134+
- `data.frame`
135+
- **Tabular** data object (all objects have the same length)
136+
- `data.frame`, `read.table`, `read.csv`, `as.data.frame`, `cbind`, `rbind`, `merge`
137+
- `nrow`, `ncol`, `names`, `str`
138+
- `is.data.frame`, `is.matrix`
139+
- `length`, `mode`, `class`, `names`, `attributes`, `row.names`
140+
- indexing: `df[i, j]`, `df$colname`, `df[["colname"]]`
141+
142+
- Arrays
143+
- **Multi-dimensional** homogeneous data object (all objects have the same type)
144+
- `list`
145+
- `array`, `as.array`
146+
- `dim`, `dimnames`
147+
- indexing: `A[i, j, k]`
148+
149+
- Lists
150+
- **Heterogeneous** data object (different types of objects)
151+
- `list`, `as.list`
152+
- `length`, `mode`, `class`, `attributes`
153+
- `names`, `str`
154+
- indexing: `L[[i]]`, `L$name`
155+
156+
## Recycling Rule
157+
158+
When performing operations on vectors of different lengths, `R` recycles the shorter vector until it matches the length of the longer vector. If the length of the longer vector is not a multiple of the shorter vector, `R` will issue a warning.
159+
160+
```r
161+
x <- c(1, 2, 3)
162+
y <- c(10, 20)
163+
z <- x + y # z will be c(11, 22, 13)
164+
```
165+
166+
## Matrix Algebra
167+
168+
- `+`, `-`, `*`, `/`: element-wise operations
169+
- `^`: element-wise exponentiation
170+
- module division: `%%`, `%/%`
171+
- `%*%`: matrix multiplication, `crossprod`
172+
- outer product: `outer`
173+
- `t`: transpose
174+
- `solve`: inverse of a matrix, or solution of linear equations
175+
176+
### Indexing and Subsetting
177+
178+
- `data[['name']] = NULL` to remove a column from a `data.frame`
179+
- `data$colname` to access a column in a `data.frame`
180+
181+
### Logical indexing
182+
183+
```r
184+
x <- c(1, 2, 3, 4, 5)
185+
x[x > 3] # returns c(4, 5)
186+
x[c(T, F, T)]
187+
```
188+
189+
### Missing values
190+
191+
- `NA` represents missing values
192+
- `is.na(x)` to check for missing values
193+
- `x[!is.na(x)]` to remove missing values
194+
- `na.omit(x)` to remove missing values from a vector or data frame
195+
- `x[is.na(x)] <- 0` to replace missing values with a specific value (e.g., `0`)
196+
197+
### Find Index(es) of a Specific Value within a Vector
198+
199+
```r
200+
x <- c(1, 2, 3, 4, 5)
201+
which(x == 3) # returns 3
202+
match(3, x) # returns 3
203+
which(x %in% vals)
204+
which(x > 3) # returns c(4, 5)
205+
```
206+
207+
## Reading and Writing Datasets
208+
209+
- `read.table`, `read.csv` to read data from files
210+
- `write.table`, `write.csv` to write data to files
211+
- `source` and `dump` to read and write `R` objects
212+
- `save` and `load` to save and load `R` objects in binary format
213+
- `sink("output.txt")` and `sink()` to redirect output to a file and stop redirecting output
214+
215+
## `R` Functions
216+
217+
Vectorized logical operations and logical indexes are extremely useful to compare, find, replace data elements.
218+
219+
- Domain: arguments lists
220+
- Function body: expressions that define the function
221+
- Range: return values

0 commit comments

Comments
 (0)