Skip to content

Commit 2ae9716

Browse files
XuhuaHuangCopilot
andcommitted
Some statistics function impl in R
Co-authored-by: Copilot <copilot@github.com>
1 parent 296e936 commit 2ae9716

1 file changed

Lines changed: 131 additions & 0 deletions

File tree

R/README.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -255,3 +255,134 @@ If the end of the function body is not an explicit `return` statement, the value
255255
- `warning("warning message")` to throw a warning and continue execution
256256
- `try(expr)` to execute an expression and catch any errors without stopping execution
257257
- `tryCatch(expr, error = function(e) { ... })` to execute an expression and handle errors with a custom function
258+
259+
## Wichmann-Hill Pseudo-Random Number Generator
260+
261+
```r
262+
wh = function(seed, start = 0, end = 1) {
263+
seed[1] = (171 * seed[1]) %% 30269
264+
seed[2] = (172 * seed[2]) %% 30307
265+
seed[3] = (170 * seed[3]) %% 30323
266+
x = (seed[1]/30269 + seed[2]/30307 + seed[3]/30323) %% 1
267+
start + (end - start) * x
268+
}
269+
whv = function(seed, start = 0, end = 1) {
270+
y = c(171, 172, 170)
271+
z = c(30269, 30307, 30323)
272+
seed = (y * seed) %% z
273+
x = sum(seed / z) %% 1
274+
start + (end - start) * x
275+
}
276+
```
277+
278+
## Programming Style
279+
280+
- Use meaningful variable and function names
281+
- Modularize code into functions
282+
- Document code with comments and function documentation
283+
- Use consistent indentation and spacing
284+
- Use existing functions and libraries when possible
285+
- Use parentheses to make grouping and operator precedence clear
286+
- Avoid unnecessary loops and use vectorized operations when possible
287+
- Avoid using recursive functions when iterative solutions are more efficient
288+
289+
## IQR
290+
291+
```r
292+
IQR(x, na.rm = FALSE)
293+
q <- quantile(x, probs = c(0.25, 0.75))
294+
iqr <- q[2] - q[1]
295+
iqr
296+
```
297+
298+
## Monte Carlo Demonstration of `CLT` in `R`
299+
300+
```r
301+
set.seed(42)
302+
303+
# Population: Exponential distribution (skewed)
304+
n <- 30 # sample size
305+
num_sim <- 10000 # number of Monte Carlo repetitions
306+
307+
# True parameters
308+
mu <- 1 # mean of Exp(1)
309+
sigma <- 1 # std dev of Exp(1)
310+
311+
z_values <- replicate(num_sim, {
312+
x <- rexp(n, rate = 1) # draw sample
313+
x_bar <- mean(x) # sample mean
314+
sqrt(n) * (x_bar - mu) # standardized value
315+
})
316+
317+
hist(z_values, probability = TRUE, breaks = 50,
318+
main = "CLT Demonstration (Monte Carlo)",
319+
xlab = expression(sqrt(n) * (bar(X) - mu)))
320+
321+
# Overlay theoretical normal curve
322+
curve(dnorm(x, mean = 0, sd = sigma),
323+
col = "red", lwd = 2, add = TRUE)
324+
```
325+
326+
## Alpha-trimmed Mean
327+
328+
```r
329+
alpha_trimmed_mean_safe <- function(x, alpha) {
330+
if (!is.numeric(x)) {
331+
stop("x must be numeric")
332+
}
333+
334+
if (alpha < 0 || alpha >= 0.5) {
335+
stop("alpha must be in [0, 0.5)")
336+
}
337+
338+
n <- length(x)
339+
k <- floor(alpha * n)
340+
341+
x_sorted <- sort(x)
342+
343+
if (2 * k >= n) {
344+
stop("Trimming removes all data")
345+
}
346+
347+
mean(x_sorted[(k + 1):(n - k)])
348+
}
349+
```
350+
351+
## Sample Correlation
352+
353+
```r
354+
sample_correlation <- function(x, y) {
355+
if (length(x) != length(y)) {
356+
stop("Vectors must have the same length")
357+
}
358+
359+
x_bar <- mean(x)
360+
y_bar <- mean(y)
361+
362+
numerator <- sum((x - x_bar) * (y - y_bar))
363+
denominator <- sqrt(sum((x - x_bar)^2)) * sqrt(sum((y - y_bar)^2))
364+
365+
numerator / denominator
366+
}
367+
368+
sample_correlation_stable <- function(x, y) {
369+
if (length(x) != length(y)) {
370+
stop("Vectors must have the same length")
371+
}
372+
373+
n <- length(x)
374+
375+
cov_xy <- sum(x * y) / n - mean(x) * mean(y)
376+
sd_x <- sqrt(sum(x^2) / n - mean(x)^2)
377+
sd_y <- sqrt(sum(y^2) / n - mean(y)^2)
378+
379+
cov_xy / (sd_x * sd_y)
380+
}
381+
```
382+
383+
## Debugging and Maintenance
384+
385+
- Error messages generated by `R` functions can be helpful for debugging.
386+
- Use `browser()` function to set breakpoints.
387+
- Use `print()` or `cat()` to display intermediate values for debugging.
388+
- Do not use `print()` in return statements of functions, just use `return(return_object)`.

0 commit comments

Comments
 (0)