Skip to content

Commit 1ca33ea

Browse files
committed
Spelling and grammatical corrections
1 parent 41c767d commit 1ca33ea

11 files changed

Lines changed: 44 additions & 44 deletions

notebooks/00_getting_started.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -316,7 +316,7 @@
316316
},
317317
{
318318
"cell_type": "code",
319-
"execution_count": null,
319+
"execution_count": 11,
320320
"id": "d51dfbec",
321321
"metadata": {
322322
"execution": {
@@ -478,7 +478,7 @@
478478
"Verify the path in `dll_path` is correct and points to a dll that exists. If no dll exists, be sure that you have built the Numerics solution.\n",
479479
"\n",
480480
"### 3. Loading DLL More Than Once\n",
481-
"When running multiple files at once, if they all load Numerics with ```clr.AddReference(str(dll_path))``` it may fail with an error that Numerics has already been loaded. If this happens, add the statement below to check the assemblies and only load Numerics if it is not already loaded. The code below doesn't output anything, but you'll know it worked if it runs without the failure error of loading multiple DLLS. This error shouldn't happened within these notebooks because they all have seperate kernels. It would happen if you were trying to run multiple Python files all at once where each one loads the same DLL. In each of these files you would replace how you were orginally loading the DLL with the chunk below.\n",
481+
"When running multiple files at once, if they all load Numerics with ```clr.AddReference(str(dll_path))``` it may fail with an error that Numerics has already been loaded. If this happens, add the statement below to check the assemblies and only load Numerics if it is not already loaded. The code below doesn't output anything, but you'll know it worked if it runs without the failure error of loading multiple DLLS. This error shouldn't happened within these notebooks because they all have separate kernels. It would happen if you were trying to run multiple Python files all at once where each one loads the same DLL. In each of these files you would replace how you were originally loading the DLL with the chunk below.\n",
482482
"\n",
483483
"```python\n",
484484
"# Needed to access the assemblies\n",

notebooks/01_distributions.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,7 @@
287287
"metadata": {},
288288
"source": [
289289
"## Continuous Distributions\n",
290-
"We will now explore some popular continous distributions in Numerics. We go through their mathematical defintion, before defining them In Numerics and taking a look at the PDF and CDF.\n",
290+
"We will now explore some popular continuous distributions in Numerics. We go through their mathematical definition, before defining them In Numerics and taking a look at the PDF and CDF.\n",
291291
"\n",
292292
"**Parameterization note:** Numerics uses its own parameter conventions (e.g., GEV uses $\\xi$, $\\alpha$, $\\kappa$). If you're comparing to textbooks or SciPy, double-check parameter definitions.\n",
293293
"\n",
@@ -518,12 +518,12 @@
518518
"\n",
519519
"- $\\kappa$ < 0: Weibull (bounded above)\n",
520520
"- $\\kappa$ = 0: Gumbel (unbounded)\n",
521-
"- $\\kappa$ > 0: Fr\\'echet (heavy-tailed)"
521+
"- $\\kappa$ > 0: Frechet (heavy-tailed)"
522522
]
523523
},
524524
{
525525
"cell_type": "code",
526-
"execution_count": 5,
526+
"execution_count": null,
527527
"id": "8bbcdc55",
528528
"metadata": {
529529
"execution": {

notebooks/02_distribution_fitting.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
},
3030
{
3131
"cell_type": "code",
32-
"execution_count": 4,
32+
"execution_count": null,
3333
"id": "969dd8b5",
3434
"metadata": {
3535
"execution": {

notebooks/04_mcmc_bayesian_inference.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -996,7 +996,7 @@
996996
"metadata": {},
997997
"source": [
998998
"## Linear Regression with Uncertainty\n",
999-
"We can estimate regression coefficients with MCMC and quantify uncertainty with credible intervals [[2]](#2). We give each coefficient of the linear regression equation a prior distrubtion and estimate a posterior distribution. Meaning in $y = a + bx + \\epsilon$, where $\\epsilon \\sim N(0,\\sigma^2)$, $a$, $b$, and $\\sigma$ all get their own prior distrubution. Then our sampler produces a posterior distribution for each of these parameters. Our final estimate for these parameters is the mean of the posterior that we compare to our observed data in our final fitted line.\n",
999+
"We can estimate regression coefficients with MCMC and quantify uncertainty with credible intervals [[2]](#2). We give each coefficient of the linear regression equation a prior distribution and estimate a posterior distribution. Meaning in $y = a + bx + \\epsilon$, where $\\epsilon \\sim N(0,\\sigma^2)$, $a$, $b$, and $\\sigma$ all get their own prior distribution. Then our sampler produces a posterior distribution for each of these parameters. Our final estimate for these parameters is the mean of the posterior that we compare to our observed data in our final fitted line.\n",
10001000
"\n",
10011001
"We go into more detail on this topic in notebook 10!\n"
10021002
]
@@ -1181,7 +1181,7 @@
11811181
"metadata": {},
11821182
"source": [
11831183
"## Non-Adaptive vs Adaptive Samplers\n",
1184-
"Samplers are usually seperated by adaptive vs non-adaptive. The key idea behind adaptive MCMC is to learn the proposal covariance from the chain's own history, eliminating the need for manual tuning [[3]](#3). RWMH is a non-adaptive baseline, while DEMCzs is an adaptive method. Normally, we would not compare an adaptive method with a non-adaptive method, but here we do so to show the run time difference and improved accuracy. We will go more in depth on adaptive methods in notebook 05, but this is to get you thinking ahead!\n",
1184+
"Samplers are usually separated by adaptive vs non-adaptive. The key idea behind adaptive MCMC is to learn the proposal covariance from the chain's own history, eliminating the need for manual tuning [[3]](#3). RWMH is a non-adaptive baseline, while DEMCzs is an adaptive method. Normally, we would not compare an adaptive method with a non-adaptive method, but here we do so to show the run time difference and improved accuracy. We will go more in depth on adaptive methods in notebook 05, but this is to get you thinking ahead!\n",
11851185
"\n",
11861186
"DEMCzs is often more efficient than RWMH for:\n",
11871187
"- High-dimensional problems\n",
@@ -1430,7 +1430,7 @@
14301430
"$\\checkmark$ Fit multiple distributions with MCMC and compared posterior parameter estimates \n",
14311431
"$\\checkmark$ Performed posterior predictive checks to evaluate model realism \n",
14321432
"$\\checkmark$ Built a Bayesian linear regression with credible intervals on model parameters \n",
1433-
"$\\checkmark$ Learned the basics of the differences between adaptive and non-adpative samplers\n",
1433+
"$\\checkmark$ Learned the basics of the differences between adaptive and non-adaptive samplers\n",
14341434
"\n",
14351435
"Key takeaway: Bayesian workflows in Numerics provide practical uncertainty quantification while remaining competitive in performance.\n",
14361436
"\n",

notebooks/06_mcmc_diagnostics.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -236,7 +236,7 @@
236236
"metadata": {},
237237
"source": [
238238
"## Multiple Chains for Convergence Assessment\n",
239-
"Running multiple chains helps diagnose convergence and assess mixing. We can run diagonsitics on and graph these chains to get a feel on how accurate the posterior is.\n",
239+
"Running multiple chains helps diagnose convergence and assess mixing. We can run diagnostics on and graph these chains to get a feel on how accurate the posterior is.\n",
240240
"\n",
241241
"Benefits of multiple chains:\n",
242242
"1. Assess convergence via R-hat statistic\n",
@@ -337,7 +337,7 @@
337337
"\n",
338338
"### Common Causes of High R̂\n",
339339
"\n",
340-
"1. Insufficient warmup: Chains haven't reached stationarity\n",
340+
"1. Insufficient warmup: Chains haven't reached stationary\n",
341341
"2. Poor mixing: Chains explore slowly\n",
342342
"3. Multimodal posterior: Chains stuck in different modes\n",
343343
"4. Bad initialization: Starting values too extreme\n",

notebooks/08_optimization.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343
"## When to Use Optimization vs MCMC\n",
4444
"\n",
4545
"Use Optimization when:\n",
46-
"- You need point estimatest only (no uncertainty)\n",
46+
"- You need point estimates only (no uncertainty)\n",
4747
"- Computational speed is critical\n",
4848
"- You have a well-defined objective function\n",
4949
"\n",
@@ -128,7 +128,7 @@
128128
"## Example: Rosenbrock Function\n",
129129
"Rosenbrock is a classic optimization test function: $f(x, y) = (a - x)^2 + b(y - x^2)^2$. With a=1 and b=100, the global minimum is at (1,1) with f=0. \n",
130130
"\n",
131-
"We will run all three methods on this function to find the minimum.\n"
131+
"We will run all three methods on this function to find the minimum."
132132
]
133133
},
134134
{
@@ -315,7 +315,7 @@
315315
"## Example: McCormick Function\n",
316316
"Another classic optimization function is the McCormick function. With $f(x, y) = \\sin(x + y) + (x - y)^2 - 1.5x + 2.5y + 1$. This has a minimum of -1.9133 at f(-0.54719, -1.54719).\n",
317317
"\n",
318-
"Again we will run all three methods on this function and compare their results.\n"
318+
"Again we will run all three methods on this function and compare their results."
319319
]
320320
},
321321
{
@@ -694,7 +694,7 @@
694694
"id": "85ecf35d",
695695
"metadata": {},
696696
"source": [
697-
"Sticking with the Eggholder function we will take a look at the path the Differential Evolition solver takes when trying to find the minimum!\n"
697+
"Sticking with the Eggholder function we will take a look at the path the Differential Evolution solver takes when trying to find the minimum!\n"
698698
]
699699
},
700700
{

notebooks/09_statistics.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
"id": "50813932",
1515
"metadata": {},
1616
"source": [
17-
"# 07. Statistical Analysis in Numerics\n",
17+
"# 09. Statistical Analysis in Numerics\n",
1818
"This notebook covers statistical methods and hypothesis testing.\n",
1919
"\n",
2020
"## What You'll Learn\n",
@@ -1428,7 +1428,7 @@
14281428
"source": [
14291429
"## 8. Data Transformations\n",
14301430
"\n",
1431-
"The Box Cox and Yeo-Johnson tranformtions transform non-normal dependent varaibles into a normal shape. Box-Cox requires strictly positive data and finds a power, $\\lambda$, that stabilizes variance and reduces skew. Yeo-Johnson is similar but allows zero and negative values, making it safer for general datasets. Both are often used before regression or hypothesis tests that assume normality.\n"
1431+
"The Box Cox and Yeo-Johnson transformations transform non-normal dependent variables into a normal shape. Box-Cox requires strictly positive data and finds a power, $\\lambda$, that stabilizes variance and reduces skew. Yeo-Johnson is similar but allows zero and negative values, making it safer for general datasets. Both are often used before regression or hypothesis tests that assume normality.\n"
14321432
]
14331433
},
14341434
{

notebooks/10_time_series.ipynb

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
"id": "e9c9ec0a",
1515
"metadata": {},
1616
"source": [
17-
"# 08. Time Series Analysis\n",
17+
"# 10. Time Series Analysis\n",
1818
"The Numerics library provides a comprehensive `TimeSeries` class for working with time-indexed data. This class supports regular and irregular time intervals, statistical operations, transformations, and analysis methods essential for hydrological and environmental data.\n",
1919
"\n",
2020
"\n",
@@ -118,7 +118,7 @@
118118
"We will construct a regular daily `TimeSeries` and inspect its basic statistical properties. There are a few different ways to construct our `TimeSeries` object\n",
119119
" - You can construct an empty time series, with or without a time interval (i.e. daily, monthly, etc)\n",
120120
" - Create a time series with date range of with start and end dates\n",
121-
" - Contruct a time series from data. \n",
121+
" - Construct a time series from data. \n",
122122
" \n",
123123
"Below we construct one with data, giving it an interval, start date, and the generated data."
124124
]
@@ -259,8 +259,8 @@
259259
"id": "077ab9ac",
260260
"metadata": {},
261261
"source": [
262-
"### Acessing Data\n",
263-
"Acessing the data in a `TimeSeries` object is similar to how you access information in an array."
262+
"### Accessing Data\n",
263+
"Accessing the data in a `TimeSeries` object is similar to how you access information in an array."
264264
]
265265
},
266266
{
@@ -1281,9 +1281,9 @@
12811281
"metadata": {},
12821282
"source": [
12831283
"## Interpolation\n",
1284-
"Interpolate short gaps in the time series while preserving surrounding trends. The `TimeSeries` object in Numerics has its own built in interpolation method in addition to a seperate Interpolation class under the Data namespace of Numerics. This other class includes linear, cubic spline, polynomial, and bilinear interpolation methods. The TimeSeries interpolation method below uses linear interpolation.\n",
1284+
"Interpolate short gaps in the time series while preserving surrounding trends. The `TimeSeries` object in Numerics has its own built in interpolation method in addition to a separate Interpolation class under the Data namespace of Numerics. This other class includes linear, cubic spline, polynomial, and bilinear interpolation methods. The TimeSeries interpolation method below uses linear interpolation.\n",
12851285
"\n",
1286-
"You can set a specificed maximum number of missing values so that only gaps smaller will be filled."
1286+
"You can set a specified maximum number of missing values so that only gaps smaller will be filled."
12871287
]
12881288
},
12891289
{
@@ -1580,7 +1580,7 @@
15801580
"metadata": {},
15811581
"source": [
15821582
"## USGS Data Download Integration\n",
1583-
"Numrics offers a method to download observed streamflow data from USGS and immediately apply statistical analysis. We will walk through how to access this data through the site number alone."
1583+
"Numerics offers a method to download observed streamflow data from USGS and immediately apply statistical analysis. We will walk through how to access this data through the site number alone."
15841584
]
15851585
},
15861586
{

notebooks/11_machine_learning.ipynb

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
"id": "24d2d933",
1515
"metadata": {},
1616
"source": [
17-
" # 09. Machine Learning with Numerics\n",
17+
" # 11. Machine Learning with Numerics\n",
1818
"This notebook explores machine learning capabilities in Numerics.\n",
1919
"\n",
2020
"## What You'll Learn\n",
@@ -552,7 +552,7 @@
552552
},
553553
{
554554
"cell_type": "code",
555-
"execution_count": 6,
555+
"execution_count": null,
556556
"id": "5ebaa4fd",
557557
"metadata": {},
558558
"outputs": [
@@ -752,7 +752,7 @@
752752
"sklearn_centers = kmeans_sklearn.cluster_centers_\n",
753753
"\n",
754754
"# Match cluster labels -- Numerics and sklearn return labels in a random order (so they will not automatically match)\n",
755-
"# Sqaured Euclidean distance between each pair of centers'\n",
755+
"# Squared Euclidean distance between each pair of centers'\n",
756756
"cost_matrix = np.linalg.norm(numerics_centers[:, None, :] - sklearn_centers[None, :, :], axis=2)\n",
757757
"# rowIndex = Numerics cluster index, colIndex = sklearn cluster index\n",
758758
"row_ind, col_ind = linear_sum_assignment(cost_matrix)\n",
@@ -904,7 +904,7 @@
904904
},
905905
{
906906
"cell_type": "code",
907-
"execution_count": 8,
907+
"execution_count": null,
908908
"id": "1c8a5985",
909909
"metadata": {},
910910
"outputs": [
@@ -998,7 +998,7 @@
998998
"# Fit mixture model and pull out labels (i.e. clusters)\n",
999999
"gmm_ms = timed_fit(lambda: gmm.Train(12345, True))\n",
10001000
"labels_gmm = np.array(list(gmm.Labels), dtype=int)\n",
1001-
"# sklearn Rand index adjusted for chance (computes similarity measure between two clusterings)\n",
1001+
"# sklearn Rand index adjusted for chance (computes similarity measure between two clustering)\n",
10021002
"ari_gmm = adjusted_rand_score(y_iris, labels_gmm) # NOTE: Numerics has no ARI equivalent yet\n",
10031003
"\n",
10041004
"n, k = X_gmm.shape[0], 3\n",
@@ -1046,7 +1046,7 @@
10461046
},
10471047
{
10481048
"cell_type": "code",
1049-
"execution_count": 9,
1049+
"execution_count": null,
10501050
"id": "3e679f8c",
10511051
"metadata": {},
10521052
"outputs": [
@@ -1130,7 +1130,7 @@
11301130
"# Split data into training and testing sets\n",
11311131
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)\n",
11321132
"\n",
1133-
"# Standardize by removing the mean and scaling to unit varaiance\n",
1133+
"# Standardize by removing the mean and scaling to unit variance\n",
11341134
"scaler = StandardScaler()\n",
11351135
"X_train_s = scaler.fit_transform(X_train)\n",
11361136
"X_test_s = scaler.transform(X_test)\n",
@@ -1186,7 +1186,7 @@
11861186
},
11871187
{
11881188
"cell_type": "code",
1189-
"execution_count": 10,
1189+
"execution_count": null,
11901190
"id": "82e51962",
11911191
"metadata": {},
11921192
"outputs": [
@@ -1270,7 +1270,7 @@
12701270
"# Split into training and testing sets\n",
12711271
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)\n",
12721272
"\n",
1273-
"# Standardize by removing the mean and scaling to unit varaiance\n",
1273+
"# Standardize by removing the mean and scaling to unit variance\n",
12741274
"scaler = StandardScaler()\n",
12751275
"X_train_s = scaler.fit_transform(X_train)\n",
12761276
"X_test_s = scaler.transform(X_test)\n",
@@ -1285,7 +1285,7 @@
12851285
"\n",
12861286
"knn_reg_ms = timed_fit(lambda: knn_reg.Predict(Matrix(X_test_net)))\n",
12871287
"y_pred = np.array(list(knn_reg.Predict(Matrix(X_test_net))), dtype=float)\n",
1288-
"# Performace metrics\n",
1288+
"# Performance metrics\n",
12891289
"y_test_gof = convert_to_dotnet_array(y_test.astype(float))\n",
12901290
"y_pred_gof = convert_to_dotnet_array(y_pred.astype(float))\n",
12911291
"rmse = np.sqrt(GoodnessOfFit.MSE(y_test_gof, y_pred_gof))\n",
@@ -1334,7 +1334,7 @@
13341334
},
13351335
{
13361336
"cell_type": "code",
1337-
"execution_count": 11,
1337+
"execution_count": null,
13381338
"id": "4d31156b",
13391339
"metadata": {},
13401340
"outputs": [
@@ -1422,7 +1422,7 @@
14221422
"y_train_net = convert_to_dotnet_array(y_train)\n",
14231423
"X_test_net = convert_to_dotnet_2d_array(X_test)\n",
14241424
"\n",
1425-
"# Intializie\n",
1425+
"# Initialize\n",
14261426
"dt_clf = DecisionTree(Matrix(X_train_net), Vector(y_train_net), 12345)\n",
14271427
"dt_clf.IsRegression = False\n",
14281428
"# Depth control is very important for decision trees\n",
@@ -1476,7 +1476,7 @@
14761476
},
14771477
{
14781478
"cell_type": "code",
1479-
"execution_count": 12,
1479+
"execution_count": null,
14801480
"id": "4ea747d7",
14811481
"metadata": {},
14821482
"outputs": [
@@ -1574,7 +1574,7 @@
15741574
"dt_reg_ms = timed_fit(lambda: dt_reg.Train())\n",
15751575
"y_pred = np.array(list(dt_reg.Predict(Matrix(X_test_net))), dtype=float)\n",
15761576
"\n",
1577-
"# Performace metrics\n",
1577+
"# Performance metrics\n",
15781578
"y_test_gof = convert_to_dotnet_array(y_test.astype(float))\n",
15791579
"y_pred_gof = convert_to_dotnet_array(y_pred.astype(float))\n",
15801580
"rmse = np.sqrt(GoodnessOfFit.MSE(y_test_gof, y_pred_gof))\n",
@@ -1624,7 +1624,7 @@
16241624
},
16251625
{
16261626
"cell_type": "code",
1627-
"execution_count": 13,
1627+
"execution_count": null,
16281628
"id": "c4198906",
16291629
"metadata": {},
16301630
"outputs": [
@@ -1724,7 +1724,7 @@
17241724
"n = pred_raw.GetLength(0)\n",
17251725
"y_pred = np.array([pred_raw[i, 1] for i in range(n)], dtype=float)\n",
17261726
"\n",
1727-
"# Performace metrics\n",
1727+
"# Performance metrics\n",
17281728
"y_test_gof = convert_to_dotnet_array(y_test.astype(float))\n",
17291729
"y_pred_gof = convert_to_dotnet_array(y_pred.astype(float))\n",
17301730
"acc = GoodnessOfFit.Accuracy(y_test_gof, y_pred_gof)\n",
@@ -1757,7 +1757,7 @@
17571757
},
17581758
{
17591759
"cell_type": "code",
1760-
"execution_count": 14,
1760+
"execution_count": null,
17611761
"id": "03961157",
17621762
"metadata": {},
17631763
"outputs": [
@@ -1847,7 +1847,7 @@
18471847
"# We have to ensure the Random Forest regressor knows it's a regression task (not classification) since it can do both\n",
18481848
"rf_reg.IsRegression = True\n",
18491849
"# More trees generally improves accuracy but increases runtime, 1000 is used here to match Numerics default and scikit-learn settings\n",
1850-
"# Set tree count to match defau;t scikit-learn settings\n",
1850+
"# Set tree count to match default scikit-learn settings\n",
18511851
"rf_reg.NumberOfTrees = 100\n",
18521852
"# Max Depth helps control overfitting\n",
18531853
"# Random Forests can often benefit from deeper trees than single Decision Trees since they average many trees\n",

notebooks/12_linear_models.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
"id": "d430ddc3",
1515
"metadata": {},
1616
"source": [
17-
"# 10. Generalized Linear Models\n",
17+
"# 12. Generalized Linear Models\n",
1818
"\n",
1919
"This notebook demonstrates Generalized Linear Models (GLMs) - a flexible framework for regression beyond ordinary least squares.\n",
2020
"\n",

0 commit comments

Comments
 (0)