|
| 1 | +# AGENTS.md |
| 2 | + |
| 3 | +Guidance for AI coding agents working in this repository. |
| 4 | + |
| 5 | +## What this repo is |
| 6 | + |
| 7 | +`FSharp.Stats` is an F# library implementing statistical and machine learning methods (descriptive statistics, distributions, hypothesis tests, regression, clustering, ML algorithms, etc.). |
| 8 | + |
| 9 | +This repo focuses on **statistical/ML methods**. The underlying numerical primitives — matrix math, linear algebra, vector operations, BLAS/LAPACK bindings — live in the reference library [**FsMath**](https://github.com/fslaborg/FsMath). When you need low-level numeric routines, prefer pulling them from FsMath rather than re-implementing them here. If something fundamental is missing in FsMath, raise it there instead of duplicating math primitives in this repo. |
| 10 | + |
| 11 | +Source layout: |
| 12 | +- [src/FSharp.Stats/](src/FSharp.Stats/) — main library |
| 13 | +- [src/FSharp.Stats.Interactive/](src/FSharp.Stats.Interactive/) — `dotnet interactive` integration |
| 14 | +- [tests/FSharp.Stats.Tests/](tests/FSharp.Stats.Tests/) — Expecto test suite |
| 15 | +- [docs/](docs/) — fsdocs tutorials and examples that should stay in sync with public API changes |
| 16 | +- [benchmarks/](benchmarks/) — BenchmarkDotNet benchmark projects and checked-in benchmark outputs |
| 17 | + |
| 18 | +## Building |
| 19 | + |
| 20 | +This repo uses a **FAKE** build project ([build/build.fsproj](build/build.fsproj), entrypoint [build/Build.fs](build/Build.fs)). Treat the FAKE targets as the build/test contract for final verification, CI parity, docs, packaging, and release work. |
| 21 | + |
| 22 | +For inner-loop iteration, narrowly scoped raw `dotnet build` / `dotnet test --no-build --filter ...` commands are acceptable as a local optimization when they help you move faster. Do not stop there: before considering the work done or ready for PR, run the repository entrypoint and finish with `./build.sh RunTests` (or `build.cmd RunTests` on Windows). |
| 23 | + |
| 24 | +Entry points: |
| 25 | +- Windows: [build.cmd](build.cmd) |
| 26 | +- Unix: [build.sh](build.sh) |
| 27 | + |
| 28 | +Both forward arguments to `dotnet run --project ./build/build.fsproj <target>`. |
| 29 | + |
| 30 | +### Targets |
| 31 | + |
| 32 | +Defined across [build/Build.fs](build/Build.fs), [build/BasicTasks.fs](build/BasicTasks.fs), [build/TestTasks.fs](build/TestTasks.fs), [build/PackageTasks.fs](build/PackageTasks.fs), [build/DocumentationTasks.fs](build/DocumentationTasks.fs), [build/ReleaseTasks.fs](build/ReleaseTasks.fs), [build/ReleaseNotesTasks.fs](build/ReleaseNotesTasks.fs): |
| 33 | + |
| 34 | +| Target | Purpose | |
| 35 | +|---|---| |
| 36 | +| `Clean` | Remove `src/**/bin`, `src/**/obj`, `tests/**/bin`, `tests/**/obj`, `pkg`. | |
| 37 | +| `Build` *(default)* | `dotnet build` the solution (Release). Depends on `Clean`. | |
| 38 | +| `RunTests` | `dotnet test` the test project with detailed console logger. Depends on `Clean`, `Build`. | |
| 39 | +| `RunTestsWithCodeCov` | Same as `RunTests` plus AltCover Cobertura output to `codeCov.xml`. | |
| 40 | +| `Pack` / `PackPrerelease` | Produce NuGet packages into `pkg/`. Prompts interactively for confirmation. | |
| 41 | +| `BuildDocs` / `BuildDocsPrerelease` | `fsdocs build --eval --clean` against the project. | |
| 42 | +| `WatchDocs` / `WatchDocsPrerelease` | `fsdocs watch` for local doc preview. | |
| 43 | +| `SetPrereleaseTag` | Reads a prerelease suffix from stdin and sets package version metadata. | |
| 44 | +| `ReleaseDocs` / `PrereleaseDocs` | Push built docs. | |
| 45 | +| `CreateTag` / `CreatePrereleaseTag`, `PublishNuget` / `PublishNugetPrerelease` | Tag git, push package to NuGet. | |
| 46 | +| `UpdateReleaseNotes` | Regenerate `RELEASE_NOTES.md` from commits since the last release. | |
| 47 | +| `Release` | Aggregate: `Clean → Build → RunTests → Pack → BuildDocs → CreateTag → PublishNuget → ReleaseDocs`. | |
| 48 | +| `PreRelease` | Aggregate prerelease variant of `Release`. | |
| 49 | +| `ReleaseNoDocs` / `PreReleaseNoDocs` | Release aggregates without doc steps. | |
| 50 | + |
| 51 | +Common usage: |
| 52 | + |
| 53 | +```sh |
| 54 | +./build.sh # default: Build |
| 55 | +./build.sh RunTests |
| 56 | +./build.sh BuildDocs |
| 57 | +./build.sh WatchDocs |
| 58 | +``` |
| 59 | + |
| 60 | +`Pack` and the `Release*` targets are interactive (prompt for confirmation, prerelease suffix, etc.) — do not run them in non-interactive automation. |
| 61 | + |
| 62 | +## F# project files and compile order |
| 63 | + |
| 64 | +F# file order is load-bearing in this repo. If you add, remove, rename, or move a `.fs` file, you must update the corresponding project file and place it in the correct compile order: |
| 65 | + |
| 66 | +- [src/FSharp.Stats/FSharp.Stats.fsproj](src/FSharp.Stats/FSharp.Stats.fsproj) — main library compile order |
| 67 | +- [tests/FSharp.Stats.Tests/FSharp.Stats.Tests.fsproj](tests/FSharp.Stats.Tests/FSharp.Stats.Tests.fsproj) — test compile order |
| 68 | + |
| 69 | +An otherwise correct code change can fail to compile if the new file is missing from the project file or inserted in the wrong slot. |
| 70 | + |
| 71 | +## Adding a new statistical / ML method |
| 72 | + |
| 73 | +When a PR or commit introduces a new statistical method (test, estimator, distribution, ML algorithm, etc.), it is expected to cite a **reference implementation** so reviewers can validate numerics. The current repo is not fully uniform about this yet, but new method work should follow this rule — undocumented numeric code is effectively unreviewable. |
| 74 | + |
| 75 | +What to include: |
| 76 | + |
| 77 | +1. **Link to a canonical reference implementation** in the PR description and/or in a comment above the function. Acceptable references (in rough order of preference): |
| 78 | + - R (`stats`, `MASS`, CRAN packages) — link to source or function docs. |
| 79 | + - Python (`numpy`, `scipy.stats`, `scikit-learn`, `statsmodels`) — link to source on GitHub or stable docs. |
| 80 | + - A peer-reviewed paper (DOI) when no canonical implementation exists. |
| 81 | + |
| 82 | +2. **A small reproducible script** in the reference language that produces the expected numbers. Put it either: |
| 83 | + - Inline in the PR description (preferred for review), and/or |
| 84 | + - As a comment block above the corresponding test in [tests/FSharp.Stats.Tests/](tests/FSharp.Stats.Tests/), so the expected values in the test are traceable. |
| 85 | + |
| 86 | +3. **Tests that pin the numbers from that script.** The test should assert the same values the reference script produces (within an explicit tolerance), and the comment should make the provenance obvious. |
| 87 | + |
| 88 | +Example comment style for a test: |
| 89 | + |
| 90 | +```fsharp |
| 91 | +// Reference: scipy.stats.shapiro |
| 92 | +// https://github.com/scipy/scipy/blob/v1.13.0/scipy/stats/_morestats.py |
| 93 | +// |
| 94 | +// >>> from scipy import stats |
| 95 | +// >>> stats.shapiro([1.0, 2.0, 3.0, 4.0, 5.0]) |
| 96 | +// ShapiroResult(statistic=0.9868..., pvalue=0.9672...) |
| 97 | +let ``shapiro matches scipy on [1..5]`` () = ... |
| 98 | +``` |
| 99 | + |
| 100 | +If you cannot find a reference implementation, say so explicitly in the PR and propose how the numbers were validated (hand derivation, paper, cross-check against another method). Do not silently ship unverified numerics. |
| 101 | + |
| 102 | +## Conventions |
| 103 | + |
| 104 | +- Match the surrounding F# style; prefer adding to existing modules over creating new top-level ones. |
| 105 | + - The code styling in this repo changed over time. Follow the style of the area you are editing, not necessarily the style of the oldest code in the repo. |
| 106 | + - For older functional and nested-module style, see [src/FSharp.Stats/Correlation.fs](src/FSharp.Stats/Correlation.fs) and [src/FSharp.Stats/Quantile.fs](src/FSharp.Stats/Quantile.fs). |
| 107 | + - For newer ergonomic APIs with static members, overloads, and optional parameters, see [src/FSharp.Stats/Integration/Integration.fs](src/FSharp.Stats/Integration/Integration.fs), [src/FSharp.Stats/Signal/QQPlot.fs](src/FSharp.Stats/Signal/QQPlot.fs), [src/FSharp.Stats/Testing/ConfusionMatrix.fs](src/FSharp.Stats/Testing/ConfusionMatrix.fs), and [src/FSharp.Stats/Fitting/LinearRegression.fs](src/FSharp.Stats/Fitting/LinearRegression.fs). |
| 108 | + - When adding new ergonomic APIs, prefer a two-tier shape: a core implementation that takes all parameters explicitly, plus overloads or convenience entrypoints for common defaults. |
| 109 | +- If you change public API or user-facing behavior, update the relevant docs script in [docs/](docs/) and keep XML documentation comments in sync. Good public API examples to mirror include [src/FSharp.Stats/Integration/Integration.fs](src/FSharp.Stats/Integration/Integration.fs) and [src/FSharp.Stats/Fitting/LinearRegression.fs](src/FSharp.Stats/Fitting/LinearRegression.fs). |
| 110 | +- Run `./build.sh RunTests` before opening a PR. |
| 111 | +- Target the `developer` branch with PRs. |
| 112 | +- Avoid churning checked-in benchmark output under `benchmarks/**/BenchmarkDotNet.Artifacts` unless you are intentionally refreshing benchmark results. |
| 113 | +- Keep PRs focused — one method (or one tightly-related family) per PR makes the reference-implementation review tractable. |
| 114 | +- Absolutely no changes to code should come without (regression) tests, even if no reference implementation is available. If you add code, you must add tests that validate it. |
0 commit comments