You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update PKG-INFO to clarify repository purpose, structure, and NDA-safe context; enhance descriptions of evaluation artifacts and development practices.
Copy file name to clipboardExpand all lines: src/math_econ_reasoning_portfolio_v3.egg-info/PKG-INFO
+95-24Lines changed: 95 additions & 24 deletions
Original file line number
Diff line number
Diff line change
@@ -14,70 +14,141 @@ Requires-Dist: matplotlib>=3.8; extra == "dev"
14
14
Requires-Dist: notebook>=7.0; extra == "dev"
15
15
Dynamic: license-file
16
16
17
-
# Math + Econ Reasoning Portfolio (NDA-safe)
17
+
# Math + Econ Reasoning Portfolio (NDA-safe excerpts)
18
18
19
-
Public portfolio of **math/economics modeling tasks** designed to test **LLM reasoning** and demonstrate:
20
-
- non-trivial numerical methods (bisection/root-finding, verification checks, Monte Carlo sanity checks)
21
-
- reproducible reference solutions
22
-
- validators (numeric checking with tolerances)
23
-
- tests + CI (Python 3.10–3.12)
19
+
This repository contains **carefully selected, NDA-safe excerpts** from a larger body of **math- and economics-based analytical work** used to design, evaluate, and verify **LLM reasoning and numerical reliability**.
24
20
25
-
Tasks are **inspired by real work** but use **synthetic numbers** and are **not copied** from any private system.
21
+
The materials here focus on the **final verification layer** of much broader analyses:
22
+
- reduced-form problem statements
23
+
- distilled numerical cores
24
+
- deterministic validation logic
26
25
27
-
## Structure
28
-
- `problems/` — problem statements + failure modes
29
-
- `src/econ_math_portfolio/models/` — model implementations (no code runs on import)
30
-
- `validators/` — validators calling model code
31
-
- `originals/` — your original task scripts kept for transparency (not used by imports)
32
-
- `tests/` — pytest
33
-
- `.github/workflows/ci.yml` — CI
26
+
The original tasks were typically **more complex, data-driven, and multi-stage**, but are presented here in **simplified, synthetic form** to remain fully public and NDA-compliant.
27
+
28
+
This is best viewed as a **portfolio of evaluation artefacts** rather than a full reproduction of the underlying research pipelines.
29
+
30
+
---
31
+
32
+
## What this repo demonstrates
33
+
34
+
- **Non-trivial numerical methods**
35
+
(bisection / root-finding, verification inequalities, Monte Carlo sanity checks)
36
+
37
+
- **Reproducible reference solutions**
38
+
with explicit tolerances and deterministic outputs
39
+
40
+
- **Answer validation & scoring logic**
41
+
similar to LLM evaluation / grading pipelines
42
+
43
+
- **Failure-mode awareness**
44
+
(bounds, monotonicity assumptions, bracketing errors, model misspecification)
45
+
46
+
- **Clean Python engineering**
47
+
(tests, CI, no side effects on import, CLI + JSON outputs)
48
+
49
+
---
50
+
51
+
## Important context (NDA-safe clarification)
52
+
53
+
The problems in this repository are **not full research problems** and **not client deliverables**.
54
+
55
+
They are:
56
+
- **condensed representations** of larger analytical tasks
57
+
- using **synthetic or normalized parameters**
58
+
- stripped of proprietary data, domain specifics, and contextual complexity
59
+
60
+
In practice, the original tasks:
61
+
- involved richer stochastic structure or real datasets
62
+
- required additional constraints, diagnostics, and robustness checks
63
+
- were embedded in broader modeling or evaluation workflows
64
+
65
+
What you see here corresponds to the **final reasoning and verification step** — the part most relevant for assessing **LLM numerical reasoning, correctness, and failure behavior**.
66
+
67
+
---
68
+
69
+
## Repository structure
70
+
71
+
- `problems/` — problem statements + failure modes
72
+
- `src/econ_math_portfolio/models/` — model implementations (no code runs on import)
73
+
- `validators/` — validators calling model code
74
+
- `originals/` — original standalone scripts kept for transparency (not imported)
75
+
- `rubrics/` — scoring rules inspired by LLM evaluation setups
76
+
- `tests/` — pytest
77
+
- `.github/workflows/ci.yml` — CI (Python 3.10–3.12)
0 commit comments