Skip to content

Commit f925936

Browse files
author
Anjey
committed
Update PKG-INFO to clarify repository purpose, structure, and NDA-safe context; enhance descriptions of evaluation artifacts and development practices.
1 parent 1e86290 commit f925936

2 files changed

Lines changed: 106 additions & 24 deletions

File tree

.gitignore

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
.venv/
2+
__pycache__/
3+
*.pyc
4+
.pytest_cache/
5+
.ruff_cache/
6+
.mypy_cache/
7+
.ipynb_checkpoints/
8+
.dist-info/
9+
build/
10+
dist/
11+
*.egg-info/

src/math_econ_reasoning_portfolio_v3.egg-info/PKG-INFO

Lines changed: 95 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -14,70 +14,141 @@ Requires-Dist: matplotlib>=3.8; extra == "dev"
1414
Requires-Dist: notebook>=7.0; extra == "dev"
1515
Dynamic: license-file
1616

17-
# Math + Econ Reasoning Portfolio (NDA-safe)
17+
# Math + Econ Reasoning Portfolio (NDA-safe excerpts)
1818

19-
Public portfolio of **math/economics modeling tasks** designed to test **LLM reasoning** and demonstrate:
20-
- non-trivial numerical methods (bisection/root-finding, verification checks, Monte Carlo sanity checks)
21-
- reproducible reference solutions
22-
- validators (numeric checking with tolerances)
23-
- tests + CI (Python 3.10–3.12)
19+
This repository contains **carefully selected, NDA-safe excerpts** from a larger body of **math- and economics-based analytical work** used to design, evaluate, and verify **LLM reasoning and numerical reliability**.
2420

25-
Tasks are **inspired by real work** but use **synthetic numbers** and are **not copied** from any private system.
21+
The materials here focus on the **final verification layer** of much broader analyses:
22+
- reduced-form problem statements
23+
- distilled numerical cores
24+
- deterministic validation logic
2625

27-
## Structure
28-
- `problems/` — problem statements + failure modes
29-
- `src/econ_math_portfolio/models/` — model implementations (no code runs on import)
30-
- `validators/` — validators calling model code
31-
- `originals/` — your original task scripts kept for transparency (not used by imports)
32-
- `tests/` — pytest
33-
- `.github/workflows/ci.yml` — CI
26+
The original tasks were typically **more complex, data-driven, and multi-stage**, but are presented here in **simplified, synthetic form** to remain fully public and NDA-compliant.
27+
28+
This is best viewed as a **portfolio of evaluation artefacts** rather than a full reproduction of the underlying research pipelines.
29+
30+
---
31+
32+
## What this repo demonstrates
33+
34+
- **Non-trivial numerical methods**
35+
(bisection / root-finding, verification inequalities, Monte Carlo sanity checks)
36+
37+
- **Reproducible reference solutions**
38+
with explicit tolerances and deterministic outputs
39+
40+
- **Answer validation & scoring logic**
41+
similar to LLM evaluation / grading pipelines
42+
43+
- **Failure-mode awareness**
44+
(bounds, monotonicity assumptions, bracketing errors, model misspecification)
45+
46+
- **Clean Python engineering**
47+
(tests, CI, no side effects on import, CLI + JSON outputs)
48+
49+
---
50+
51+
## Important context (NDA-safe clarification)
52+
53+
The problems in this repository are **not full research problems** and **not client deliverables**.
54+
55+
They are:
56+
- **condensed representations** of larger analytical tasks
57+
- using **synthetic or normalized parameters**
58+
- stripped of proprietary data, domain specifics, and contextual complexity
59+
60+
In practice, the original tasks:
61+
- involved richer stochastic structure or real datasets
62+
- required additional constraints, diagnostics, and robustness checks
63+
- were embedded in broader modeling or evaluation workflows
64+
65+
What you see here corresponds to the **final reasoning and verification step** — the part most relevant for assessing **LLM numerical reasoning, correctness, and failure behavior**.
66+
67+
---
68+
69+
## Repository structure
70+
71+
- `problems/` — problem statements + failure modes
72+
- `src/econ_math_portfolio/models/` — model implementations (no code runs on import)
73+
- `validators/` — validators calling model code
74+
- `originals/` — original standalone scripts kept for transparency (not imported)
75+
- `rubrics/` — scoring rules inspired by LLM evaluation setups
76+
- `tests/` — pytest
77+
- `.github/workflows/ci.yml` — CI (Python 3.10–3.12)
78+
79+
---
3480

3581
## Quickstart
82+
3683
```bash
3784
python -m venv .venv
38-
source .venv/bin/activate # Windows: .venv\Scripts\activate
85+
source .venv/bin/activate
3986
pip install -e ".[dev]"
4087
pytest
4188
python -m econ_math_portfolio list
4289
```
4390

44-
## CLI
91+
## Development
92+
93+
Run linting and tests:
94+
```bash
95+
ruff check .
96+
ruff format --check .
97+
pytest
98+
```
99+
100+
---
101+
102+
## CLI usage
103+
45104
```bash
46105
python -m econ_math_portfolio reference credit_var_quantile
47106
python -m econ_math_portfolio validate cpi_target_discount 0.26191
48107
```
49108

109+
---
110+
50111
## Notebook demo
51-
Run:
112+
52113
```bash
53114
jupyter notebook notebooks/demo.ipynb
54115
```
55116

117+
---
118+
56119
## JSON output (tool-calling friendly)
57-
All commands accept `--json`:
120+
58121
```bash
59122
python -m econ_math_portfolio list --json
60123
python -m econ_math_portfolio reference cpi_target_discount --json
61124
python -m econ_math_portfolio validate cpi_target_discount 0.26191 --json
62125
```
63126

127+
---
64128

65129
## Scoring rubric (LLM evaluation style)
66130

67-
This repo includes a lightweight, NDA-safe evaluation layer:
68-
- `rubrics/rubric.json` defines format + numeric correctness + (optional) reasoning weighting
69-
- `score` command grades a submission JSON against the reference answer
70-
71-
Example:
72131
```bash
73132
python -m econ_math_portfolio score submissions/contract_good.json --json
74133
```
75134

76-
Submission JSON schema:
135+
Submission format:
136+
77137
```json
78138
{
79139
"task_id": "cpi_target_discount",
80140
"answer": 0.26191,
81141
"explanation": "optional short explanation"
82142
}
83143
```
144+
145+
---
146+
147+
## How to interpret this portfolio
148+
149+
This repository is a **curated slice of real analytical work**, intentionally focused on:
150+
- reasoning clarity
151+
- numerical correctness
152+
- verification and evaluation
153+
154+
The goal is to show **how problems are checked**, not just how they are solved.

0 commit comments

Comments
 (0)