Update PKG-INFO to clarify repository purpose, structure, and NDA-safe context; enhance descriptions of evaluation artifacts and development practices.

Anjey · Anjey · commit f925936ca4ef · 2025-12-23T00:13:08.000Z
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,11 @@
+.venv/
+__pycache__/
+*.pyc
+.pytest_cache/
+.ruff_cache/
+.mypy_cache/
+.ipynb_checkpoints/
+.dist-info/
+build/
+dist/
+*.egg-info/
diff --git a/src/math_econ_reasoning_portfolio_v3.egg-info/PKG-INFO b/src/math_econ_reasoning_portfolio_v3.egg-info/PKG-INFO
@@ -14,70 +14,141 @@ Requires-Dist: matplotlib>=3.8; extra == "dev"
 Requires-Dist: notebook>=7.0; extra == "dev"
 Dynamic: license-file
 
-# Math + Econ Reasoning Portfolio (NDA-safe)
+# Math + Econ Reasoning Portfolio (NDA-safe excerpts)
 
-Public portfolio of **math/economics modeling tasks** designed to test **LLM reasoning** and demonstrate:
-- non-trivial numerical methods (bisection/root-finding, verification checks, Monte Carlo sanity checks)
-- reproducible reference solutions
-- validators (numeric checking with tolerances)
-- tests + CI (Python 3.10–3.12)
+This repository contains **carefully selected, NDA-safe excerpts** from a larger body of **math- and economics-based analytical work** used to design, evaluate, and verify **LLM reasoning and numerical reliability**.
 
-Tasks are **inspired by real work** but use **synthetic numbers** and are **not copied** from any private system.
+The materials here focus on the **final verification layer** of much broader analyses:
+- reduced-form problem statements  
+- distilled numerical cores  
+- deterministic validation logic  
 
-## Structure
-- `problems/` — problem statements + failure modes
-- `src/econ_math_portfolio/models/` — model implementations (no code runs on import)
-- `validators/` — validators calling model code
-- `originals/` — your original task scripts kept for transparency (not used by imports)
-- `tests/` — pytest
-- `.github/workflows/ci.yml` — CI
+The original tasks were typically **more complex, data-driven, and multi-stage**, but are presented here in **simplified, synthetic form** to remain fully public and NDA-compliant.
+
+This is best viewed as a **portfolio of evaluation artefacts** rather than a full reproduction of the underlying research pipelines.
+
+---
+
+## What this repo demonstrates
+
+- **Non-trivial numerical methods**  
+  (bisection / root-finding, verification inequalities, Monte Carlo sanity checks)
+
+- **Reproducible reference solutions**  
+  with explicit tolerances and deterministic outputs
+
+- **Answer validation & scoring logic**  
+  similar to LLM evaluation / grading pipelines
+
+- **Failure-mode awareness**  
+  (bounds, monotonicity assumptions, bracketing errors, model misspecification)
+
+- **Clean Python engineering**  
+  (tests, CI, no side effects on import, CLI + JSON outputs)
+
+---
+
+## Important context (NDA-safe clarification)
+
+The problems in this repository are **not full research problems** and **not client deliverables**.
+
+They are:
+- **condensed representations** of larger analytical tasks  
+- using **synthetic or normalized parameters**  
+- stripped of proprietary data, domain specifics, and contextual complexity  
+
+In practice, the original tasks:
+- involved richer stochastic structure or real datasets  
+- required additional constraints, diagnostics, and robustness checks  
+- were embedded in broader modeling or evaluation workflows  
+
+What you see here corresponds to the **final reasoning and verification step** — the part most relevant for assessing **LLM numerical reasoning, correctness, and failure behavior**.
+
+---
+
+## Repository structure
+
+- `problems/` — problem statements + failure modes  
+- `src/econ_math_portfolio/models/` — model implementations (no code runs on import)  
+- `validators/` — validators calling model code  
+- `originals/` — original standalone scripts kept for transparency (not imported)  
+- `rubrics/` — scoring rules inspired by LLM evaluation setups  
+- `tests/` — pytest  
+- `.github/workflows/ci.yml` — CI (Python 3.10–3.12)
+
+---
 
 ## Quickstart
+
 ```bash
 python -m venv .venv
-source .venv/bin/activate  # Windows: .venv\Scripts\activate
+source .venv/bin/activate
 pip install -e ".[dev]"
 pytest
 python -m econ_math_portfolio list
 ```
 
-## CLI
+## Development
+
+Run linting and tests:
+```bash
+ruff check .
+ruff format --check .
+pytest
+```
+
+---
+
+## CLI usage
+
 ```bash
 python -m econ_math_portfolio reference credit_var_quantile
 python -m econ_math_portfolio validate cpi_target_discount 0.26191
 ```
 
+---
+
 ## Notebook demo
-Run:
+
 ```bash
 jupyter notebook notebooks/demo.ipynb
 ```
 
+---
+
 ## JSON output (tool-calling friendly)
-All commands accept `--json`:
+
 ```bash
 python -m econ_math_portfolio list --json
 python -m econ_math_portfolio reference cpi_target_discount --json
 python -m econ_math_portfolio validate cpi_target_discount 0.26191 --json
 ```
 
+---
 
 ## Scoring rubric (LLM evaluation style)
 
-This repo includes a lightweight, NDA-safe evaluation layer:
-- `rubrics/rubric.json` defines format + numeric correctness + (optional) reasoning weighting
-- `score` command grades a submission JSON against the reference answer
-
-Example:
 ```bash
 python -m econ_math_portfolio score submissions/contract_good.json --json
 ```
 
-Submission JSON schema:
+Submission format:
+
 ```json
 {
   "task_id": "cpi_target_discount",
   "answer": 0.26191,
   "explanation": "optional short explanation"
 }
 ```
+
+---
+
+## How to interpret this portfolio
+
+This repository is a **curated slice of real analytical work**, intentionally focused on:
+- reasoning clarity  
+- numerical correctness  
+- verification and evaluation  
+
+The goal is to show **how problems are checked**, not just how they are solved.