Edison Scientific deep-research integration via edison-client SDK#33
Merged
Conversation
Adds a programmatic path to FutureHouse / Edison Scientific deep
research over CultureMech media records. Companion to (not replacement
for) scripts/research_media.py, which wraps deep-research-client — DRC
0.2.4 registers only `openai` and `cyberian` as providers, so Edison /
PaperQA isn't accessible through it. The edison-client SDK (already
in pyproject deps) is invoked directly here.
New: scripts/research_media_edison.py
Single recipe:
python scripts/research_media_edison.py --target <slug|id|path>
Batch (priority list):
python scripts/research_media_edison.py \
--batch data/import_tracking/reports/edison_batch.json --limit 5
--job: literature (paperqa3, default) | literature-high | precedent
| phoenix. Aliases: paperqa, paperqa-high.
--dry-run: render the query and print the plan; no API call, no
credits spent.
--start / --limit: cap or skip into a batch list.
Auth picks up EDISON_PLATFORM_API_KEY (SDK-native) or EDISON_API_KEY
(legacy alias used by research_media.py). A repo-root .env is
auto-loaded via python-dotenv. .env.example added.
Reuses research_media.py's template_vars / load_media / resolve
helpers so the rendered query matches the existing DRC workflow.
Outputs land under research/media/{slug}-edison-{job}.md plus a
sibling -meta.yaml capturing task_id, total_cost, status, the
rendered template variables, and the prompt size for audit.
Batch resolution gotcha (and fix):
edison_batch.json carries `recipe_name` (slug derived from the YAML
name field) AND `file_path` (relative to data/normalized_yaml/).
research_media.py's resolve_media_file:
(a) treats string paths as relative to CWD (not normalized_yaml/),
so file_path entries miss; and
(b) returns ValueError when a slug matches multiple files (e.g.,
"dehalospirillum_medium" appears in 5 importer-flavor variants).
The Edison batch resolver now first tries
`data/normalized_yaml/<file_path>` verbatim — unambiguous — before
falling back to slug matching. Resolvability against the freshly-
regenerated 100-recipe batch is 100% (5/5 in --limit 5 smoke).
Justfile targets:
research-media-edison target *args="" # single
research-media-edison-batch batch *args="" # batch (pass --limit!)
Out of scope here: live API smoke test deferred until user confirms
.env has the new LBL key.
There was a problem hiding this comment.
Pull request overview
Adds a new Edison Scientific / FutureHouse “deep research” entrypoint for CultureMech media records by calling the edison-client SDK directly (bypassing deep-research-client provider limitations), plus convenience just targets and updated reporting/docs artifacts.
Changes:
- Introduces
scripts/research_media_edison.pyto submit Edison jobs for single targets or batches, writing Markdown + YAML meta outputs. - Adds
just research-media-edisonandjust research-media-edison-batchtargets to run the Edison script viauv. - Updates the committed quality analysis report summary numbers and adds a
.env.exampletemplate for Edison key setup.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| scripts/research_media_edison.py | New Edison SDK-backed research script (single + batch modes, output + meta writing). |
| project.justfile | Adds just targets to invoke the Edison research script. |
| data/import_tracking/reports/quality_analysis.md | Updates report metrics/formatting for the latest corpus analysis. |
| .env.example | Adds an example env file documenting Edison API key usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Eight findings, all fixed:
1. research_media_edison.py — Path.relative_to(REPO_ROOT) crashed
--dry-run for --out-dir outside the repo. New _display_path()
helper falls back to absolute string when the path isn't under
REPO_ROOT.
2. Module docstring claimed --dry-run "prints the rendered query" but
only printed paths + query_chars. --dry-run now writes the full
meta yaml (including the rendered prompt) alongside the would-be
md path; docstring updated to match.
3. Meta dict didn't actually contain the query referenced in the
"prompt that was sent" doc claim. Added `query`, `media_id`, and
`template_path` fields. Live runs gain the same fields.
4. slug_for() used the CURIE local part (009674) which diverged from
research_media.py's stem-based naming. Switched to media_path.stem
(e.g. luria_bertani_lb_medium) so research outputs are sortable /
findable by recipe name. CURIE id captured in meta.media_id
instead.
5. Filename suffix used job.name.lower() which produced
"literature_high" while the CLI alias is --job literature-high.
New _short_job() helper normalizes _ -> - for consistency.
6. edison-client + python-dotenv were only transitive deps via
deep-research-client. Declared both explicitly under the dev
extra in pyproject.toml so fresh `uv run --extra dev ...` won't
break if the transitive ever drops out. Lockfile refreshed.
7. analyze_media_quality.py wrote the developer's absolute CWD into
the committed report header. Now writes a repo-relative
`Source dir: data/normalized_yaml` line; output_file used as the
anchor for the relative-path computation.
8. tests/test_research_media_edison.py covers:
- load_batch_targets returns recipe_name + file_path candidates
in fall-through order
- load_batch_targets rejects non-list JSON with SystemExit
- _short_job emits hyphens (literature-high)
- slug_for uses the YAML stem
- _display_path doesn't crash on paths outside REPO_ROOT
- resolve_job recognizes literature / paperqa / literature-high /
paperqa-high aliases and SystemExits on unknown jobs
7/7 passing.
just validate-strict: 0 ERROR rows / 15,827 records (no schema or
recipe-data touched).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wires a programmatic path to FutureHouse / Edison Scientific deep research over CultureMech media records.
The existing
scripts/research_media.pywrapsdeep-research-client, but DRC 0.2.4 registers onlyopenaiandcyberianas providers — Edison / PaperQA isn't accessible through it. Theedison-clientSDK is already inpyprojectdeps; this PR invokes it directly.New file:
scripts/research_media_edison.pypython scripts/research_media_edison.py --target <slug|id|path>python scripts/research_media_edison.py --batch data/import_tracking/reports/edison_batch.json --limit 5--job literature(paperqa3, default) |literature-high|precedent|phoenix. Aliasespaperqa,paperqa-high.--dry-runrenders the query and prints the plan without an API call (no credits).--start/--limitcap or skip into a batch list.EDISON_PLATFORM_API_KEY(SDK-native) or legacyEDISON_API_KEY; auto-loads repo-root.envvia python-dotenv.research_media.py'stemplate_vars/load_media/resolve_media_fileso the rendered query matches the existing DRC workflow.research/media/{slug}-edison-{job}.md+ sibling-meta.yaml(task_id, total_cost, status, template_vars).Batch resolution fix
The 100-recipe
edison_batch.jsonpriority list was failing to resolve becauseresearch_media.py'sresolve_media_file:file_paths as CWD-relative (notdata/normalized_yaml/-relative), anddehalospirillum_mediumexists in 5 importer-flavor variants).The Edison batch resolver tries
data/normalized_yaml/<file_path>verbatim first — unambiguous — then falls back to slug matching. Resolvability is 100% (5/5) in the smoke dry-run.Justfile targets
just research-media-edison <target> [*args]just research-media-edison-batch <batch.json> [*args](always pass--limit Non first runs)Also
.env.exampleadded (gitignored —.envalready is).data/import_tracking/reports/edison_batch.jsonregenerated against current corpus viaanalyze_media_quality.py(the March-vintage file pre-dated the snake_case + orphan-page cleanups; few entries still resolved).Out of scope here
Live API smoke test deferred until user confirms
.envhas the new LBL key (2.6k credits).Test plan
--target luria_bertani_lb_medium)--limit 5) — 5/5 resolve cleanlyjust --listshows both new targets in the Research groupjust validate-strictclean (no schema/data touched)🤖 Generated with Claude Code