add GRPO fine-tuning notebook for JSON invoice extraction using Fireworks Training API by namanvirk18 · Pull Request #243 · patchy631/ai-engineering-hub

namanvirk18 · 2026-06-01T07:08:15Z

Summary by CodeRabbit

New Features
- End-to-end GRPO fine-tuning workflow for structured JSON invoice extraction, including training, evaluation, and demo/run scripts
- Included training (200) and evaluation (50) example prompts plus sample invoices and CLI demo for batch inference
- Automated reward validation utility and notebook with evaluation charts and baseline comparison
Documentation
- Detailed README and skill documentation with setup, environment variables, usage steps, and contribution notes

… Fireworks Training API

coderabbitai · 2026-06-01T07:08:35Z

📝 Walkthrough

Walkthrough

This PR adds an end-to-end GRPO fine-tuning workflow for JSON invoice extraction (notebook, training/eval datasets, reward scorer), plus agent-skill docs, a reward validator, orchestration scripts, a CLI demo, and sample invoice inputs.

Changes

GRPO Training Pipeline

Layer / File(s)	Summary
Project setup & notebook init `grpo-finetuning-qwen3/.env.example`, `grpo-finetuning-qwen3/README.md`, `grpo-finetuning-qwen3/grpo_json_extraction.ipynb` (Step 0)	Environment example entries, README with prerequisites and run instructions, and notebook initialization (dependency install and API-key loading/assertions).
Reward function (score) implementation `grpo-finetuning-qwen3/grpo_json_extraction.ipynb` (Step 1)	JSON parsing + JSON-schema scoring function `score()` with 1.0/0.5/0.0 semantics and inline self-tests.
Training & evaluation datasets `grpo-finetuning-qwen3/train_prompts.jsonl`, `grpo-finetuning-qwen3/eval_prompts.jsonl`	200-line JSONL training dataset and 50-line eval JSONL with extraction prompts and ground-truth `vendor/date/amount/currency`.
Dataset upload & baseline evaluation `grpo-finetuning-qwen3/grpo_json_extraction.ipynb` (Steps 2–3)	Upload training dataset to Fireworks (poll until READY) and run baseline `run_eval()` over eval prompts using OpenRouter client and `score()`.
GRPO training execution `grpo-finetuning-qwen3/grpo_json_extraction.ipynb` (Step 4)	Wire and run Fireworks GRPO RL loop with `invoice_reward`, model/dataset/GRPO config, infra/deployment settings, and notebook-compatible asyncio patching.
Post-training evaluation & visualization `grpo-finetuning-qwen3/grpo_json_extraction.ipynb` (Steps 5–7)	Evaluate deployed fine-tuned model, evaluate `openai/gpt-4.1`, compute accuracies, generate `eval_chart.png`, and run single inference example with schema validation.

Agent Skill & Pipeline

Layer / File(s)	Summary
Skill documentation & demo `grpo-finetuning-qwen3/agent-skill/grpo-finetune/SKILL.md`, `grpo-finetuning-qwen3/agent-skill/grpo-finetune/agent_demo.py`	Skill README with trigger/workflow/commands and a CLI demo that calls a deployment, sanitizes/parses JSON, validates required fields, and prints per-invoice results.
Reward validator `grpo-finetuning-qwen3/agent-skill/grpo-finetune/generate_reward.py`	Validator that dynamically imports `reward.py`, ensures `score(completion, row=None)` exists, probes outputs are numeric ∈ [0,1], and optionally runs `SELF_TESTS`, returning process exit codes.
Run pipeline orchestration `grpo-finetuning-qwen3/agent-skill/grpo-finetune/run_pipeline.py`	CLI orchestration: validate reward, upload or reuse dataset, run GRPO training via RL-loop wiring, poll readiness, evaluate deployment with `run_eval`, and optionally execute local demo using `invoices.txt`.
Demo inputs `grpo-finetuning-qwen3/invoices.txt`	Five sample invoice lines used for demo/inference runs.

Sequence Diagram

sequenceDiagram
  participant Notebook
  participant Fireworks as Fireworks (Training & Inference)
  participant OpenRouter as OpenRouter (Inference)
  participant Local as Local (Reward Scoring)
  
  Notebook->>Fireworks: Upload train_prompts.jsonl dataset
  Fireworks-->>Notebook: Dataset ready
  
  Notebook->>OpenRouter: Baseline eval on eval_prompts
  OpenRouter-->>Notebook: Base model outputs
  Notebook->>Local: Score outputs with reward function
  Local-->>Notebook: Schema-valid accuracy
  
  Notebook->>Fireworks: Execute GRPO training loop
  Fireworks->>Fireworks: Train with invoice_reward function
  Fireworks-->>Notebook: Deployment endpoint ready
  
  Notebook->>Fireworks: Eval fine-tuned model
  Fireworks-->>Notebook: Fine-tuned outputs
  Notebook->>Local: Score outputs
  Local-->>Notebook: Fine-tuned accuracy
  
  Notebook->>OpenRouter: Eval GPT-4.1 baseline
  OpenRouter-->>Notebook: GPT-4.1 outputs
  Notebook->>Local: Score outputs
  Local-->>Notebook: GPT-4.1 accuracy
  
  Notebook->>Notebook: Generate & save eval_chart.png

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through prompts and JSON lines,
Reward tests, uploads, and training times,
Baseline, tuned, and GPT side-by-side,
Charts saved, demos run, all verified,
A tiny rabbit cheers this pipeline's signs.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately summarizes the main addition: a GRPO fine-tuning notebook for JSON invoice extraction using the Fireworks Training API, which aligns with the primary file addition (grpo_json_extraction.ipynb) and core objective.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@grpo-finetuning-qwen3/eval_prompts.jsonl`:
- Around line 1-50: The eval_prompts.jsonl file contains corrupted ground_truth
labels that don't match the prompt text inside each "messages" array (e.g.,
amounts/currencies in the "user" content disagree with the "ground_truth"
object); update each record's ground_truth fields (vendor, date, amount,
currency) to exactly match the invoice text in the corresponding "messages" ->
user content (fix entries like the first record where "654 GBP" should be
amount: 654 and currency: "GBP", the sixth record that incorrectly adds "GBP",
and the 14th that lists 130 JPY but has 130.5), or remove/regenerate any rows
where you cannot confidently determine the correct label; ensure every JSON
object’s "ground_truth" matches the parsed values from its "messages" content
before running run_eval().

In `@grpo-finetuning-qwen3/grpo_json_extraction.ipynb`:
- Around line 564-567: The reward function invoice_reward currently ignores the
ground-truth row and only calls score(completion), so the policy can game output
shape; update invoice_reward to parse the completion JSON, extract fields
(vendor, date, amount, currency), compare them against the corresponding values
in row (normalizing formats for date/amount/currency), compute a numeric reward
that gives full credit for exact matches and partial credit for close/normalized
matches and penalties for missing/invalid fields, and return that float; replace
the current rl_loop.reward_fn binding with this new invoice_reward and ensure
parsing errors return a low reward rather than a crash.
- Around line 257-288: run_eval currently only measures schema validity (via
score/content) and never uses entry["ground_truth"]; update run_eval to compute
task accuracy against the true labels by parsing the model output JSON and
comparing it to entry["ground_truth"]. Specifically, inside run_eval (and where
you call score), first validate/parse content into a dict (handle the existing
</think> strip and JSON exceptions), then call a new or extended comparator
(e.g., score or compare_to_ground_truth) that accepts (parsed_output,
entry["ground_truth"]) and returns 1.0 for a correct match and 0.0 otherwise;
append that result to scores, keep schema-valid logging if desired, and change
the printed/returned accuracy to be based on matches to entry["ground_truth"]
rather than just schema validity. Ensure you reference run_eval, eval_data,
entry["ground_truth"], and score/compare_to_ground_truth when implementing.
- Around line 548-552: The notebook globally replaces asyncio.run with
_patched_run but never restores it if main(cfg) raises; wrap the patch and the
call to main(cfg) in a try/finally: save the original in _original_run, assign
asyncio.run = _patched_run, call main(cfg) inside try, and always restore
asyncio.run = _original_run in the finally block (referencing the existing
symbols _original_run, _patched_run, asyncio.run and the main(cfg) invocation)
so later cells see the original asyncio.run regardless of exceptions.
- Around line 84-88: The code currently assigns os.environ["FIREWORKS_API_KEY"]
= FIREWORKS_KEY before checking FIREWORKS_KEY is non-None, which can raise
TypeError; move or replace the assertion so that assert FIREWORKS_KEY, "Missing
FIREWORKS_API_KEY" (and similar asserts for OPENROUTER_KEY and ACCOUNT_ID) run
before any os.environ[...] = ... assignments, and ensure the checks verify the
values are non-empty strings (not None) before injecting them into environment
variables (refer to FIREWORKS_KEY, OPENROUTER_KEY, ACCOUNT_ID and the
os.environ[...] assignments).

In `@grpo-finetuning-qwen3/README.md`:
- Around line 19-40: Update the README to explicitly instruct users to start
Jupyter (or run the notebook) from the grpo-finetuning-qwen3 directory so the
notebook’s relative imports resolve; mention that grpo_json_extraction.ipynb
expects process-relative paths to ./cookbook/training, ./train_prompts.jsonl,
and ./eval_prompts.jsonl and provide the exact command to change into the
directory (e.g., cd grpo-finetuning-qwen3) or an alternative note about
launching Jupyter from the repo root after adjusting those paths in the
notebook.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ef7d615e-aabd-43f1-be50-ed875fc1a589

📥 Commits

Reviewing files that changed from the base of the PR and between 42eedf8 and 0d30d20.

⛔ Files ignored due to path filters (1)

grpo-finetuning-qwen3/eval_chart.png is excluded by !**/*.png

📒 Files selected for processing (5)

grpo-finetuning-qwen3/.env.example
grpo-finetuning-qwen3/README.md
grpo-finetuning-qwen3/eval_prompts.jsonl
grpo-finetuning-qwen3/grpo_json_extraction.ipynb
grpo-finetuning-qwen3/train_prompts.jsonl

coderabbitai · 2026-06-01T07:16:01Z

+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Please find invoice from Acme Corp dated April 15, 2024. Grand total: 654 GBP.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Acme Corp", "date": "2024-04-15", "amount": 653.75, "currency": "GBP"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Outstanding payment to Crestline Distributors, due as of 05/02/2024, totalling \u20ac7,999.00.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Crestline Distributors", "date": "2024-05-02", "amount": 7999.0, "currency": "EUR"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Received invoice from Redwood Staffing Inc on 09/17/2024. Amount due: 3,696 GBP.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Redwood Staffing Inc", "date": "2024-09-17", "amount": 3695.99, "currency": "GBP"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Invoice details: Supplier \u2014 Nexus Solutions Pvt Ltd. Invoice date: Jan 10 2023. Total payable: $69,552.50.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Nexus Solutions Pvt Ltd", "date": "2023-01-10", "amount": 69552.5, "currency": "USD"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Please find invoice from Global Logistics Ltd dated Apr 6 2023. Grand total: $1,043.75.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Global Logistics Ltd", "date": "2023-04-06", "amount": 1043.75, "currency": "USD"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Accounts payable entry \u2014 Nexus Solutions Pvt Ltd, 16/10/2023, 16731.25.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Nexus Solutions Pvt Ltd", "date": "2023-10-16", "amount": 16731.25, "currency": "GBP"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Bill from Sterling Parts & Co, dated 22/02/2023, total 83,860 GBP.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Sterling Parts & Co", "date": "2023-02-22", "amount": 83860.0, "currency": "GBP"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Received invoice from Ironclad Manufacturing on 18/12/2024. Amount due: 825.99.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Ironclad Manufacturing", "date": "2024-12-18", "amount": 825.99, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Invoice\nFrom: Clearwater Utilities\nIssued: Feb 7 2023\nAmount owed: 9,094 EUR\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Clearwater Utilities", "date": "2023-02-07", "amount": 9094.5, "currency": "EUR"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Received invoice from Sterling Parts & Co on Dec 7 2023. Amount due: $55,158.00 USD.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Sterling Parts & Co", "date": "2023-12-07", "amount": 55158.0, "currency": "USD"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Please find invoice from Pinnacle Hardware Co dated 04/23/2023. Grand total: 78,517 JPY.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Pinnacle Hardware Co", "date": "2023-04-23", "amount": 78517.0, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "We have received a bill from Global Logistics Ltd. Date of service: 2023-02-05. Invoice total: 3,621.50.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Global Logistics Ltd", "date": "2023-02-05", "amount": 3621.5, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Outstanding payment to Apex Print Works, due as of 1 October 2024, totalling 69.50.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Apex Print Works", "date": "2024-10-01", "amount": 69.5, "currency": "USD"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Received invoice from Clearwater Utilities on 02/24/2024. Amount due: 130 JPY.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Clearwater Utilities", "date": "2024-02-24", "amount": 130.5, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Vendor: BlueSky Rentals\nDate: 10/06/2024\nTotal: \u00a3890.75\n\nExtract the fields and return valid JSON only."}], "ground_truth": {"vendor": "BlueSky Rentals", "date": "2024-06-10", "amount": 890.75, "currency": "GBP"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "We have received a bill from Apex Print Works. Date of service: 12/08/2023. Invoice total: 392 JPY.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Apex Print Works", "date": "2023-08-12", "amount": 391.99, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Payment request from Bright Media LLC, issue date 12/06/2023, for the amount of 580.00.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Bright Media LLC", "date": "2023-12-06", "amount": 580.0, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Accounts payable entry \u2014 Nexus Solutions Pvt Ltd, 10/23/2024, \u20ac6,147.00.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Nexus Solutions Pvt Ltd", "date": "2024-10-23", "amount": 6147.0, "currency": "EUR"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Bill from Apex Print Works, dated 2024-06-25, total 7,211.00.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Apex Print Works", "date": "2024-06-25", "amount": 7211.0, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Vendor: Apex Print Works\nDate: 9 April 2023\nTotal: \u00a347,880.00\n\nExtract the fields and return valid JSON only."}], "ground_truth": {"vendor": "Apex Print Works", "date": "2023-04-09", "amount": 47880.0, "currency": "GBP"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Bill from Atlas Freight Services, dated 2023-04-10, total 85,914 JPY.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Atlas Freight Services", "date": "2023-04-10", "amount": 85914.5, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Received invoice from Pinnacle Hardware Co on December 19, 2023. Amount due: EUR 8,744.99.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Pinnacle Hardware Co", "date": "2023-12-19", "amount": 8744.99, "currency": "EUR"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Vendor: BlueSky Rentals\nDate: 09/13/2024\nTotal: 309.00 JPY\n\nExtract the fields and return valid JSON only."}], "ground_truth": {"vendor": "BlueSky Rentals", "date": "2024-09-13", "amount": 309.0, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Invoice\nFrom: Nexus Solutions Pvt Ltd\nIssued: 09/17/2024\nAmount owed: 215.99 JPY\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Nexus Solutions Pvt Ltd", "date": "2024-09-17", "amount": 215.99, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Please find invoice from Ironclad Manufacturing dated 2023-07-28. Grand total: $8,138.50.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Ironclad Manufacturing", "date": "2023-07-28", "amount": 8138.5, "currency": "USD"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Invoice details: Supplier \u2014 Cascade Engineering. Invoice date: May 21, 2023. Total payable: 18,142 GBP.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Cascade Engineering", "date": "2023-05-21", "amount": 18142.0, "currency": "GBP"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Accounts payable entry \u2014 Cascade Engineering, Jan 3 2023, $706.99.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Cascade Engineering", "date": "2023-01-03", "amount": 706.99, "currency": "USD"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Payment request from Acme Corp, issue date July 2, 2024, for the amount of 6,927.75 EUR.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Acme Corp", "date": "2024-07-02", "amount": 6927.75, "currency": "EUR"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "We have received a bill from BlueSky Rentals. Date of service: Jan 10 2024. Invoice total: JPY 447.50.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "BlueSky Rentals", "date": "2024-01-10", "amount": 447.5, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Payment request from Global Logistics Ltd, issue date 04/21/2024, for the amount of EUR 7,482.00.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Global Logistics Ltd", "date": "2024-04-21", "amount": 7482.0, "currency": "EUR"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Invoice details: Supplier \u2014 Ironclad Manufacturing. Invoice date: 4 April 2023. Total payable: 32,740.00 EUR.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Ironclad Manufacturing", "date": "2023-04-04", "amount": 32740.0, "currency": "EUR"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Payment request from Acme Corp, issue date 10/24/2023, for the amount of JPY 40,162.99.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Acme Corp", "date": "2023-10-24", "amount": 40162.99, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Accounts payable entry \u2014 Horizon Digital Agency, 10/09/2024, 93,238 USD.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Horizon Digital Agency", "date": "2024-09-10", "amount": 93238.0, "currency": "USD"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Vendor: Meridian Office Supplies\nDate: 09/14/2023\nTotal: JPY 9,683.00\n\nExtract the fields and return valid JSON only."}], "ground_truth": {"vendor": "Meridian Office Supplies", "date": "2023-09-14", "amount": 9683.0, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Received invoice from Ironclad Manufacturing on 28 March 2023. Amount due: 629.75 EUR.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Ironclad Manufacturing", "date": "2023-03-28", "amount": 629.75, "currency": "EUR"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Invoice\nFrom: Cascade Engineering\nIssued: 21/12/2024\nAmount owed: 346.00\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Cascade Engineering", "date": "2024-12-21", "amount": 346.0, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Invoice details: Supplier \u2014 Meridian Office Supplies. Invoice date: 12/10/2023. Total payable: $6,314.00.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Meridian Office Supplies", "date": "2023-12-10", "amount": 6314.0, "currency": "USD"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Invoice details: Supplier \u2014 Global Logistics Ltd. Invoice date: February 17, 2024. Total payable: USD 727.50.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Global Logistics Ltd", "date": "2024-02-17", "amount": 727.5, "currency": "USD"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Received invoice from Vortex Software Ltd on Nov 4 2024. Amount due: JPY 779.00.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Vortex Software Ltd", "date": "2024-11-04", "amount": 779.0, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "We have received a bill from Ironclad Manufacturing. Date of service: Dec 28 2023. Invoice total: USD 2,374.25.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Ironclad Manufacturing", "date": "2023-12-28", "amount": 2374.25, "currency": "USD"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "We have received a bill from TechSupplies Inc. Date of service: 08/04/2024. Invoice total: 9807.25.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "TechSupplies Inc", "date": "2024-08-04", "amount": 9807.25, "currency": "USD"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Payment request from Summit Consulting Group, issue date 2023-12-10, for the amount of GBP 67,422.25.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Summit Consulting Group", "date": "2023-12-10", "amount": 67422.25, "currency": "GBP"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Accounts payable entry \u2014 Clearwater Utilities, 9 February 2023, 398 JPY.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Clearwater Utilities", "date": "2023-02-09", "amount": 398.0, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Invoice details: Supplier \u2014 Cascade Engineering. Invoice date: September 24, 2023. Total payable: $985.00.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Cascade Engineering", "date": "2023-09-24", "amount": 985.0, "currency": "USD"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Bill from Orbit Telecom LLC, dated 11/25/2024, total \u20ac92,242.00.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Orbit Telecom LLC", "date": "2024-11-25", "amount": 92242.0, "currency": "EUR"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Payment request from Meridian Office Supplies, issue date May 15 2023, for the amount of \u20ac2,586.25.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Meridian Office Supplies", "date": "2023-05-15", "amount": 2586.25, "currency": "EUR"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Invoice details: Supplier \u2014 Nexus Solutions Pvt Ltd. Invoice date: April 22, 2024. Total payable: 56357.99.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Nexus Solutions Pvt Ltd", "date": "2024-04-22", "amount": 56357.99, "currency": "JPY"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Invoice details: Supplier \u2014 Summit Consulting Group. Invoice date: 2024-01-05. Total payable: 7001.50.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Summit Consulting Group", "date": "2024-01-05", "amount": 7001.5, "currency": "USD"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Payment request from Clearwater Utilities, issue date 2023-06-08, for the amount of USD 715.99.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Clearwater Utilities", "date": "2023-06-08", "amount": 715.99, "currency": "USD"}}
+{"messages": [{"role": "system", "content": "Extract the following fields from this invoice: vendor, date, amount, currency."}, {"role": "user", "content": "Received invoice from Apex Print Works on 20 March 2024. Amount due: 61,645.50 EUR.\n\nReturn valid JSON only."}], "ground_truth": {"vendor": "Apex Print Works", "date": "2024-03-20", "amount": 61645.5, "currency": "EUR"}}


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Several eval labels do not match the prompt text.

This held-out set has corrupted targets. For example, Line 1 says 654 GBP but ground_truth.amount is 653.75; Line 6 invents GBP even though the prompt gives no currency; Line 14 says 130 JPY but the label is 130.5. Once run_eval() starts comparing against ground_truth, these rows will penalize correct extraction.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpo-finetuning-qwen3/eval_prompts.jsonl` around lines 1 - 50, The eval_prompts.jsonl file contains corrupted ground_truth labels that don't match the prompt text inside each "messages" array (e.g., amounts/currencies in the "user" content disagree with the "ground_truth" object); update each record's ground_truth fields (vendor, date, amount, currency) to exactly match the invoice text in the corresponding "messages" -> user content (fix entries like the first record where "654 GBP" should be amount: 654 and currency: "GBP", the sixth record that incorrectly adds "GBP", and the 14th that lists 130 JPY but has 130.5), or remove/regenerate any rows where you cannot confidently determine the correct label; ensure every JSON object’s "ground_truth" matches the parsed values from its "messages" content before running run_eval().

coderabbitai · 2026-06-01T07:16:01Z

+    "os.environ[\"FIREWORKS_API_KEY\"] = FIREWORKS_KEY\n",
+    "\n",
+    "assert FIREWORKS_KEY,  \"Missing FIREWORKS_API_KEY\"\n",
+    "assert OPENROUTER_KEY, \"Missing OPENROUTER_API_KEY\"\n",
+    "assert ACCOUNT_ID,     \"Missing FIREWORKS_ACCOUNT_ID\"\n",


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash python - <<'PY' import os try: os.environ["FIREWORKS_API_KEY"] = None except Exception as exc: print(type(exc).__name__) print(exc) PY

Repository: patchy631/ai-engineering-hub

Length of output: 109

Assert FIREWORKS_KEY is a non-None string before assigning os.environ["FIREWORKS_API_KEY"]

When FIREWORKS_API_KEY is unset, FIREWORKS_KEY becomes None; writing it into os.environ throws TypeError: str expected, not NoneType before the notebook reaches assert FIREWORKS_KEY, "Missing FIREWORKS_API_KEY". Move/replace the assert so it runs before the os.environ[...] = FIREWORKS_KEY assignment (and similarly ensure other injected env values are not None).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpo-finetuning-qwen3/grpo_json_extraction.ipynb` around lines 84 - 88, The code currently assigns os.environ["FIREWORKS_API_KEY"] = FIREWORKS_KEY before checking FIREWORKS_KEY is non-None, which can raise TypeError; move or replace the assertion so that assert FIREWORKS_KEY, "Missing FIREWORKS_API_KEY" (and similar asserts for OPENROUTER_KEY and ACCOUNT_ID) run before any os.environ[...] = ... assignments, and ensure the checks verify the values are non-empty strings (not None) before injecting them into environment variables (refer to FIREWORKS_KEY, OPENROUTER_KEY, ACCOUNT_ID and the os.environ[...] assignments).

coderabbitai · 2026-06-01T07:16:01Z

+    "def run_eval(client, model_id: str, label: str) -> float:\n",
+    "    scores = []\n",
+    "    for i, entry in enumerate(eval_data):\n",
+    "        messages = [m for m in entry[\"messages\"] if m[\"role\"] != \"assistant\"]\n",
+    "        # Suppress thinking mode for structured output\n",
+    "        messages = [\n",
+    "            {**m, \"content\": m[\"content\"] + \" /no-think\"}\n",
+    "            if m[\"role\"] == \"system\" else m\n",
+    "            for m in messages\n",
+    "        ]\n",
+    "        try:\n",
+    "            response = client.chat.completions.create(\n",
+    "                model=model_id,\n",
+    "                messages=messages,\n",
+    "                temperature=0.0,\n",
+    "                max_tokens=512,\n",
+    "            )\n",
+    "            content = response.choices[0].message.content\n",
+    "            if \"</think>\" in content:\n",
+    "                content = content.split(\"</think>\")[-1].strip()\n",
+    "            s = score(content)\n",
+    "        except Exception as e:\n",
+    "            print(f\"  Error on prompt {i}: {e}\")\n",
+    "            s = 0.0\n",
+    "        scores.append(s)\n",
+    "        if (i + 1) % 10 == 0:\n",
+    "            acc = sum(1 for s in scores if s == 1.0) / len(scores)\n",
+    "            print(f\"  [{label}] {i+1}/{len(eval_data)} running: {acc:.1%}\")\n",
+    "\n",
+    "    accuracy = sum(1 for s in scores if s == 1.0) / len(scores)\n",
+    "    print(f\"\\n[{label}] Schema-valid: {accuracy:.1%} ({sum(1 for s in scores if s == 1.0)}/{len(scores)})\")\n",
+    "    return accuracy\n",


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use ground_truth for task accuracy, not just schema validity.

run_eval() never reads entry["ground_truth"], so the reported 62%/82% numbers only measure whether the output is valid JSON for the schema. A model can return the same well-formed placeholder object for every prompt and still score perfectly here.

📌 Suggested direction

def run_eval(client, model_id: str, label: str) -> float: - scores = [] + schema_scores = [] + exact_matches = 0 for i, entry in enumerate(eval_data): messages = [m for m in entry["messages"] if m["role"] != "assistant"] # Suppress thinking mode for structured output messages = [ {**m, "content": m["content"] + " /no-think"} if m["role"] == "system" else m for m in messages ] try: response = client.chat.completions.create( model=model_id, messages=messages, temperature=0.0, max_tokens=512, ) content = response.choices[0].message.content if "</think>" in content: content = content.split("</think>")[-1].strip() - s = score(content) + s = score(content) + if s == 1.0 and json.loads(content) == entry["ground_truth"]: + exact_matches += 1 except Exception as e: print(f" Error on prompt {i}: {e}") s = 0.0 - scores.append(s) + schema_scores.append(s) if (i + 1) % 10 == 0: - acc = sum(1 for s in scores if s == 1.0) / len(scores) - print(f" [{label}] {i+1}/{len(eval_data)} running: {acc:.1%}") + acc = exact_matches / len(schema_scores) + print(f" [{label}] {i+1}/{len(eval_data)} running exact-match: {acc:.1%}") - accuracy = sum(1 for s in scores if s == 1.0) / len(scores) - print(f"\n[{label}] Schema-valid: {accuracy:.1%} ({sum(1 for s in scores if s == 1.0)}/{len(scores)})") - return accuracy + schema_valid = sum(1 for s in schema_scores if s == 1.0) / len(schema_scores) + exact_accuracy = exact_matches / len(schema_scores) + print(f"\n[{label}] Schema-valid: {schema_valid:.1%}") + print(f"[{label}] Exact-match: {exact_accuracy:.1%} ({exact_matches}/{len(schema_scores)})") + return exact_accuracy

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpo-finetuning-qwen3/grpo_json_extraction.ipynb` around lines 257 - 288, run_eval currently only measures schema validity (via score/content) and never uses entry["ground_truth"]; update run_eval to compute task accuracy against the true labels by parsing the model output JSON and comparing it to entry["ground_truth"]. Specifically, inside run_eval (and where you call score), first validate/parse content into a dict (handle the existing </think> strip and JSON exceptions), then call a new or extended comparator (e.g., score or compare_to_ground_truth) that accepts (parsed_output, entry["ground_truth"]) and returns 1.0 for a correct match and 0.0 otherwise; append that result to scores, keep schema-valid logging if desired, and change the printed/returned accuracy to be based on matches to entry["ground_truth"] rather than just schema validity. Ensure you reference run_eval, eval_data, entry["ground_truth"], and score/compare_to_ground_truth when implementing.

coderabbitai · 2026-06-01T07:16:01Z

+    "_original_run = asyncio.run\n",
+    "def _patched_run(coro, **kwargs):\n",
+    "    loop = asyncio.get_event_loop()\n",
+    "    return loop.run_until_complete(coro)\n",
+    "asyncio.run = _patched_run\n",


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore asyncio.run in a finally block.

If main(cfg) throws, Line 601 never executes and the notebook leaves a globally patched asyncio.run behind for later cells. That makes subsequent failures hard to reason about.

#!/bin/bash python - <<'PY' import json from pathlib import Path nb = json.loads(Path("grpo-finetuning-qwen3/grpo_json_extraction.ipynb").read_text()) for cell in nb["cells"]: src = "".join(cell.get("source", [])) if "_original_run = asyncio.run" in src: print(src) break PY

Also applies to: 599-601

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpo-finetuning-qwen3/grpo_json_extraction.ipynb` around lines 548 - 552, The notebook globally replaces asyncio.run with _patched_run but never restores it if main(cfg) raises; wrap the patch and the call to main(cfg) in a try/finally: save the original in _original_run, assign asyncio.run = _patched_run, call main(cfg) inside try, and always restore asyncio.run = _original_run in the finally block (referencing the existing symbols _original_run, _patched_run, asyncio.run and the main(cfg) invocation) so later cells see the original asyncio.run regardless of exceptions.

coderabbitai · 2026-06-01T07:16:01Z

+    "def invoice_reward(completion: str, row: dict) -> float:\n",
+    "    return score(completion)\n",
+    "\n",
+    "rl_loop.reward_fn = invoice_reward\n",


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The GRPO reward never checks extraction correctness.

invoice_reward() ignores row and only rewards schema-conforming JSON. Combined with the current training rows, that means the policy can optimize for output shape alone instead of learning to extract the vendor, date, amount, and currency from the invoice text.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpo-finetuning-qwen3/grpo_json_extraction.ipynb` around lines 564 - 567, The reward function invoice_reward currently ignores the ground-truth row and only calls score(completion), so the policy can game output shape; update invoice_reward to parse the completion JSON, extract fields (vendor, date, amount, currency), compare them against the corresponding values in row (normalizing formats for date/amount/currency), compute a numeric reward that gives full credit for exact matches and partial credit for close/normalized matches and penalties for missing/invalid fields, and return that float; replace the current rl_loop.reward_fn binding with this new invoice_reward and ensure parsing errors return a low reward rather than a crash.

coderabbitai · 2026-06-01T07:16:01Z

+**Clone the Fireworks cookbook**:
+```bash
+git clone https://github.com/fw-ai/cookbook.git
+```
+
+**Install Dependencies**:
+
+Ensure you have Python 3.10 or later installed.
+
+```bash
+uv venv
+source .venv/bin/activate
+uv pip install python-dotenv jsonschema openai fireworks-ai matplotlib
+uv pip install -e "cookbook/training[training]"
+uv pip install eval-protocol nest_asyncio
+```
+
+Select the virtual environment as the kernel in the notebook.
+
+**Run the notebook**:
+
+Open and run `grpo_json_extraction.ipynb` end-to-end. The notebook covers:


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Document the required notebook working directory.

These steps never say to start Jupyter from grpo-finetuning-qwen3, but the notebook resolves ./cookbook/training, ./train_prompts.jsonl, and ./eval_prompts.jsonl with process-relative paths. Launching the notebook from the repo root will make the setup and data-loading cells fail immediately.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpo-finetuning-qwen3/README.md` around lines 19 - 40, Update the README to explicitly instruct users to start Jupyter (or run the notebook) from the grpo-finetuning-qwen3 directory so the notebook’s relative imports resolve; mention that grpo_json_extraction.ipynb expects process-relative paths to ./cookbook/training, ./train_prompts.jsonl, and ./eval_prompts.jsonl and provide the exact command to change into the directory (e.g., cd grpo-finetuning-qwen3) or an alternative note about launching Jupyter from the repo root after adjusting those paths in the notebook.

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

grpo-finetuning-qwen3/README.md (1)

19-22: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Specify the cookbook clone location relative to the working directory.

Line 21 clones the cookbook repository, but doesn't specify where to clone it. Line 32 then references cookbook/training[training], assuming the cookbook directory exists in the current working directory.

If users clone the cookbook while in the repository root but run the notebook from grpo-finetuning-qwen3/, the relative path ./cookbook/training in the notebook won't resolve correctly.
📝 Suggested fix

Either instruct users to clone into the grpo-finetuning-qwen3 directory, or adjust the install command to reference the correct relative path:

Option 1: Clone inside grpo-finetuning-qwen3
 **Clone the Fireworks cookbook**:
 ```bash
+cd grpo-finetuning-qwen3
 git clone https://github.com/fw-ai/cookbook.git
**Option 2: Adjust the pip install path**
```diff
 ```bash
 uv venv
 source .venv/bin/activate
 uv pip install python-dotenv jsonschema openai fireworks-ai matplotlib
-uv pip install -e "cookbook/training[training]"
+uv pip install -e "../cookbook/training[training]"
 uv pip install eval-protocol nest_asyncio
</details>


Also applies to: 32-32

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @grpo-finetuning-qwen3/README.md around lines 19 - 22, The README instructs
cloning the cookbook without a specific target, causing the later pip install -e
"cookbook/training[training]" to fail if the repo is cloned elsewhere; update
instructions to either (A) tell users to cd into grpo-finetuning-qwen3 before
cloning so the cookbook lands at ./cookbook (e.g., run cd grpo-finetuning-qwen3
then git clone https://github.com/fw-ai/cookbook.git), or (B) change the
editable install path in the setup commands to point to the parent folder where
users are likely to clone (replace pip install -e "cookbook/training[training]"
with pip install -e "../cookbook/training[training]"); ensure the README clearly
states which option you choose so the path used by the pip install matches where
the cookbook was cloned.
</details>

</blockquote></details>

</blockquote></details>

♻️ Duplicate comments (1)

grpo-finetuning-qwen3/README.md (1)
19-40: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clarify the working directory for setup and notebook execution.

The setup steps don't specify where to run these commands. The notebook resolves paths like ./train_prompts.jsonl, ./eval_prompts.jsonl, and ./cookbook/training relative to the process working directory, but the instructions never tell users to cd grpo-finetuning-qwen3 first.

Without this clarification, users starting from the repository root will encounter import failures and missing file errors when running the notebook.
📝 Suggested fix

Add an explicit working directory instruction before the setup steps:
 ## Setup and installations
 
+**Navigate to the project directory**:
+```bash
+cd grpo-finetuning-qwen3
+```
+
 **Get API Keys**:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grpo-finetuning-qwen3/README.md` around lines 19 - 40, The README lacks an
explicit working-directory instruction so running the setup and the notebook
(which references files like ./train_prompts.jsonl, ./eval_prompts.jsonl and the
./cookbook/training package) from the repository root will fail; update
README.md to instruct users to first change into the project folder (e.g., cd
grpo-finetuning-qwen3) before running the listed setup commands and launching
grpo_json_extraction.ipynb so the relative paths and imports resolve correctly.

🧹 Nitpick comments (1)

grpo-finetuning-qwen3/README.md (1)

66-71: 💤 Low value

Consider providing a more concrete example for the --output-id parameter.

Line 70 uses <your-model-id> as a placeholder, which requires users to invent their own identifier. Providing a concrete example (e.g., invoice-extraction-v1) would make it easier for users to understand the expected format and get started quickly.

📝 Suggested enhancement

 python agent-skill/grpo-finetune/run_pipeline.py \
     --train ./train_prompts.jsonl \
     --eval  ./eval_prompts.jsonl \
     --task  invoice-extraction \
-    --output-id <your-model-id>
+    --output-id invoice-extraction-v1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grpo-finetuning-qwen3/README.md` around lines 66 - 71, Replace the
placeholder "<your-model-id>" in the example command that runs
agent-skill/grpo-finetune/run_pipeline.py with a concrete example model id
(e.g., "invoice-extraction-v1") so users see the expected format for the
--output-id parameter; update the README example invocation to: python
agent-skill/grpo-finetune/run_pipeline.py --train ./train_prompts.jsonl --eval
./eval_prompts.jsonl --task invoice-extraction --output-id invoice-extraction-v1
and keep the placeholder mention elsewhere if you still want to show users they
can choose their own id.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@grpo-finetuning-qwen3/agent-skill/grpo-finetune/agent_demo.py`:
- Around line 97-132: The loop can divide by zero when raw_docs is empty (total
== 0); update the logic after reading raw_docs (where total and passed are set)
to check if total == 0 and return/exit early with a clear message instead of
proceeding to the loop and final percentage computation—e.g., detect total == 0
and print a user-friendly "no documents found" message (referencing raw_docs,
total, passed, and filepath) and skip the pct calculation and final summary so
pct = round(passed / total * 100) is never executed on zero.
- Around line 83-84: The current validate(result: dict) uses REQUIRED_FIELDS and
only checks for non-empty values, so replace its body to call the canonical
schema validator used by the training/eval pipeline (i.e., reuse the same schema
validation function/class used for training/eval rather than ad-hoc
REQUIRED_FIELDS checks), import that validator into this module, and ensure
validate(result) returns the validator's boolean result so numeric 0 is allowed,
types are enforced, and extra/missing fields are handled consistently with the
reward contract.

In `@grpo-finetuning-qwen3/agent-skill/grpo-finetune/generate_reward.py`:
- Around line 43-53: The SELF_TESTS loop can raise on malformed entries or when
mod.score(...) throws, so change the loop over getattr(mod, "SELF_TESTS", []) in
generate_reward.py to catch exceptions per-test: inside the for i, (completion,
row, expected) in enumerate(...) wrap the unpack and call to
mod.score(completion, row) in a try/except Exception as e, mark the test as
failed (increment failures), and print a clear failure line that includes the
exception message/type instead of letting it traceback; keep the existing
numeric tolerance check when score returns normally and ensure the existing
failures check (printing "FAIL ... self-test(s) failed" and returning 1) still
runs after the loop.

In `@grpo-finetuning-qwen3/agent-skill/grpo-finetune/run_pipeline.py`:
- Around line 55-61: The completion content handling currently only strips
"</think>" before scoring; update the logic in run_pipeline.py where the
variable content is prepared (after resp.choices[0].message.content and before
calling score_fn) to also remove Markdown fenced code blocks by trimming a
leading triple-backtick fence (optionally followed by "json") and a trailing
triple-backtick from content.strip(), then pass the cleaned string to score_fn
so fenced JSON is scored correctly; refer to the variables/content handling
around model_id, resp, content and the call to score_fn when making the change.
- Around line 181-184: The subprocess call uses a hard-coded relative path;
change it to resolve agent_demo.py relative to this module's file: import Path
from pathlib, compute script_path = Path(__file__).resolve().parent /
"agent_demo.py" (since run_pipeline.py and agent_demo.py are siblings), and pass
str(script_path) to subprocess.run instead of the literal
"agent-skill/grpo-finetune/agent_demo.py"; keep the existing argv list
(sys.executable, str(invoices_path), "--deployment", deployment).
- Around line 107-124: The try/except around fw.datasets.get(...) currently
catches all exceptions and wrongly treats any error as "dataset not found";
change the except Exception to catch only the specific fireworks.NotFoundError
(e.g., except fireworks.NotFoundError as e:) so you only fall back to
fw.datasets.create(...) and fw.datasets.upload(...) when the SDK reports a 404,
and for any other exception re-raise or log and propagate it instead of
proceeding with create/upload; update references around fw.datasets.get,
fw.datasets.create, fw.datasets.upload and ensure the NotFoundError import or
fully-qualified name is used.

In `@grpo-finetuning-qwen3/agent-skill/grpo-finetune/SKILL.md`:
- Around line 94-95: The statement "The pipeline automatically runs the agent
demo on sample invoices at the end." is inaccurate; update the doc to state the
demo is conditional: describe that run_pipeline.py calls agent_demo.py only if
the invoices file exists and otherwise prints that the demo is skipped.
Reference the filenames run_pipeline.py and agent_demo.py and mention the
invoices file existence check so readers understand the exact conditional
behavior.

---

Outside diff comments:
In `@grpo-finetuning-qwen3/README.md`:
- Around line 19-22: The README instructs cloning the cookbook without a
specific target, causing the later pip install -e "cookbook/training[training]"
to fail if the repo is cloned elsewhere; update instructions to either (A) tell
users to cd into grpo-finetuning-qwen3 before cloning so the cookbook lands at
./cookbook (e.g., run cd grpo-finetuning-qwen3 then git clone
https://github.com/fw-ai/cookbook.git), or (B) change the editable install path
in the setup commands to point to the parent folder where users are likely to
clone (replace pip install -e "cookbook/training[training]" with pip install -e
"../cookbook/training[training]"); ensure the README clearly states which option
you choose so the path used by the pip install matches where the cookbook was
cloned.

---

Duplicate comments:
In `@grpo-finetuning-qwen3/README.md`:
- Around line 19-40: The README lacks an explicit working-directory instruction
so running the setup and the notebook (which references files like
./train_prompts.jsonl, ./eval_prompts.jsonl and the ./cookbook/training package)
from the repository root will fail; update README.md to instruct users to first
change into the project folder (e.g., cd grpo-finetuning-qwen3) before running
the listed setup commands and launching grpo_json_extraction.ipynb so the
relative paths and imports resolve correctly.

---

Nitpick comments:
In `@grpo-finetuning-qwen3/README.md`:
- Around line 66-71: Replace the placeholder "<your-model-id>" in the example
command that runs agent-skill/grpo-finetune/run_pipeline.py with a concrete
example model id (e.g., "invoice-extraction-v1") so users see the expected
format for the --output-id parameter; update the README example invocation to:
python agent-skill/grpo-finetune/run_pipeline.py --train ./train_prompts.jsonl
--eval ./eval_prompts.jsonl --task invoice-extraction --output-id
invoice-extraction-v1 and keep the placeholder mention elsewhere if you still
want to show users they can choose their own id.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 86f36cd1-c636-4c99-9412-dfcc70a4cbbb

📥 Commits

Reviewing files that changed from the base of the PR and between 0d30d20 and 897d1b3.

📒 Files selected for processing (6)

grpo-finetuning-qwen3/README.md
grpo-finetuning-qwen3/agent-skill/grpo-finetune/SKILL.md
grpo-finetuning-qwen3/agent-skill/grpo-finetune/agent_demo.py
grpo-finetuning-qwen3/agent-skill/grpo-finetune/generate_reward.py
grpo-finetuning-qwen3/agent-skill/grpo-finetune/run_pipeline.py
grpo-finetuning-qwen3/invoices.txt

✅ Files skipped from review due to trivial changes (1)

grpo-finetuning-qwen3/invoices.txt

coderabbitai · 2026-06-03T05:32:39Z

+def validate(result: dict) -> bool:
+    return all(result.get(f) not in (None, "", 0) for f in REQUIRED_FIELDS)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make demo validation match the training/eval schema.

This only checks for non-empty keys, so it can mark wrong types or extra fields as Schema valid, and it rejects amount: 0 even though the notebook schema accepts any JSON number. Reuse the same schema validation here so the demo's pass/fail matches the reward contract.

Suggested fix

import argparse import json import os import re import sys import time from dotenv import load_dotenv +from jsonschema import ValidationError, validate as validate_json from openai import OpenAI REQUIRED_FIELDS = {"vendor", "date", "amount", "currency"} +SCHEMA = { + "type": "object", + "required": ["vendor", "date", "amount", "currency"], + "properties": { + "vendor": {"type": "string"}, + "date": {"type": "string"}, + "amount": {"type": "number"}, + "currency": {"type": "string"}, + }, + "additionalProperties": False, +} @@ def validate(result: dict) -> bool: - return all(result.get(f) not in (None, "", 0) for f in REQUIRED_FIELDS) + try: + validate_json(instance=result, schema=SCHEMA) + return True + except ValidationError: + return False

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpo-finetuning-qwen3/agent-skill/grpo-finetune/agent_demo.py` around lines 83 - 84, The current validate(result: dict) uses REQUIRED_FIELDS and only checks for non-empty values, so replace its body to call the canonical schema validator used by the training/eval pipeline (i.e., reuse the same schema validation function/class used for training/eval rather than ad-hoc REQUIRED_FIELDS checks), import that validator into this module, and ensure validate(result) returns the validator's boolean result so numeric 0 is allowed, types are enforced, and extra/missing fields are handled consistently with the reward contract.

coderabbitai · 2026-06-03T05:32:39Z

+    with open(filepath) as f:
+        raw_docs = [line.strip() for line in f if line.strip()]
+
+    total  = len(raw_docs)
+    passed = 0
+
+    for i, doc in enumerate(raw_docs, 1):
+        print(f"\n{GRAY}#{i} of {total}{RESET}")
+        print(f"{CYAN}{doc}{RESET}")
+        print(DIVIDER_MID)
+
+        t0 = time.time()
+        try:
+            result  = extract(client, deployment, doc)
+            elapsed = round(time.time() - t0, 2)
+            valid   = validate(result)
+            if valid:
+                passed += 1
+
+            for field in REQUIRED_FIELDS:
+                val = result.get(field, "—")
+                print(f"  {GRAY}{field:<10}{RESET}  {val}")
+
+            print()
+            if valid:
+                print(f"  {GREEN}✓  Schema valid{RESET}    {GRAY}{elapsed}s{RESET}")
+            else:
+                print(f"  {RED}✗  Schema mismatch{RESET}  {GRAY}{elapsed}s{RESET}")
+
+        except Exception as e:
+            print(f"  {RED}✗  Error: {e}{RESET}")
+
+        print(DIVIDER)
+
+    pct = round(passed / total * 100)
+    print(f"\n  {BOLD}Results{RESET}   {GREEN}{passed}/{total} valid{RESET}   Schema match: {GREEN}{pct}%{RESET}\n")


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle an empty invoices.txt before computing percentages.

If the file is empty or only whitespace, total becomes 0 and Line 131 raises ZeroDivisionError. Return early with a clear message instead of crashing after the banner.

🧰 Tools

🪛 Ruff (0.15.15)

[warning] 126-126: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpo-finetuning-qwen3/agent-skill/grpo-finetune/agent_demo.py` around lines 97 - 132, The loop can divide by zero when raw_docs is empty (total == 0); update the logic after reading raw_docs (where total and passed are set) to check if total == 0 and return/exit early with a clear message instead of proceeding to the loop and final percentage computation—e.g., detect total == 0 and print a user-friendly "no documents found" message (referencing raw_docs, total, passed, and filepath) and skip the pct calculation and final summary so pct = round(passed / total * 100) is never executed on zero.

coderabbitai · 2026-06-03T05:32:39Z

+    for i, (completion, row, expected) in enumerate(getattr(mod, "SELF_TESTS", [])):
+        got = mod.score(completion, row)
+        ok = abs(float(got) - float(expected)) < 1e-6
+        print(f"  [{'ok  ' if ok else 'FAIL'}] test {i}: got {got}  expected {expected}")
+        if not ok:
+            failures += 1
+    if failures:
+        print(f"FAIL  {failures} self-test(s) failed")
+        return 1
+    tests = len(getattr(mod, "SELF_TESTS", []))
+    print("PASS" + (f"  {tests} self-tests green" if tests else "  (add SELF_TESTS to pin the task)"))


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail self-tests cleanly instead of letting them traceback.

A malformed SELF_TESTS entry, or a score() exception on one of those cases, currently escapes this validator as an unhandled exception. That defeats the point of validating reward.py before GPU spend; this path should print FAIL and return 1 just like the probe checks above.

Suggested fix

failures = 0 - for i, (completion, row, expected) in enumerate(getattr(mod, "SELF_TESTS", [])): - got = mod.score(completion, row) + for i, test_case in enumerate(getattr(mod, "SELF_TESTS", [])): + try: + completion, row, expected = test_case + got = mod.score(completion, row) + except Exception as e: + print(f"FAIL self-test {i} crashed: {e}") + return 1 ok = abs(float(got) - float(expected)) < 1e-6 print(f" [{'ok ' if ok else 'FAIL'}] test {i}: got {got} expected {expected}") if not ok: failures += 1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpo-finetuning-qwen3/agent-skill/grpo-finetune/generate_reward.py` around lines 43 - 53, The SELF_TESTS loop can raise on malformed entries or when mod.score(...) throws, so change the loop over getattr(mod, "SELF_TESTS", []) in generate_reward.py to catch exceptions per-test: inside the for i, (completion, row, expected) in enumerate(...) wrap the unpack and call to mod.score(completion, row) in a try/except Exception as e, mark the test as failed (increment failures), and print a clear failure line that includes the exception message/type instead of letting it traceback; keep the existing numeric tolerance check when score returns normally and ensure the existing failures check (printing "FAIL ... self-test(s) failed" and returning 1) still runs after the loop.

coderabbitai · 2026-06-03T05:32:39Z

+        try:
+            resp = client.chat.completions.create(model=model_id, messages=msgs,
+                                                  temperature=0.0, max_tokens=512)
+            content = resp.choices[0].message.content
+            if "</think>" in content:
+                content = content.split("</think>")[-1].strip()
+            s = score_fn(content, row)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Strip fenced JSON in eval before scoring it.

agent_demo.py already removes both </think> and Markdown ```json fences, but run_eval() only removes ``. A completion wrapped in fences gets scored as `0.0` here even when the JSON body is otherwise valid, so the reported accuracy can be artificially low.

Suggested fix

import argparse import json import os +import re import subprocess import sys import time @@ content = resp.choices[0].message.content if "</think>" in content: content = content.split("</think>")[-1].strip() + content = re.sub(r"^```(?:json)?\s*|\s*```$", "", content.strip()) s = score_fn(content, row)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpo-finetuning-qwen3/agent-skill/grpo-finetune/run_pipeline.py` around lines 55 - 61, The completion content handling currently only strips "</think>" before scoring; update the logic in run_pipeline.py where the variable content is prepared (after resp.choices[0].message.content and before calling score_fn) to also remove Markdown fenced code blocks by trimming a leading triple-backtick fence (optionally followed by "json") and a trailing triple-backtick from content.strip(), then pass the cleaned string to score_fn so fenced JSON is scored correctly; refer to the variables/content handling around model_id, resp, content and the call to score_fn when making the change.

coderabbitai · 2026-06-03T05:32:39Z

+    try:
+        ds = fw.datasets.get(dataset_id=dataset_id)
+        if ds.state == "READY":
+            checkpoint(f"Dataset ready  ·  {row_count} prompts")
+        else:
+            print("Waiting for dataset", end="", flush=True)
+            while True:
+                ds = fw.datasets.get(dataset_id=dataset_id)
+                if ds.state == "READY":
+                    print()
+                    break
+                print(".", end="", flush=True)
+                time.sleep(3)
+            checkpoint(f"Dataset ready  ·  {row_count} prompts")
+    except Exception:
+        print(f"Uploading {row_count} prompts...", flush=True)
+        fw.datasets.create(dataset_id=dataset_id, dataset={"exampleCount": str(row_count)})
+        fw.datasets.upload(dataset_id=dataset_id, file=train_path)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

What exception does the Fireworks Python SDK raise when fw.datasets.get(dataset_id=...) is called for a nonexistent dataset, and how should callers distinguish that from auth or transport failures?

💡 Result:

When the Fireworks Python SDK encounters a nonexistent dataset during a call to fw.datasets.get(dataset_id=...), it raises a fireworks.NotFoundError exception [1][2]. To distinguish this from other failures, you should handle the hierarchical exception structure provided by the SDK [1][2]: 1. NotFoundError: This is a subclass of fireworks.APIStatusError that specifically indicates the requested resource was not found (HTTP 404) [1][2]. 2. APIStatusError: This class is raised for any non-success API response (4xx or 5xx) [1][2]. It contains status_code and response properties, which allow you to programmatically inspect the HTTP status code (e.g., 401 for authentication, 403 for permission denied) [1][2]. 3. APIConnectionError: This class is raised when the library cannot connect to the API, such as during network failures or timeouts [1][2]. All of these exceptions inherit from the base fireworks.APIError class, which can be used to catch any SDK-related error [1][2]. Example handling pattern: try: dataset = client.datasets.get(dataset_id=dataset_id) except fireworks.NotFoundError: print("The requested dataset does not exist.") except fireworks.APIStatusError as e: if e.status_code == 401: print("Authentication failure.") else: print(f"API request failed with status code {e.status_code}") except fireworks.APIConnectionError: print("Failed to connect to the Fireworks API.") except fireworks.APIError: print("An unexpected API error occurred.")

Citations:

1: https://github.com/fw-ai-external/python-sdk/blob/main/README.md

2: https://pypi.org/project/fireworks-ai/1.0.0a3/

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the referenced file around the snippet and its imports/SDK usage sed -n '1,220p' grpo-finetuning-qwen3/agent-skill/grpo-finetune/run_pipeline.py | nl -ba

Repository: patchy631/ai-engineering-hub

Length of output: 115

🌐 Web query:

Fireworks Python SDK exception class for nonexistent dataset datasets.get NotFoundError APIStatusError APIConnectionError

💡 Result:

In the Fireworks Python SDK, when a requested resource—such as a dataset—does not exist, the API returns a 404 status code, which triggers a fireworks.NotFoundError exception [1][2][3]. The Fireworks Python SDK uses a structured hierarchy for error handling, where all exceptions inherit from the base fireworks.APIError class [1][2][3]. The specific exception classes relevant to your query are defined as follows: 1. NotFoundError: Raised when the API returns a 404 status code, indicating that the requested resource (e.g., a specific dataset) could not be found [1][2][4]. 2. APIStatusError: This is the base class for all non-success status codes (4xx or 5xx responses) [1][2][3]. It contains properties for status_code and response, allowing you to inspect the details of the error [1][2][3]. 3. APIConnectionError: This exception is raised when the library is unable to connect to the API, such as in cases of network connectivity issues or timeouts (where no HTTP status code is received) [1][2][4]. Other common exception classes include: - BadRequestError (400) [1][2] - AuthenticationError (401) [1][2] - PermissionDeniedError (403) [1][2] - RateLimitError (429) [1][2] - InternalServerError (>=500) [1][2] When writing code to handle these, you can catch them specifically to implement appropriate retry or logging logic [1][2]. For example: import fireworks try: dataset = client.datasets.get(dataset_id="nonexistent-id") except fireworks.NotFoundError: print("The requested dataset was not found.") except fireworks.APIConnectionError: print("Failed to connect to the Fireworks API.") except fireworks.APIStatusError as e: print(f"API returned status {e.status_code}: {e.response}")

Citations:

1: https://github.com/fw-ai-external/python-sdk/blob/main/README.md

2: https://pypi.org/project/fireworks-ai/1.2.0a67/

3: https://pypi.org/project/fireworks-ai/1.0.0a54/

4: https://pypi.org/project/fireworks-ai/1.0.0a59/

🏁 Script executed:

#!/bin/bash set -euo pipefail # Print the relevant section with line numbers (cat -n works without nl) cat -n grpo-finetuning-qwen3/agent-skill/grpo-finetune/run_pipeline.py | sed -n '70,170p' # Check how `fw` is instantiated / what SDK is imported rg -n "import fireworks|fw\s*=|fireworks\." grpo-finetuning-qwen3/agent-skill/grpo-finetune/run_pipeline.py || true

Repository: patchy631/ai-engineering-hub

Length of output: 4957

Limit the dataset-upload fallback to fireworks.NotFoundError (404)

In grpo-finetuning-qwen3/agent-skill/grpo-finetune/run_pipeline.py (lines 107-124), the except Exception: around fw.datasets.get(dataset_id=...) will also swallow auth/permission failures and transport/transient API errors—then it incorrectly proceeds to fw.datasets.create(...) + fw.datasets.upload(...) as if the dataset were missing. The SDK raises fireworks.NotFoundError for a nonexistent dataset; narrow the fallback to that exception only and re-raise/log everything else.

🧰 Tools

🪛 Ruff (0.15.15)

[warning] 121-121: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpo-finetuning-qwen3/agent-skill/grpo-finetune/run_pipeline.py` around lines 107 - 124, The try/except around fw.datasets.get(...) currently catches all exceptions and wrongly treats any error as "dataset not found"; change the except Exception to catch only the specific fireworks.NotFoundError (e.g., except fireworks.NotFoundError as e:) so you only fall back to fw.datasets.create(...) and fw.datasets.upload(...) when the SDK reports a 404, and for any other exception re-raise or log and propagate it instead of proceeding with create/upload; update references around fw.datasets.get, fw.datasets.create, fw.datasets.upload and ensure the NotFoundError import or fully-qualified name is used.

coderabbitai · 2026-06-03T05:32:39Z

+        subprocess.run([sys.executable,
+                        "agent-skill/grpo-finetune/agent_demo.py",
+                        str(invoices_path),
+                        "--deployment", deployment])


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Resolve agent_demo.py from __file__, not the caller's working directory.

The subprocess target is hard-coded as agent-skill/grpo-finetune/agent_demo.py, which only works when the current working directory is grpo-finetuning-qwen3. Running this script from the repo root or from its own directory breaks the demo step.

Suggested fix

if invoices_path.exists(): banner("STEP 4 · sample inference") print("Running the fine-tuned model on sample invoices...\n", flush=True) time.sleep(1) + demo_script = Path(__file__).resolve().parent / "agent_demo.py" subprocess.run([sys.executable, - "agent-skill/grpo-finetune/agent_demo.py", + str(demo_script), str(invoices_path), "--deployment", deployment])

🧰 Tools

🪛 Ruff (0.15.15)

[error] 181-181: subprocess call: check for execution of untrusted input

(S603)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpo-finetuning-qwen3/agent-skill/grpo-finetune/run_pipeline.py` around lines 181 - 184, The subprocess call uses a hard-coded relative path; change it to resolve agent_demo.py relative to this module's file: import Path from pathlib, compute script_path = Path(__file__).resolve().parent / "agent_demo.py" (since run_pipeline.py and agent_demo.py are siblings), and pass str(script_path) to subprocess.run instead of the literal "agent-skill/grpo-finetune/agent_demo.py"; keep the existing argv list (sys.executable, str(invoices_path), "--deployment", deployment).

coderabbitai · 2026-06-03T05:32:39Z

+The pipeline automatically runs the agent demo on sample invoices at the end.
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document the demo step as conditional.

run_pipeline.py only invokes agent_demo.py when the invoices file exists; otherwise it prints that the demo is being skipped. This line currently overpromises the behavior the user will see.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grpo-finetuning-qwen3/agent-skill/grpo-finetune/SKILL.md` around lines 94 - 95, The statement "The pipeline automatically runs the agent demo on sample invoices at the end." is inaccurate; update the doc to state the demo is conditional: describe that run_pipeline.py calls agent_demo.py only if the invoices file exists and otherwise prints that the demo is skipped. Reference the filenames run_pipeline.py and agent_demo.py and mention the invoices file existence check so readers understand the exact conditional behavior.

feat: add GRPO fine-tuning notebook for JSON invoice extraction using…

0d30d20

… Fireworks Training API

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

add agent skill for GRPO fine-tune pipeline

897d1b3

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

		def validate(result: dict) -> bool:
		return all(result.get(f) not in (None, "", 0) for f in REQUIRED_FIELDS)

		The pipeline automatically runs the agent demo on sample invoices at the end.

Conversation

namanvirk18 commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

namanvirk18 commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading