Commit d00faa7
committed
sdg_pipeline: multi-model profiling stage replaces single-model difficulty_estimation
Introduce a new `profiling` stage that runs per-model generate->judge->aggregate
chains in parallel for N models and merges results into a single `profiling`
array per problem:
"profiling": [
{"model": "ModelA", "pass_rate": 0.5, "pass_at_n": "2/4"},
{"model": "ModelB", "pass_rate": 0.8, "pass_at_n": "4/5"},
]
Changes:
- `run_pipeline.profiling()` orchestrates: shared prepare -> per-model chains
(generate, judge, aggregate) in parallel -> final merge. Judge kwargs are
copied per-iteration so args don't leak across models; `num_random_seeds`
is inherited from generation if not explicitly set.
- `aggregate_profiling_model.py` (new): per-model aggregator over per-seed
`output-rs*.jsonl` files. Streams inputs — keeps only the BASE_FIELDS
projection + a small counters dict per `(id, problem)` key — so aggregation
fits in memory at 1M+ problem scale. Falls back to `(_lineno, line_number)`
when neither `id` nor `problem` is present on a record.
- `merge_profiling.py` (new): merges per-model result files. Asserts every
per-model file contains the same `(id, problem)` key set so row-alignment
mismatches fail loudly instead of silently dropping problems. After a
successful merge, removes the per-model `result.jsonl` intermediates
(folders — generation/, judgement/, logs/ — are retained for debugging).
- `filter_solutions.py`: replaces the scalar `difficulty_model_pass_rate`
bounds with a per-model dict `profiling_pass_rate_ranges:
{model_name: [min, max]}` (min exclusive, max inclusive).
- `validate_pipeline.py` and `scripts/utils/constants.py`: update stage-name
and field-set checks (`PROFILING_FIELDS`, required `profiling` key,
row-count equality for the new stage).
- Base + settings YAMLs: renamed stage (`difficulty_estimation` -> `profiling`)
and directory (`step-3-difficulty-estimation` -> `step-3-profiling`); new
`profiling.models: [...]` list with per-model `generation_kwargs` and an
optional per-model `judge_kwargs` override.
- SLURM test: references `stages.profiling.models.0.generation_kwargs...`
instead of the old top-level path.
- README: updates the stage list + filter-parameter description.
- `aggregate_difficulty.py` is removed (replaced by the new aggregator +
merger).
Signed-off-by: Tatevik Ter-Hovhannisyan <tterhovhanni@nvidia.com>1 parent 410b337 commit d00faa7
14 files changed
Lines changed: 355 additions & 181 deletions
File tree
- recipes/opensciencereasoning/sdg_pipeline
- configs
- pipelines
- settings
- scripts
- utils
- tests/slurm-tests/stem_sdg_pipeline
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| |||
Lines changed: 23 additions & 21 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
| 36 | + | |
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| |||
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
137 | | - | |
138 | | - | |
| 137 | + | |
| 138 | + | |
139 | 139 | | |
140 | 140 | | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
| 141 | + | |
154 | 142 | | |
155 | 143 | | |
156 | 144 | | |
157 | 145 | | |
158 | 146 | | |
159 | 147 | | |
160 | | - | |
161 | 148 | | |
162 | 149 | | |
163 | 150 | | |
164 | 151 | | |
165 | 152 | | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
166 | 168 | | |
167 | 169 | | |
168 | 170 | | |
| |||
232 | 234 | | |
233 | 235 | | |
234 | 236 | | |
235 | | - | |
| 237 | + | |
236 | 238 | | |
237 | 239 | | |
238 | | - | |
| 240 | + | |
239 | 241 | | |
240 | 242 | | |
241 | 243 | | |
242 | 244 | | |
243 | 245 | | |
244 | 246 | | |
245 | 247 | | |
246 | | - | |
| 248 | + | |
247 | 249 | | |
248 | 250 | | |
249 | 251 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
| 20 | + | |
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| |||
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
Lines changed: 4 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
Lines changed: 90 additions & 49 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
410 | 410 | | |
411 | 411 | | |
412 | 412 | | |
413 | | - | |
414 | | - | |
| 413 | + | |
| 414 | + | |
415 | 415 | | |
416 | | - | |
417 | | - | |
418 | | - | |
419 | | - | |
420 | | - | |
421 | | - | |
422 | | - | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
423 | 425 | | |
424 | 426 | | |
425 | 427 | | |
426 | 428 | | |
427 | 429 | | |
428 | 430 | | |
| 431 | + | |
| 432 | + | |
429 | 433 | | |
430 | | - | |
431 | | - | |
432 | | - | |
433 | | - | |
434 | | - | |
435 | | - | |
436 | | - | |
437 | | - | |
438 | | - | |
| 434 | + | |
| 435 | + | |
439 | 436 | | |
440 | 437 | | |
441 | 438 | | |
| |||
446 | 443 | | |
447 | 444 | | |
448 | 445 | | |
449 | | - | |
| 446 | + | |
450 | 447 | | |
451 | 448 | | |
452 | 449 | | |
453 | | - | |
454 | | - | |
455 | | - | |
456 | | - | |
457 | | - | |
458 | | - | |
459 | | - | |
460 | | - | |
461 | | - | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
462 | 453 | | |
463 | | - | |
464 | | - | |
465 | | - | |
466 | | - | |
467 | | - | |
468 | | - | |
469 | | - | |
470 | | - | |
471 | | - | |
472 | | - | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
473 | 514 | | |
| 515 | + | |
474 | 516 | | |
475 | 517 | | |
476 | | - | |
477 | | - | |
| 518 | + | |
| 519 | + | |
478 | 520 | | |
479 | | - | |
480 | 521 | | |
481 | 522 | | |
482 | 523 | | |
483 | | - | |
| 524 | + | |
484 | 525 | | |
485 | 526 | | |
486 | 527 | | |
| |||
519 | 560 | | |
520 | 561 | | |
521 | 562 | | |
522 | | - | |
| 563 | + | |
523 | 564 | | |
524 | 565 | | |
525 | 566 | | |
| |||
528 | 569 | | |
529 | 570 | | |
530 | 571 | | |
531 | | - | |
| 572 | + | |
532 | 573 | | |
533 | 574 | | |
534 | 575 | | |
| |||
537 | 578 | | |
538 | 579 | | |
539 | 580 | | |
540 | | - | |
541 | | - | |
542 | | - | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
543 | 584 | | |
544 | 585 | | |
545 | 586 | | |
| |||
558 | 599 | | |
559 | 600 | | |
560 | 601 | | |
561 | | - | |
| 602 | + | |
562 | 603 | | |
563 | 604 | | |
564 | 605 | | |
| |||
752 | 793 | | |
753 | 794 | | |
754 | 795 | | |
755 | | - | |
| 796 | + | |
756 | 797 | | |
757 | 798 | | |
758 | 799 | | |
| |||
0 commit comments