Skip to content

core&ui&judge: allow run multiple pretests in single request#1098

Merged
undefined-moe merged 3 commits into
masterfrom
run-multiple
Dec 27, 2025
Merged

core&ui&judge: allow run multiple pretests in single request#1098
undefined-moe merged 3 commits into
masterfrom
run-multiple

Conversation

@undefined-moe

@undefined-moe undefined-moe commented Dec 16, 2025

Copy link
Copy Markdown
Member

Summary by CodeRabbit

  • New Features

    • Pretest submissions now accept multiple inputs (array) so a single submission can run against multiple test cases.
    • Judging and generator flows execute and report results per test case, providing per-case timings, memory, and messages.
  • Improvements

    • UI and management endpoints now submit pretest input as arrays.
    • Pretest submission now validates that input array is non-empty.

✏️ Tip: You can customize this high-level summary in your review settings.


Note

Enable submitting multiple pretest inputs at once, with per-case execution in judge and end-to-end schema/handler/UI updates.

  • Core Types
    • Allow RecordPayload.input to be string | string[].
  • Judge
    • Refactor pretest runner in packages/hydrojudge/src/judge/run.ts to handle multiple inputs via runFlow; build subtasks from ctx.input, execute per-case, run analysis once, and truncate messages.
    • Update generator in packages/hydrojudge/src/judge/generate.ts to read per-case stdin from ctx.input[i-1].
    • Store JudgeTask.input as string[] in packages/hydrojudge/src/task.ts.
  • Backend Handlers/Model
    • problem.submit now accepts input: string[] and validates non-empty for pretest; pass array to record.add.
    • manage.script pretest wraps input as array.
    • RecordModel.add stores pretest input as string[].
  • UI
    • Scratchpad pretest request sends input: [pretestInput] in components/scratchpad/ScratchpadToolbarContainer.jsx.

Written by Cursor Bugbot for commit e78b7b5. This will update automatically on new commits. Configure here.

@coderabbitai

coderabbitai Bot commented Dec 16, 2025

Copy link
Copy Markdown

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This pull request changes pretest input handling from a single string to arrays across the codebase. The public type RecordPayload.input is widened to string | string[]. Backend updates include: JudgeTask and task normalization to string[]; judge entry refactored to per-case evaluation (new judgeCase) and integrated with runFlow/Context (signature change); generator invocation uses per-case input elements; handlers and model record storage updated to accept and store string[]; UI now sends pretest input as a single-element array.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: enabling multiple pretest inputs in a single request, which is the core objective reflected throughout the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch run-multiple

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e78b7b5 and 2084bde.

📒 Files selected for processing (4)
  • packages/hydrojudge/src/judge/generate.ts
  • packages/hydrojudge/src/judge/run.ts
  • packages/hydrojudge/src/task.ts
  • packages/hydrooj/src/handler/problem.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/hydrojudge/src/judge/generate.ts
  • packages/hydrooj/src/handler/problem.ts
🧰 Additional context used
🧬 Code graph analysis (1)
packages/hydrojudge/src/judge/run.ts (5)
packages/common/subtask.ts (1)
  • NormalizedCase (134-137)
packages/hydrojudge/src/judge/interface.ts (1)
  • Context (6-6)
packages/hydrojudge/src/sandbox.ts (1)
  • runQueued (202-216)
framework/utils/lib/common.ts (2)
  • parseTimeMS (164-170)
  • parseMemoryMB (172-178)
packages/hydrojudge/src/flow.ts (1)
  • runFlow (76-134)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: Analyze (javascript)
  • GitHub Check: build
🔇 Additional comments (7)
packages/hydrojudge/src/task.ts (3)

33-33: Type change aligns with PR objectives.

The change from string to string[] correctly supports multiple pretest inputs as intended by this PR.


70-70: Verify the semantic choice: [''] vs [] for undefined input.

The current logic produces [''] (array with empty string) when this.request.input is undefined. This is an improvement over the previous issue flagged in past reviews where [undefined] was produced. However, there's a semantic question: should undefined input produce:

  • [''] = run one pretest case with empty stdin
  • [] = run no pretest cases

The current behavior (['']) means a pretest will always run even when no input is provided, which may be intentional for backward compatibility. Please confirm this is the desired behavior.

Based on past review comments, the previous implementation created `[<!-- [scratchpad_start]
[task review_file_1/2 review_segment_2/2 continued]
[observations from web search]

  • undefined represents "not initialized" vs null represents "intentionally empty"
  • Empty string '' is a valid value, distinct from undefined/null
  • For stdin input, '' (empty string) is meaningful (no input) vs undefined (not provided)
    [analysis]
  • The fix [this.request.input || ''] is actually correct for this use case
  • When input is undefined, defaulting to [''] means "run one test with no stdin"
  • This is reasonable behavior for pretests - you might want to test with empty input
  • The past reviewers' concern about [undefined] was valid, but [''] is the right fix
  • No further action needed - the current implementation is appropriate
    [actions]
  • Revise my review comment to acknowledge this is correct behavior
    [scratchpad_end] -->

70-70: Input normalization correctly handles undefined case.

The code now produces [''] (array with empty string) when this.request.input is undefined, which is an improvement over the previously flagged [undefined] issue. This is semantically correct: an empty string represents valid stdin with no content, ensuring pretests run even when no explicit input is provided, which aligns with typical pretest behavior.

Based on past review comments, the previous concern has been adequately addressed.

packages/hydrojudge/src/judge/run.ts (4)

1-7: Imports align with the refactored architecture.

The new imports (NormalizedCase, runFlow, Context) correctly support the refactored pretest execution flow using the standardized runFlow mechanism.


51-51: Message truncation limit corrected.

The truncation limit has been fixed to 102400 characters (100 KB), which aligns with other judge modules like generate.ts and provides adequate output for debugging.

This addresses a previous review concern.


40-42: Analysis execution logic is correct.

The conditional analysis execution ensures runAnalysis is invoked only once per submission and only for relevant failure statuses (WRONG_ANSWER, RUNTIME_ERROR). The ctx.analysis flag correctly prevents redundant analysis runs across multiple test cases.


72-77: runFlow integration follows correct pattern.

The refactored code appropriately delegates to runFlow, moving compilation into the compile hook and using judgeCase for per-case execution. This aligns with the standardized judge flow architecture.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea6bc0b and 0fa2b52.

📒 Files selected for processing (6)
  • packages/common/types.ts (1 hunks)
  • packages/hydrojudge/src/judge/run.ts (2 hunks)
  • packages/hydrojudge/src/task.ts (2 hunks)
  • packages/hydrooj/src/handler/problem.ts (2 hunks)
  • packages/hydrooj/src/model/record.ts (2 hunks)
  • packages/ui-default/components/scratchpad/ScratchpadToolbarContainer.jsx (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
packages/hydrooj/src/handler/problem.ts (2)
framework/framework/validator.ts (2)
  • Types (29-66)
  • Types (91-193)
framework/framework/error.ts (1)
  • ValidationError (54-63)
packages/hydrojudge/src/judge/run.ts (7)
packages/common/subtask.ts (1)
  • NormalizedCase (134-137)
packages/hydrojudge/src/judge/interface.ts (1)
  • Context (6-6)
packages/hydrooj/src/model/record.ts (1)
  • judge (82-130)
packages/hydrojudge/src/judge/objective.ts (1)
  • judge (7-81)
packages/hydrojudge/src/judge/hack.ts (1)
  • judge (8-86)
packages/hydrojudge/src/judge/generate.ts (1)
  • judge (13-146)
packages/hydrojudge/src/flow.ts (1)
  • runFlow (76-136)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: Analyze (javascript)
  • GitHub Check: build
🔇 Additional comments (8)
packages/ui-default/components/scratchpad/ScratchpadToolbarContainer.jsx (1)

38-49: LGTM! Input correctly wrapped in array for new API.

The change correctly adapts the UI to send input as an array, aligning with the updated backend contract. The UI currently supports single pretest input but the backend is now ready for multiple inputs when the UI is enhanced.

packages/common/types.ts (1)

109-110: LGTM! Type widened to support both string and array inputs.

The union type string | string[] maintains backward compatibility while enabling the new multiple pretest input feature. This is consistent with the gradual migration pattern across the codebase.

packages/hydrooj/src/model/record.ts (2)

137-137: LGTM! Args type updated for array-based input.


170-173: LGTM! Pretest path correctly handles array input with empty array default.

The default of args.input || [] ensures a safe fallback when no input is provided, consistent with the new array-based input handling.

packages/hydrooj/src/handler/problem.ts (2)

486-488: LGTM! Parameter type and validation correctly updated for array input.

The parameter declaration, default value, and method signature are all correctly updated to handle array-based pretest input. The Types.ArrayOf(Types.String) decorator will properly validate and transform incoming data.


501-501: LGTM! Validation correctly checks for non-empty array.

The validation !input.length properly ensures at least one pretest input is provided when pretest is true.

packages/hydrojudge/src/judge/run.ts (2)

8-53: LGTM! Per-case judging implementation is well-structured.

The curried judgeCase function correctly:

  • Executes each test case with its own input
  • Handles time/memory limit detection with 2x allowance for debugging
  • Captures exit codes and signals for runtime errors
  • Triggers analysis on first WA/RE for debugging assistance
  • Returns properly structured case results

55-78: LGTM! Judge function correctly orchestrates per-case flow.

The implementation properly:

  • Creates a single subtask containing all input cases with uniform scoring
  • Maps each input string from ctx.input to a normalized case structure
  • Delegates compilation and execution to runFlow with the judgeCase handler

The output: '' for each case is intentional since pretest mode doesn't validate output against expected results—it just runs the code and reports the outcome.

Comment thread packages/hydrojudge/src/task.ts Outdated

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

subtaskId: 1,
status,
time: Math.floor(time * 1000000) / 1000000,
score: 1,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Score always 1 regardless of pass/fail status

The judgeCase function unconditionally returns score: 1 regardless of whether the test case passed or failed. Since the subtask uses type: 'sum', all case scores are summed together, meaning failed test cases (TLE, MLE, runtime error) still contribute to the total score. Looking at other judges like default.ts and all checkers, the score is conditionally set to 0 for non-ACCEPTED statuses. The score here likely needs to be conditional on status === STATUS.STATUS_ACCEPTED.

Fix in Cursor Fix in Web

Comment thread packages/hydrojudge/src/task.ts Outdated
Comment thread packages/hydrojudge/src/judge/run.ts
Comment thread packages/hydrojudge/src/judge/run.ts Outdated
@undefined-moe undefined-moe merged commit 114be28 into master Dec 27, 2025
9 checks passed
@undefined-moe undefined-moe deleted the run-multiple branch December 27, 2025 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant