-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add golden artifact generation to nightly backend test suite #17663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
de9c876
dc2c80d
57e93cf
4eaeabc
4226112
cfbbf25
7038965
ff2dbda
af61150
364c7f1
9756b37
64f5ae8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -85,7 +85,26 @@ else | |||||
| fi | ||||||
| CMAKE_ARGS="$EXTRA_BUILD_ARGS" ${CONDA_RUN_CMD} $SETUP_SCRIPT --build-tool cmake --build-mode Release --editable true | ||||||
|
|
||||||
| GOLDEN_DIR="${ARTIFACT_DIR}/golden-artifacts" | ||||||
| export GOLDEN_ARTIFACTS_DIR="${GOLDEN_DIR}" | ||||||
|
|
||||||
| EXIT_CODE=0 | ||||||
| ${CONDA_RUN_CMD} pytest -c /dev/nul -n auto backends/test/suite/$SUITE/ -m flow_$FLOW --json-report --json-report-file="$REPORT_FILE" || EXIT_CODE=$? | ||||||
|
||||||
| ${CONDA_RUN_CMD} pytest -c /dev/nul -n auto backends/test/suite/$SUITE/ -m flow_$FLOW --json-report --json-report-file="$REPORT_FILE" || EXIT_CODE=$? | |
| ${CONDA_RUN_CMD} pytest -c /dev/null -n auto backends/test/suite/$SUITE/ -m flow_$FLOW --json-report --json-report-file="$REPORT_FILE" || EXIT_CODE=$? |
Copilot
AI
Feb 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per-model zips are written to ${GOLDEN_DIR}/${model_name}_golden.zip (outside the per-flow directory). In the workflow matrix, multiple flows can produce the same model_name, which will silently overwrite zips from earlier flows. Include $FLOW in the zip filename or keep the per-model zips under ${GOLDEN_DIR}/${FLOW}/ to avoid collisions.
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -59,6 +59,51 @@ jobs: | |||||||||||||||||||||
|
|
||||||||||||||||||||||
| source .ci/scripts/test_backend.sh "${{ matrix.suite }}" "${{ matrix.flow }}" "${RUNNER_ARTIFACT_DIR}" | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| package-golden-artifacts: | ||||||||||||||||||||||
| if: ${{ inputs.run-linux }} | ||||||||||||||||||||||
| needs: test-backend-linux | ||||||||||||||||||||||
| runs-on: linux.2xlarge | ||||||||||||||||||||||
| steps: | ||||||||||||||||||||||
| - name: Download model test artifacts | ||||||||||||||||||||||
| uses: actions/download-artifact@v4 | ||||||||||||||||||||||
| with: | ||||||||||||||||||||||
| pattern: test-report-*-models | ||||||||||||||||||||||
|
||||||||||||||||||||||
| pattern: test-report-*-models | |
| pattern: test-report-* |
Copilot
AI
Feb 25, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description mentions "These artifacts are packaged into per-model zips and a combined golden_artifacts_yymmddhh.zip", but the implementation only creates a combined zip file (line 92). There are no per-model zips being created. Either update the PR description to match the implementation, or add the per-model zip creation step if it was intended.
Copilot
AI
Feb 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The packaging step flattens all .pte/.bin files from downloaded/ into a single golden_combined/ directory via cp. Since artifacts are produced per-flow, identical filenames across flows (same model/test name) will overwrite each other and the combined zip will silently drop files. Preserve directory structure (e.g. copy with --parents or zip from the original tree) or prefix filenames with flow/suite to keep them unique.
| -exec cp {} golden_combined/ \; | |
| if ls golden_combined/*.pte 1>/dev/null 2>&1; then | |
| (cd golden_combined && zip -r "../golden_artifacts_${TIMESTAMP}.zip" .) | |
| echo "Created golden_artifacts_${TIMESTAMP}.zip with $(ls golden_combined/*.pte | wc -l) models." | |
| -exec cp --parents {} golden_combined/ \; | |
| if find golden_combined -name '*.pte' -print -quit | grep -q .; then | |
| (cd golden_combined && zip -r "../golden_artifacts_${TIMESTAMP}.zip" .) | |
| echo "Created golden_artifacts_${TIMESTAMP}.zip with $(find golden_combined -name '*.pte' | wc -l) models." |
Copilot
AI
Feb 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition checks for the existence of golden_artifacts_*.zip files to determine whether to upload to S3, but this check happens in the step itself (line 98). If for some reason the file doesn't exist at that point, the step will be skipped silently. However, the step name suggests it should "Upload golden artifacts to S3" unconditionally if the package-golden-artifacts job succeeded. Consider whether the conditional should be on the job level (line 63) rather than the step level, or if the conditional logic needs adjustment to match the intended behavior.
| if: ${{ hashFiles('golden_artifacts_*.zip') != '' }} |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -3,6 +3,8 @@ | |||||||||||||||||||||||||||||
| # This source code is licensed under the BSD-style license found in the | ||||||||||||||||||||||||||||||
| # LICENSE file in the root directory of this source tree. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| import logging | ||||||||||||||||||||||||||||||
| import os | ||||||||||||||||||||||||||||||
| import random | ||||||||||||||||||||||||||||||
| from collections import Counter, OrderedDict | ||||||||||||||||||||||||||||||
| from typing import Any, Callable, Dict, List, Optional, Tuple | ||||||||||||||||||||||||||||||
|
|
@@ -317,11 +319,14 @@ def run_method_and_compare_outputs( | |||||||||||||||||||||||||||||
| rtol=1e-03, | ||||||||||||||||||||||||||||||
| qtol=0, | ||||||||||||||||||||||||||||||
| statistics_callback: Callable[[ErrorStatistics], None] | None = None, | ||||||||||||||||||||||||||||||
| artifact_dir: Optional[str] = None, | ||||||||||||||||||||||||||||||
| artifact_name: Optional[str] = None, | ||||||||||||||||||||||||||||||
| ): | ||||||||||||||||||||||||||||||
| number_of_runs = 1 if inputs is not None else num_runs | ||||||||||||||||||||||||||||||
| reference_stage = self.stages[StageType.EXPORT] | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| stage = stage or self.cur | ||||||||||||||||||||||||||||||
| artifacts_saved = False | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| for _ in range(number_of_runs): | ||||||||||||||||||||||||||||||
| inputs_to_run = inputs if inputs else next(self.generate_random_inputs()) | ||||||||||||||||||||||||||||||
|
|
@@ -346,8 +351,45 @@ def run_method_and_compare_outputs( | |||||||||||||||||||||||||||||
| statistics_callback, | ||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| if artifact_dir and artifact_name and not artifacts_saved: | ||||||||||||||||||||||||||||||
| self._dump_golden_artifacts( | ||||||||||||||||||||||||||||||
| artifact_dir, artifact_name, inputs_to_run, reference_output | ||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| self._dump_golden_artifacts( | |
| artifact_dir, artifact_name, inputs_to_run, reference_output | |
| ) | |
| try: | |
| self._dump_golden_artifacts( | |
| artifact_dir, artifact_name, inputs_to_run, reference_output | |
| ) | |
| except Exception as e: | |
| logger = logging.getLogger(__name__) | |
| logger.warning( | |
| "Failed to dump golden artifacts for '%s': %s", | |
| artifact_name, | |
| e, | |
| ) |
Copilot
AI
Feb 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The artifact directory creation should be done earlier to catch errors during the actual test run rather than silently failing later. Currently, if os.makedirs fails, the exception is caught and logged as a warning, but the test continues. Since this is called after successful output comparison, there's a risk that test results could be marked as successful even though artifact generation failed. Consider whether artifact generation failures should be treated as test failures, or at minimum, ensure that the directory creation happens before the comparison so that filesystem issues are caught early.
Copilot
AI
Feb 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop only saves inputs that are torch.Tensor instances, silently skipping any non-tensor inputs. This could lead to incomplete golden artifact sets if models accept mixed tensor and non-tensor inputs (e.g., integers, floats, booleans). While this might be intentional for simplicity, it should be documented or a warning should be logged when non-tensor inputs are skipped, so that users are aware that the golden artifacts may not fully represent the test case.
Copilot
AI
Feb 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function does not handle the case where reference_output is already a tuple. According to the existing _compare_outputs method (lines 474-477), the code handles torch.Tensor and OrderedDict, but if reference_output is already a tuple (which is a valid case), it will not be normalized. This could lead to issues if the tuple contains non-tensor elements or needs further processing. Consider adding a check for tuple type or ensuring all possible output types are handled consistently.
| reference_output = tuple(reference_output.values()) | |
| reference_output = tuple(reference_output.values()) | |
| elif isinstance(reference_output, (list, tuple)): | |
| reference_output = tuple(reference_output) |
Copilot
AI
Feb 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the input handling, the loop only saves outputs that are torch.Tensor instances. If reference_output contains non-tensor elements after being converted to a tuple, those elements will be silently skipped. This could result in incomplete output files. Consider logging a warning when non-tensor outputs are encountered and skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GOLDEN_ARTIFACTS_DIRis exported unconditionally, so the operators suite will also generate golden inputs/outputs and.ptefiles even though the packaging job only collects*-modelsartifacts. This will increase artifact size and I/O for operators runs; consider only setting this env var (or only zipping) whenSUITE=models(or when a separate opt-in flag is set).