Skip to content

Commit 4afe6e2

Browse files
alexfurmenkovRamilCDISCgerrycampion
authored
798: Action to check engine against published rules (#1700)
* WIP: action to check engine against published rules * added workflow options to test it * debug step * set rules_2 as default branch for open-rules * set rules_2 as default branch for open-rules * report adjustments * indentation fix * indentation fix(2) * indentation fix(3) -- heredoc in tmp file * moved validation logic to python script * removed trigger on feature branch push event * fix action * fixed naming in report * temp allow to run on branch * try to fix failure * still trying * got/actual update * more actual got fix * comment change * got->actual * core ids arg to limit number of rules run * cross * fix the cross * change csv conversion to an engine output format * let's run the entire suite * add unit tests for csv reports * fix regression test * remove execution column and put exec fails in actual * run all again * remove todo's --------- Co-authored-by: RamilCDISC <113539111+RamilCDISC@users.noreply.github.com> Co-authored-by: gerrycampion <85252124+gerrycampion@users.noreply.github.com> Co-authored-by: Gerry Campion <gcampion@cdisc.org>
1 parent 2cb62ee commit 4afe6e2

14 files changed

Lines changed: 782 additions & 17 deletions

File tree

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# ==============================================================================
2+
# This workflow:
3+
# 1. Checks out cdisc-rules-engine (the engine itself)
4+
# 2. Checks out cdisc-open-rules (rules + test data) into ./open-rules/
5+
# 3. Installs engine Python dependencies
6+
# 4. Iterates every Published/ rule from cdisc-open-rules
7+
# 5. Runs the engine against each test case
8+
# 6. Compares actual output with expected results.csv baseline
9+
# 7. Publishes a Markdown report to Job Summary and as an artifact
10+
# ==============================================================================
11+
name: Validate Published Rules
12+
13+
on:
14+
push:
15+
branches:
16+
- main
17+
workflow_dispatch:
18+
inputs:
19+
rules_ref:
20+
description: "Branch/tag/SHA of cdisc-open-rules to validate against"
21+
required: false
22+
default: "main"
23+
core_ids:
24+
description: "Space-separated list of rule IDs to validate (e.g. CORE-000001 CORE-000002). Leave blank to validate all."
25+
required: false
26+
default: ""
27+
28+
jobs:
29+
validate-published-rules:
30+
runs-on: ubuntu-latest
31+
permissions:
32+
contents: read
33+
34+
steps:
35+
# -----------------------------------------------------------------------
36+
# 1. Checkout cdisc-rules-engine
37+
# -----------------------------------------------------------------------
38+
- name: Checkout cdisc-rules-engine
39+
uses: actions/checkout@v6
40+
with:
41+
repository: cdisc-org/cdisc-rules-engine
42+
path: engine
43+
token: ${{ secrets.GITHUB_TOKEN }}
44+
45+
# -----------------------------------------------------------------------
46+
# 2. Checkout cdisc-open-rules (rules + test data + helper scripts)
47+
# -----------------------------------------------------------------------
48+
- name: Checkout cdisc-open-rules
49+
uses: actions/checkout@v6
50+
with:
51+
repository: cdisc-org/cdisc-open-rules
52+
ref: ${{ inputs.rules_ref}}
53+
path: open-rules
54+
55+
# -----------------------------------------------------------------------
56+
# 2b. Debug — verify directory layout
57+
# -----------------------------------------------------------------------
58+
- name: Debug — list workspace layout
59+
run: |
60+
echo "=== Workspace root ==="
61+
ls -la
62+
echo "=== open-rules/ ==="
63+
ls -la open-rules/ || echo "open-rules/ NOT FOUND"
64+
echo "=== open-rules/Published/ (first 10) ==="
65+
ls open-rules/Published/ 2>/dev/null | head -10 || echo "Published/ NOT FOUND"
66+
echo "=== engine/ ==="
67+
ls engine/ | head -10 || echo "engine/ NOT FOUND"
68+
69+
# -----------------------------------------------------------------------
70+
# 3. Set up Python
71+
# -----------------------------------------------------------------------
72+
- name: Set up Python 3.12
73+
uses: actions/setup-python@v6
74+
with:
75+
python-version: "3.12"
76+
77+
# -----------------------------------------------------------------------
78+
# 4. Install engine dependencies
79+
# -----------------------------------------------------------------------
80+
- name: Install engine dependencies
81+
run: |
82+
python -m venv venv
83+
./venv/bin/pip install --upgrade pip
84+
./venv/bin/pip install -r engine/requirements.txt
85+
86+
# -----------------------------------------------------------------------
87+
# 5. Run validation for every Published rule
88+
# -----------------------------------------------------------------------
89+
- name: Run validation for all Published rules
90+
id: validate
91+
continue-on-error: true
92+
run: |
93+
chmod +x open-rules/.github/scripts/run_validation.sh
94+
95+
CORE_IDS_ARG=""
96+
if [ -n "${{ inputs.core_ids }}" ]; then
97+
CORE_IDS_ARG="--core-ids ${{ inputs.core_ids }}"
98+
fi
99+
100+
./venv/bin/python engine/scripts/validate_published_rules.py \
101+
--rules-root "$(pwd)/open-rules" \
102+
--engine-dir "$(pwd)/engine" \
103+
--python-cmd "$(pwd)/venv/bin/python" \
104+
--output-dir "$(pwd)" \
105+
$CORE_IDS_ARG
106+
107+
# -----------------------------------------------------------------------
108+
# 6. Upload both reports + raw results as artifacts
109+
# -----------------------------------------------------------------------
110+
- name: Upload validation artifacts
111+
if: always()
112+
uses: actions/upload-artifact@v6
113+
with:
114+
name: published-rules-validation-${{ github.run_id }}
115+
path: |
116+
open-rules/Published/**/results/results.csv
117+
summary_table.md
118+
detail_report.md
119+
if-no-files-found: warn
120+
121+
# -----------------------------------------------------------------------
122+
# 7. Write ONLY the summary table to GitHub Actions Job Summary
123+
# -----------------------------------------------------------------------
124+
- name: Write summary table to workflow summary
125+
if: always()
126+
run: |
127+
[ -f summary_table.md ] && cat summary_table.md >> $GITHUB_STEP_SUMMARY || true
128+
129+
# -----------------------------------------------------------------------
130+
# 8. Fail the job if any rule failed
131+
# -----------------------------------------------------------------------
132+
- name: Check overall status
133+
if: steps.validate.outcome == 'failure'
134+
run: |
135+
echo "One or more published rules failed validation — see the artifacts for detail_report.md."
136+
exit 1

cdisc_rules_engine/constants/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
NULL_FLAVORS = ["", None, {}, {None}, [], [None], np.nan]
77

8-
KNOWN_REPORT_EXTENSIONS = [".json", ".xlsx", ".xls"]
8+
KNOWN_REPORT_EXTENSIONS = [".json", ".xlsx", ".xls", ".csv"]
99

1010
VALIDATION_FORMATS_MESSAGE = (
1111
"SAS V5 XPT, Dataset-JSON (JSON or NDJSON), or Excel (XLSX)"

cdisc_rules_engine/enums/report_types.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,4 @@
44
class ReportTypes(BaseEnum):
55
XLSX = "XLSX"
66
JSON = "JSON"
7+
CSV = "CSV"

cdisc_rules_engine/services/reporting/base_report_data.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from abc import ABC
1+
from abc import ABC, abstractmethod
22
from io import IOBase
33
from typing import Iterable
44

@@ -53,3 +53,11 @@ def process_values(values: list[str]) -> list[str]:
5353
else:
5454
processed_values.append(value)
5555
return processed_values
56+
57+
@abstractmethod
58+
def get_csv_rows(self) -> tuple[list[str], list[list[str]]]:
59+
"""
60+
Return (header, rows) for the CSV output format.
61+
Each row is a list of string values matching the header columns.
62+
"""
63+
pass
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
import csv
2+
import os
3+
from io import IOBase
4+
from typing import override
5+
6+
from cdisc_rules_engine.enums.report_types import ReportTypes
7+
from cdisc_rules_engine.models.validation_args import Validation_args
8+
from cdisc_rules_engine.services.reporting.base_report_data import BaseReportData
9+
10+
from .base_report import BaseReport
11+
12+
13+
class CsvReport(BaseReport):
14+
"""
15+
Writes a results.csv file in the format defined by the report standard,
16+
compatible with the cdisc-open-rules test harness baselines.
17+
"""
18+
19+
def __init__(
20+
self,
21+
report_standard: BaseReportData,
22+
args: Validation_args,
23+
template: IOBase | None = None,
24+
):
25+
super().__init__(report_standard, args, template)
26+
27+
@property
28+
@override
29+
def _file_ext(self) -> str:
30+
return ReportTypes.CSV.value.lower()
31+
32+
@override
33+
def write_report(self) -> None:
34+
output_dir = os.path.dirname(self._output_name)
35+
if output_dir:
36+
os.makedirs(output_dir, exist_ok=True)
37+
38+
header, rows = self._report_standard.get_csv_rows()
39+
40+
with open(self._output_name, "w", newline="", encoding="utf-8") as fh:
41+
writer = csv.writer(fh)
42+
writer.writerow(header)
43+
writer.writerows(rows)

cdisc_rules_engine/services/reporting/report_factory.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
from .base_report import BaseReport
1414
from .excel_report import ExcelReport
1515
from .json_report import JsonReport
16+
from .csv_report import CsvReport
1617

1718

1819
class ReportFactory:
@@ -46,6 +47,7 @@ def __init__(
4647
self._output_type_service_map: dict[str, Type[BaseReport]] = {
4748
ReportTypes.XLSX.value: ExcelReport,
4849
ReportTypes.JSON.value: JsonReport,
50+
ReportTypes.CSV.value: CsvReport,
4951
}
5052
self._standard_type_map: dict[str, Type[BaseReportData]] = {
5153
"usdm": USDMReportData,

cdisc_rules_engine/services/reporting/sdtm_report_data.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -347,6 +347,18 @@ def _generate_error_details(
347347
)
348348
return errors
349349

350+
def get_csv_rows(self) -> tuple[list[str], list[list[str]]]:
351+
header = ["Dataset", "Record", "Variable", "Value"]
352+
rows = []
353+
for issue in self.data_sheets.get("Issue Details", []):
354+
dataset = (issue.get("dataset") or "").removesuffix(".csv")
355+
record = str(issue.get("row", ""))
356+
variables = issue.get("variables") or []
357+
values = issue.get("values") or []
358+
for variable, value in zip(variables, values):
359+
rows.append([dataset, record, variable, str(value)])
360+
return header, rows
361+
350362
def get_rules_report_data(self) -> list[dict]:
351363
"""
352364
Generates the rules report data that goes into the excel export.

cdisc_rules_engine/services/reporting/usdm_report_data.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,17 @@ def _generate_error_details(
245245
)
246246
return errors
247247

248+
def get_csv_rows(self) -> tuple[list[str], list[list[str]]]:
249+
header = ["path", "attribute", "value"]
250+
rows = []
251+
for issue in self.data_sheets.get("Issue Details", []):
252+
path = issue.get("path") or ""
253+
attributes = issue.get("attributes") or []
254+
values = issue.get("values") or []
255+
for attribute, value in zip(attributes, values):
256+
rows.append([path, attribute, str(value)])
257+
return header, rows
258+
248259
def get_rules_report_data(self) -> list[dict]:
249260
"""
250261
Generates the rules report data that goes into the excel export.

docs/cli-reference.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -69,14 +69,14 @@ python core.py validate --help
6969

7070
### Output
7171

72-
| Flag | Description |
73-
| -------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
74-
| `-o, --output TEXT` | Output file path (without extension). Extension is added automatically based on format. |
75-
| `-of, --output-format [JSON\|XLSX]` | Output format. |
76-
| `-rr, --raw-report` | Raw output format (JSON only). |
77-
| `-mr, --max-report-rows INTEGER` | Max rows in the Issue Details tab of Excel output (default: 1000; 0 = unlimited). Also via `MAX_REPORT_ROWS` env var. |
78-
| `-me, --max-errors-per-rule INTEGER BOOLEAN` | Limit errors per rule. Format: `-me <limit> <per_dataset_flag>`. See below. |
79-
| `-rt, --report-template TEXT` | Path to a custom Excel report template. |
72+
| Flag | Description |
73+
| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
74+
| `-o, --output TEXT` | Output file path (without extension). Extension is added automatically based on format. |
75+
| `-of, --output-format [JSON\|XLSX\|CSV]` | Output format. `CSV` writes issue rows directly (Dataset, Record, Variable, Value) compatible with the open-rules test harness. |
76+
| `-rr, --raw-report` | Raw output format (JSON only). |
77+
| `-mr, --max-report-rows INTEGER` | Max rows in the Issue Details tab of Excel output (default: 1000; 0 = unlimited). Also via `MAX_REPORT_ROWS` env var. |
78+
| `-me, --max-errors-per-rule INTEGER BOOLEAN` | Limit errors per rule. Format: `-me <limit> <per_dataset_flag>`. See below. |
79+
| `-rt, --report-template TEXT` | Path to a custom Excel report template. |
8080

8181
#### `--max-errors-per-rule` Detail
8282

0 commit comments

Comments
 (0)