CNF-23534: Add automated OpenAPI schema generation for PolicyGenerator list merging#787
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds OpenAPI schema generation tooling, config and build wiring, generated schema artifacts, documentation updates, and a scheduled GitHub Actions workflow that regenerates schemas and opens an automated pull request. ChangesOpenAPI Schema Generation
Sequence Diagram(s)sequenceDiagram
participant GitHubActions
participant Makefile
participant GenerateConfig
participant ExtractSchema
participant CreatePullRequest
GitHubActions->>Makefile: run make generate-openapi-schemas
Makefile->>GenerateConfig: regenerate hack/crd-schema-config.json
Makefile->>ExtractSchema: extract telco-ran schema.openapi
Makefile->>ExtractSchema: extract telco-core schema.openapi
GitHubActions->>CreatePullRequest: open update-openapi-schemas PR
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@irinamihai: This pull request references CNF-23534 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/hold for further testing |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
.github/workflows/openapi-schema-check.yml (1)
25-25: ⚡ Quick winPin PyYAML to a specific version for reproducibility.
Installing PyYAML without a version constraint means the workflow will fetch whatever version is latest at runtime. If PyYAML releases breaking changes, the workflow could fail unpredictably.
📌 Proposed fix to pin PyYAML version
- pip install PyYAML + pip install PyYAML==6.0.1Update the version number as needed to match the version tested with your scripts.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/openapi-schema-check.yml at line 25, Replace the unpinned installer command "pip install PyYAML" in the GitHub Actions job with a pinned version (e.g., "pip install PyYAML==<version>") to ensure reproducible runs; pick and lock the exact PyYAML version you tested against and update that version string in the workflow step that currently runs pip install PyYAML.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/openapi-schema-check.yml:
- Line 19: Replace semantic version tags for GitHub Actions with pinned commit
SHAs to satisfy supply-chain requirements: update usages of actions/checkout@v4
and peter-evans/create-pull-request@v7 to their corresponding full commit SHAs
(e.g., actions/checkout@<sha> and peter-evans/create-pull-request@<sha>), fetch
the latest stable commit SHAs from each action's repository first, and commit
the workflow changes so the workflow references the fixed SHAs instead of the
version tags.
- Around line 23-24: The workflow currently downloads yq_linux_amd64 and makes
it executable without integrity checks; update the steps around the curl and
chmod so you first fetch or define an expected checksum (e.g.,
YQ_CHECKSUM/YQ_VERSION), verify the downloaded file's SHA256 (or other selected
hash) against that expected value using a verification tool (sha256sum or shasum
-a 256) and only call chmod +x /usr/local/bin/yq and move it into the PATH if
the checksum matches; ensure the job fails if verification fails and document
updating YQ_VERSION and YQ_CHECKSUM to the intended release.
In `@hack/extract-schema.py`:
- Around line 236-240: The branch that handles "x-kubernetes-patch-strategy" can
KeyError on schema["x-kubernetes-patch-merge-key"]; update the logic in the
block that builds result (the code using the local variable schema and creating
result) to read the merge key via schema.get("x-kubernetes-patch-merge-key") and
only include the "x-kubernetes-patch-merge-key" entry in the result when that
get() returns a non-None value (keep the "x-kubernetes-patch-strategy" entry as
before and preserve the default for "type").
In `@hack/generate-schema-config.py`:
- Around line 66-70: The aggregation logic that populates results[pkg_name] with
"channel" and "components" silently ignores when the same spec.name (pkg_name)
appears with a different channel; update the block that checks pkg_name and
channel (where results is built and
results[pkg_name]["components"].add(component) is called) to detect if
results[pkg_name]["channel"] already exists and differs from the current
channel, and on that conflict raise an error (or exit non‑zero) with a clear
message referencing the spec.name/pkg_name and both channels; do not merge
components across differing channels.
---
Nitpick comments:
In @.github/workflows/openapi-schema-check.yml:
- Line 25: Replace the unpinned installer command "pip install PyYAML" in the
GitHub Actions job with a pinned version (e.g., "pip install PyYAML==<version>")
to ensure reproducible runs; pick and lock the exact PyYAML version you tested
against and update that version string in the workflow step that currently runs
pip install PyYAML.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 00891ba2-3175-4320-88cf-d34e51e42032
📒 Files selected for processing (11)
.github/workflows/openapi-schema-check.ymlMakefileVERSION_UPDATE_GUIDE.mdhack/crd-schema-config.jsonhack/extract-schema.pyhack/generate-schema-config.pytelco-core/configuration/schema.openapitelco-ran/configuration/argocd/example/acmpolicygenerator/README.mdtelco-ran/configuration/argocd/example/acmpolicygenerator/hub-side-templating/schema.openapitelco-ran/configuration/argocd/example/acmpolicygenerator/hub-side-templating/schema.openapitelco-ran/configuration/argocd/example/acmpolicygenerator/schema.openapi
| if pkg_name and channel: | ||
| if pkg_name not in results: | ||
| results[pkg_name] = {"channel": channel, "components": set()} | ||
| results[pkg_name]["components"].add(component) | ||
|
|
There was a problem hiding this comment.
Detect and fail on cross-component channel conflicts per package.
If the same spec.name is found with different channels, current logic silently keeps the first channel and merges components, which can generate wrong refs and schemas.
Proposed fix
if pkg_name and channel:
if pkg_name not in results:
results[pkg_name] = {"channel": channel, "components": set()}
+ elif results[pkg_name]["channel"] != channel:
+ raise RuntimeError(
+ f"Conflicting channels for package '{pkg_name}': "
+ f"'{results[pkg_name]['channel']}' vs '{channel}'"
+ )
results[pkg_name]["components"].add(component)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@hack/generate-schema-config.py` around lines 66 - 70, The aggregation logic
that populates results[pkg_name] with "channel" and "components" silently
ignores when the same spec.name (pkg_name) appears with a different channel;
update the block that checks pkg_name and channel (where results is built and
results[pkg_name]["components"].add(component) is called) to detect if
results[pkg_name]["channel"] already exists and differs from the current
channel, and on that conflict raise an error (or exit non‑zero) with a clear
message referencing the spec.name/pkg_name and both channels; do not merge
components across differing channels.
|
/unhold |
| contents: write | ||
| pull-requests: write |
There was a problem hiding this comment.
Are there finer grained permissions that can be granted. These are pretty broad.
There was a problem hiding this comment.
These are already the minimum required permissions for peter-evans/create-pull-request - contents: write to push the commit and pull-requests: write to create the PR. This matches the permissions used by doc-updater.yml in this repo. There are no finer-grained alternatives for this use case where we want to automatically create PRs.
What restrictions/rules did you have in mind?
There was a problem hiding this comment.
Will PolicyGenerator follow the redirect path?
Also nit: newline
There was a problem hiding this comment.
PolicyGenerator rejects symlinks that resolve outside the directory tree. The file is kept as a regular copy of the parent schema.openapi. The following work to restructure the directories to resemble telco-core plus the removal of the non templated PolicyGenerator examples will consolidate this so there's a single schema.openapi co-located with the PolicyGenerator CRs.
For now, I'm going to leave the full content in this file and update the Makefile to copy the generated schema in here.
| "Static fields (source, merge_keys, version, package_name, ref_rule)", | ||
| "are manually maintained.", | ||
| "Dynamic fields (subscription_channel, components, source.github.ref)", | ||
| "are updated by: python3 hack/generate-schema-config.py", |
There was a problem hiding this comment.
Would it make sense to separate the static and dynamic/derived data into separate files? Perhaps yaml file listing the static (human generated) data.
Do we need persistence of the dynamic data, or is it for archival/informational purposes (ie a log of what was found)?
There was a problem hiding this comment.
The dynamic fields (subscription_channel, components, source.github.ref) are needed by extract-schema.py to know which GitHub branch to fetch CRDs from - they're not just informational. generate-schema-config.py updates them in-place while preserving the static fields (source.github.{owner,repo,path},merge_keys, version, ref_rule).
Splitting into two files would add complexity to the merge/read logic, since the round-trip update preserves static fields and git diff shows exactly what changed. The _comment block at the top documents which fields are static vs dynamic.
We can revisit if the config grows significantly, but for 6 CRD entries the single file keeps things simple.
WDYT?
…onfiguration
Description:
Add automated OpenAPI schema generation for PolicyGenerator list merging
Two-step pipeline to dynamically generate schema.openapi files from
upstream CRDs, replacing the previously hand-crafted schemas:
1. generate-schema-config.py scans Subscription CRs to update
crd-schema-config.json with current channels and git refs
2. extract-schema.py downloads CRDs and extracts minimal merge
directive schemas for ACM PolicyGenerator
Adds nightly GitHub Actions workflow to auto-PR when schemas drift.
Description: - check openapi schema workflow - security hardening - warning for channel clash across core vs ran - README updates
… restructure is complete
b549584 to
1a5246f
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
hack/generate-schema-config.py (1)
66-77: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winAbort on channel conflicts instead of warning and continuing.
This still collapses both components into one
results[pkg_name]entry after the warning.main()then stamps onesubscription_channeland derivedrefonto every matching config entry, so one component can silently regenerate schemas from the wrong branch. Also, the suggested “split the entry” workaround cannot work with the currentsubscriptions[pkg_name]lookup model.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hack/generate-schema-config.py` around lines 66 - 77, The channel conflict handling in generate-schema-config.py should abort instead of printing a warning and continuing, because the current results[pkg_name] merge still drives one subscription_channel and ref for all components. Update the logic around the results[pkg_name] / components accumulation so that a mismatched channel in main() or the package-processing loop raises an error and exits immediately, and make sure the subscriptions[pkg_name] lookup path does not silently reuse a single entry for conflicting components.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hack/extract-schema.py`:
- Around line 293-303: The version selection in the schema extraction flow is
too naive and can pick the wrong API version. Update the logic around the
preferred_version handling and the loop over versions so it follows the
documented CRD order: prefer the requested version, otherwise choose the storage
version, and then the first served version that has a schema; use the existing
versions list and schema lookup in extract-schema.py to filter out deprecated or
served=false entries before emitting directives.
- Around line 190-194: The map-list patch metadata in the schema extractor is
synthesizing an invalid comma-delimited `x-kubernetes-patch-merge-key`, which
only supports a single field name. Update the `extract-schema.py` logic around
the `is_map_list`/`map_keys` handling to either reject multi-key list-maps
explicitly or require an explicit `merge_keys` override, and only set
`x-kubernetes-patch-merge-key` when there is exactly one valid key.
In `@hack/generate-schema-config.py`:
- Around line 123-133: The CRD config update loop leaves stale dynamic fields
behind when a package_name no longer matches a Subscription. Update the config
processing in generate-schema-config.py so the per-entry fields populated from
sub are also removed or reset when sub is None, specifically clearing
subscription_channel and components before writing back the entry. Keep the
existing derive_ref and entry["source"]["github"]["ref"] update behavior intact,
but ensure removed operators no longer retain old component filters in the
generated JSON.
---
Duplicate comments:
In `@hack/generate-schema-config.py`:
- Around line 66-77: The channel conflict handling in generate-schema-config.py
should abort instead of printing a warning and continuing, because the current
results[pkg_name] merge still drives one subscription_channel and ref for all
components. Update the logic around the results[pkg_name] / components
accumulation so that a mismatched channel in main() or the package-processing
loop raises an error and exits immediately, and make sure the
subscriptions[pkg_name] lookup path does not silently reuse a single entry for
conflicting components.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 1c6d0c68-6b76-4385-988d-f00d45c6799e
📒 Files selected for processing (10)
.github/workflows/openapi-schema-check.ymlMakefileVERSION_UPDATE_GUIDE.mdhack/crd-schema-config.jsonhack/extract-schema.pyhack/generate-schema-config.pytelco-core/configuration/schema.openapitelco-ran/configuration/argocd/example/acmpolicygenerator/README.mdtelco-ran/configuration/argocd/example/acmpolicygenerator/hub-side-templating/schema.openapitelco-ran/configuration/argocd/example/acmpolicygenerator/schema.openapi
✅ Files skipped from review due to trivial changes (2)
- hack/crd-schema-config.json
- telco-ran/configuration/argocd/example/acmpolicygenerator/README.md
🚧 Files skipped from review as they are similar to previous changes (4)
- telco-core/configuration/schema.openapi
- .github/workflows/openapi-schema-check.yml
- telco-ran/configuration/argocd/example/acmpolicygenerator/hub-side-templating/schema.openapi
- telco-ran/configuration/argocd/example/acmpolicygenerator/schema.openapi
| """ | ||
| ref_rule = entry.get("ref_rule") | ||
| if not ref_rule: | ||
| return entry["source"]["github"].get("ref") |
There was a problem hiding this comment.
Add handling for missing keys (or use get() with default {}).
| with open(config_path) as f: | ||
| config = json.load(f) |
There was a problem hiding this comment.
Add FileNotFoundError handling
| req.add_header("Authorization", f"token {token}") | ||
|
|
||
| try: | ||
| resp = urllib.request.urlopen(req) |
There was a problem hiding this comment.
Claude suggests adding timeout:
| resp = urllib.request.urlopen(req) | |
| resp = urllib.request.urlopen(req, timeout=30) |
Description: - Reject multi-key list-maps in x-kubernetes-patch-merge-key (single key only) - Prefer storage version when selecting CRD API version - Clear stale subscription_channel and components when operator is removed - Use chained .get() for missing keys in derive_ref - Add FileNotFoundError handling for config file - Add timeout=30 to urlopen
6bfc7b1 to
e1e3b43
Compare
| "merge_keys": { | ||
| "spec.hugepages.pages": "size" |
There was a problem hiding this comment.
I don't think this is the correct merge key for hugepages. You can spec hugepages per numa node so there may be multiple entries for 1G. For this field I don't think we can do an intelligent merge, the user needs to provide the full set of content.
There was a problem hiding this comment.
Yes, good point. I'll leave this empty for now, to make it obvious that the PerformanceProfile CRD was considered, but no relevant merge keys were identified.
Description: - Per-NUMA hugepages entries can share the same size, so size is not a valid merge key. Users must provide the full pages list. - No relevant merge keys for PerformanceProfile as of yet, keeping it in the crd-schema-config.json for awareness that the CRD was considered in the current openAPI schema generation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: imiller0, irinamihai The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add automated OpenAPI schema generation for PolicyGenerator list merging
Two-step pipeline to dynamically generate schema.openapi files from upstream CRDs, replacing the previously hand-crafted schemas:
Adds nightly GitHub Actions workflow to auto-PR when schemas drift.