Add scanned_manifests_path metadata to snapshots#14406
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new scanned_manifest_paths metadata field to dependency graph snapshots. This field is always populated (unlike the manifests property which can be empty on errors), allowing the downstream snapshot service to verify which manifests were scanned regardless of the snapshot's outcome (success, failure, degraded, or skipped).
Changes:
- Added a
scanned_manifest_pathsprivate method toDependencySubmissionthat returns an array containing the ecosystem and directory path of the scanned manifest. - Included the new
scanned_manifest_pathsfield in the snapshot'smetadatahash within thepayloadmethod. - Added test assertions in both
dependency_submission_spec.rbandupdate_graph_processor_spec.rbto verify the new metadata field across all snapshot status scenarios (success, failed, skipped, degraded, subdirectory).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
updater/lib/github_api/dependency_submission.rb |
Adds scanned_manifest_paths method and includes it in the payload metadata |
updater/spec/github_api/dependency_submission_spec.rb |
Adds test assertion for the new metadata field in the "generates submission metadata correctly" test |
updater/spec/dependabot/update_graph_processor_spec.rb |
Adds test assertions for the new metadata field across 4 snapshot status scenarios |
02f9c81 to
a799ef5
Compare
|
We are being strict about the validation of the metadata as just string keys, so I'm going to work within those constraints for now. This does lock us in a little one |
3ecef0a to
7af435d
Compare
| # Returns a synopsis of the scan performed in the format `ecosystem::manifest_path`, e.g. | ||
| # - `golang::/` | ||
| # - `rubygems::rails_app/` | ||
| # | ||
| sig do | ||
| returns(String) | ||
| end | ||
| def scanned_manifest_path | ||
| "#{GithubApi::EcosystemMapper.ecosystem_for(package_manager)}::#{File.dirname(manifest_file.path)}" | ||
| end |
There was a problem hiding this comment.
💭 We could use the job_correlator for this but it has some handling to ensure we deterministically hash long paths that I don't want to undo as it would change how historical snapshots are compared using these keys.
Rather than overload the responsibility of this value, which is firmly in the snapshot domain, with the requirement to track the job 'receipt' of a snapshot, let's keep these isolated.
| # TODO: Move use of metadata to a Dependabot-specific object | ||
| # | ||
| # We are using the existing job metadata as a bag-of-values for error handling | ||
| # and job tracking that is specific to Dependabot-created submissions. | ||
| # | ||
| # In future, we should extend the public API schema with a validated object to | ||
| # harden this contract. |
There was a problem hiding this comment.
We are tracking this internally, I'm just adding a note here to make sure we keep line of sight that using metadata as a bag of values is something we will need to migrate off of in future.
The values we store here are all of short-term relevant, e.g. they only matter to the actual POST request that submits the snapshot or for state reporting when this snapshot is on the tip of a branch so we don't need to worry about backfilling or handling historical values.
7af435d to
0c3042b
Compare
What are you trying to accomplish?
We have started to track the snapshots received vs those requested from Dependabot in the service using the
manifest[].ecosystemandmanifest[].file.source_locationbut this only allows us to track 'happy path' outcomes.In case where a manifest has been deleted or there was a fatal error processing the project, we will have an empty
manifestscollection in the snapshot.To solve this problem, let's start adding an explicit
scanned_manifest_pathinventory to the metadata that will always be populated regardless of the snapshot outcome.Anything you want to highlight for special attention from reviewers?
This is using the current pattern of attaching these values to the snapshots top-level
metadatakey, in future we will extend the JSON definition with a Dependabot-specific object to validate and track the values a little more concretely, but for now this is good enough.How will you know you've accomplished your goal?
I will see the new
scanned_manifest_pathappear in any snapshots I generate from my test repos.I will also see the smoke tests for graph jobs fail on this PR until I generate an updated test expectation.
Checklist