Store error taxonomy data in the analytics database instead of OpenSearch#698
Store error taxonomy data in the analytics database instead of OpenSearch#698mvandenburgh wants to merge 6 commits into
Conversation
Since we'd also like to use this Django app for error taxonomy upload (and potentially, any other analytics-related things other than build timings in the future), I refactored this docker image by - Renaming it to have a more generic name (`ci-analytics`) - Removing the invocation of the build timing upload script from the Docker file, and instead specifying it in the job template.
I kept the old one there for now instead of deleting it. Once we confirm that the new analytics DB-based workflow works, I'll remove it.
Both of their job-template.yaml files got updated, so we need to bump these as well.
19288de to
faf8b3d
Compare
| for container in job_template["spec"]["template"]["spec"]["containers"]: | ||
| container.setdefault("env", []).extend( | ||
| [dict(name="JOB_INPUT_DATA", value=json.dumps(job_input_data))] | ||
| for template in ("job-template.yaml", "job-template-old.yaml"): |
There was a problem hiding this comment.
I think we should just switch over to the new job template. If it fails we'll see it in either sentry or just by monitoring the cluster. The worst case scenario is that we lose the error taxonomy for a few jobs, which isn't really a problem. And in a situation where we need to immediately remedy the situation (not sure what that would be), we can always revert this PR.
There was a problem hiding this comment.
I don't have a grafana dashboard ready for the new one yet, since I wanted to wait until we have some data in the database to make one. If we remove the opensearch job now, we'll lose access to the error taxonomy dashboard until I get the new one working.
Failed jobs don't have a `Job` record saved.
|
is this PR obsolete now that #749 has been merged? |
That PR consolidates the app that handles creating the opensearch record into the Django app, but the data is still going to opensearch. The goal of this PR is to store the error taxonomy classification itself into the analytics database, removing the need to push that data to opensearch at all. |
|
Superseded by #761 |
@jjnesbitt I tried to describe my approach here in the commit messages. Let me know if you have any suggestions regarding the design I took to add in the error processor; I'm not 100% satisfied with it, but this was the best solution I managed to come up with given that we need everything to be packaged in docker images.
I also tried to structure this with an eye towards unifying the two "job processor" fastapi servers in the future.