Skip to content

Store error taxonomy data in the analytics database instead of OpenSearch#698

Closed
mvandenburgh wants to merge 6 commits into
mainfrom
analytics-db-error-taxonomy
Closed

Store error taxonomy data in the analytics database instead of OpenSearch#698
mvandenburgh wants to merge 6 commits into
mainfrom
analytics-db-error-taxonomy

Conversation

@mvandenburgh
Copy link
Copy Markdown
Member

@mvandenburgh mvandenburgh commented Nov 10, 2023

@jjnesbitt I tried to describe my approach here in the commit messages. Let me know if you have any suggestions regarding the design I took to add in the error processor; I'm not 100% satisfied with it, but this was the best solution I managed to come up with given that we need everything to be packaged in docker images.

I also tried to structure this with an eye towards unifying the two "job processor" fastapi servers in the future.

Since we'd also like to use this Django app for error taxonomy upload
(and potentially, any other analytics-related things other than build
timings in the future), I refactored this docker image by
- Renaming it to have a more generic name (`ci-analytics`)
- Removing the invocation of the build timing upload script from the
Docker file, and instead specifying it in the job template.
I kept the old one there for now instead of deleting it. Once we
confirm that the new analytics DB-based workflow works, I'll remove it.
Both of their job-template.yaml files got updated, so we need to bump
these as well.
@mvandenburgh mvandenburgh force-pushed the analytics-db-error-taxonomy branch from 19288de to faf8b3d Compare November 10, 2023 16:04
jjnesbitt
jjnesbitt previously approved these changes Nov 13, 2023
Copy link
Copy Markdown
Collaborator

@jjnesbitt jjnesbitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just a couple questions/comments.

I agree that we should unify our two webhooks. I'd like to do that sooner rather than later, maybe even before #697 is merged.

for container in job_template["spec"]["template"]["spec"]["containers"]:
container.setdefault("env", []).extend(
[dict(name="JOB_INPUT_DATA", value=json.dumps(job_input_data))]
for template in ("job-template.yaml", "job-template-old.yaml"):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just switch over to the new job template. If it fails we'll see it in either sentry or just by monitoring the cluster. The worst case scenario is that we lose the error taxonomy for a few jobs, which isn't really a problem. And in a situation where we need to immediately remedy the situation (not sure what that would be), we can always revert this PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a grafana dashboard ready for the new one yet, since I wanted to wait until we have some data in the database to make one. If we remove the opensearch job now, we'll lose access to the error taxonomy dashboard until I get the new one working.

Comment thread analytics/analytics/tests/test_upload_error_taxonomy.py
@jjnesbitt jjnesbitt dismissed their stale review November 13, 2023 17:19

Didn't mean to hit "Approve"

Failed jobs don't have a `Job` record saved.
@zackgalbreath
Copy link
Copy Markdown
Collaborator

is this PR obsolete now that #749 has been merged?

@jjnesbitt
Copy link
Copy Markdown
Collaborator

is this PR obsolete now that #749 has been merged?

That PR consolidates the app that handles creating the opensearch record into the Django app, but the data is still going to opensearch. The goal of this PR is to store the error taxonomy classification itself into the analytics database, removing the need to push that data to opensearch at all.

@mvandenburgh
Copy link
Copy Markdown
Member Author

Superseded by #761

@mvandenburgh mvandenburgh deleted the analytics-db-error-taxonomy branch October 9, 2024 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants