Store error taxonomy data in the analytics database instead of OpenSearch by mvandenburgh · Pull Request #698 · spack/spack-infrastructure

mvandenburgh · 2023-11-10T15:57:03Z

@jjnesbitt I tried to describe my approach here in the commit messages. Let me know if you have any suggestions regarding the design I took to add in the error processor; I'm not 100% satisfied with it, but this was the best solution I managed to come up with given that we need everything to be packaged in docker images.

I also tried to structure this with an eye towards unifying the two "job processor" fastapi servers in the future.

Since we'd also like to use this Django app for error taxonomy upload (and potentially, any other analytics-related things other than build timings in the future), I refactored this docker image by - Renaming it to have a more generic name (`ci-analytics`) - Removing the invocation of the build timing upload script from the Docker file, and instead specifying it in the job template.

I kept the old one there for now instead of deleting it. Once we confirm that the new analytics DB-based workflow works, I'll remove it.

Both of their job-template.yaml files got updated, so we need to bump these as well.

jjnesbitt

Overall looks good, just a couple questions/comments.

I agree that we should unify our two webhooks. I'd like to do that sooner rather than later, maybe even before #697 is merged.

jjnesbitt · 2023-11-13T15:47:33Z

-    for container in job_template["spec"]["template"]["spec"]["containers"]:
-        container.setdefault("env", []).extend(
-            [dict(name="JOB_INPUT_DATA", value=json.dumps(job_input_data))]
+    for template in ("job-template.yaml", "job-template-old.yaml"):


I think we should just switch over to the new job template. If it fails we'll see it in either sentry or just by monitoring the cluster. The worst case scenario is that we lose the error taxonomy for a few jobs, which isn't really a problem. And in a situation where we need to immediately remedy the situation (not sure what that would be), we can always revert this PR.

I don't have a grafana dashboard ready for the new one yet, since I wanted to wait until we have some data in the database to make one. If we remove the opensearch job now, we'll lose access to the error taxonomy dashboard until I get the new one working.

Didn't mean to hit "Approve"

Failed jobs don't have a `Job` record saved.

zackgalbreath · 2024-02-05T15:03:08Z

is this PR obsolete now that #749 has been merged?

jjnesbitt · 2024-02-14T15:34:48Z

is this PR obsolete now that #749 has been merged?

That PR consolidates the app that handles creating the opensearch record into the Django app, but the data is still going to opensearch. The goal of this PR is to store the error taxonomy classification itself into the analytics database, removing the need to push that data to opensearch at all.

mvandenburgh · 2024-10-09T18:11:58Z

Superseded by #761

mvandenburgh added 2 commits November 10, 2023 10:50

Add error taxonomy script to analytics app

269d5aa

mvandenburgh requested a review from jjnesbitt November 10, 2023 15:57

mvandenburgh added 2 commits November 10, 2023 11:04

Add new job template for analytics db error processor

42844d3

I kept the old one there for now instead of deleting it. Once we confirm that the new analytics DB-based workflow works, I'll remove it.

Bump job processor images

faf8b3d

Both of their job-template.yaml files got updated, so we need to bump these as well.

mvandenburgh force-pushed the analytics-db-error-taxonomy branch from 19288de to faf8b3d Compare November 10, 2023 16:04

jjnesbitt previously approved these changes Nov 13, 2023

View reviewed changes

Run analytics tests in github actions

156f91e

mvandenburgh requested a review from jjnesbitt November 14, 2023 16:13

Make job id an int instead of foreign key

e42e1f6

Failed jobs don't have a `Job` record saved.

mvandenburgh marked this pull request as draft December 5, 2023 19:41

mvandenburgh mentioned this pull request Jan 5, 2024

Make job_id an integer field instead of one-to-one #720

Merged

mvandenburgh closed this Oct 9, 2024

mvandenburgh deleted the analytics-db-error-taxonomy branch October 9, 2024 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store error taxonomy data in the analytics database instead of OpenSearch#698

Store error taxonomy data in the analytics database instead of OpenSearch#698
mvandenburgh wants to merge 6 commits into
mainfrom
analytics-db-error-taxonomy

mvandenburgh commented Nov 10, 2023 •

edited by jjnesbitt

Loading

Uh oh!

jjnesbitt left a comment

Uh oh!

jjnesbitt Nov 13, 2023

Uh oh!

mvandenburgh Nov 14, 2023

Uh oh!

Uh oh!

zackgalbreath commented Feb 5, 2024

Uh oh!

jjnesbitt commented Feb 14, 2024

Uh oh!

mvandenburgh commented Oct 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mvandenburgh commented Nov 10, 2023 • edited by jjnesbitt Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjnesbitt left a comment

Choose a reason for hiding this comment

Uh oh!

jjnesbitt Nov 13, 2023

Choose a reason for hiding this comment

Uh oh!

mvandenburgh Nov 14, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zackgalbreath commented Feb 5, 2024

Uh oh!

jjnesbitt commented Feb 14, 2024

Uh oh!

mvandenburgh commented Oct 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mvandenburgh commented Nov 10, 2023 •

edited by jjnesbitt

Loading