-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
feat(seer): Add lightweight RCA clustering endpoint integration #112229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
1b7c082
98a5f16
a3bbef2
6f9d2d1
4905a86
c984fcf
2b84529
2b9b855
87ecf74
ce103c2
aa18de9
4270ced
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1343,6 +1343,19 @@ | |
| flags=FLAG_MODIFIABLE_RATE | FLAG_AUTOMATOR_MODIFIABLE, | ||
| ) | ||
|
|
||
| # Supergroups / Lightweight RCA | ||
| register( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does it make sense to duplicate the options between here and seer? i thought the original plan was to have seer do this check
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yea I was conflicted about it, I think that now I got options in Seer I can have protections there, but I also dont want to queue tasks for all issues for nothing, seems very wasteful... and now I can basically have both be controlled by the same repo so I think its ok to protect from both sides using same config
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we could just do the check only in sentry the? not sure it makes sense to duplicate the options, especially if the options have exactly the same name + purpose |
||
| "supergroups.active-rca-source", | ||
| default="explorer", | ||
| flags=FLAG_AUTOMATOR_MODIFIABLE, | ||
| ) | ||
|
cursor[bot] marked this conversation as resolved.
Outdated
|
||
| register( | ||
| "supergroups.lightweight-enabled-orgs", | ||
| type=Sequence, | ||
| default=[], | ||
| flags=FLAG_ALLOW_EMPTY | FLAG_AUTOMATOR_MODIFIABLE, | ||
| ) | ||
|
|
||
| # ## sentry.killswitches | ||
| # | ||
| # The following options are documented in sentry.killswitches in more detail | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -259,6 +259,14 @@ class SupergroupsEmbeddingRequest(TypedDict): | |||||
| artifact_data: dict[str, Any] | ||||||
|
|
||||||
|
|
||||||
| class LightweightRCAClusterRequest(TypedDict): | ||||||
| group_id: int | ||||||
| issue: dict[str, Any] | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seer has this thing where issue is the word used in APIs, I am mimicking IssueSummary endpoint here, and its in other places as well, the model used in code there is IssueDetails, and so theres this weird thing of like Issue is the model, group_id is the number it gets... |
||||||
| organization_slug: str | ||||||
| organization_id: int | ||||||
| project_id: int | ||||||
|
|
||||||
|
|
||||||
| class SupergroupsListRequest(TypedDict): | ||||||
| organization_id: int | ||||||
| offset: NotRequired[int | None] | ||||||
|
|
@@ -375,6 +383,20 @@ def make_supergroups_embedding_request( | |||||
| ) | ||||||
|
|
||||||
|
|
||||||
| def make_lightweight_rca_cluster_request( | ||||||
| body: LightweightRCAClusterRequest, | ||||||
| timeout: int | float | None = None, | ||||||
| viewer_context: SeerViewerContext | None = None, | ||||||
| ) -> BaseHTTPResponse: | ||||||
| return make_signed_seer_api_request( | ||||||
| seer_autofix_default_connection_pool, | ||||||
| "/v0/issues/supergroups/cluster-lightweight", | ||||||
| body=orjson.dumps(body, option=orjson.OPT_NON_STR_KEYS), | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. curious, what is this orjson option?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. its to allow dict keys that are non string, in this case integers - from what I understand when we send event data it contains these kind of keys and its requried that we allow it, the issue summary endpoint does the same thing for the same reason I believe |
||||||
| timeout=timeout, | ||||||
| viewer_context=viewer_context, | ||||||
| ) | ||||||
|
|
||||||
|
|
||||||
| def make_supergroups_list_request( | ||||||
| body: SupergroupsListRequest, | ||||||
| viewer_context: SeerViewerContext, | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| from __future__ import annotations | ||
|
|
||
| import logging | ||
|
|
||
| from sentry import options | ||
| from sentry.api.serializers import EventSerializer, serialize | ||
| from sentry.eventstore import backend as eventstore | ||
| from sentry.models.group import Group | ||
| from sentry.seer.models import SeerApiError | ||
|
sentry[bot] marked this conversation as resolved.
|
||
| from sentry.seer.signed_seer_api import ( | ||
| LightweightRCAClusterRequest, | ||
| SeerViewerContext, | ||
| make_lightweight_rca_cluster_request, | ||
| ) | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def trigger_lightweight_rca_cluster(group: Group) -> None: | ||
| """ | ||
| Call Seer's lightweight RCA clustering endpoint for the given group. | ||
|
|
||
| Sends issue event data to Seer, which generates a lightweight root cause analysis | ||
| and clusters the issue into supergroups based on embedding similarity. | ||
| """ | ||
| enabled_orgs: list[int] = options.get("supergroups.lightweight-enabled-orgs") | ||
| if group.organization.id not in enabled_orgs: | ||
| return | ||
|
cursor[bot] marked this conversation as resolved.
Outdated
|
||
|
|
||
| event = group.get_recommended_event_for_environments() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shouldn't we always only have one event for this group, since we run this when the issue is created? just confused on any situation where we'd have >1 event
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yea probably no need for this protection, will change to get_latest_event() |
||
| if not event: | ||
| event = group.get_latest_event() | ||
|
|
||
| if not event: | ||
| logger.info( | ||
| "lightweight_rca_cluster.no_event", | ||
| extra={"group_id": group.id}, | ||
| ) | ||
| return | ||
|
|
||
| ready_event = eventstore.get_event_by_id(group.project.id, event.event_id, group_id=group.id) | ||
| if not ready_event: | ||
| logger.info( | ||
| "lightweight_rca_cluster.event_not_ready", | ||
| extra={"group_id": group.id, "event_id": event.event_id}, | ||
| ) | ||
| return | ||
|
|
||
| serialized_event = serialize(ready_event, None, EventSerializer()) | ||
|
|
||
| body = LightweightRCAClusterRequest( | ||
| group_id=group.id, | ||
| issue={ | ||
| "id": group.id, | ||
| "title": group.title, | ||
| "short_id": group.qualified_short_id, | ||
| "events": [serialized_event], | ||
| }, | ||
| organization_slug=group.organization.slug, | ||
| organization_id=group.organization.id, | ||
| project_id=group.project.id, | ||
| ) | ||
| viewer_context = SeerViewerContext(organization_id=group.organization.id) | ||
|
sentry-warden[bot] marked this conversation as resolved.
|
||
|
|
||
| response = make_lightweight_rca_cluster_request(body, timeout=30, viewer_context=viewer_context) | ||
| if response.status >= 400: | ||
| raise SeerApiError("Lightweight RCA cluster request failed", response.status) | ||
|
|
||
| logger.info( | ||
| "lightweight_rca_cluster.success", | ||
| extra={ | ||
| "group_id": group.id, | ||
| "project_id": group.project.id, | ||
| "organization_id": group.organization.id, | ||
| }, | ||
| ) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| import logging | ||
|
|
||
| from sentry.models.group import Group | ||
| from sentry.tasks.base import instrumented_task | ||
| from sentry.taskworker.namespaces import issues_tasks | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| @instrumented_task( | ||
| name="sentry.tasks.seer.lightweight_rca_cluster.trigger_lightweight_rca_cluster_task", | ||
| queue="default", | ||
| max_retries=0, | ||
| taskworker_namespace=issues_tasks, | ||
| ) | ||
| def trigger_lightweight_rca_cluster_task(group_id: int, **kwargs) -> None: | ||
| from sentry.seer.supergroups.lightweight_rca_cluster import trigger_lightweight_rca_cluster | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does this import need to be in the func?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are you sure?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh yea youre right, I had in mind like its in a different place, removing it |
||
|
|
||
| try: | ||
| group = Group.objects.get(id=group_id) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't we have the group in
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. its standard for these tasks, you cant pass a model into them, they gotta refetch |
||
| except Group.DoesNotExist: | ||
| logger.info( | ||
| "lightweight_rca_cluster_task.group_not_found", | ||
| extra={"group_id": group_id}, | ||
| ) | ||
| return | ||
|
|
||
| try: | ||
| trigger_lightweight_rca_cluster(group) | ||
| except Exception: | ||
| logger.exception( | ||
| "lightweight_rca_cluster_task.failed", | ||
| extra={"group_id": group_id}, | ||
| ) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably name this lightweight_rca_embedding or just lightweight rca?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its like the command - to trigger clustering, because I basically treat the task as not a task to generate lgithweight-rca and a side effect of clustering, but instead of purposely triggering clustering, because before we didnt even save the lightweight-rca.
The endpoint I added is even called
/cluster-lightweight, so thats like the command here, so I think the name fits. Does that make sense? I dont feel strongly about it though, just want it all to be coherentThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it should be more explicit like just "trigger_supergroup_clustering_lightweight"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i also don't feel too strongly, i think a slightly more consistent name would be 'lightweight rca embedding' / 'lightweight rca generation' but i think cluster also fits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer cluster to both of these I think, like embedding is kinda technical, its not what the caller really intends, and generation is sort of inaccurate because of the way we set it up where the point is to cluster by lightweightRCA - not to generate it, we didnt even save the generated RCA until we realized we need it for resummarization, so its like a side effect now.
Just so were all on the same page - I am just trying to be consistent with the way I phrased and treated it up until now, I even leaned in the direction of making it all about lightweight RCA generation and the clustering being a side effect, but went the other way around in the API and flow, so I think we should stay consistent.