Skip to content

feat(seer): Add lightweight RCA clustering endpoint integration#112229

Merged
yuvmen merged 12 commits intomasterfrom
yuvmen/feat/lightweight-rca-cluster
Apr 7, 2026
Merged

feat(seer): Add lightweight RCA clustering endpoint integration#112229
yuvmen merged 12 commits intomasterfrom
yuvmen/feat/lightweight-rca-cluster

Conversation

@yuvmen
Copy link
Copy Markdown
Member

@yuvmen yuvmen commented Apr 3, 2026

Integrate Seer's new /v0/issues/supergroups/cluster-lightweight endpoint for lightweight root cause analysis and supergroup clustering.

When a new error issue is created, if the org is in the supergroups.lightweight-enabled-orgs sentry-option, we send the issue's event data to Seer. Seer generates a lightweight RCA via a single LLM call and clusters the issue into supergroups based on embedding similarity. This is separate from the existing Explorer-based agentic RCA flow.

Changes:

  • Register supergroups.active-rca-source and supergroups.lightweight-enabled-orgs sentry-options
  • Add LightweightRCAClusterRequest type and make_lightweight_rca_cluster_request() API function
  • Add trigger_lightweight_rca_cluster() core function and Celery task
  • Add kick_off_lightweight_rca_cluster pipeline step in post_process for new error issues
  • Rename existing lightweight_rca.pyexplorer_lightweight_rca.py to clarify it's the Explorer-based flow

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 3, 2026
Call Seer's new /v0/issues/supergroups/cluster-lightweight endpoint
on new issue creation, gated per-org via sentry-options. This sends
issue event data to Seer for lightweight root cause analysis and
clustering into supergroups.

Also renames the existing explorer-based lightweight RCA files to
explorer_lightweight_rca to avoid confusion with the new direct
endpoint-based clustering approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Serialized event data from EventSerializer can contain non-string dict
keys (integer keys in _meta.entries). Without this option orjson.dumps
raises TypeError.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Backend Test Failures

Failures on 35a6636 in this run:

tests/sentry/taskworker/test_config.py::test_all_instrumented_tasks_registeredlog
[gw1] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/taskworker/test_config.py:120: in test_all_instrumented_tasks_registered
    raise AssertionError(
E   AssertionError: Found 1 module(s) with @instrumented_task that are NOT registered in TASKWORKER_IMPORTS.
E   These tasks will not be discovered by the taskworker in production!
E   
E   Missing modules:
E     - sentry.tasks.seer.lightweight_rca_cluster
E   
E   Add these to TASKWORKER_IMPORTS in src/sentry/conf/server.py

Without this registration the task won't be discovered by the
taskworker in production.
@yuvmen yuvmen marked this pull request as ready for review April 3, 2026 21:20
@yuvmen yuvmen requested review from a team as code owners April 3, 2026 21:20
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Backend Test Failures

Failures on b442df5 in this run:

tests/sentry/taskworker/test_config.py::test_import_pathslog
[gw0] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/taskworker/test_config.py:25: in test_import_paths
    __import__(path)
src/sentry/tasks/seer/lightweight_rca_cluster.py:10: in <module>
    @instrumented_task(
E   TypeError: instrumented_task() missing 1 required positional argument: 'namespace'
tests/sentry/test_devimports.py::test_startup_imports[sentry]log
[gw1] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/test_devimports.py:121: in test_startup_imports
    validate_package(pkg, EXCLUDED, XFAIL)
tests/sentry/test_devimports.py:116: in validate_package
    raise AssertionError(ret.stdout)
E   AssertionError: /home/runner/work/sentry/sentry/src/sentry/spans/consumers/process_segments/enrichment.py:38: DeprecationWarning: ATTRIBUTE_NAMES.SENTRY_BROWSER_NAME is deprecated.
E     ATTRIBUTE_NAMES.SENTRY_BROWSER_NAME,
E   error importing sentry.tasks.seer.lightweight_rca_cluster:
E   
E   Traceback (most recent call last):
E     File "<string>", line 35, in <module>
E       __import__(name)
E       ~~~~~~~~~~^^^^^^
E     File "<string>", line 15, in _import
E       return orig(name, globals=globals, locals=locals, fromlist=fromlist, level=level)
E     File "/home/runner/work/sentry/sentry/src/sentry/tasks/seer/lightweight_rca_cluster.py", line 10, in <module>
E       @instrumented_task(
E        ~~~~~~~~~~~~~~~~~^
E           name="sentry.tasks.seer.lightweight_rca_cluster.trigger_lightweight_rca_cluster_task",
E           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E       ...<2 lines>...
E           taskworker_namespace=issues_tasks,
E           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E       )
E       ^
E   TypeError: instrumented_task() missing 1 required positional argument: 'namespace'

)

# Supergroups / Lightweight RCA
register(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to duplicate the options between here and seer? i thought the original plan was to have seer do this check

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I was conflicted about it, I think that now I got options in Seer I can have protections there, but I also dont want to queue tasks for all issues for nothing, seems very wasteful... and now I can basically have both be controlled by the same repo so I think its ok to protect from both sides using same config

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could just do the check only in sentry the? not sure it makes sense to duplicate the options, especially if the options have exactly the same name + purpose

This option is used on the Seer side, not in Sentry. Remove it until
it's actually needed here.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Backend Test Failures

Failures on c02c388 in this run:

tests/sentry/taskworker/test_config.py::test_import_pathslog
[gw1] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/taskworker/test_config.py:25: in test_import_paths
    __import__(path)
src/sentry/tasks/seer/lightweight_rca_cluster.py:10: in <module>
    @instrumented_task(
E   TypeError: instrumented_task() missing 1 required positional argument: 'namespace'
tests/sentry/test_devimports.py::test_startup_imports[sentry]log
[gw1] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/test_devimports.py:121: in test_startup_imports
    validate_package(pkg, EXCLUDED, XFAIL)
tests/sentry/test_devimports.py:116: in validate_package
    raise AssertionError(ret.stdout)
E   AssertionError: /home/runner/work/sentry/sentry/src/sentry/spans/consumers/process_segments/enrichment.py:38: DeprecationWarning: ATTRIBUTE_NAMES.SENTRY_BROWSER_NAME is deprecated.
E     ATTRIBUTE_NAMES.SENTRY_BROWSER_NAME,
E   error importing sentry.tasks.seer.lightweight_rca_cluster:
E   
E   Traceback (most recent call last):
E     File "<string>", line 35, in <module>
E       __import__(name)
E       ~~~~~~~~~~^^^^^^
E     File "<string>", line 15, in _import
E       return orig(name, globals=globals, locals=locals, fromlist=fromlist, level=level)
E     File "/home/runner/work/sentry/sentry/src/sentry/tasks/seer/lightweight_rca_cluster.py", line 10, in <module>
E       @instrumented_task(
E        ~~~~~~~~~~~~~~~~~^
E           name="sentry.tasks.seer.lightweight_rca_cluster.trigger_lightweight_rca_cluster_task",
E           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E       ...<2 lines>...
E           taskworker_namespace=issues_tasks,
E           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E       )
E       ^
E   TypeError: instrumented_task() missing 1 required positional argument: 'namespace'

The instrumented_task decorator requires `namespace` not
`taskworker_namespace`, and doesn't accept `queue` or `max_retries`.
GroupEvent.group is typed as non-optional, so the None check is
unreachable and mypy flags it.
…uster

The org eligibility check is already done in the pipeline step before
scheduling the task, so there's no need to check again in the function
itself.

class LightweightRCAClusterRequest(TypedDict):
group_id: int
issue: dict[str, Any]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
issue: dict[str, Any]
group: dict[str, Any]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seer has this thing where issue is the word used in APIs, I am mimicking IssueSummary endpoint here, and its in other places as well, the model used in code there is IssueDetails, and so theres this weird thing of like Issue is the model, group_id is the number it gets...

namespace=issues_tasks,
)
def trigger_lightweight_rca_cluster_task(group_id: int, **kwargs) -> None:
from sentry.seer.supergroups.lightweight_rca_cluster import trigger_lightweight_rca_cluster
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this import need to be in the func?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kick_off_seer_automation does it as well, I think its to avoid circular definitions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure? kick_off_seer_automation is defined in post process and everything there has imports inside the functions, but all the other tasks in this folder don't follow that pattern

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yea youre right, I had in mind like its in a different place, removing it

trigger_lightweight_rca_cluster(self.group)

@patch("sentry.seer.supergroups.lightweight_rca_cluster.make_lightweight_rca_cluster_request")
def test_passes_viewer_context(self, mock_request):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have this type of test on the other seer endpoints?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm not really, guess it doesnt make sense, ill remove

from sentry.testutils.cases import TestCase


class TriggerLightweightRCAClusterTest(TestCase):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we have a test for the end to end actual flow? e.g. that triggering an event that creates a new issue will cause trigger_lightweight_rca_cluster to be sent?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, will add

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looked into it a bit, none of the other tests actually do this, as it forces you to sort of force the task to run synchronously and no tests do that right now, gonna follow the established pattern of asserting on the delay call being made, and letting the task tests verify that it does what is needs

"sentry.workflow_engine.tasks.cleanup",
"sentry.tasks.seer.explorer_index",
"sentry.tasks.seer.context_engine_index",
"sentry.tasks.seer.lightweight_rca_cluster",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably name this lightweight_rca_embedding or just lightweight rca?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its like the command - to trigger clustering, because I basically treat the task as not a task to generate lgithweight-rca and a side effect of clustering, but instead of purposely triggering clustering, because before we didnt even save the lightweight-rca.
The endpoint I added is even called /cluster-lightweight, so thats like the command here, so I think the name fits. Does that make sense? I dont feel strongly about it though, just want it all to be coherent

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it should be more explicit like just "trigger_supergroup_clustering_lightweight"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i also don't feel too strongly, i think a slightly more consistent name would be 'lightweight rca embedding' / 'lightweight rca generation' but i think cluster also fits

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer cluster to both of these I think, like embedding is kinda technical, its not what the caller really intends, and generation is sort of inaccurate because of the way we set it up where the point is to cluster by lightweightRCA - not to generate it, we didnt even save the generated RCA until we realized we need it for resummarization, so its like a side effect now.

Just so were all on the same page - I am just trying to be consistent with the way I phrased and treated it up until now, I even leaned in the direction of making it all about lightweight RCA generation and the clustering being a side effect, but went the other way around in the API and flow, so I think we should stay consistent.

Sends issue event data to Seer, which generates a lightweight root cause analysis
and clusters the issue into supergroups based on embedding similarity.
"""
event = group.get_recommended_event_for_environments()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we always only have one event for this group, since we run this when the issue is created? just confused on any situation where we'd have >1 event

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea probably no need for this protection, will change to get_latest_event()

from sentry.seer.supergroups.lightweight_rca_cluster import trigger_lightweight_rca_cluster

try:
group = Group.objects.get(id=group_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we have the group in post_process when we pass it here? do we need another fetch?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its standard for these tasks, you cant pass a model into them, they gotta refetch


@instrumented_task(
name="sentry.tasks.seer.lightweight_rca_cluster.trigger_lightweight_rca_cluster_task",
namespace=issues_tasks,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily blocking, is this the right namespace? i see that issues_tasks is technically labeled for 'issueplatform', while ingest_errors_postprocess_tasks exists and might fit slightly better

yuvmen added 2 commits April 6, 2026 17:05
- Remove viewer_context test (not tested on other seer endpoints)
- Switch to get_latest_event() since this runs on new issue creation
- Change task namespace to ingest_errors_postprocess_tasks
- Add post_process pipeline tests verifying task dispatch gating:
  dispatched when org enabled + new issue, skipped otherwise
No circular dependency exists, so the function-level import is
unnecessary.
Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 87ecf74. Configure here.


@patch("sentry.tasks.seer.lightweight_rca_cluster.trigger_lightweight_rca_cluster_task.delay")
def test_kick_off_lightweight_rca_cluster_skips_when_not_new(self, mock_task):
event = self.create_event(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure, but isn't there a helper for creating an event and triggering the entire event lifecycle, including post process? basically, is there a way we remove the call_post_process_group while having the option enabled and have this test pass?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is I (and claude) couldnt find it, theres this , which saves a new event and triggers some stuff related to the group save but not post processing.
From what I can tell call_post_process_group is the way its done around other tests

assert len(body["issue"]["events"]) == 1

@patch("sentry.seer.supergroups.lightweight_rca_cluster.make_lightweight_rca_cluster_request")
def test_raises_on_seer_error(self, mock_request):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to other test, do seer callers have this sort of test?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm actually they dont, this was generated as a response to some check I made though its sort of a good verification that if the request fails we throw something, however it did make me notice that the task itself just does logger.exception and swallows the error, which I think I will change - I want the task to fail outright, I think we agreed on this already in some other place.
So I am adding a raise there, regarding this test I feel like its harmless, dont mind removing

return make_signed_seer_api_request(
seer_autofix_default_connection_pool,
"/v0/issues/supergroups/cluster-lightweight",
body=orjson.dumps(body, option=orjson.OPT_NON_STR_KEYS),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, what is this orjson option?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its to allow dict keys that are non string, in this case integers - from what I understand when we send event data it contains these kind of keys and its requried that we allow it, the issue summary endpoint does the same thing for the same reason I believe

Log the exception for Sentry visibility, then re-raise so the task
is marked as failed in monitoring.
…ight-rca-cluster

# Conflicts:
#	src/sentry/seer/signed_seer_api.py
This task makes an external API call with a 30s timeout, so it
shouldn't run on the postprocess worker pool. Use ingest_errors_tasks
to match generate_summary_and_run_automation which has the same
dispatch pattern.
@yuvmen yuvmen merged commit 615581d into master Apr 7, 2026
77 checks passed
@yuvmen yuvmen deleted the yuvmen/feat/lightweight-rca-cluster branch April 7, 2026 22:35
yuvmen added a commit that referenced this pull request Apr 8, 2026
…ting (#112436)

Add `rca_source` parameter to supergroup query APIs so Seer knows which
embedding space (explorer vs lightweight) to query from. The source is
determined by the `organizations:supergroups-lightweight-rca-clustering`
feature flag.

Also replaces the `supergroups.lightweight-enabled-orgs` sentry-option
with this feature flag for both:
- **Write path**: post_process task dispatch for lightweight RCA
clustering
- **Read path**: supergroup query endpoints (details + by-group)

This is consistent with how all other supergroup features are gated (via
feature flags, not options).

Depends on #112229 (merged).
george-sentry pushed a commit that referenced this pull request Apr 9, 2026
)

Integrate Seer's new `/v0/issues/supergroups/cluster-lightweight`
endpoint for lightweight root cause analysis and supergroup clustering.

When a new error issue is created, if the org is in the
`supergroups.lightweight-enabled-orgs` sentry-option, we send the
issue's event data to Seer. Seer generates a lightweight RCA via a
single LLM call and clusters the issue into supergroups based on
embedding similarity. This is separate from the existing Explorer-based
agentic RCA flow.

**Changes:**
- Register `supergroups.active-rca-source` and
`supergroups.lightweight-enabled-orgs` sentry-options
- Add `LightweightRCAClusterRequest` type and
`make_lightweight_rca_cluster_request()` API function
- Add `trigger_lightweight_rca_cluster()` core function and Celery task
- Add `kick_off_lightweight_rca_cluster` pipeline step in post_process
for new error issues
- Rename existing `lightweight_rca.py` → `explorer_lightweight_rca.py`
to clarify it's the Explorer-based flow

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
george-sentry pushed a commit that referenced this pull request Apr 9, 2026
…ting (#112436)

Add `rca_source` parameter to supergroup query APIs so Seer knows which
embedding space (explorer vs lightweight) to query from. The source is
determined by the `organizations:supergroups-lightweight-rca-clustering`
feature flag.

Also replaces the `supergroups.lightweight-enabled-orgs` sentry-option
with this feature flag for both:
- **Write path**: post_process task dispatch for lightweight RCA
clustering
- **Read path**: supergroup query endpoints (details + by-group)

This is consistent with how all other supergroup features are gated (via
feature flags, not options).

Depends on #112229 (merged).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants