Surface non-lock and self-blocking reports in sp_HumanEventsBlockViewer rollup (#806)#811
Merged
Merged
Conversation
… rollup (#806) The blocked process monitor fires for any task waiting longer than the configured blocked process threshold, not only lock waits. Non-lock waits (memory grants, parallelism, etc.) and self-referential reports (a session reported as blocking itself) therefore surface as blocked process reports, but were counted alongside genuine lock contention in the findings rollup with no way to tell them apart. Add finding check_id 10 ("Non-Lock and Self Blocking") that counts these reports per database, filtered on resource_owner_type <> 'LOCK'. LOCK is the only lock value in the resource_owner_type XE map; every other value is a non-lock wait, and a self-referential report is always non-lock. Counts distinct events via event_time since transaction_id is unreliable (often 0) for non-lock waits and each report yields both a blocking and blocked row. The reports already flow through the main result set; this only adds the rollup signal. Existing lock findings and @version/@version_date are unchanged. Validated on SQL 2017 and SQL 2025 (case-sensitive collation): the new finding fires for a non-lock self-wait and excludes a normal lock block. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Checks 1 (Database Locks) and 2 (Object Locks) counted every #blocks row, so non-lock and self-referential reports inflated them alongside genuine lock contention. Filter both to resource_owner_type = 'LOCK' (or NULL, to keep unknown-type rows rather than silently drop them) so they report real lock contention only; the non-lock reports remain counted in check_id 10. Wait-time totals (check_id 1000/1001) and Login, App, and Host (check_id 8) are intentionally left inclusive: a non-lock wait is still real wait time and real activity. Validated on SQL 2017 and SQL 2025 (case-sensitive collation): with one lock and one non-lock event staged, checks 1 and 2 now read 1 (was 2), check 10 reads 1, and the wait-time totals stay at the combined value. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Makes
sp_HumanEventsBlockViewerdistinguish non-lock and self-referential blocked process reports from genuine lock contention. Two parts:check_id = 10, group "Non-Lock and Self Blocking") that counts these reports per database.check_id 1Database Locks,check_id 2Object Locks) so those read as real lock contention only.Closes #806.
Why
The blocked process monitor fires for any task that waits longer than
blocked process threshold, not just lock waits. A longRESOURCE_SEMAPHORE(memory grant), parallelism, or other non-lock wait surfaces as a blocked process report, frequently as a session reported as blocking itself (blockedSPID:ECID== blockingSPID:ECID).These rows already appear in the main result set, but they were indistinguishable from real lock blocking in the analysis rollup. Every finding reads from
#blocks, and the count/sum-style findings applied no lock-type filter, so a non-lock report was counted as a "blocking session" in Database/Object Locks and added to the wait-time totals.Note: contrary to the original issue, these reports are not silently dropped to zero rows for well-formed event XML (verified by repro). The cycle guard only suppresses the blocking-tree depth, not the row itself. The one shape that does produce zero rows is staged XML lacking the
<event name="blocked_process_report">envelope, which is a caller-side staging concern, not this code path.How
resource_owner_typeis authoritative:LOCKis the only lock value in the server'sresource_owner_typeXE map;GENERIC/EXCHANGE/THREAD/ etc. are all non-lock, and a self-referential report is always non-lock.resource_owner_type <> N'LOCK'.(resource_owner_type = N'LOCK' OR resource_owner_type IS NULL)— the complement, withIS NULLretained so unknown-type rows stay in the lock bucket rather than being silently dropped. Every row lands in exactly one place.Event count in check_id 10 uses
COUNT_BIG(DISTINCT b.event_time)becausetransaction_idis unreliable (often 0) for non-lock waits, and each report yields both a blocking- and blocked-perspective row in#blocks.Left intentionally inclusive: wait-time totals (
check_id 1000/1001) and Login/App/Host (check_id 8). A non-lock wait is still real wait time and real activity.@version/@version_dateleft for release.Test plan
Staged two blocked process reports in a table and ran
@target_type = N'table':resource_owner_type = GENERIC,lock_mode = NL), spid 61 blocked by itself (the issue's example capture).resource_owner_type = LOCK,Xmode), distinct sessions 55 <- 60.Result on SQL 2017 and SQL 2025 (case-sensitive collation):
2 total eventsparsed.check_id 1(Database Locks) andcheck_id 2(Object Locks) read 1 (the real lock block only; was 2 before netting).check_id 10fires once, counting 1 report (the non-lock self-wait only).check_id 1000/1001total wait time stays at the combined 14s (9s non-lock + 5s lock).check_id 8still lists both sessions.Books balance: 2 events = 1 lock + 1 non-lock, no row double-counted or dropped.
🤖 Generated with Claude Code