fix(user-interviews): guard null emails in behavioral targeting templates#67286
Merged
pauldambra merged 1 commit intoJul 1, 2026
Merged
Conversation
…ates The three "Finding users by behavior" SQL templates in the planning-voice-agent-user-interviews skill grouped events by a single `<id>` placeholder that agents were told to fill with `person.properties.email`. With no null guard, all emailless (anonymous) traffic collapsed into one `None` residual row that sorts to the top under `ORDER BY count() DESC`, eating a slot of the 20-row interview sample and risking a literal `None` being passed downstream as an interviewee email. The prose also said to "keep both kinds of rows" (email and distinct_id), but a single `<id>` column can't do that in one run. Rewrite the templates to group by `coalesce(person.properties.email, distinct_id) AS id` and select an explicit `email` column. This keeps both kinds of rows in a single query — emailed people group under their email, emailless people fall back to their own distinct_id as separate rows instead of one junk bucket — and the `email` column gives an unambiguous routing rule (non-null → `interviewee_emails`, null → `interviewee_distinct_ids`), mirroring the null guard the same file already uses in its cohort recipe. Also update Step 5's CSV guidance to prefer the existing `user-interview-topics-interviewees-bulk-create` tool over one create call per row. Generated-By: PostHog Code Task-Id: 8e4de988-d9c9-4628-9d1d-7cb1d7e350a3
pauldambra
approved these changes
Jul 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
planning-voice-agent-user-interviewsskill's preferred "find users by behavior" path handed agents three SQL templates that group events by a single<id>placeholder, which the prose tells them to fill withperson.properties.email. With no null guard, every emailless (anonymous) event collapses into oneNoneresidual row. UnderORDER BY count() DESCthat junk row sorts to the very top of the interview-candidate sample, eating a slot of the 20-row result and risking a literalNonebeing passed downstream as an interviewee email.The same section's prose also said to "keep both kinds of rows" (email and distinct_id), but a single
<id>column can't produce both in one run. The file already gets this right one section up: the cohort recipe filters withAND properties.email IS NOT NULL, so this was an internal inconsistency.This came from PostHog inbox report
019edc91-a574-70d2-9cce-82288f48cf3f.Changes
Rewrite the three behavioral templates to group by
coalesce(person.properties.email, distinct_id) AS idand select an explicitemailcolumn. This keeps both kinds of rows in a single query: emailed people group under their email, emailless people fall back to their own distinct_id as separate rows instead of one giant junk bucket. Theemailcolumn then gives an unambiguous routing rule (non-null goes tointerviewee_emails, null goes tointerviewee_distinct_ids), and the prose is updated to match.Also updated Step 5's CSV guidance to prefer the existing
user-interview-topics-interviewees-bulk-createtool over one create call per row.How did you test this code?
Documentation-only change to a skill file, so no automated tests. I ran the rewritten heavy-users template verbatim against project 2 via the PostHog MCP
execute-sqltool (event = '$pageview', 60-day window). It is valid HogQL and the top 20 rows are now all real interviewable emails with noNoneresidual row, confirming the fix.🤖 Agent context
Autonomy: Human-driven (agent-assisted)
AND person.properties.email IS NOT NULL(mirroring the cohort recipe), or switch tocoalesce(person.properties.email, distinct_id). I chose coalesce because it resolves both flagged issues at once — it drops theNoneresidual and genuinely "keeps both kinds of rows" in one query, whereas the plain null guard would only apply when<id>was an email and left the prose contradiction unresolved. I added an explicitemailcolumn so the email/distinct_id routing is unambiguous rather than heuristic.