feat: infer memberOrganization stint dates from work-email activities (CM-1105)#4054
feat: infer memberOrganization stint dates from work-email activities (CM-1105)#4054
Conversation
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds infrastructure to infer and persist memberOrganizations stint dates from verified work-email activities, so email-domain affiliations become timeline-aware and can compete with enrichment on overlaps.
Changes:
- Extend affiliation resolution to bias toward email-domain rows when a verified email domain is present, and add a source-priority tier in
decidePrimaryOrganizationId. - Buffer
(memberId, orgId, YYYY-MM-DD)activity evidence in Redis on the ingestion hot path and introduce a cron job to infer stint insert/update operations from buffered dates. - Add a partial Postgres index to speed up per-member fetches of
email-domainmemberOrganizations; remove legacy mapping scripts and rename the shared member-organization service file.
Reviewed changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| services/libs/types/src/organizations.ts | Adds shared types for buffered org-dates and inferred stint changes. |
| services/libs/data-access-layer/src/old/apps/data_sink_worker/repo/memberAffiliation.data.ts | Extends work-experience data shape to include source. |
| services/libs/data-access-layer/src/members/segments.ts | Adds optional email-domain candidate inclusion in findMemberWorkExperience. |
| services/libs/data-access-layer/src/members/organizations.ts | Adds fetchMemberOrganizationsBySource for cron’s targeted reads. |
| services/libs/common_services/src/services/memberOrganization.ts | Deleted (renamed). |
| services/libs/common_services/src/services/member/unmerge.ts | Updates import to new member-organization module path. |
| services/libs/common_services/src/services/member-organization.ts | New module: keeps unmerge helpers and adds stint inference logic + Redis key constants. |
| services/libs/common_services/src/services/index.ts | Re-exports renamed member-organization module. |
| services/libs/common_services/src/services/common.member.service.ts | Threads emailDomain through findAffiliation and adds source-priority selection logic. |
| services/apps/data_sink_worker/src/service/member.service.ts | Buffers per-member per-org activity dates in Redis and enqueues member IDs for cron. |
| services/apps/data_sink_worker/src/service/activity.service.ts | Extracts verified email domain from activity payload and passes it into affiliation lookup. |
| services/apps/data_sink_worker/src/bin/map-tenant-members-to-org.ts | Removed outdated script. |
| services/apps/data_sink_worker/src/bin/map-member-to-org.ts | Removed outdated script. |
| services/apps/data_sink_worker/package.json | Removes script entries for deleted bin scripts. |
| services/apps/cron_service/src/jobs/inferMemberOrganizationStintChanges.job.ts | New cron job to drain Redis buffers and compute stint changes (currently dry-run). |
| backend/src/database/migrations/V1776931245__member-organizations-email-domain-partial-index.sql | Adds partial index to support efficient per-member email-domain org reads. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7b7e744. Configure here.
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 16 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 16 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 16 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (activityDates.length > 0) { | ||
| // 3. Compare with DB and calculate delta | ||
| const existingOrgs = await fetchMemberOrganizationsBySource( | ||
| qx, | ||
| memberId, | ||
| OrganizationSource.EMAIL_DOMAIN, | ||
| ) | ||
|
|
||
| const changes = inferMemberOrganizationStintChanges(memberId, existingOrgs, activityDates) | ||
|
|
||
| if (changes.length > 0) { | ||
| ctx.log.info({ memberId, count: changes.length }, 'Stint changes identified.') | ||
| stats.inserts += changes.filter((c) => c.type === 'insert').length | ||
| stats.updates += changes.filter((c) => c.type === 'update').length | ||
| } | ||
| } |
There was a problem hiding this comment.
The cron job computes changes via inferMemberOrganizationStintChanges(...) but never applies them to Postgres (no INSERT/UPDATE of memberOrganizations). As written, it will log and then delete the buffered Redis dates, effectively dropping the inference signal. Persist the computed changes (ideally in a transaction per member) before cleaning up Redis.
| // Filter by source priority | ||
| // Source rank: ui > email-domain > enrichment-* > Other | ||
| const rankSource = (source?: string) => { | ||
| if (source === OrganizationSource.UI) return 0 | ||
| if (source === OrganizationSource.EMAIL_DOMAIN) return 1 | ||
| if (source?.startsWith('enrichment-')) return 2 | ||
| return 3 | ||
| } |
There was a problem hiding this comment.
rankSource() duplicates the source-tier logic already implemented as getMemberOrganizationSourceRank() in @crowd/common (services/libs/common/src/member.ts). Duplicating this ranking risks divergence between affiliation resolution paths; consider importing and reusing the shared helper instead of re-implementing the mapping here.
…and reuse Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>

Context
When an activity comes in with a verified work email like
jbeulich@suse.com, we already create amemberOrganizationsrow linking the member to SUSE — but with NULLdateStart/dateEnd. That causes two problems:@suse.comactivity can lose to an unrelated dated enrichment row, becausefindAffiliationtreats undated rows as last-resort fallback.We want to use the activity timestamp as evidence that the person was at that company at that moment, and write that into
dateStart/dateEnd. The catch: we can't write on every activity (active maintainers generate hundreds per day), we can't collapse a real multi-stint history like Google → Apple → Google into one wrong range, and we can't override user edits or enrichment data.How it works
Hot path (
data_sink_worker): When an activity arrives, we just buffer(org, date)in a Redis hash keyed by member. This involves two Redis ops, no Postgres, and no rule evaluation. Hundreds of same-day activities collapse to a single entry.Cron (
cron_service): Every 5 min, the service pops up to 500 pending members, atomically drains each member's hash, loads their existing email-domain rows, and walks the buffered dates chronologically applying 4 rules:dateEnd→ extend forward, with a 30-day debounce and a multi-stint guard (if another org holds a 30+ day stint in the gap, insert a fresh stint instead of bridging)dateStart→ extend backward, same multi-stint guard, no debounce (rare, re-ingestion only)Walking all orgs together in chronological order is what makes multi-stint detection work: by the time the 2008 Google event checks its gap, the 2005-2007 Apple events are already in the working copy and the guard fires correctly.
findAffiliationgets two changes so email-domain rows start contributing meaningfully:ui > email-domain > enrichment-* > other) inserted intodecidePrimaryOrganizationId. Once email-domain rows have dates, they beat enrichment on overlaps.findMemberWorkExperiencepulls in the matching email-domain row as a candidate even when undated. This is the user-visible win that lands immediately — work-email activities resolve to the right org inline, without waiting for the cron to stamp dates.Partial index on
memberOrganizations("memberId") WHERE source='email-domain' AND deletedAt IS NULLbacks the cron's per-member fetch so it's a single index seek.Cleanup
map-member-to-org.ts,map-tenant-members-to-org.ts).memberOrganization.ts→member-organization.tsto match the folder's kebab-case convention.Note
Medium Risk
Introduces new background processing that buffers activity-derived dates in Redis and later infers/updates
memberOrganizationsstints, plus changes affiliation selection logic; errors or edge cases could misattribute organizations or create incorrect date ranges.Overview
Adds a Redis-buffer + cron pipeline to infer
memberOrganizationsstintdateStart/dateEndforemail-domainsources from activity timestamps: the data sink worker now queues per-(member, org) activity dates, and a new cron job periodically reads and computes insert/update deltas viainferMemberOrganizationStintChanges.Updates affiliation resolution to account for verified work-email domains and to prefer higher-priority org sources when multiple stints overlap, and adds a partial Postgres index plus a DAL helper (
fetchMemberOrganizationsBySource) to make per-member email-domain lookups efficient. Also removes two legacy mapping scripts and renamesmemberOrganizationutilities tomember-organizationwhile extending types to model stint-change inputs/outputs.Reviewed by Cursor Bugbot for commit 39fe82b. Bugbot is set up for automated code reviews on this repo. Configure here.