Skip to content

Improve entity identification extraction and association in memory extractor #175

@NOVA-Openclaw

Description

@NOVA-Openclaw

Problem

The memory extraction pipeline currently does a poor job of detecting entity identification information (phone numbers, Signal IDs, Discord IDs/usernames, email addresses, GitHub handles, etc.) and associating it with the correct entity record.

Example

During Neva's onboarding, the following information was available across Signal and Discord conversations but was never extracted or associated with her entity record (entity_id 24):

  • Signal phone number (+18086498444)
  • Signal UUID (bf688e71-9267-4b72-ba9e-21a506fbf190)
  • Email (heyninarei@gmail.com)
  • GitHub username (Pr1ncessN1na)
  • Discord ID (603715052435931147)
  • Discord username (ninarei)
  • Discord display name (Nina Rei)
  • Timezone (America/Chicago)
  • Nickname (Princess)

All of this had to be manually inserted after the fact.

Desired Behavior

The memory extractor should:

  1. Detect identity-class information in conversations — phone numbers, email addresses, usernames, platform IDs, UUIDs, timezones, pronouns, real names, nicknames, etc.
  2. Associate detected facts with the correct entity — using conversational context, sender metadata, and existing entity records to determine who the information belongs to.
  3. Store as entity_facts with appropriate data_type = 'identity' and high confidence when self-reported.
  4. Cross-reference platform identifiers — when a Discord user mentions their Signal number or email, link those to the same entity.
  5. Leverage inbound message metadata — sender IDs, phone numbers, and usernames from message envelopes should be automatically captured and associated.

Scope

  • Focus on the memory extraction hooks/pipeline (not the embedding layer).
  • Consider both explicit self-reporting ("my email is...") and implicit metadata (Signal sender phone number in message envelope).
  • Should handle the common platforms: Signal, Discord, Telegram, Slack, email, GitHub, X/Twitter.

Related

  • Entity facts table: entity_facts (key/value with entity_id FK)
  • Memory extraction hook: memory-extract in hooks config
  • Onboarding workflow also touches this (User Onboarding workflow)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions