Skip to content

Fix DataFilter false positives for UUIDs and MongoDB ObjectIds#518

Merged
neSpecc merged 5 commits intomasterfrom
copilot/fix-falsy-filtered-values
Jan 31, 2026
Merged

Fix DataFilter false positives for UUIDs and MongoDB ObjectIds#518
neSpecc merged 5 commits intomasterfrom
copilot/fix-falsy-filtered-values

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 30, 2026

Fix DataFilter False Positives for UUIDs and ObjectIds ✅

Problem

The DataFilter incorrectly flagged UUIDs and MongoDB ObjectIds as credit card numbers (PANs), replacing them with [filtered]. This happened when:

  • The value contained mostly digits
  • After removing non-digit characters, it was 13-16 digits long
  • It started with digits that match credit card patterns (2, 4, 5, certain 3/6 patterns)

Root Cause

The filterPanNumbers method stripped all non-digit characters and then checked against PAN regex patterns, which caused false positives for valid identifiers.

Solution

Added UUID and ObjectId detection before PAN checking:

  1. MongoDB ObjectId Regex: Matches 24 hexadecimal characters
  2. UUID Regex: Matches 32 hex characters with enforced dash consistency (all dashes in 8-4-4-4-12 format OR no dashes)

The filterPanNumbers method now:

  1. First checks if value is an ObjectId → keep it
  2. Then checks if value is a UUID → keep it
  3. Only then performs PAN detection → filter if match

Changes Made

  • Analyzed DataFilter implementation and identified false positive patterns
  • Added objectIdRegex to detect MongoDB ObjectIds
  • Added uuidRegex to detect UUIDs (enforcing consistent dash usage)
  • Updated filterPanNumbers to check UUID/ObjectId patterns first
  • Fixed tests to use values that would actually fail without the fix
    • UUIDs with exactly 16 digits that match PAN patterns when cleaned
    • ObjectIds with 16 digits that match PAN patterns when cleaned
  • Addressed code review feedback
  • Verified linting and build pass (pre-existing errors unrelated to changes)
  • Ran CodeQL security scan (0 vulnerabilities)

Test Coverage

Tests now use problematic values that would be incorrectly filtered without the fix:

  • ✅ UUID 4a1b2c3d-4e5f-6a7b-8c9d-0e1f2a3b4c5d (16 digits → Visa pattern)
  • ✅ UUID 5A1B2C3D-4E5F-6A7B-8C9D-0E1F2A3B4C5D (16 digits → Mastercard pattern)
  • ✅ ObjectId 4111111111111111abcdefab (16 digits → Visa pattern)
  • ✅ ObjectId 5111111111111111ABCDEFAB (16 digits → Mastercard pattern)

All tests would FAIL if lines 109-123 are commented out.

Security Summary

  • No security vulnerabilities introduced (CodeQL scan: 0 alerts)
  • Credit card filtering still works correctly
  • Sensitive key filtering still works correctly
  • Fix only prevents false positives for valid identifiers
Original prompt

This section details on the original issue you should resolve

<issue_title>grouper(data-filter):  falsy filtered values</issue_title>
<issue_description>Sometimes some values replaced by [filtered] by mistake. Probably, uuid or ObjectId could be mistaken for PAN.

Image

We need to:

  1. write tests that reproduce the problem
    1.1) ensure new tests are failing
  2. fix DataFilter
  3. ensure tests passed

Additional info:

  1. In example above, falsy filtered value stored in the GroupWorkerTask root-level projectId key which is inserted by Collector:

https://github.com/codex-team/hawk.collector/blob/0b11313918afba7e94028589bd1c3b3da0a7eb6c/pkg/server/errorshandler/handler.go#L97

I'm not sure whether bug is actual only for this field or not.

  1. We also have DataFilter class in Hawk.Laravel catcher.

https://github.com/codex-team/hawk.laravel/blob/27d8c9f542819db3aad67ed5bdefaa0061732b38/src/Services/DataFilter.php#L10-L15

I'm not sure if it caused by Hawk.Laravel. Maybe, but based on (1) it does not look so.

  1. I tried to write these tests but have not managed to reproduce the problem:
    test('should not filter UUID values', async () => {
      const uuidV4 = '550e8400-e29b-41d4-a716-446655440000';
      const uuidV4Upper = '550E8400-E29B-41D4-A716-446655440000';
      const uuidWithoutDashes = '550e8400e29b41d4a716446655440000';

      const event = generateEvent({
        context: {
          userId: uuidV4,
          sessionId: uuidV4Upper,
          transactionId: uuidWithoutDashes,
          requestId: uuidV4,
        },
        addons: {
          vue: {
            props: {
              componentId: uuidV4,
            },
          },
        },
      });

      dataFilter.processEvent(event);

      expect(event.context['userId']).toBe(uuidV4);
      expect(event.context['sessionId']).toBe(uuidV4Upper);
      expect(event.context['transactionId']).toBe(uuidWithoutDashes);
      expect(event.context['requestId']).toBe(uuidV4);
      expect(event.addons['vue']['props']['componentId']).toBe(uuidV4);
    });

    test('should not filter MongoDB ObjectId at root level of event', async () => {
      const objectId = '507f1f77bcf86cd799439011';
      /** ObjectId that contains only digits (could be mistaken for PAN) */
      const numericObjectId = '672808419583041003090824';

      const event = {
        projectId: objectId,
        numericProjectId: numericObjectId,
        catcherType: 'errors/php',
        title: 'Test event',
        backtrace: [],
        timestamp: 1769802857,
      };

      dataFilter.processEvent(event as any);

      expect(event.projectId).toBe(objectId);
      expect(event.numericProjectId).toBe(numericObjectId);
    });

    test('should not filter MongoDB ObjectId values in context and addons', async () => {
      const objectId = '507f1f77bcf86cd799439011';
      const objectIdUpper = '507F1F77BCF86CD799439011';

      const event = generateEvent({
        context: {
          projectId: objectId,
          workspaceId: objectIdUpper,
        },
        addons: {
          hawk: {
            projectId: objectId,
          },
        },
      });

      dataFilter.processEvent(event);

      expect(event.context['projectId']).toBe(objectId);
      expect(event.context['workspaceId']).toBe(objectIdUpper);
      expect(event.addons['hawk']['projectId']).toBe(objectId);
    });

</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits January 30, 2026 22:12
Co-authored-by: neSpecc <3684889+neSpecc@users.noreply.github.com>
Co-authored-by: neSpecc <3684889+neSpecc@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix falsy filtered values in DataFilter Fix DataFilter false positives for UUIDs and MongoDB ObjectIds Jan 30, 2026
Copilot AI requested a review from neSpecc January 30, 2026 22:19
…d detection

Co-authored-by: neSpecc <3684889+neSpecc@users.noreply.github.com>
Copilot AI requested a review from neSpecc January 30, 2026 23:26
@codex-assistant codex-assistant bot marked this pull request as ready for review January 31, 2026 00:20
@codex-assistant
Copy link
Copy Markdown

Thanks for adding a description — the PR is now marked as Ready for Review.

@neSpecc neSpecc merged commit 4189b0a into master Jan 31, 2026
5 checks passed
@neSpecc neSpecc deleted the copilot/fix-falsy-filtered-values branch January 31, 2026 00:24
e11sy added a commit that referenced this pull request Feb 9, 2026
* chore(sentry): add some debug logs (#500)

* add logs to sentry worker

* tune logs

* Update index.ts

* fix tests

* chore(event-email): event email notification now contains link to a particular repetition  (#499)

* Add repetitionId to event notification flow

Introduces the repetitionId field to event notification data structures and templates, allowing emails and notifications to reference specific event repetitions. Updates TypeScript interfaces, worker logic, and email templates to support and display repetitionId where applicable.

* fix grouper test

* fix(sentry): replay skipping improved (#503)

* fix sentry replay skipping

* lint code

* Update index.test.ts

* Log Sentry client_report items for debugging (#505)

Added handling for 'client_report' items in Sentry envelopes to log their internals for easier debugging of dropped events and SDK/reporting issues. Decodes payloads as needed and logs errors if decoding fails.

* bug(sentry): Flatten nested objects in backtrace frame arguments using dot notation (#509)

* Initial plan

* Implement dot notation for nested objects in backtrace frame arguments

Co-authored-by: neSpecc <3684889+neSpecc@users.noreply.github.com>

* Fix empty array handling to be consistent with empty objects

Co-authored-by: neSpecc <3684889+neSpecc@users.noreply.github.com>

* Remove unrelated file changes that should not be affected by the solution

Co-authored-by: neSpecc <3684889+neSpecc@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: neSpecc <3684889+neSpecc@users.noreply.github.com>

* chore(setup): migrate from node 16 to 24 (#512)

* Upgrade to Node.js 24 and update TypeScript config

Update all workflows, Dockerfiles, and .nvmrc to use Node.js 24 for consistency and latest features. Enable 'moduleResolution: node' and 'skipLibCheck' in tsconfig.json to improve TypeScript compatibility and speed up builds.

* upd packages

* Update jest.setup.redis-mock.js

* Update jest.global-teardown.js

* fix tests

* Update jest.setup.redis-mock.js

* disable redis tests

* rm skip

* Update jest.config.js

* rm skips

* lint code

* Task manager (#511)

* Add Task Manager worker for auto GitHub issue creation

Introduces a new worker at workers/task-manager that automatically creates GitHub issues for events meeting a threshold, with daily rate limiting and atomic usage tracking. Includes environment setup, documentation, and updates @hawk.so/types to v0.5.3.

* Integrate GitHub issue creation and Copilot assignment

Added GitHubService for authenticating as a GitHub App and creating issues via the GitHub API. Implemented formatting of issue data from events, including stacktrace and source code snippets. Updated TaskManagerWorker to use real GitHub issue creation and Copilot assignment, replacing previous mocked logic. Added environment variables for GitHub App configuration and updated documentation. Included tests for issue formatting.

* Refactor GitHub key handling and improve Copilot assignment

Extracted GitHub App private key normalization to a utility for better reliability and CI compatibility. Enhanced Copilot assignment to use the GraphQL API and improved error handling. Refactored task creation flow to increment usage only after successful issue creation, updated dependencies, and fixed import paths.

* use pat

* Add delegated user OAuth support and token refresh for GitHub integration

Introduces delegated user-to-server OAuth support for GitHub App integration in the task manager worker. Adds logic for handling delegated user tokens, including automatic refresh and fallback to installation tokens, and updates environment/configuration to support GitHub App OAuth credentials. Updates dependencies to include @octokit/oauth-methods and related packages.

* Refactor GitHub issue creation and Copilot assignment

Separated GitHub issue creation and Copilot agent assignment into distinct steps. The issue is now always created using the GitHub App installation token, and Copilot is assigned afterward using a user-to-server OAuth token if enabled. Updated the TaskManagerWorker logic to reflect this change, improved error handling, and updated the event saving logic to accurately reflect Copilot assignment status.

* Update GithubService.ts

* lint code

* lint and tests

* Fix task manager env parsing and event timestamp filter

Replaces Number() with parseInt() for MAX_AUTO_TASKS_PER_DAY to ensure correct parsing. Fixes event query to filter by timestamp using connectedAt, and enables the super.start() call. Also corrects a typo in a comment in GrouperWorker.

* Update package.json

* update issue format

* lint

* Add PR Assistant workflow configuration (#507)

Co-authored-by: Peter <specc.dev@gmail.com>

* Update package.json

* Fix DataFilter false positives for UUIDs and MongoDB ObjectIds (#518)

* Initial plan

* Fix DataFilter to not filter UUIDs and MongoDB ObjectIds

Co-authored-by: neSpecc <3684889+neSpecc@users.noreply.github.com>

* Address code review feedback: improve UUID regex and test coverage

Co-authored-by: neSpecc <3684889+neSpecc@users.noreply.github.com>

* Fix tests to use values that would actually fail without UUID/ObjectId detection

Co-authored-by: neSpecc <3684889+neSpecc@users.noreply.github.com>

* upd version

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: neSpecc <3684889+neSpecc@users.noreply.github.com>
Co-authored-by: Peter Savchenko <specc.dev@gmail.com>

* fix(grouper): filter oldPassword and newPassword in event payload (#516)

* fix(grouper): filter oldPassword and newPassword in event payload

* chore: add more keys and update tests

* chore: lint fix

* fix(tests): rename sessionId to requestId in data-filter tests for clarity

* chore(grouper): add counters to the grouper worker

* chore(): eslint fix

* chore(): clean up

* chore(grouper): remove redundant rate-limit increment logic

* chore(grouper): remove redundant mocks

* chore(): eslint fix

* chore(): change metric type

* Update workers/grouper/src/index.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* imp(): use lua for create if not exists, to avoid race-cond

---------

Co-authored-by: Peter <specc.dev@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: neSpecc <3684889+neSpecc@users.noreply.github.com>
Co-authored-by: Kuchizu <70284260+Kuchizu@users.noreply.github.com>
Co-authored-by: Dobrunia Kostrigin <48620984+Dobrunia@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

grouper(data-filter):  falsy filtered values

2 participants