Skip to content

Technical Design: Charity Data Audit & Update System #39

@oraweb

Description

@oraweb

Overview

This document details the architecture for the Charity Data Audit & Update System. The system is designed to:

  • Store original and new charity data (with immutability and timestamping)
  • Evaluate data reliability for possible live updates
  • Flag charities with significant or recent updates
  • Maintain linkage to internal and source references

All requirements are fulfilled using modular service boundaries with robust audit, storage, and automated update mechanisms.


Functional Scope & Key Features

  • Data Ingestion: Intake new recommended charity data from external sources.
  • Data Store: Persist both original and updated charity data records with timestamps and immutability guarantees.
  • Change Detection & Flagging: Automatically compare new data to originals, flag significant/new changes, and log audit details.
  • Reliability Scoring: Automated evaluators to score data reliability per record.
  • Automated API Push: Push only very reliable records to live API, log results, and handle errors.
  • Reference Linking: Store and verify URLs to internal site/org pages and external sources within each data record.
  • Admin Dashboard Interface: Backend filter and retrieval of flagged changes, audit logs, etc.

Architectural Pattern

  • Service-Oriented Architecture (SOA) or microservices, as appropriate for modularity, audit, and scalability.
  • Separate modules/services for data ingestion, reliability scoring, flagging, storage, and API pushing.

Major Components and Services

  • Charity Data Ingestion Service: Handles import of new recommended data from external sources.
  • Charity Data Store: Immutable database/storage bucket for charity data snapshots (original & updated), including:
    • Charity ID, record version, timestamps
    • Data payload (fields, links)
    • Reference URLs validated for accessibility
  • Change Detection Service: Logic to compare data sets, flag deltas, and annotate change nature.
  • Reliability Scoring Engine: Automated evaluation of updated records against criteria. Outputs score and triggers downstream actions if above threshold.
  • API Update Service: Calls mutation updateSiteFromAI when a record meets reliability threshold, confirms success, and logs results.
  • Audit Log Service: Stores audit records for data changes, reliability assessments, API pushes, and link verifications.
  • Admin Dashboard/API: Secure backend for querying flagged changes and audit logs.

Assumptions

  • Charity records have unique IDs.
  • Data specification guidelines are already established and validated at ingestion.
  • All links (internal and external) are HTTP(S) accessible.
  • Reliability score computation algorithm/criteria is supplied by product team.
  • updateSiteFromAI mutation is well-documented and accessible over HTTP API with known schema.
  • Admin dashboard/API is not a public interface.

Step-by-Step Implementation Plan

1. Data Flow & Storage

  • Every charity record (original and newly recommended) is saved to a dedicated storage bucket/DB.
    • All writes are append-only; updating or ingesting new data creates a new immutable record with timestamp.
    • Data includes: Charity ID, full data payload, site/org URL, external source URL, timestamp, record type (original/recommended).

2. Change Detection & Flagging

  • Every ingestion triggers comparison logic:
    • If significant or recent field change is detected, flag record and generate a change summary (nature, affected fields).
    • Create audit entry for flagging.

3. Reliability Scoring

  • Updated/recommended records are scored using defined algorithm.
  • If score >= threshold, trigger API Update; else, log record as not-live and add to pending review list.

4. API Mutation Call

  • For records meeting reliability threshold, APIUpdateService calls updateSiteFromAI mutation with new data payload, references, and timestamp.
  • On success, log update in audit service and flag in DB as 'live'; on failure, log error and mark for manual review.

5. Reference Link Validation

  • Each record includes validated URLs (site/org + external source).
  • Links are tested for accessibility; failures are flagged and logged.
  • Downstream reports/export API include full reference context.

6. Audit & Admin Access

  • Changes, flags, reliability results, and API pushes are recorded in immutable audit log with references to charity record version.
  • Admin dashboard enables querying flagged changes, audit logs, and pending/manual review items.

System Context Diagram

%%{init: {'theme':'dark'}}%%
graph TD
    DMI[Charity Data Ingestion Service:::primary] --> CDS[Charity Data Store:::storage]
    DMI --> CDS2[Audit Log Service:::secondary]
    CDS --> CDS2
    CDS --> FDS[Change Detection & Flagging Service:::primary]
    FDS --> CDS2
    FDS --> AD[Admin Dashboard/API:::admin]
    CDS --> RSE[Reliability Scoring Engine:::primary]
    RSE --> CDS2
    RSE -- over--> APIU[API Update Service:::action]
    APIU --> CDS2
    RSE --under--> AD
    CDS --> RLVS[Reference Link Validation Service:::secondary]
    RLVS --> CDS2
    RLVS --Fail--> FDS
    APIU --> LA[Live API updateSiteFromAI:::live]

    classDef primary fill:#0059b3,stroke:#fff,stroke-width:2px
    classDef secondary fill:#1a7dd7,stroke:#fff,stroke-width:1px
    classDef storage fill:#ffe872,stroke:#fff,stroke-width:2px
    classDef action fill:#4caf50,stroke:#fff,stroke-width:2px
    classDef admin fill:#965cdb,stroke:#fff,stroke-width:2px
    classDef live fill:#e91e63,stroke:#fff,stroke-width:2px
    
    class DMI,CDS,FDS,RSE,APIU primary
    class CDS2,RLVS secondary
    class CDS storage
    class LA live
    class AD admin
Loading

Description: This diagram shows the flow and relation between services. Data moves from ingestion, to storage, then is processed for flagging and reliability, which may result in an API call to go live, with full audit logging at every step. Link validation and change detection ensure data traceability and compliance.


Data Storage Model Overview

erDiagram
    CHARITYRECORDS ||--o| AUDITLOGS : has
    CHARITYRECORDS {
        string CharityID
        string RecordType
        object DataPayload
        string InternalSiteURL
        string SourceURL
        datetime Timestamp
        string Version
        string ReliabilityScore
        string FlagStatus
    }
    AUDITLOGS {
        string AuditID
        string CharityID
        string ChangeNature
        list ModifiedFields
        string PreviousVersion
        string NewVersion
        datetime FlagTimestamp
        string AuditAction
    }
Loading

Description: This depicts the relationship between charity record snapshots (original/recommended) and change/audit logs. Each immutable record is versioned and linked to its audit trail.


Trust Boundary Diagram

%%{init: {'theme':'dark'}}%%
graph LR
    ExternalSource((External Data Source)):::ext -->|Data| DMI[Data Ingestion Service]
    DMI -.->|Validation & Scoring| InternalBoundary((Internal System Boundary)):::trust
    InternalBoundary --> CDS[Charity Data Store]
    InternalBoundary --> CDS2[Audit Log Service]
    InternalBoundary --> RLVS[Reference Link Validation]
    InternalBoundary --> APIU[API Update Service]
    APIU -->|updateSiteFromAI| ExternalAPI((Live Charity API)):::extapi

    classDef ext fill:#e57373,stroke:#fff,stroke-width:2px
    classDef extapi fill:#ffb300,stroke:#fff,stroke-width:2px
    classDef trust fill:#388e3c,stroke:#fff,stroke-width:3px
Loading

Description: This shows the trust boundaries: only validated and scored charity data crosses from external sources into the internal system. Only highly reliable records are pushed from internal boundary to live API. All changes are logged and auditable within the internal boundary.


Implementation Notes

  • Storage solution must support append-only, immutable design (e.g., S3 with versioning, or audit-enabled relational DB).
  • Services must validate every field according to spec; errors should never result in update going live.
  • All links stored must be regularly tested and revalidated; any downstream consumer must receive references.
  • API mutation calls must be idempotent and fully auditable, with retry and error logging.
  • Admin dashboard endpoints must be secure, with filtering for flagged/changed records and audit logs.


Generated from Architecture analysis of #68

Metadata

Metadata

Assignees

No one assigned

    Labels

    designDesigngen-aiContains AI generated content

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions