📊 Family B — Data Provenance, Cross-Reference Mapping, and Manifest Generation
🎯 manifest.json · cross-reference-map.md · data-download-manifest.md · Citation Integrity
📋 Document Owner: CEO | 📄 Version: 1.1 | 📅 Last Updated: 2026-04-25 (UTC) 🔄 Review Cycle: Quarterly | ⏰ Next Review: 2026-07-31 🏢 Owner: Hack23 AB (Org.nr 5595347807) | 🏷️ Classification: Public
| Element | Value | Reference |
|---|---|---|
| F3EAD Stage | EXPLOIT — extract and structure metadata from raw data sources | Provenance mapping occurs during data exploitation phase |
| PIRs Served | Enables audit trail for all PIR answers — every fact traceable to source | See political-style-guide.md §PIR/EEI |
| Admiralty Floor | Manifest records source grade for each data item; cross-reference inherits source grades | See political-style-guide.md §Admiralty |
| WEP Requirement | Not applicable — Family B is metadata, not estimative | N/A |
| ICD 203 Gate | Standard 1 (sourcing), Standard 6 (traceable reasoning) | See political-style-guide.md §ICD 203 |
| SAT(s) | Network Analysis, Chronological Review, Link Analysis | See political-style-guide.md §SATs |
Family B provides the provenance infrastructure that makes every analytical claim auditable. It produces three output types:
| Output | Purpose | Consumption |
|---|---|---|
manifest.json |
Machine-readable inventory of every data item in the run | Validation scripts, downstream workflows, audit logs |
cross-reference-map.md |
Human-readable link graph showing document-to-document relationships | Analysts tracing claim provenance, ACH evidence sourcing |
data-download-manifest.md |
Catalogue of all EP MCP tool calls, parameters, and retrieved datasets | Reproducibility, debugging, GDPR Article 30 compliance |
- Auditability — Every claim in Family A/C/D artifacts traces back to a manifest entry
- Reproducibility — Anyone can re-run the same MCP calls and verify data
- Integrity — SHA-256 hashes detect data tampering or drift
- GDPR Compliance — Article 30 requires records of processing activities
%%{init: {"theme":"dark","themeVariables":{"primaryColor":"#1565C0","primaryTextColor":"#ffffff","primaryBorderColor":"#0A3F7F","lineColor":"#90CAF9","secondaryColor":"#2E7D32","secondaryTextColor":"#ffffff","tertiaryColor":"#FF9800","tertiaryTextColor":"#000000","mainBkg":"#1565C0","secondBkg":"#2E7D32","tertiaryBkg":"#FF9800","noteBkgColor":"#FFC107","noteTextColor":"#000000","errorBkgColor":"#D32F2F","fontFamily":"Inter, Helvetica, Arial, sans-serif"}}}%%
flowchart TB
classDef input fill:#1565C0,stroke:#0D47A1,color:#FFFFFF
classDef proc fill:#4CAF50,stroke:#1B5E20,color:#FFFFFF
classDef meta fill:#FFC107,stroke:#F57F17,color:#3E2723
classDef out fill:#7B1FA2,stroke:#4A148C,color:#FFFFFF
subgraph STAGE_A["Stage A — Data Collection"]
MCP1[EP MCP get_adopted_texts]:::input
MCP2[EP MCP get_procedures]:::input
MCP3[EP MCP get_meps]:::input
MCP4[EP MCP get_voting_records]:::input
MCP5[IMF imf-fetch-data<br/>primary economic]:::input
MCP6[World Bank get_social_data<br/>non-economic]:::input
end
subgraph FAMILY_B["Family B — Provenance Layer"]
M[manifest.json<br/>Machine inventory]:::meta
X[cross-reference-map.md<br/>Human link graph]:::meta
D[data-download-manifest.md<br/>Call catalogue]:::meta
end
subgraph DOWNSTREAM["Family A/C/D/E"]
A[synthesis-summary.md]:::out
C[devils-advocate-analysis.md]:::out
E[per-document-analysis/*.md]:::out
end
MCP1 --> M
MCP2 --> M
MCP3 --> M
MCP4 --> M
MCP5 --> M
M --> X
M --> D
X --> A
X --> C
M --> E
{
"$schema": "https://euparliamentmonitor.com/schemas/manifest-v2.json",
"version": "2.0",
"runId": "breaking-2026-04-21-run01",
"articleType": "breaking",
"generatedAt": "2026-04-21T14:32:17Z",
"workflow": "news-breaking-analysis.md",
"history": [],
"data": { ... },
"artifacts": { ... },
"statistics": { ... }
}"data": {
"epMcp": {
"adoptedTexts": [
{
"docId": "TA(2026)0123",
"title": "Resolution on Digital Services Act implementation",
"retrievedAt": "2026-04-21T14:00:05Z",
"endpoint": "get_adopted_texts",
"params": { "year": 2026, "limit": 50 },
"contentHash": "sha256:abc123...",
"sourceGrade": "A1",
"wordCount": 4520
}
],
"procedures": [ ... ],
"meps": [ ... ],
"votingRecords": [ ... ],
"committees": [ ... ],
"questions": [ ... ]
},
"imf": {
"vintage": "WEO-April-2026",
"economicIndicators": [
{
"databaseId": "WEO",
"indicator": "NGDP_RPCH",
"countries": ["DEU", "FRA", "ITA", "ESP", "POL"],
"years": 10,
"retrievedAt": "2026-04-21T14:01:12Z",
"endpoint": "imf-fetch-data"
}
]
},
"worldBank": {
"nonEconomicIndicators": [
{
"indicator": "HEALTH_EXPENDITURE",
"countries": ["DE", "FR", "IT", "ES", "PL"],
"years": 10,
"retrievedAt": "2026-04-21T14:01:12Z",
"endpoint": "get_health_data"
}
]
},
"externalSources": [
{
"url": "https://data.consilium.europa.eu/...",
"retrievedAt": "2026-04-21T14:02:30Z",
"contentHash": "sha256:def456...",
"sourceGrade": "B2"
}
]
}"artifacts": {
"intelligence": [
{
"filename": "synthesis-summary.md",
"lineCount": 245,
"minLines": 205,
"status": "pass",
"generatedAt": "2026-04-21T15:45:22Z"
},
{ ... }
],
"classification": [ ... ],
"risk-scoring": [ ... ],
"threat-assessment": [ ... ],
"documents": [ ... ]
}"statistics": {
"totalDataItems": 156,
"totalArtifacts": 34,
"artifactsPassing": 34,
"artifactsFailing": 0,
"sourceDistribution": {
"A1": 45,
"A2": 32,
"B1": 28,
"B2": 31,
"B3": 15,
"C2": 5
},
"politicalGroupCoverage": {
"EPP": { "mentions": 34, "depthScore": 1.2 },
"S&D": { "mentions": 28, "depthScore": 1.0 },
"Renew": { "mentions": 22, "depthScore": 0.95 },
"Greens/EFA": { "mentions": 18, "depthScore": 0.88 },
"ECR": { "mentions": 20, "depthScore": 0.92 },
"PfE": { "mentions": 16, "depthScore": 0.85 },
"The Left": { "mentions": 14, "depthScore": 0.82 },
"NI": { "mentions": 8, "depthScore": 0.70 }
}
}"history": [
{
"runId": "breaking-2026-04-21-run01",
"timestamp": "2026-04-21T15:50:00Z",
"artifactsUpgraded": ["pestle-analysis.md"],
"artifactsNew": ["executive-brief.md"],
"artifactsUnchanged": 32
}
]Human-readable document that maps which artifacts cite which data sources and which documents reference each other. Enables analysts to trace any claim back to its origin.
## Artifact → Source Matrix
| Artifact | Primary Sources | Secondary Sources | Source Count |
|----------|-----------------|-------------------|--------------|
| synthesis-summary.md | TA(2026)0123, TA(2026)0124 | QE-001234, A9-0045/2026 | 8 |
| stakeholder-map.md | get_meps(country=DE), get_voting_records | MEP press statements | 12 |
| scenario-forecast.md | get_procedures, get_adopted_texts | IMF WEO GDP projections (primary) | 6 |## Document → Document Links
| Document A | Relationship | Document B | Evidence |
|------------|--------------|------------|----------|
| COM(2025)0456 | Amends | DIR 2019/1024 | Art. 12 explicit reference |
| TA(2026)0123 | Adopts | A9-0045/2026 | Committee report |
| A9-0045/2026 | References | QE-001234 | Recital 15 citation |%%{init: {"theme":"dark","themeVariables":{"primaryColor":"#1565C0","primaryTextColor":"#ffffff"}}}%%
graph TB
classDef com fill:#1565C0,stroke:#0D47A1,color:#FFFFFF
classDef ta fill:#4CAF50,stroke:#1B5E20,color:#FFFFFF
classDef report fill:#FF9800,stroke:#E65100,color:#FFFFFF
classDef qa fill:#7B1FA2,stroke:#4A148C,color:#FFFFFF
COM["COM(2025)0456<br/>Commission Proposal"]:::com
TA["TA(2026)0123<br/>Adopted Text"]:::ta
A9["A9-0045/2026<br/>Committee Report"]:::report
QE["QE-001234<br/>Written Question"]:::qa
COM -->|initiates| A9
A9 -->|adopted as| TA
QE -->|cited in| A9
COM -->|amends| DIR["DIR 2019/1024"]
## Citation Integrity
| Artifact | Total Claims | Claims Sourced | Unsourced Claims | Integrity % |
|----------|--------------|----------------|------------------|-------------|
| synthesis-summary.md | 24 | 24 | 0 | 100% |
| stakeholder-map.md | 18 | 17 | 1 | 94% |
| devils-advocate-analysis.md | 32 | 32 | 0 | 100% |Comprehensive log of every EP MCP tool call made during the workflow run. Enables reproducibility and GDPR Article 30 compliance.
## EP MCP Tool Calls
| # | Tool | Parameters | Timestamp | Items Retrieved | Duration |
|---|------|------------|-----------|-----------------|----------|
| 1 | `get_adopted_texts` | year=2026, limit=50 | 2026-04-21T14:00:05Z | 47 | 1.2s |
| 2 | `get_procedures` | limit=100, offset=0 | 2026-04-21T14:00:08Z | 100 | 2.1s |
| 3 | `get_meps` | country=DE, active=true | 2026-04-21T14:00:15Z | 96 | 0.8s |
| 4 | `get_voting_records` | dateFrom=2026-04-01 | 2026-04-21T14:00:20Z | 23 | 1.5s |
| 5 | `analyze_coalition_dynamics` | — | 2026-04-21T14:00:25Z | 1 | 3.2s |## Data Volume Summary
| Data Type | Items | Total Size | Avg Size/Item |
|-----------|-------|------------|---------------|
| Adopted Texts | 47 | 2.3 MB | 49 KB |
| Procedures | 100 | 1.8 MB | 18 KB |
| MEP Records | 96 | 0.4 MB | 4 KB |
| Voting Records | 23 | 0.9 MB | 39 KB |
| **Total** | **266** | **5.4 MB** | **20 KB** |## API Response Codes
| Tool | HTTP 200 | HTTP 4xx | HTTP 5xx | Retries |
|------|----------|----------|----------|---------|
| get_adopted_texts | 1 | 0 | 0 | 0 |
| get_procedures | 1 | 0 | 0 | 0 |
| get_meps | 3 | 0 | 1 | 1 |
| get_voting_records | 1 | 0 | 0 | 0 |## Content Hash Inventory
| Item ID | SHA-256 (first 16 chars) | Verification |
|---------|--------------------------|--------------|
| TA(2026)0123 | abc123def456... | ✅ Match |
| TA(2026)0124 | 789xyz012abc... | ✅ Match |
| A9-0045/2026 | fedcba987654... | ✅ Match |The scripts/validate-analysis-completeness.js CLI (npm run validate-analysis -- <runDir>) reads manifest.json to:
- Verify all required artifacts exist
- Check line counts against
reference-quality-thresholds.json - Validate source grade distribution
- Generate pass/fail report
The devils-advocate-analysis.md workflow reads cross-reference-map.md to:
- Source ACH evidence from verified citations
- Identify document clusters for hypothesis testing
- Map claim provenance for red-team challenges
The data-download-manifest.md provides:
- GDPR Article 30 compliance documentation
- Debugging information for failed workflows
- Reproducibility instructions for verification
- Version 2.0 schema compliant
- All
data.epMcpsections populated - All
artifactshave lineCount and status -
statistics.sourceDistributionsums correctly -
statistics.politicalGroupCoverageincludes all 8 groups -
historyarray includes current run entry
- Artifact → Source matrix complete
- Document → Document links section present
- Link network Mermaid included
- Citation integrity ≥95% for all artifacts
- Unsourced claims flagged and explained
- All EP MCP tool calls logged with timestamps
- Data volume summary populated
- API response codes section present
- Content hash inventory complete
- No HTTP 5xx errors unresolved
| Control | How this methodology satisfies it |
|---|---|
| ISO 27001 A.5.10 (Information classification) | Manifest records source grades per ICD 203 |
| ISO 27001 A.8.3 (Access control) | Audit trail for all data access via manifest |
| NIST CSF ID.AM-3 (Data flows mapped) | Cross-reference map visualizes data flow |
| NIST CSF PR.DS-6 (Integrity checking) | SHA-256 hashes enable integrity verification |
| CIS 8.1 (Audit log management) | Data download manifest provides audit log |
| GDPR Art. 30 (Records of processing) | Data manifest satisfies processing record requirement |
| NIS2 Art. 21 (Risk management) | Provenance enables risk tracing |
Owner: CEO (Intelligence Program) · Reviewer: Chief Analyst + CISO · Review Cycle: Quarterly Next Review: 2026-07-31 · Related: ai-driven-analysis-guide.md, per-document-methodology.md, artifact-catalog.md
Generated following EU Parliament Monitor Structural Metadata Methodology v1.0 — Family B Provenance & Linkage Layer.