Data Machine Architecture

Data Machine is an AI-first WordPress plugin that uses a Pipeline+Flow architecture for automated content processing and publication. It provides multi-provider AI integration with tool-first design patterns, centered around a reliability-first Single Item Execution Model, with multi-agent support and a layered memory system.

Core Components

Pipeline+Flow System

Pipelines: Reusable templates containing step configurations
Flows: Configured instances of pipelines with scheduling
Jobs: Individual executions of flows with status tracking, each processing exactly one item. Support parent-child relationships for batch execution via parent_job_id.

Execution Engine

Services layer architecture with direct method calls for optimal performance. The engine implements a four-action execution cycle that processes exactly one item per job to ensure maximum reliability and isolation.

Database Schema

Eight core tables:

Table	Purpose
`wp_datamachine_pipelines`	Pipeline templates (reusable), with `user_id` and `agent_id`
`wp_datamachine_flows`	Flow instances (scheduled + configured), with `user_id` and `agent_id`
`wp_datamachine_jobs`	Job execution records, with `user_id`, `agent_id`, `parent_job_id`, `source`, `label`
`wp_datamachine_processed_items`	Deduplication tracking per execution
`wp_datamachine_chat_sessions`	Persistent conversation state, with `agent_id`, `title`, `context`
`wp_datamachine_agents`	Agent registry (slug, name, owner, config, status)
`wp_datamachine_agent_access`	Role-based access control (viewer, operator, admin)
`wp_datamachine_logs`	Centralized system logs with agent scoping

See Database Schema for full table definitions and relationships.

Multi-Agent Architecture

Data Machine supports multiple agents on a single WordPress installation (@since v0.36.1):

┌─────────────────────────────────────────────────┐
│                WordPress Site                    │
│                                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐      │
│  │ Agent A  │  │ Agent B  │  │ Agent C  │      │
│  │          │  │          │  │          │      │
│  │ SOUL.md  │  │ SOUL.md  │  │ SOUL.md  │      │
│  │ MEMORY.md│  │ MEMORY.md│  │ MEMORY.md│      │
│  │ daily/   │  │ daily/   │  │ daily/   │      │
│  │          │  │          │  │          │      │
│  │ pipelines│  │ pipelines│  │ pipelines│      │
│  │ flows    │  │ flows    │  │ flows    │      │
│  │ jobs     │  │ jobs     │  │ jobs     │      │
│  │ chat     │  │ chat     │  │ chat     │      │
│  └──────────┘  └──────────┘  └──────────┘      │
│                                                  │
│  ┌──────────────────────────────────────────┐   │
│  │ Shared Layer: SITE.md, RULES.md          │   │
│  └──────────────────────────────────────────┘   │
│                                                  │
│  ┌──────────┐  ┌──────────┐                     │
│  │ User 1   │  │ User 2   │                     │
│  │ USER.md  │  │ USER.md  │                     │
│  └──────────┘  └──────────┘                     │
└─────────────────────────────────────────────────┘

Key components:

Agent Registry (datamachine_agents): Each agent has a unique slug, owner, and configuration
Access Control (datamachine_agent_access): Role-based sharing (viewer < operator < admin)
Resource Scoping: Agents have both agent_id and agent_slug; storage tables use IDs while portable runtime/export contexts may carry slugs
Filesystem Isolation: Each agent gets agents/{slug}/ for identity files and daily memory
Permission Helper: PermissionHelper resolves agent context and enforces access checks

Layered Memory Architecture

Agent memory is organized in a three-layer directory system below Data Machine's files root in WordPress uploads:

datamachine-files/
├── shared/              # Site-wide (all agents)
│   ├── SITE.md
│   └── RULES.md
├── agents/              # Per-agent identity
│   ├── agent-a/
│   │   ├── SOUL.md
│   │   ├── MEMORY.md
│   │   └── daily/
│   │       └── 2026/
│   │           └── 03/
│   │               ├── 15.md
│   │               └── 16.md
│   └── agent-b/
│       ├── SOUL.md
│       └── MEMORY.md
├── users/               # Per-user preferences
│   ├── 1/
│   │   ├── USER.md
│   │   └── MEMORY.md
│   └── 2/
│       └── USER.md

CoreMemoryFilesDirective (Priority 20) loads files from layers in order:

SITE.md → RULES.md from the shared layer
SOUL.md → MEMORY.md from the agent layer
USER.md → MEMORY.md from the user layer
Custom files from MemoryFileRegistry (extensions), including files selected by pipelines and flows

See WordPress as Agent Memory for full memory documentation.

Daily Memory System

Temporal knowledge management via date-organized files:

DailyMemory: File operations at agents/{slug}/daily/YYYY/MM/DD.md
DailyMemoryTask: System task that maintains MEMORY.md with deterministic overflow handling, same-day activity_section context, the single daily_memory prompt, and conservation checks.
AgentDailyMemoryDirective (Priority 35): Opt-in recent daily memory injection for chat and pipeline requests via agent_config.daily_memory.
AgentDailyMemory tool: Policy-gated on-demand reads, writes, lists, and searches for exact dates, ranges, and historical lookups.
DailyMemoryAbilities: CRUD + search via Abilities API with multi-agent scoping

See Daily Memory System for the current task lifecycle and artifact behavior.

System Tasks Framework

Background AI operations that run outside the normal pipeline model:

┌─────────────────────┐
│     SystemTask      │  (abstract base)
│                     │
│ execute()           │  ← Task-specific logic
│ completeJob()       │  ← Mark done + store engine_data
│ failJob()           │  ← Record failure
│ reschedule()        │  ← Retry with backoff (max 24)
│ supportsUndo()      │  ← Opt-in undo support
│ undo()              │  ← Reverse recorded effects
│ getPromptDefs()     │  ← Editable AI prompts
│ resolveSystemModel()│  ← Agent-aware model selection
└─────────────────────┘
         ▲
         │ extends
    ┌────┴────────────────────────────────────┐
    │                                          │
ImageGenerationTask  AltTextTask  DailyMemoryTask
ImageOptimizationTask  InternalLinkingTask
AgentCallTask  MetaDescriptionTask  RetentionTask

Undo System: Tasks that record effects in engine_data can be reversed:

post_content_modified → restore WordPress revision
post_meta_set → restore previous value
attachment_created → delete attachment
featured_image_set → restore/remove thumbnail

Workspace System

Secure file management outside the web root for agent operations lives in the data-machine-code extension plugin, not Data Machine core.

Location: Managed by data-machine-code workspace settings
Git-aware: Clone, status, pull, add, commit, push, log, diff
File ops: Read (with pagination), write, edit (find-replace), list directory
Security: Outside web root; mutating ops are CLI-only (not REST-exposed)
CLI: wp datamachine-code workspace {path,list,clone,remove,show,read,ls,write,edit,git,worktree}

Engine Data Architecture

Clean Data Separation: AI agents receive clean data packets without URLs while handlers access engine parameters via centralized filter pattern.

Enhanced Database Storage + Filter Access: Fetch handlers store engine parameters (source_url, image_url) in database; steps retrieve via centralized datamachine_engine_data filter with storage/retrieval mode detection for unified access.

Core Pattern:

// Fetch handlers store via centralized filter (array storage)
if ($job_id) {
    apply_filters('datamachine_engine_data', null, $job_id, [
        'source_url' => $source_url,
        'image_url' => $image_url
    ]);
}

// Steps retrieve via centralized filter (EngineData.php)
$engine_data = apply_filters('datamachine_engine_data', [], $job_id);
$source_url = $engine_data['source_url'] ?? null;
$image_url = $engine_data['image_url'] ?? null;

Benefits:

Clean AI Data: AI processes content without URLs for better model performance
Centralized Access: Single filter interface for all engine data retrieval
Filter Consistency: Maintains architectural pattern of filter-based service discovery
Flexible Storage: Steps access only what they need via filter call

Abilities-First Architecture (@since v0.11.7)

Performance Revolution: Complete replacement of the older filter-based action and service-manager layers with direct ability classes. REST endpoints, WP-CLI commands, and chat tools all delegate to WordPress Abilities API registrations under inc/Abilities/.

Ability domains (business logic):

Flow abilities - Flow CRUD, duplication, pause/resume, scheduling, webhooks, and queue management
Pipeline abilities - Pipeline CRUD, import/export, and pipeline-step template management
Flow step abilities - Individual flow step configuration and handler management
Job abilities - Workflow execution, retry/fail/delete/recovery, flow health, and summaries
Processed item abilities - Deduplication tracking across workflows
Agent abilities - Agent CRUD, access grants, tokens, remote calls, memory, and daily memory
File abilities - Agent files, flow uploads, cleanup, and memory scaffolding

Coding workspace note: Git-aware workspace and GitHub coding operations live in the data-machine-code extension plugin. Data Machine core no longer registers WorkspaceAbilities or GitHub issue abilities.

Benefits:

3x Performance Improvement: Direct method calls eliminate filter indirection
Centralized Business Logic: Consistent validation and error handling
Reduced Database Queries: Optimized data access patterns
Clean Architecture: Single responsibility per ability class
Backward Compatibility: Maintains WordPress hook integration

Step Types

Fetch: Data retrieval with clean content processing (core handlers include Files, RSS, Email, WordPress Local, WordPress Media, and WordPress API; extension plugins can register more)
AI: Content processing with multi-provider support (OpenAI, Anthropic, Google, Grok)
Publish: Content distribution with modular handler architecture (core handlers include WordPress and Email; extension plugins can register social destinations)
Upsert: Content modification (WordPress posts/pages)
System Task: Execute system tasks within pipeline flows
Agent Ping: Outbound webhook notifications to external agents
Webhook Gate: Wait for inbound webhook before proceeding

Directive System

Priority-ordered context injection into every AI request:

Priority	Directive	Context	Purpose
20	`CoreMemoryFilesDirective`	All	Layer files + custom registry
22	`AgentModeDirective`	All	Mode-specific guidance for chat, pipeline, and system
25	`CallerContextDirective`	All, cross-site only	Authenticated A2A caller identity
35	`AgentDailyMemoryDirective`	Chat, pipeline	Recent daily archives when enabled
35	`ClientContextDirective`	All	Free-form client-reported context
40	`PipelineMemoryFilesDirective`	Pipeline	Per-pipeline memory files
45	`ChatPipelinesDirective`	Chat	Pipeline/flow context
45	`FlowMemoryFilesDirective`	Pipeline	Per-flow memory files
50	`PipelineSystemPromptDirective`	Pipeline	Workflow instructions

Directives implement DirectiveInterface and return arrays of typed outputs:

system_text — plain text content
system_json — labeled structured data
system_file — file path with MIME type

Authentication System

Base Authentication Provider Architecture (@since v0.2.6): Complete inheritance system with centralized option storage and validation across all authentication providers.

Base Classes:

BaseAuthProvider (/inc/Core/OAuth/BaseAuthProvider.php): Abstract base for all authentication providers with unified option storage, callback URL generation, and authentication state checking
BaseOAuth1Provider (/inc/Core/OAuth/BaseOAuth1Provider.php): Base for extension-provided OAuth 1.0a providers
BaseOAuth2Provider (/inc/Core/OAuth/BaseOAuth2Provider.php): Base for core and extension OAuth 2.0 providers

OAuth Handlers:

OAuth1Handler (/inc/Core/OAuth/OAuth1Handler.php): Three-legged OAuth 1.0a flow implementation
OAuth2Handler (/inc/Core/OAuth/OAuth2Handler.php): Authorization code flow implementation

Authentication Providers:

Core ships base classes plus concrete providers next to the handlers that need them, such as Email auth.
Extension plugins register their own providers through datamachine_auth_providers.

OAuth2 Flow:

Create state nonce for CSRF protection
Build authorization URL with parameters
Handle callback: verify state, exchange code for token, retrieve account details, store credentials

OAuth1 Flow:

Get request token
Build authorization URL
Handle callback: validate parameters, exchange for access token, store credentials

Benefits:

Eliminates duplicated storage logic across all providers (~60% code reduction per provider)
Standardized error handling and logging
Unified security implementation
Easy integration of new providers via base class extension

Universal Engine Architecture

Data Machine v0.2.0 introduced a universal Engine layer (/inc/Engine/AI/) that serves both Pipeline and Chat agents with shared AI infrastructure:

Core Engine Components:

AIConversationLoop: Multi-turn conversation execution with tool calling, completion detection, and state management
ToolExecutor: Universal tool discovery, enablement validation, and execution across agent types
WP_Agent_Tool_Parameters: Centralized parameter building for AI tools with data packet integration
ConversationManager: Message formatting and conversation state management
RequestBuilder: AI request construction with directive application and tool restructuring
ToolResultFinder: Utility for finding tool execution results in data packets

Tool Categories:

Handler-specific tools for publish/update operations
Global tools for search and analysis (LocalSearch, WebFetch, WordPressPostReader)
Coding workspace tools live in the data-machine-code extension plugin
Agent memory tools (AgentMemory, AgentDailyMemory) for runtime memory access
Chat-only tools for workflow building (@since v0.4.3):
- AddPipelineStep, ApiQuery, AuthenticateHandler, ConfigureFlowSteps, ConfigurePipelineStep, CopyFlow, CreateFlow, CreatePipeline, CreateTaxonomyTerm, ExecuteWorkflowTool, GetHandlerDefaults, ManageLogs, ReadLogs, RunFlow, SearchTaxonomyTerms, SetHandlerDefaults, UpdateFlow
Automatic tool discovery and three-layer enablement system

Filter-Based Discovery

All components self-register via WordPress filters:

datamachine_handlers - Register fetch/publish/upsert handlers
datamachine_tools - Register AI tools and capabilities (unified static + runtime handler tool registry)
datamachine_auth_providers - Register authentication providers
datamachine_step_types - Register custom step types
datamachine_directives - Register AI context directives
datamachine_get_oauth1_handler - OAuth 1.0a handler service discovery
datamachine_get_oauth2_handler - OAuth 2.0 handler service discovery

Modular Component Architecture (@since v0.2.1)

Data Machine v0.2.1 introduced modular component systems for enhanced code organization and maintainability:

FilesRepository Components (/inc/Core/FilesRepository/):

DirectoryManager - Directory creation, path management, and three-layer resolution
FileStorage - File operations and flow-isolated storage
FileCleanup - Retention policy enforcement and cleanup
ImageValidator - Image validation and metadata extraction
VideoValidator - Video file validation
RemoteFileDownloader - Remote file downloading with validation
FileRetrieval - Data retrieval from file storage
DailyMemory - Daily memory file operations (read, write, append, search, list)

WordPress Shared Components (/inc/Core/WordPress/):

TaxonomyHandler - Taxonomy selection and term creation (skip, AI-decided, pre-selected modes)
WordPressSettingsHandler - Shared WordPress settings fields
WordPressFilters - Service discovery registration

EngineData (/inc/Core/EngineData.php):

Consolidated Operations - Featured image attachment, source URL attribution, and engine data access (@since v0.2.1, enhanced v0.2.6)
Unified Interface - Single class for all engine data operations (replaces FeaturedImageHandler and SourceUrlHandler in v0.2.6)

Engine Components (/inc/Engine/):

StepNavigator - Centralized step navigation logic for execution flow

Benefits:

Code Deduplication: Eliminates repetitive functionality across handlers
Single Responsibility: Each component has focused purpose
Maintainability: Centralized logic simplifies updates
Extensibility: Easy to add new functionality via composition

For detailed documentation:

FilesRepository Components
WordPress Shared Components
EngineData
StepNavigator

Centralized Handler Filter System

Unified Cross-Cutting Functionality: The engine provides centralized filters for shared functionality across multiple handlers, eliminating code duplication and ensuring consistency.

Core Centralized Filters:

datamachine_timeframe_limit: Shared timeframe parsing with discovery/conversion modes
- Discovery mode: Returns available timeframe options for UI dropdowns
- Conversion mode: Returns Unix timestamp for specified timeframe
- Used by: RSS, WordPress Local, WordPress Media, WordPress API, and extension fetch handlers
datamachine_keyword_search_match: Universal keyword matching with OR logic
- Case-insensitive Unicode-safe matching
- Comma-separated keyword support
- Used by: RSS, WordPress Local, WordPress Media, WordPress API, and extension fetch handlers
datamachine_data_packet: Standardized data packet creation and structure
- Ensures type and timestamp fields are present
- Maintains chronological ordering via array_unshift()
- Used by: All step types for consistent data flow

Implementation:

// Timeframe parsing example
$cutoff_timestamp = apply_filters('datamachine_timeframe_limit', null, '24_hours');
$date_query = $cutoff_timestamp ? ['after' => gmdate('Y-m-d H:i:s', $cutoff_timestamp)] : [];

// Keyword matching example
$matches = apply_filters('datamachine_keyword_search_match', true, $content, $search_keywords);
if (!$matches) continue; // Skip non-matching items

// Data packet creation example
$data = apply_filters('datamachine_data_packet', $data, $packet_data, $flow_step_id, $step_type);

Benefits:

Code Consistency: Identical behavior across all handlers using shared filters
Maintainability: Single implementation location for shared functionality
Extensibility: New handlers automatically inherit shared capabilities
Performance: Optimized implementations used across all handlers

WordPress Publish Handler Architecture

Modular Component System: The WordPress publish handler uses specialized processing modules for enhanced maintainability and extensibility.

Core Components:

EngineData: Consolidated featured image attachment and source URL attribution with configuration hierarchy (system defaults override handler config) (@since v0.2.1, enhanced v0.2.6)
TaxonomyHandler: Configuration-based taxonomy processing with three selection modes (skip, AI-decided, pre-selected)
Direct Integration: WordPress handlers use EngineData and TaxonomyHandler directly for single source of truth data access

Configuration Hierarchy: System-wide defaults ALWAYS override handler-specific configuration when set, providing consistent behavior across all WordPress publish operations.

Features:

Specialized component isolation for maintainability
Configuration validation and error handling per component
WordPress native function integration for optimal performance
Comprehensive logging throughout all components
Unified engine data operations via EngineData class

File Management

Flow-isolated UUID storage with automatic cleanup:

Files organized by flow instance
Automatic purging on job completion
Support for local and remote file processing

HTTP Client

The centralized HttpClient class (/inc/Core/HttpClient.php) standardizes all outbound requests for fetch and publish handlers. It wraps the native WordPress HTTP helpers while:

exposing explicit methods (get, post, put, patch, delete) that accept consistent option bags
merging default headers (plugin DATAMACHINE_VERSION and optional browser-mode headers) with user-supplied headers
honoring timeout, body, and browser_mode options so handlers can simulate browser traffic when needed
validating success codes per method before returning parsed responses
logging WP_Error and non-success HTTP responses via datamachine_log and returning structured error payloads for downstream handling
extracting error metadata from JSON bodies to improve diagnostics

See HTTP Client for implementation details and usage guidance.

Admin Interface

Modern React Architecture: The entire Data Machine admin interface (Pipelines, Logs, Settings, Jobs, and Agents) uses a complete React implementation with zero jQuery or AJAX dependencies.

React Implementation:

A unified React-based admin UI built with @wordpress/components.
Specialized apps for each page (PipelinesApp, LogsApp, SettingsApp, JobsApp).
Modern state management using TanStack Query for server state (and a small Zustand store on the Pipelines page for UI state).
Complete REST API integration for all data operations.
Real-time updates via TanStack Query background refetching.
Optimistic UI updates for instant user feedback.

Component Architecture:

Core: Page-specific App containers; UI state is either local React state or (for Pipelines) a small Zustand store.
Modals: Centralized ModalManager and ModalSwitch for routing (Pipelines/Settings).
Queries/API: Standardized TanStack Query hooks and REST client modules.

Complete REST API Integration: All admin pages now use REST API architecture with zero jQuery/AJAX dependencies.

Security Model: Admin operations use scoped Data Machine capabilities through PermissionHelper plus WordPress nonce validation for browser requests. Administrators continue to pass through the manage_options capability mapping.

Extension Framework

Complete extension system for custom handlers and tools:

Filter-based registration
Template-driven development
Automatic discovery and validation
LLM-assisted development support

Key Features

AI Integration

Support for multiple AI providers (OpenAI, Anthropic, Google, and others)
Unified Directive System: Priority-based directive management via PromptBuilder:
- datamachine_directives - Centralized filter with priority ordering and agent targeting
Universal Engine Architecture: Shared AI infrastructure via /inc/Engine/AI/ components:
- AIConversationLoop for multi-turn conversation execution with automatic tool calling
- ToolExecutor for universal tool discovery and execution
- WP_Agent_Tool_Parameters for centralized parameter building (buildParameters() for standard tools, buildForHandlerTool() for handler tools with engine data)
- ConversationManager for message formatting and conversation utilities
- RequestBuilder for centralized AI request construction with directive application
- ToolResultFinder for universal tool result search in data packets
Site context injection with automatic cache invalidation (SiteContext::clear_cache())
Tool result formatting with success/failure messages
Clear tool result messaging enabling natural AI agent conversation termination

Data Processing

Explicit Data Separation Architecture: Clean data packets for AI processing vs engine parameters for handlers
Engine Data Filter Architecture: Fetch handlers store engine_data (source_url, image_url) in database; steps retrieve via centralized datamachine_engine_data filter
DataPacket structure for consistent data flow with chronological ordering
Clear data packet structure for AI agents with chronological ordering:
- Root wrapper with data_packets array
- Index 0 = newest packet (chronological ordering)
- Type-specific fields (handler, attachments, tool_name)
- Workflow dynamics and turn-based updates
Deduplication tracking
Comprehensive logging

Scheduling

WordPress Action Scheduler integration
Configurable intervals
Manual execution support
System task scheduling (cron-based)
Job failure handling with retry support (max 24 attempts)

Security

Scoped Data Machine capabilities for admin and agent operations
Multi-agent access control (viewer, operator, admin roles)
CSRF protection via WordPress nonces
Input sanitization and validation
Secure OAuth implementation
Workspace outside web root

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data Machine Architecture

Core Components

Pipeline+Flow System

Execution Engine

Database Schema

Multi-Agent Architecture

Layered Memory Architecture

Daily Memory System

System Tasks Framework

Workspace System

Engine Data Architecture

Abilities-First Architecture (@since v0.11.7)

Step Types

Directive System

Authentication System

Universal Engine Architecture

Filter-Based Discovery

Modular Component Architecture (@since v0.2.1)

Centralized Handler Filter System

WordPress Publish Handler Architecture

File Management

HTTP Client

Admin Interface

Extension Framework

Key Features

AI Integration

Data Processing

Scheduling

Security

Uh oh!

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Data Machine Architecture

Core Components

Pipeline+Flow System

Execution Engine

Database Schema

Multi-Agent Architecture

Layered Memory Architecture

Daily Memory System

System Tasks Framework

Workspace System

Engine Data Architecture

Abilities-First Architecture (@since v0.11.7)

Step Types

Directive System

Authentication System

Universal Engine Architecture

Filter-Based Discovery

Modular Component Architecture (@since v0.2.1)

Centralized Handler Filter System

WordPress Publish Handler Architecture

File Management

HTTP Client

Admin Interface

Extension Framework

Key Features

AI Integration

Data Processing

Scheduling

Security