The rule engine is the classification core of l3dg3rr. It applies a set of Rhai rule files to each ingested transaction to assign a tax category, confidence score, and optional review flag. Rules are authored as standalone .rhai files and executed at runtime without recompilation.
Rules execute in a waterfall model: each rule is tried in order, and the first rule that returns a category other than "Unclassified" wins. This keeps individual rules simple and composable while still handling complex multi-category edge cases.
// Multi-rule waterfall: first non-Unclassified result wins
fn load_rules() -> select_applicable
fn select_applicable() -> run_waterfall
fn run_waterfall() -> evaluate_confidence
fn evaluate_confidence() -> emit_result
The pipeline nodes map to Rust types:
| Pipeline step | Rust type | Status |
|---|---|---|
load_rules |
RuleRegistry::load_from_dir |
Implemented for transaction rules |
select_applicable |
RuleRegistry::select_rules_deterministic |
Implemented keyword fallback |
run_waterfall |
RuleRegistry::classify_waterfall |
Implemented first-match waterfall |
evaluate_confidence |
ClassificationOutcome.confidence |
Implemented |
emit_result |
ClassifiedTransaction / ReviewFlag |
Implemented |
Before running the waterfall, applicable rules are selected from the registry. A confidence gate then routes the result to either a commit or a manual review queue.
if confidence > 0.85 -> commit
if confidence > 0.60 -> review
if confidence <= 0.60 -> escalate
The thresholds are configurable per deployment. The review path creates a ReviewFlag in the workbook's Flags sheet. The escalate path promotes the flag to a high-priority manual item.
The diagram below shows the full classification pipeline from document ingestion through workbook commit. Each node is tagged with its current implementation status.
fn pdf_source() -> doctype_detection
fn doctype_detection() -> reqif_bridge
fn doctype_detection() -> rule_registry
fn reqif_bridge() -> rule_registry
fn rule_registry() -> rule_selection
fn rule_selection() -> classify_waterfall
fn classify_waterfall() -> high_confidence
fn classify_waterfall() -> medium_confidence
fn classify_waterfall() -> low_confidence
fn high_confidence() -> workbook_commit
fn medium_confidence() -> review_flag
fn low_confidence() -> review_flag
fn review_flag() -> workbook_commit
Rules are standalone .rhai files that implement a single fn classify(tx) function. The function receives a transaction map and returns a map with fields: category, confidence, review, reason.
See docs/rhai-rules.md for the complete rule authoring guide, including field contracts, available built-ins, and worked examples for foreign_income, self_employment, and fallback rules.
The baseline rules currently covered by the registry waterfall are:
rules/classify_foreign_income.rhai— matches foreign-sourced income transactionsrules/classify_self_employment.rhai— matches self-employment / Schedule C incomerules/classify_fallback.rhai— catch-all returningUnclassifiedwithreview: true
Additional Schedule, FBAR/FATCA, crypto, AU GST, and AU CGT transaction rules are also loaded. classify_document_shape.rhai is excluded because it exposes classify_document(doc), not the transaction classify(tx) entry point.
SemanticRuleSelector is the planned upgrade to the deterministic keyword-match selector. It will use vector embeddings to rank rules by semantic similarity to a transaction's description, drawing on ReqIfCandidate objects produced by the reqif-opa-mcp Python sidecar.
pub trait SemanticRuleSelector {
fn select_rules_semantic(&self, tx: &SampleTransaction, top_k: usize) -> Vec<PathBuf>;
fn build_embedding_index(&mut self) -> Result<(), RuleRegistryError>;
}This is unimplemented!() today because it requires:
- Embedding model — a local ONNX or
fastembed-rsmodel to encode transaction text - ReqIfCandidate index —
ReqIfCandidateobjects loaded from the Python sidecar - Vector similarity — cosine distance or HNSW index over candidate embeddings
Until those are wired, select_rules_deterministic provides a stable keyword-match fallback. See Document Ingestion for the full sidecar bridge plan.