Build a SQLite database that mimics a legacy Southeast Asia life insurance policy administration system (LifeAsia-style). The database supports two product types:
- Term Life (non-participating)
- Whole Life / Endowment (participating, with reversionary and terminal bonuses)
The database must be populated with synthetic data (~200–300 policy records) and contain 24 deliberately seeded defects across four risk tiers. These defects are the diagnostic targets for a separate risk assessment application.
All defects must be documented in a separate defect manifest (JSON file) so the diagnostic app can validate its findings against ground truth.
mock_pas/
├── schema.sql # CREATE TABLE statements
├── seed_data.sql # INSERT statements for clean reference data
├── policies.sql # INSERT statements for policy/life/benefit records
├── gen_policies.py # Python script that generates policies.sql (reproducible)
├── defects.sql # Injects D01–D24 (24 Tier 1–4 defects)
├── defects_v2.sql # Injects D25–D30 (6 Tier 5 business logic defects)
├── defects_v3.sql # Injects D31–D36 (6 Tier 5/6 calculation defects)
├── manifest_addendum.json # 6 new defect entries for D25–D30
├── manifest_v3_addendum.json # 6 new defect entries for D31–D36
├── verify_v2.sh # Spot-check queries for D25–D30
├── verify_v3.sh # Spot-check queries for D31–D36
├── build.sh # Shell script: runs all .sql files in order
├── run_all.sh # Runs diagnostic CLI tool and report generator
├── mock_pas.db # The compiled SQLite database (all 36 defects applied)
├── defect_manifest.json # Ground-truth catalogue of all 36 seeded defects
├── requirements.txt # V4 web interface dependencies: flask, anthropic
├── app.py # V4: Flask web application entry point
├── start_web.sh # V4: convenience start script (port 5001)
├── reference_data/
│ ├── regulatory_params.json # Mock MAS/SGP prescribed parameters
│ ├── mortality_cso2001.csv # CSO2001 reference qx values (232 rows, ages 18–75)
│ └── generate_reference_data.py # Script that produced the CSV
└── diagnostic/
├── checks.py # 45 check functions (Tiers 1–6)
├── run_diagnostics.py # Runner: executes all checks, writes findings.json
├── report.py # Reporter: reads findings.json, produces report
├── findings.json # Output: last run results
├── report.html # Output: HTML report
├── nl_query.py # V4: NL→SQL engine using Anthropic API
├── report_ai.py # V4: AI narrative report + drill interpretation
├── draft_checks.py # V4: draft check CRUD and Python codegen
├── draft_checks/ # V4: runtime directory for promoted NL check drafts
├── templates/ # V4: Jinja2 HTML templates
│ ├── base.html
│ ├── home.html
│ ├── select_checks.html
│ ├── dashboard.html
│ ├── nl_query.html
│ ├── review_checks.html
│ └── coming_soon.html
└── static/ # V4: CSS and JS assets
├── style.css
└── app.js
Create all tables in the order listed below to satisfy foreign key dependencies.
Enable foreign key enforcement at the top: PRAGMA foreign_keys = ON;
Note: SQLite does not enforce FK constraints unless explicitly enabled per connection. Include the PRAGMA in build.sh and note this in comments.
CREATE TABLE AGENT (
AGENT_CODE TEXT PRIMARY KEY,
AGENT_NAME TEXT NOT NULL,
AGENCY_CODE TEXT NOT NULL,
STATUS TEXT NOT NULL CHECK (STATUS IN ('ACTIVE','TERMINATED','SUSPENDED')),
APPOINTMENT_DATE DATE NOT NULL,
TERMINATION_DATE DATE,
CREATED_DATE DATE NOT NULL DEFAULT (DATE('now'))
);CREATE TABLE PRODUCT (
PLAN_CODE TEXT PRIMARY KEY,
PLAN_NAME TEXT NOT NULL,
PRODUCT_TYPE TEXT NOT NULL CHECK (PRODUCT_TYPE IN ('TERM','WL','ENDOW')),
PARTICIPATING_FLAG TEXT NOT NULL CHECK (PARTICIPATING_FLAG IN ('Y','N')),
MIN_ENTRY_AGE INTEGER NOT NULL,
MAX_ENTRY_AGE INTEGER NOT NULL,
MIN_POLICY_TERM INTEGER,
MAX_POLICY_TERM INTEGER,
MIN_SUM_ASSURED NUMERIC NOT NULL,
MAX_SUM_ASSURED NUMERIC,
PREM_PAYING_TERM TEXT NOT NULL CHECK (PREM_PAYING_TERM IN ('FULL','LIMITED','SINGLE')),
CURRENCY_CODE TEXT NOT NULL DEFAULT 'SGD',
STATUS TEXT NOT NULL CHECK (STATUS IN ('ACTIVE','CLOSED','WITHDRAWN')),
EFFECTIVE_DATE DATE NOT NULL,
WITHDRAWAL_DATE DATE,
CREATED_DATE DATE NOT NULL DEFAULT (DATE('now'))
);CREATE TABLE MORTALITY_TABLE (
ROW_ID INTEGER PRIMARY KEY AUTOINCREMENT,
TABLE_CODE TEXT NOT NULL,
TABLE_NAME TEXT NOT NULL,
GENDER TEXT NOT NULL CHECK (GENDER IN ('M','F')),
AGE INTEGER NOT NULL,
SMOKING_STATUS TEXT NOT NULL CHECK (SMOKING_STATUS IN ('S','NS','U')),
QX NUMERIC NOT NULL,
EFFECTIVE_DATE DATE NOT NULL,
EXPIRY_DATE DATE,
UNIQUE (TABLE_CODE, GENDER, AGE, SMOKING_STATUS, EFFECTIVE_DATE)
);CREATE TABLE VALUATION_BASIS (
BASIS_CODE TEXT PRIMARY KEY,
BASIS_NAME TEXT NOT NULL,
MORT_TABLE_CODE TEXT NOT NULL,
VAL_INT_RATE NUMERIC NOT NULL,
LAPSE_TABLE_CODE TEXT,
EXPENSE_BASIS TEXT,
EFFECTIVE_DATE DATE NOT NULL,
EXPIRY_DATE DATE,
CREATED_DATE DATE NOT NULL DEFAULT (DATE('now')),
LAST_UPD_DATE DATE,
LAST_UPD_BY TEXT
);CREATE TABLE SYSTEM_PARAMETER (
PARAM_ID INTEGER PRIMARY KEY AUTOINCREMENT,
PARAM_CATEGORY TEXT NOT NULL,
PARAM_CODE TEXT NOT NULL,
PARAM_VALUE TEXT NOT NULL,
PARAM_TYPE TEXT NOT NULL CHECK (PARAM_TYPE IN ('NUMERIC','TEXT','DATE','FLAG')),
DESCRIPTION TEXT,
EFFECTIVE_DATE DATE,
EXPIRY_DATE DATE,
LAST_UPD_DATE DATE,
LAST_UPD_BY TEXT,
UNIQUE (PARAM_CATEGORY, PARAM_CODE, EFFECTIVE_DATE)
);CREATE TABLE POLICY (
POLICY_NO TEXT PRIMARY KEY,
PLAN_CODE TEXT NOT NULL REFERENCES PRODUCT(PLAN_CODE),
STATUS_CODE TEXT NOT NULL CHECK (STATUS_CODE IN (
'IF', -- Inforce
'LP', -- Lapsed
'SR', -- Surrendered
'MT', -- Matured
'DC', -- Death Claim
'PU' -- Paid-up
)),
ISSUE_DATE DATE NOT NULL,
MATURITY_DATE DATE,
SUM_ASSURED NUMERIC,
PREM_MODE TEXT NOT NULL CHECK (PREM_MODE IN ('A','S','Q','M')),
PREM_AMOUNT NUMERIC,
NEXT_PREM_DUE_DATE DATE,
CURRENCY_CODE TEXT NOT NULL DEFAULT 'SGD',
OFFICE_CODE TEXT NOT NULL DEFAULT 'SG01',
AGENT_CODE TEXT REFERENCES AGENT(AGENT_CODE),
ISSUE_AGE INTEGER,
AGE_CALC_METHOD TEXT NOT NULL CHECK (AGE_CALC_METHOD IN ('ANB','ALB')),
BASIS_CODE TEXT REFERENCES VALUATION_BASIS(BASIS_CODE),
POLICY_TERM INTEGER,
PREM_PAYING_TERM INTEGER,
CREATED_DATE DATE NOT NULL DEFAULT (DATE('now')),
LAST_UPD_DATE DATE,
LAST_UPD_BY TEXT
);CREATE TABLE LIFE (
LIFE_ID INTEGER PRIMARY KEY AUTOINCREMENT,
POLICY_NO TEXT NOT NULL REFERENCES POLICY(POLICY_NO),
LIFE_TYPE TEXT NOT NULL CHECK (LIFE_TYPE IN ('LA','JL','PO')),
SALUTATION TEXT CHECK (SALUTATION IN ('MR','MRS','MS','DR','MDM')),
FULL_NAME TEXT NOT NULL,
ID_TYPE TEXT CHECK (ID_TYPE IN ('NRIC','FIN','PASSPORT','OTHERS')),
ID_NO TEXT,
DOB DATE,
GENDER TEXT CHECK (GENDER IN ('M','F')),
SMOKING_STATUS TEXT CHECK (SMOKING_STATUS IN ('S','NS','U')),
NATIONALITY TEXT,
OCCUPATION_CODE TEXT,
OCCUPATION_CLASS INTEGER CHECK (OCCUPATION_CLASS IN (1,2,3,4)),
UW_DECISION TEXT CHECK (UW_DECISION IN ('STD','SUB','EXCL','DEC')),
EXTRA_MORTALITY NUMERIC DEFAULT 0,
CREATED_DATE DATE NOT NULL DEFAULT (DATE('now'))
);CREATE TABLE BENEFIT (
BENEFIT_ID INTEGER PRIMARY KEY AUTOINCREMENT,
POLICY_NO TEXT NOT NULL REFERENCES POLICY(POLICY_NO),
BENEFIT_TYPE TEXT NOT NULL CHECK (BENEFIT_TYPE IN (
'DEATH','MATURITY','TPD','CI','WAIVER','PAYOR','ACCIDENT'
)),
BENEFIT_AMOUNT NUMERIC,
BENEFIT_BASIS TEXT CHECK (BENEFIT_BASIS IN ('SA','MULT_SA','FIXED')),
RIDER_PLAN_CODE TEXT,
COMMENCEMENT_DATE DATE,
EXPIRY_DATE DATE,
STATUS TEXT NOT NULL DEFAULT 'ACTIVE'
CHECK (STATUS IN ('ACTIVE','EXPIRED','WAIVED','CANCELLED')),
CREATED_DATE DATE NOT NULL DEFAULT (DATE('now'))
);CREATE TABLE PREMIUM_RATE (
RATE_ID INTEGER PRIMARY KEY AUTOINCREMENT,
PLAN_CODE TEXT NOT NULL REFERENCES PRODUCT(PLAN_CODE),
GENDER TEXT NOT NULL CHECK (GENDER IN ('M','F')),
SMOKING_STATUS TEXT NOT NULL CHECK (SMOKING_STATUS IN ('S','NS','U')),
ISSUE_AGE INTEGER NOT NULL,
POLICY_TERM INTEGER,
PREM_PAYING_TERM INTEGER,
RATE_PER_1000 NUMERIC NOT NULL,
EFFECTIVE_DATE DATE NOT NULL,
EXPIRY_DATE DATE,
CREATED_DATE DATE NOT NULL DEFAULT (DATE('now'))
);CREATE TABLE SURRENDER_VALUE_FACTOR (
SV_ID INTEGER PRIMARY KEY AUTOINCREMENT,
PLAN_CODE TEXT NOT NULL REFERENCES PRODUCT(PLAN_CODE),
POLICY_YEAR INTEGER NOT NULL,
SV_FACTOR NUMERIC NOT NULL,
EFFECTIVE_DATE DATE NOT NULL,
EXPIRY_DATE DATE,
UNIQUE (PLAN_CODE, POLICY_YEAR, EFFECTIVE_DATE)
);CREATE TABLE BONUS_DECLARATION (
DECL_ID INTEGER PRIMARY KEY AUTOINCREMENT,
PLAN_CODE TEXT NOT NULL REFERENCES PRODUCT(PLAN_CODE),
BONUS_TYPE TEXT NOT NULL CHECK (BONUS_TYPE IN ('RB','TB','CASH_DIV')),
POLICY_YEAR INTEGER NOT NULL,
BONUS_RATE NUMERIC NOT NULL,
DECLARATION_DATE DATE NOT NULL,
EFFECTIVE_DATE DATE NOT NULL
);CREATE TABLE POLICY_BONUS (
PB_ID INTEGER PRIMARY KEY AUTOINCREMENT,
POLICY_NO TEXT NOT NULL REFERENCES POLICY(POLICY_NO),
BONUS_TYPE TEXT NOT NULL CHECK (BONUS_TYPE IN ('RB','TB','CASH_DIV')),
BONUS_YEAR INTEGER NOT NULL,
BONUS_AMOUNT NUMERIC NOT NULL,
DECLARATION_DATE DATE NOT NULL,
STATUS TEXT NOT NULL DEFAULT 'ACCRUED'
CHECK (STATUS IN ('ACCRUED','PAID','REVERSED'))
);CREATE TABLE PREMIUM_HISTORY (
PREM_ID INTEGER PRIMARY KEY AUTOINCREMENT,
POLICY_NO TEXT NOT NULL REFERENCES POLICY(POLICY_NO),
DUE_DATE DATE NOT NULL,
PAID_DATE DATE,
PREMIUM_AMOUNT NUMERIC NOT NULL,
MODE_LOADING NUMERIC DEFAULT 0,
TOTAL_COLLECTED NUMERIC,
RECEIPT_NO TEXT,
PAYMENT_METHOD TEXT CHECK (PAYMENT_METHOD IN ('GIRO','CHEQUE','CASH','NETS','ONLINE')),
STATUS TEXT NOT NULL CHECK (STATUS IN ('PAID','DUE','OVERDUE','WAIVED','REVERSED'))
);CREATE TABLE TRANSACTION (
TXN_ID INTEGER PRIMARY KEY AUTOINCREMENT,
POLICY_NO TEXT NOT NULL REFERENCES POLICY(POLICY_NO),
TXN_TYPE TEXT NOT NULL CHECK (TXN_TYPE IN (
'PREM','CLAIM','SURR','BONUS','ALTER','LOAN','LOAN_REPAY','REINSTATE'
)),
TXN_DATE DATE NOT NULL,
EFFECTIVE_DATE DATE,
AMOUNT NUMERIC NOT NULL,
CURRENCY_CODE TEXT DEFAULT 'SGD',
REFERENCE_NO TEXT,
REMARKS TEXT,
PROCESSED_BY TEXT,
STATUS TEXT NOT NULL CHECK (STATUS IN ('POSTED','REVERSED','PENDING'))
);CREATE TABLE REINSURANCE (
RI_ID INTEGER PRIMARY KEY AUTOINCREMENT,
POLICY_NO TEXT NOT NULL REFERENCES POLICY(POLICY_NO),
RI_TREATY_CODE TEXT NOT NULL,
RI_COMPANY_CODE TEXT NOT NULL,
RI_SUM_AT_RISK NUMERIC NOT NULL,
RI_PREMIUM NUMERIC NOT NULL,
EFFECTIVE_DATE DATE NOT NULL,
EXPIRY_DATE DATE,
STATUS TEXT DEFAULT 'ACTIVE'
);Include a mix of ACTIVE and TERMINATED agents. Use realistic SEA names. Example agents:
| AGENT_CODE | STATUS | APPOINTMENT_DATE | TERMINATION_DATE |
|---|---|---|---|
| AGT001 | ACTIVE | 2010-03-15 | NULL |
| AGT002 | ACTIVE | 2012-07-01 | NULL |
| AGT003 | TERMINATED | 2008-01-10 | 2015-06-30 |
| AGT004 | ACTIVE | 2015-09-01 | NULL |
| AGT005 | TERMINATED | 2005-04-20 | 2018-12-31 |
| AGT006–AGT010 | ACTIVE | various | NULL |
| PLAN_CODE | PLAN_NAME | PRODUCT_TYPE | PARTICIPATING | MIN_AGE | MAX_AGE |
|---|---|---|---|---|---|
| TRM20 | 20-Year Term Life | TERM | N | 18 | 65 |
| TRM30 | 30-Year Term Life | TERM | N | 18 | 55 |
| WL99 | Whole Life 99 | WL | Y | 1 | 70 |
| END20 | 20-Year Endowment | ENDOW | Y | 18 | 60 |
| END25 | 25-Year Endowment | ENDOW | Y | 18 | 55 |
Insert qx values for ages 18–75, both genders, smoker/non-smoker. Use realistic Southeast Asian mortality rates (lower than Western tables). Example: qx for M, NS, age 30 ≈ 0.00085; age 50 ≈ 0.00450; age 65 ≈ 0.01800.
Use TABLE_CODE = 'CSO2001' with EFFECTIVE_DATE = '2001-01-01' and EXPIRY_DATE = NULL. Use TABLE_CODE = 'AM92' with EFFECTIVE_DATE = '1992-01-01' and EXPIRY_DATE = '2010-12-31'.
| BASIS_CODE | NAME | MORT_TABLE_CODE | VAL_INT_RATE | EFFECTIVE_DATE | EXPIRY_DATE |
|---|---|---|---|---|---|
| VB2001 | SGP Regulatory 2001 | CSO2001 | 0.0375 | 2001-01-01 | 2014-12-31 |
| VB2014 | SGP Regulatory 2014 | CSO2001 | 0.0325 | 2015-01-01 | NULL |
| VB_OLD | Pre-2001 Basis | AM92 | 0.0500 | 1995-01-01 | 2000-12-31 |
Include parameters covering: valuation interest rate, modal loading factors, AGE_CALC_METHOD default, minimum face amount, GST rate, reinsurance treaty codes. Use PARAM_TYPE = 'NUMERIC', 'TEXT', 'DATE', or 'FLAG' appropriately. Vary LAST_UPD_DATE — some recently updated, some stale (see defects section).
For TRM20 and TRM30: rates by GENDER (M/F), SMOKING_STATUS (S/NS), ISSUE_AGE (18–70), POLICY_TERM (matching product), RATE_PER_1000 increasing with age. Example TRM20 M/NS rates: age 25→0.82, age 35→1.45, age 45→3.20, age 55→7.80, age 65→18.50. Apply ~25% smoker loading.
For WL99, END20, END25: rates by GENDER, SMOKING_STATUS, ISSUE_AGE (18–65). These are higher than term rates as they include savings/endowment element.
For WL99, END20, END25: SV_FACTOR by POLICY_YEAR (1–40). Year 1: 0.00, Year 2: 0.00, Year 3: 0.25, Year 5: 0.45, Year 10: 0.62, Year 15: 0.72, Year 20: 0.82, Year 30: 0.91, Year 40: 1.00.
Reversionary bonus (RB): declare annually for policy years 1–30. Typical RB rate: 3.5–5.0 per 1000 SA, varying by year. Terminal bonus (TB): declare for policy years 10, 15, 20, 25, 30 only. Use DECLARATION_DATE ranging from 2000 to 2024.
Insert approximately 250 clean policy records before defects.sql introduces errors.
- 80 TRM20 policies
- 60 TRM30 policies
- 50 WL99 policies
- 35 END20 policies
- 25 END25 policies (some of these will be used for defect seeding)
- ~70% STATUS_CODE = 'IF' (inforce)
- ~12% STATUS_CODE = 'LP' (lapsed)
- ~8% STATUS_CODE = 'SR' (surrendered)
- ~6% STATUS_CODE = 'MT' (matured)
- ~4% STATUS_CODE = 'DC' (death claim)
Format: [OFFICE]-[YEAR]-[SEQ] e.g. SG01-2005-000001
Issue dates: spread across 2000–2022.
- 1 LIFE record (LIFE_TYPE = 'LA') — realistic SEA names and Singapore NRIC format
- 1–3 BENEFIT records (DEATH is always present; MATURITY for endowment; optional riders)
- PREMIUM_HISTORY records: insert paid records from issue to last anniversary (lapsed/surrendered policies stop at their exit date)
- 2–5 TRANSACTION records per policy
- POLICY_BONUS records for all WL/endowment inforce policies (using bonus declarations)
- REINSURANCE records for policies with SUM_ASSURED > SGD 500,000
Issue ages should be consistent with DOB. In ANB markets (default for SGD policies), issue age = age at next birthday. Ensure ISSUE_AGE field matches DOB and ISSUE_DATE.
All defects are introduced via targeted UPDATE or INSERT statements on specific named records. Each defect has a unique ID (D01–D24) referenced in defect_manifest.json.
D01 — NULL sum assured on inforce policy
UPDATE POLICY SET SUM_ASSURED = NULL
WHERE POLICY_NO = 'SG01-2010-000042';Expected detection: NULL check on POLICY.SUM_ASSURED for STATUS_CODE = 'IF'
D02 — Error artifact value in premium rate table
UPDATE PREMIUM_RATE SET RATE_PER_1000 = -999
WHERE PLAN_CODE = 'TRM20' AND GENDER = 'F' AND SMOKING_STATUS = 'S' AND ISSUE_AGE = 38;Expected detection: out-of-range / negative rate check on PREMIUM_RATE.RATE_PER_1000
D03 — Date of birth in the future (impossible value)
UPDATE LIFE SET DOB = '2091-06-15'
WHERE POLICY_NO = 'SG01-2015-000088' AND LIFE_TYPE = 'LA';Expected detection: DOB > CURRENT_DATE on LIFE table
D04 — Zero premium amount on active inforce policy
UPDATE POLICY SET PREM_AMOUNT = 0
WHERE POLICY_NO = 'SG01-2012-000031';Expected detection: PREM_AMOUNT <= 0 check for STATUS_CODE = 'IF'
D05 — Spreadsheet error string in ID field
UPDATE LIFE SET ID_NO = '#VALUE!'
WHERE POLICY_NO = 'SG01-2018-000105' AND LIFE_TYPE = 'LA';Expected detection: pattern match for '#VALUE!', '#N/A', '#REF!' in text fields
D06 — NULL currency code
UPDATE POLICY SET CURRENCY_CODE = NULL
WHERE POLICY_NO = 'SG01-2008-000019';Expected detection: NOT NULL check on POLICY.CURRENCY_CODE
D07 — Policy referencing a non-existent plan code (orphan FK)
INSERT INTO POLICY (POLICY_NO, PLAN_CODE, STATUS_CODE, ISSUE_DATE, MATURITY_DATE,
SUM_ASSURED, PREM_MODE, PREM_AMOUNT, CURRENCY_CODE, OFFICE_CODE,
AGENT_CODE, ISSUE_AGE, AGE_CALC_METHOD, POLICY_TERM)
VALUES ('SG01-2019-000201', 'TRM15', 'IF', '2019-03-01', '2039-03-01',
250000, 'A', 820.00, 'SGD', 'SG01',
'AGT004', 35, 'ANB', 20);Note: PLAN_CODE 'TRM15' does not exist in the PRODUCT table. Expected detection: FK integrity check — POLICY.PLAN_CODE → PRODUCT.PLAN_CODE
D08 — Duplicate life assured on same policy
INSERT INTO LIFE (POLICY_NO, LIFE_TYPE, SALUTATION, FULL_NAME, ID_TYPE, ID_NO,
DOB, GENDER, SMOKING_STATUS, NATIONALITY, UW_DECISION)
VALUES ('SG01-2011-000067', 'LA', 'MR', 'Tan Wei Ming', 'NRIC', 'S7812345A',
'1978-04-22', 'M', 'NS', 'Singaporean', 'STD');Note: Policy SG01-2011-000067 already has one LA record. This creates a second. Expected detection: count of LIFE_TYPE='LA' records per POLICY_NO > 1
D09 — Overlapping effective date ranges in premium rate table
INSERT INTO PREMIUM_RATE (PLAN_CODE, GENDER, SMOKING_STATUS, ISSUE_AGE, POLICY_TERM,
RATE_PER_1000, EFFECTIVE_DATE, EXPIRY_DATE)
VALUES ('WL99', 'M', 'NS', 40, NULL, 28.50, '2015-01-01', '2025-12-31'),
('WL99', 'M', 'NS', 40, NULL, 29.10, '2020-01-01', '2030-12-31');Note: Both records are for the same key combination; their date ranges overlap (2020–2025). Expected detection: overlapping [EFFECTIVE_DATE, EXPIRY_DATE] for same PLAN_CODE/GENDER/SMOKING_STATUS/ISSUE_AGE
D10 — Bonus records on a non-participating (term) policy
INSERT INTO POLICY_BONUS (POLICY_NO, BONUS_TYPE, BONUS_YEAR, BONUS_AMOUNT, DECLARATION_DATE)
VALUES ('SG01-2005-000012', 'RB', 5, 1250.00, '2010-12-01'),
('SG01-2005-000012', 'RB', 6, 1300.00, '2011-12-01');Note: Policy SG01-2005-000012 is a TRM20 (non-participating). Bonus records are nonsensical. Expected detection: cross-reference POLICY_BONUS → POLICY → PRODUCT.PARTICIPATING_FLAG = 'N'
D11 — Benefit record referencing a non-existent policy
INSERT INTO BENEFIT (POLICY_NO, BENEFIT_TYPE, BENEFIT_AMOUNT, BENEFIT_BASIS,
COMMENCEMENT_DATE, EXPIRY_DATE, STATUS)
VALUES ('SG01-GHOST-99999', 'DEATH', 100000, 'SA', '2020-01-01', '2040-01-01', 'ACTIVE');Expected detection: FK integrity — BENEFIT.POLICY_NO not found in POLICY table
D12 — Policy sold by an agent who was terminated before the policy was issued
UPDATE POLICY SET AGENT_CODE = 'AGT003', ISSUE_DATE = '2017-04-15'
WHERE POLICY_NO = 'SG01-2017-000144';Note: AGT003 was terminated on 2015-06-30. Policy issue date 2017-04-15 is after termination. Expected detection: JOIN POLICY → AGENT WHERE ISSUE_DATE > AGENT.TERMINATION_DATE
D13 — Lapsed policy with a future next premium due date
UPDATE POLICY SET STATUS_CODE = 'LP', NEXT_PREM_DUE_DATE = '2027-06-01'
WHERE POLICY_NO = 'SG01-2016-000091';Expected detection: STATUS_CODE = 'LP' AND NEXT_PREM_DUE_DATE > CURRENT_DATE
D14 — Issue date after maturity date (logically impossible)
UPDATE POLICY SET ISSUE_DATE = '2022-01-01', MATURITY_DATE = '2019-12-31'
WHERE POLICY_NO = 'SG01-2014-000073';Expected detection: ISSUE_DATE > MATURITY_DATE
D15 — Gap in rate table (missing age 50 entry) Delete the row for PLAN_CODE='TRM20', GENDER='M', SMOKING_STATUS='NS', ISSUE_AGE=50:
DELETE FROM PREMIUM_RATE
WHERE PLAN_CODE = 'TRM20' AND GENDER = 'M'
AND SMOKING_STATUS = 'NS' AND ISSUE_AGE = 50
AND EFFECTIVE_DATE = (SELECT MIN(EFFECTIVE_DATE) FROM PREMIUM_RATE
WHERE PLAN_CODE = 'TRM20' AND GENDER = 'M'
AND SMOKING_STATUS = 'NS' AND ISSUE_AGE = 50);Expected detection: missing age sequence in PREMIUM_RATE for a given PLAN_CODE/GENDER/SMOKING_STATUS combination
D16 — Non-monotonic premium rate (rate decreases with age — wrong direction for term)
UPDATE PREMIUM_RATE SET RATE_PER_1000 = 2.10
WHERE PLAN_CODE = 'TRM20' AND GENDER = 'M'
AND SMOKING_STATUS = 'NS' AND ISSUE_AGE = 55;Note: The rate at age 54 should be ~6.50 and age 56 ~8.20. Overwriting age 55 with 2.10 creates a non-monotonic dip. Term rates must be non-decreasing with age. Expected detection: rate[age N] < rate[age N-1] for term products
D17 — Policy issued at age outside product entry age limits
UPDATE POLICY SET ISSUE_AGE = 72
WHERE POLICY_NO = 'SG01-2020-000162';Note: The policy uses PLAN_CODE = 'TRM20' which has MAX_ENTRY_AGE = 65. Expected detection: JOIN POLICY → PRODUCT, check ISSUE_AGE > PRODUCT.MAX_ENTRY_AGE
D18 — Premium collected after the policy surrender date First, identify a surrendered policy (e.g. SG01-2009-000033, surrendered 2018-06-15). Then insert a premium payment after that date:
INSERT INTO PREMIUM_HISTORY (POLICY_NO, DUE_DATE, PAID_DATE, PREMIUM_AMOUNT,
TOTAL_COLLECTED, RECEIPT_NO, PAYMENT_METHOD, STATUS)
VALUES ('SG01-2009-000033', '2019-01-01', '2019-01-05', 1850.00,
1850.00, 'RCT-202234', 'GIRO', 'PAID');Expected detection: PREMIUM_HISTORY.PAID_DATE > effective surrender date for STATUS_CODE = 'SR' policies
D19 — Surrender value factor > 1.0 in early policy years
UPDATE SURRENDER_VALUE_FACTOR SET SV_FACTOR = 1.35
WHERE PLAN_CODE = 'END20' AND POLICY_YEAR = 2;Note: An SV factor > 1.0 means the policyholder would receive more than the sum assured on early surrender — economically impossible for a standard endowment product. Expected detection: SV_FACTOR > 1.0 for POLICY_YEAR <= 5 (flag for any policy year, critical for early years)
D20 — Valuation basis referencing an expired mortality table
UPDATE VALUATION_BASIS
SET MORT_TABLE_CODE = 'AM92',
LAST_UPD_DATE = '2024-01-10',
LAST_UPD_BY = 'SYSTEM'
WHERE BASIS_CODE = 'VB2014';Note: AM92 has EXPIRY_DATE = '2010-12-31'. The active valuation basis VB2014 now references an expired table — all current valuations use out-of-date mortality assumptions. Expected detection: JOIN VALUATION_BASIS → MORTALITY_TABLE, check EXPIRY_DATE < CURRENT_DATE
D21 — Stale hardcoded valuation interest rate parameter
INSERT INTO SYSTEM_PARAMETER (PARAM_CATEGORY, PARAM_CODE, PARAM_VALUE, PARAM_TYPE,
DESCRIPTION, EFFECTIVE_DATE, LAST_UPD_DATE, LAST_UPD_BY)
VALUES ('VALUATION', 'VAL_INT_RATE', '0.045', 'NUMERIC',
'Default valuation interest rate for participating products',
NULL, '2008-03-01', 'KWOK_CS');Note: No EFFECTIVE_DATE, no EXPIRY_DATE, last updated in 2008. This is a hardcoded parameter with no version control — the rate has not been reviewed in 16+ years. Expected detection: SYSTEM_PARAMETER with NULL EFFECTIVE_DATE OR LAST_UPD_DATE > 5 years ago for PARAM_CATEGORY = 'VALUATION'
D22 — Premium rate records with no expiry date (no version control)
UPDATE PREMIUM_RATE SET EXPIRY_DATE = NULL
WHERE PLAN_CODE = 'WL99' AND EFFECTIVE_DATE < '2010-01-01';Note: Rate records without EXPIRY_DATE cannot be properly version-controlled. If new rates are introduced, old records remain "open" with no clear supersession. Expected detection: count of PREMIUM_RATE records with NULL EXPIRY_DATE grouped by PLAN_CODE
D23 — System parameter type mismatch (NUMERIC field storing non-numeric text)
INSERT INTO SYSTEM_PARAMETER (PARAM_CATEGORY, PARAM_CODE, PARAM_VALUE, PARAM_TYPE,
DESCRIPTION, EFFECTIVE_DATE, LAST_UPD_DATE, LAST_UPD_BY)
VALUES ('SYSTEM', 'MAX_SA_LIMIT', 'TBD', 'NUMERIC',
'Maximum sum assured limit for auto-underwriting',
'2023-01-01', '2023-01-01', 'SYS_ADMIN');Note: PARAM_TYPE is 'NUMERIC' but PARAM_VALUE = 'TBD' — cannot be cast to a number. Any downstream process trying to use this limit as a numeric value will fail silently. Expected detection: PARAM_TYPE = 'NUMERIC' AND PARAM_VALUE NOT castable to number (e.g. CAST fails)
D24 — Overlapping effective date ranges in valuation basis for same product
INSERT INTO VALUATION_BASIS (BASIS_CODE, BASIS_NAME, MORT_TABLE_CODE, VAL_INT_RATE,
EFFECTIVE_DATE, EXPIRY_DATE, LAST_UPD_DATE, LAST_UPD_BY)
VALUES ('VB2014B', 'SGP Regulatory 2014 (Revised)', 'CSO2001', 0.0300,
'2014-01-01', '2020-12-31', '2024-06-01', 'ACTUARIAL_TEAM');Note: VB2014 is already active from 2015-01-01 onwards. VB2014B starts 2014-01-01 and overlaps with VB2014 for the period 2015-01-01 to 2020-12-31. It is ambiguous which basis applies during the overlap — a critical governance failure. Expected detection: overlapping [EFFECTIVE_DATE, EXPIRY_DATE] ranges across VALUATION_BASIS records
Produce a JSON file with the following structure for all 24 defects:
{
"version": "1.0",
"database": "mock_pas.db",
"generated": "2025-01-01",
"total_defects": 24,
"defects": [
{
"defect_id": "D01",
"tier": 1,
"tier_label": "Data Integrity",
"severity": "HIGH",
"table": "POLICY",
"field": "SUM_ASSURED",
"affected_key": "SG01-2010-000042",
"description": "NULL sum assured on an active inforce policy",
"detection_hint": "NULL check on SUM_ASSURED for STATUS_CODE = 'IF'",
"remediation": "Recover SUM_ASSURED from policy documents or TRANSACTION history"
}
// ... repeat for D02–D24
]
}Severity guidance:
- HIGH: defects with direct financial or customer impact (D01, D02, D07, D13, D14, D16, D17, D19, D20, D21)
- MEDIUM: defects with operational or governance impact (D03, D04, D08, D09, D10, D15, D18, D22, D24)
- LOW: defects detectable but lower immediate risk (D05, D06, D11, D12, D23)
#!/bin/bash
set -e
DB="mock_pas.db"
echo "Removing old database..."
rm -f $DB
echo "Creating schema..."
sqlite3 $DB < schema.sql
echo "Loading reference data..."
sqlite3 $DB < seed_data.sql
echo "Loading policy records..."
sqlite3 $DB < policies.sql
echo "Injecting seeded defects..."
sqlite3 $DB < defects.sql
echo "Verifying record counts..."
sqlite3 $DB "SELECT 'POLICY', COUNT(*) FROM POLICY
UNION ALL SELECT 'LIFE', COUNT(*) FROM LIFE
UNION ALL SELECT 'BENEFIT', COUNT(*) FROM BENEFIT
UNION ALL SELECT 'PREMIUM_HISTORY', COUNT(*) FROM PREMIUM_HISTORY
UNION ALL SELECT 'TRANSACTION', COUNT(*) FROM TRANSACTION
UNION ALL SELECT 'PREMIUM_RATE', COUNT(*) FROM PREMIUM_RATE;"
echo "Done. Database: $DB"- Age calculation default: ANB (Age Next Birthday) for all SGD policies
- Singapore NRIC format: S/T + 7 digits + check letter, e.g.
S8012345Z - Currency: SGD default; also include some MYR policies (OFFICE_CODE = 'MY01')
- Names: Use a mix of Chinese, Malay, and Indian names realistic for Singapore market
- Premium modes: Annual ('A') is most common (~55%), Monthly ('M') ~30%, others remainder
- Smoker loading: ~25% above non-smoker rates for term products
- Issue ages should cluster around 30–50 (most policies sold in this range)
- Sum assureds: range from SGD 50,000 to SGD 2,000,000; mode around SGD 300,000–500,000
- Policy terms: TRM20 = 20 years, TRM30 = 30 years, WL99 to age 99, END20/END25 as named
- Maturity dates = ISSUE_DATE + policy term in years
- NEXT_PREM_DUE_DATE: for inforce policies, set to next anniversary from a recent date
Defects D07 and D11 require inserting orphaned records. Temporarily disable FK enforcement around those inserts, or structure defects.sql to insert these before re-enabling. Comment this clearly in defects.sql.
-
sqlite3 mock_pas.db ".tables"shows all 15 tables - Total POLICY count is 250–260
-
sqlite3 mock_pas.db "SELECT COUNT(*) FROM POLICY WHERE SUM_ASSURED IS NULL"returns 1 (D01) -
sqlite3 mock_pas.db "SELECT COUNT(*) FROM LIFE WHERE DOB > DATE('now')"returns 1 (D03) -
sqlite3 mock_pas.db "SELECT COUNT(*) FROM POLICY WHERE ISSUE_DATE > MATURITY_DATE"returns 1 (D14) - defect_manifest.json is valid JSON and contains exactly 36 entries (D01–D36; V2 and V3 addenda merged in)
All files listed in §2 were produced.
| File | Status | Notes |
|---|---|---|
| schema.sql | Built | Two schema deviations — see §11 |
| seed_data.sql | Built | Rate and SV factor fixes applied during build — see §11 |
| policies.sql | Built (generated) | 1.1 MB; deterministic via random.seed(42) in gen_policies.py |
| gen_policies.py | Built | Python script that generates policies.sql; kept for reproducibility |
| defects.sql | Built | All 24 V1 defects (D01–D24); FK enforcement disabled around D07 and D11 |
| defects_v2.sql | Built | D25–D30 (6 Tier 5 business logic defects) |
| defects_v3.sql | Built | D31–D36 (6 Tier 5/6 calculation defects) — one deviation, see §11 |
| manifest_addendum.json | Built | 6 entries for D25–D30 |
| manifest_v3_addendum.json | Built | 6 entries for D31–D36 |
| verify_v2.sh | Built | All 6 checks pass |
| verify_v3.sh | Built | All 6 checks pass (D31 threshold adjusted to 0.95 — see §11) |
| build.sh | Built | Extended with 24 individual defect verification queries beyond §8 spec |
| run_all.sh | Built | Runs diagnostic with --ref-data reference_data/ |
| mock_pas.db | Built | 251 policies, all 36 defects applied and verified |
| defect_manifest.json | Built | 36 entries total; D19 severity corrected to HIGH — see §11 |
| reference_data/regulatory_params.json | Built | Mock MAS/SGP parameters (verbatim from reference_data_spec.md) |
| reference_data/mortality_cso2001.csv | Built | 232 data rows; all spot checks pass |
| reference_data/generate_reference_data.py | Built | Generates the CSV via log-linear interpolation |
| diagnostic/checks.py | Built | 45 check functions (Tiers 1–6) |
| diagnostic/run_diagnostics.py | Built | Supports --ref-data for Tier 6 checks; exports CHECK_REGISTRY and CHECK_METADATA for V4 Flask app |
| diagnostic/report.py | Built | Tier 5 teal accent, Tier 6 purple accent, tier legend in HTML header |
| requirements.txt | Built | flask>=2.3.0, anthropic>=0.25.0 |
| app.py | Built | Flask web application; runs on port 5001 (via PORT env var); all V4 routes implemented |
| start_web.sh | Built | Convenience script to start V4 web interface |
| diagnostic/nl_query.py | Built | NL→SQL engine; uses claude-sonnet-4-20250514; validates generated SQL for safety |
| diagnostic/report_ai.py | Built | AI narrative report generator + per-finding drill interpretation; uses same model |
| diagnostic/draft_checks.py | Built | Draft check CRUD: save, list, load, delete, generate Python function, commit to checks.py with rollback |
| diagnostic/draft_checks/ | Runtime | Directory created by app at startup; holds promoted NL check JSON files |
| diagnostic/templates/base.html | Built | Shared layout: nav bar, flash messages, API key status indicator, footer |
| diagnostic/templates/home.html | Built | Hub page: API key panel, three action cards, last run summary strip |
| diagnostic/templates/select_checks.html | Built | Check selector: tier-grouped checkboxes, live count, select-all/deselect-all |
| diagnostic/templates/dashboard.html | Built | Interactive results: summary cards, Chart.js charts, filterable/sortable findings table, AI report panel |
| diagnostic/templates/nl_query.html | Built | NL query interface: input, SQL/results display, promote-to-check form, query history |
| diagnostic/templates/review_checks.html | Built | Draft review queue: cards, detail panel, Python preview, commit/delete actions |
| diagnostic/templates/coming_soon.html | Built | Placeholder template for future pages (not in original V4 spec) |
| diagnostic/static/style.css | Built | Full CSS with tier colour variables, badge styles, card grid, responsive breakpoints |
| diagnostic/static/app.js | Built | Chart.js init, findings table filter/sort/expand, AJAX drill + report calls, clipboard utilities |
-
CURRENCY_CODENOT NULL removed — spec definesCURRENCY_CODE TEXT NOT NULL DEFAULT 'SGD'on POLICY. The NOT NULL constraint was removed (column isTEXT DEFAULT 'SGD') so that D06 (UPDATE POLICY SET CURRENCY_CODE = NULL) can execute without a constraint error. The defect is still detectable; the prototype checks for NULL values in data, not schema constraints. -
"TRANSACTION"quoted —TRANSACTIONis a reserved keyword in SQLite. The table name is quoted as"TRANSACTION"throughout schema.sql, seed_data.sql, defects.sql, policies.sql, and build.sh. Functionally identical at runtime.
-
WL99 SV factors years 28–29 corrected to 0.91 — Initial values (0.92, 0.93) created an unintended non-monotonic dip before year 30 (0.91), which would have caused false positives in clean data. Both changed to 0.91. The only remaining non-monotonic pair is END20 year 2→3 (1.35→0.25), which is the intentional D19 defect.
-
TRM20 F/S/38 premium rate row added at 1.64 — D02 targets
PLAN_CODE='TRM20', GENDER='F', SMOKING_STATUS='S', ISSUE_AGE=38. This row was missing from the initial seed; added to provide a target for the defect UPDATE. -
TRM20 M/S/38 rate corrected from -999 to 2.48 — The -999 error artifact was accidentally placed in the M/S/38 row during seed authoring. Corrected to the proper actuarial rate before defects.sql runs.
- D19 severity corrected MEDIUM → HIGH — Spec §7 severity guidance lists D19 as HIGH. Initial manifest had MEDIUM. Corrected to match spec.
- D23 detection logic —
CAST('TBD' AS REAL) IS NULLdoes not work in SQLite; SQLite returns0.0not NULL for a non-numeric cast. Detection query replaced with:PARAM_VALUE NOT GLOB '[0-9]*' AND PARAM_VALUE NOT GLOB '-[0-9]*'
-
D31 re-targeted (defects_v3.sql) — The spec sets
PREM_AMOUNT = ROUND(PREM_AMOUNT / 1.08, 2)on the 7th monthly inforce policy (OFFSET 6). In practice, the 7th monthly policy (SG01-2000-000044, age 29) had no matching row in PREMIUM_RATE (rates are stored at quinquennial ages only), so the defect was undetectable. The as-built approach re-targets the 7th monthly policy with a matching rate record (SG01-2002-000167, M/S/50/TRM20) and setsPREM_AMOUNT = ROUND(rate × SA / 1000 / 12, 2)directly, simulating the same loading-omission error. -
D31 detection threshold: 0.95 (not spec's 0.90) — Removing the 1.08 modal factor yields a premium that is 1/1.08 ≈ 92.6% of the correct value, which is above the spec's 90% floor. The threshold was raised to 0.95 so that 92.6% falls below it. This makes the check more sensitive, not less.
| Deviation | Why acceptable |
|---|---|
| Zero MT (matured) policies | No defect targets matured-status policies; spec distribution was advisory |
| CSO2001 Female/Smoker sparse (12 ages only) | No defect targets F/S mortality rows |
| D22 side-effect: END20/END25 also have NULL EXPIRY_DATE | A grouped scan returns 3 plans instead of 1; realistic behaviour for a prototype to handle |
| gen_policies.py extra file | No effect on the database content |
Counts reflect the final database state after build.sh + defects_v2.sql + defects_v3.sql are applied (i.e. all 36 defects).
| Table | Row count |
|---|---|
| POLICY | 251 |
| LIFE | 251 |
| BENEFIT | 365 |
| PREMIUM_HISTORY | 2,651 |
| TRANSACTION | 756 |
| PREMIUM_RATE | 199 |
- ✅
sqlite3 mock_pas.db ".tables"shows all 15 tables - ✅ Total POLICY count: 251 (within 250–260 target)
- ✅
SELECT COUNT(*) FROM POLICY WHERE SUM_ASSURED IS NULL→ 1 (D01) - ✅
SELECT COUNT(*) FROM LIFE WHERE DOB > DATE('now')→ 1 (D03) - ✅
SELECT COUNT(*) FROM POLICY WHERE ISSUE_DATE > MATURITY_DATE→ 1 (D14) - ✅ defect_manifest.json: valid JSON, exactly 36 entries (updated after V2/V3 merges)
All 24 V1 defects were independently verified by build.sh detection queries.
- ✅ All 6 checks in verify_v2.sh pass
- ✅ D25: TRM30 M/S/35 smoker rate (0.55) below NS rate
- ✅ D26: RI_SAR > SUM_ASSURED on first active RI record
- ✅ D27: POLICY_TERM=35 inconsistent with 20-year date range
- ✅ D28: PREM_PAYING_TERM=30 on END20 (POLICY_TERM=20)
- ✅ D29: BONUS_YEAR=25 on policy issued after 2021
- ✅ D30: Bonus amount ×1000 on first RB year-5 record
- ✅ All 6 checks in verify_v3.sh pass
- ✅ D31: Monthly premium missing 1.08 modal loading (detection threshold 0.95 — see §11)
- ✅ D32: SUB policy premium at base rate only (extra mortality loading absent)
- ✅ D33: Non-par death benefit set to 115% of SA
- ✅ D34: Endowment maturity benefit set to 85% of SA
- ✅ D35: VB2001 VAL_INT_RATE=0.0475 above regulatory max of 0.0425
- ✅ D36: CSO2001 M/NS/45 QX=0.02800 (10× decimal shift from reference 0.00280)
- ✅ 45 checks run, 36/36 seeded defects detected (100% coverage)
- ✅ Tier 6 checks gracefully skip when
--ref-datanot provided (34/36 detected)