AI Tech Lead — LLMs / NLP, evaluation & explainability, optimization. MSc CS. Building open, reproducible research.
- reasoning-faithfulness-eval — do reasoning models reach the right answer for the wrong reason? A clean / hinted / misleadingly-hinted benchmark that scores the answer and the chain-of-thought's faithfulness separately.
- agentic-compliance-eval — when an LLM agent says it will follow a rule, does it actually call rule-compliant tools? Stated-vs-enacted compliance evaluation.
- interp-probe-eval — evaluating interpretability probes: do they recover the concept, or a shortcut? Controllable synthetic ground truth + shortcut-controlled evaluation.
Explainability & interpretability of AI · operations research & optimization· NLP / LLMs — evaluation, governance, and reliability in regulated/enterprise settings.
"The answer to the Great Question, of Life, the Universe and Everything, is forty-two." — Douglas Adams, The Hitchhiker's Guide to the Galaxy
(Fitting, since my research is on whether models reach the right answer for the right reason — not just the answer.)
