avilog

Avi Levin

AI Tech Lead — LLMs / NLP, evaluation & explainability, optimization. MSc CS. Building open, reproducible research.

Open research (EleutherAI SOAR 2026 prep)

reasoning-faithfulness-eval — do reasoning models reach the right answer for the wrong reason? A clean / hinted / misleadingly-hinted benchmark that scores the answer and the chain-of-thought's faithfulness separately.
agentic-compliance-eval — when an LLM agent says it will follow a rule, does it actually call rule-compliant tools? Stated-vs-enacted compliance evaluation.
interp-probe-eval — evaluating interpretability probes: do they recover the concept, or a shortcut? Controllable synthetic ground truth + shortcut-controlled evaluation.

Interests

Explainability & interpretability of AI · operations research & optimization· NLP / LLMs — evaluation, governance, and reliability in regulated/enterprise settings.

"The answer to the Great Question, of Life, the Universe and Everything, is forty-two." — Douglas Adams, The Hitchhiker's Guide to the Galaxy

_{(Fitting, since my research is on whether models reach the right answer for the right reason — not just the answer.)}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avilog

Achievements

Achievements

Block or report avilog

Avi Levin

Open research (EleutherAI SOAR 2026 prep)

Interests

Popular repositories Loading

Uh oh!