Skip to content
View avilog's full-sized avatar

Block or report avilog

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
avilog/README.md

Avi Levin

AI Tech Lead — LLMs / NLP, evaluation & explainability, optimization. MSc CS. Building open, reproducible research.

Open research (EleutherAI SOAR 2026 prep)

  • reasoning-faithfulness-eval — do reasoning models reach the right answer for the wrong reason? A clean / hinted / misleadingly-hinted benchmark that scores the answer and the chain-of-thought's faithfulness separately.
  • agentic-compliance-eval — when an LLM agent says it will follow a rule, does it actually call rule-compliant tools? Stated-vs-enacted compliance evaluation.
  • interp-probe-eval — evaluating interpretability probes: do they recover the concept, or a shortcut? Controllable synthetic ground truth + shortcut-controlled evaluation.

Interests

Explainability & interpretability of AI · operations research & optimization· NLP / LLMs — evaluation, governance, and reliability in regulated/enterprise settings.


"The answer to the Great Question, of Life, the Universe and Everything, is forty-two." — Douglas Adams, The Hitchhiker's Guide to the Galaxy

(Fitting, since my research is on whether models reach the right answer for the right reason — not just the answer.)

Popular repositories Loading

  1. shap2llm shap2llm Public

    Python 2

  2. firebug firebug Public

    Forked from firebug/firebug

    Web Development Evolved - The Firebug you have known and loved

    JavaScript

  3. Grad_project Grad_project Public

  4. playground playground Public

    Makefile

  5. eoinfo eoinfo Public

  6. AnomalyDetection AnomalyDetection Public

    Forked from twitter/AnomalyDetection

    Anomaly Detection with R

    R