Skip to content

fix(probes): download wordnet lexicon when it is missing from the database#1820

Open
adityasingh2400 wants to merge 2 commits into
NVIDIA:mainfrom
adityasingh2400:fix-topic-wordnet-error
Open

fix(probes): download wordnet lexicon when it is missing from the database#1820
adityasingh2400 wants to merge 2 commits into
NVIDIA:mainfrom
adityasingh2400:fix-topic-wordnet-error

Conversation

@adityasingh2400
Copy link
Copy Markdown
Contributor

Fixes #1230

Root cause

The Wordnet topic probes point wn at a garak-specific cache directory (garak/probes/topic.py:104). On a clean cache the SQLite database does not exist yet, so wn.Wordnet(self.lexicon) raises sqlite3.OperationalError, which the constructor catches to trigger a one-off lexicon download and retry.

There is a second failure mode the constructor did not handle. When the database file already exists but the requested lexicon row has never been installed, wn raises wn.Error instead, for example no lexicon found with lang=None and lexicon=oewn:2023. That exception was not caught, so the probe failed to load and the run ended early. This is exactly the traceback reported in the issue. The wn maintainer confirmed on the issue thread that the underlying cause is the lexicon being absent from the database, and that reinstalling the package does not rebuild it. Since garak uses a custom data directory, the lexicon is frequently not present, so this path is easy to hit.

Fix

Widen the except clause in WordnetBlockedWords.__init__ to also catch wn.Error, so the same download-and-retry recovery runs whether the database is missing entirely or merely lacks the requested lexicon. A short comment documents which condition each exception represents. This keeps the change to a single, specific exception type as the contribution guide asks, and avoids a broad except.

Why this is not a duplicate

There is no open or closed PR addressing issue #1230. I checked with gh pr list --repo NVIDIA/garak --state all --search "1230 in:body", gh pr list --search "topic wordnet", and gh pr list --search "wordnet lexicon". The two prior cross-references (#1316 and #764) are unrelated merged changes to how wn is used and to the topic probe content, neither of which catches wn.Error.

Verification

Reproduced the exact exception against the pinned wn==0.9.5 by pointing wn at an empty data directory and calling wn.Wordnet("oewn:2023"), which raises wn.Error rather than sqlite3.OperationalError, confirming the gap.

Added a parametrized regression test that mocks wn.Wordnet to raise each error on first load and asserts the probe downloads the lexicon once, reloads it, and keeps the reloaded handle. The wn.Error case fails on current main and passes with this change. The sqlite3.OperationalError case passes in both, guarding the existing behaviour.

Commands run, on Python 3.13 with the test extras installed:

python -m pytest tests/probes/test_probes_topic.py -k downloads_missing_lexicon -q
  -> 2 passed (1 of the 2 fails on unpatched main, confirming the guard)

python -m pytest tests/probes/test_probes_topic.py -q
  -> 18 passed

python -m black --check garak/probes/topic.py tests/probes/test_probes_topic.py
  -> clean

python -m pylint --disable=all --enable=E garak/probes/topic.py
  -> no new errors (the 9 pre-existing E1101 dynamic-attribute warnings are unchanged from main)

Disclosure

AI assistance was used to help investigate and draft this change. I reviewed every changed line, confirmed the root cause against the pinned wn version, and ran the tests and linters above.

The Wordnet topic probes set wn's data directory to a garak-specific
cache location. On the first run that cache holds no lexicon, so the
constructor catches sqlite3.OperationalError and downloads the lexicon.

When the database file already exists but the requested lexicon row is
not installed, wn raises wn.Error rather than sqlite3.OperationalError.
That case was not caught, so the probe failed to load with
"no lexicon found with lang=None and lexicon='oewn:2023'", which is
exactly what users hit in issue NVIDIA#1230.

This widens the except clause to also catch wn.Error so the same
download-and-retry recovery runs in both situations. The fix matches the
wn maintainer's diagnosis on the issue that the lexicon was simply
absent from the database and a reinstall does not rebuild it.

Adds a parametrized test that mocks wn.Wordnet to raise each error on
first load and asserts the probe downloads the lexicon and keeps the
reloaded handle. The wn.Error case fails before this change and passes
after it.

Fixes NVIDIA#1230

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Aditya Singh <adisin650@gmail.com>
Copy link
Copy Markdown
Collaborator

@leondz leondz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally in the right place - minor queries

Comment thread tests/probes/test_probes_topic.py Outdated
Comment on lines +83 to +84
sqlite3.OperationalError("no such table: lexicons"),
wn.Error("no lexicon found with lang=None and lexicon='oewn:2023'"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String-matched exceptions are brittle. Is this the best way of catching these? What if wn version becomes unpinned?

Comment thread tests/probes/test_probes_topic.py Outdated
import sqlite3
from unittest.mock import MagicMock, patch

import pytest
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports on L4-L8 should be in a single sorted block, with wn on its own

Comment thread tests/probes/test_probes_topic.py Outdated
patch.object(wn, "download", return_value=download_path) as mock_download,
patch.object(pathlib.Path, "rmdir"),
):
probe = garak.probes.topic.WordnetBlockedWords()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unconventional instantiation path; why not use garak._plugins.load_plugin()?

@adityasingh2400
Copy link
Copy Markdown
Contributor Author

Thanks for the review. Addressed all three in c46fb60:

  1. Brittle exception matching. Good point. The recovery path in topic.py keys off the exception type, except (sqlite3.OperationalError, wn.Error), not the message text, so the message strings in the test were only illustrative. I replaced them with generic messages and added a comment making that explicit, so an unpinned or updated wn that rewords its errors will not break the test.

  2. Imports. Regrouped them into a single sorted block with wn on its own, as suggested.

  3. Instantiation path. Switched to garak._plugins.load_plugin("probes.topic.WordnetBlockedWords") to match the other tests in this file rather than constructing the class directly.

Let me know if you would prefer the test exercised a different way.

- Group the new imports into a single sorted block and keep wn separate
- Use generic exception messages since the recovery path matches on the
  exception type, not the message text, so an unpinned wn version will not
  break the test
- Load the probe through garak._plugins.load_plugin like the other tests
  instead of instantiating the class directly

Signed-off-by: Aditya Singh <adisin650@gmail.com>
@adityasingh2400 adityasingh2400 force-pushed the fix-topic-wordnet-error branch from c46fb60 to b8e8335 Compare June 5, 2026 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Probes: rest probes error on topic

2 participants