Skip to content

tests: migrate to data-driven fixtures for feature presence#2986

Open
williballenthin wants to merge 16 commits intomasterfrom
data-fixtures-2
Open

tests: migrate to data-driven fixtures for feature presence#2986
williballenthin wants to merge 16 commits intomasterfrom
data-fixtures-2

Conversation

@williballenthin
Copy link
Copy Markdown
Collaborator

@williballenthin williballenthin commented Apr 2, 2026

closes #2743

This PR refactors our test suite with the primary goal of making the feature tests data-driven; that is, we now have a collection of JSON files that describe where backends should find various features. This enables non-Python implementations of capa to demonstrate that they extract the features we'd expect.

Along the way, I also made some other cleanups. In particular, I tried to simplify fixtures.py a bit. I removed a lot of the symbolic extractors (e.g., pma01_03_extractor) since many were used only once or twice in other modules; I inlined this logic with more explicit code. I figure, more obvious code -> easier to mechanically port to another language. Pytest fixtures as magic test function parameter names is not that easy to reason about...

To the reviewers, I request a thorough review of not just the code style, but of the underlying implementation, design, decisions, etc. I think you should set aside an hour for this.

Here's how I'd recommend that you review these changes:

  • read the new test_viv_extractor.py to see how a backend test now looks
  • read tests/fixtures/features/README.md and static.json to see how the features are declared
  • read fixtures.py for familiarity. don't read the diff, read the final.
  • read various diffs when they're reasonable, but i'm afraid there's a bunch of churn, so it may not be easy to parse. focus on the end result, imho, and possibly ask Gemini for any hotspots.

To be clear, there's moderate risk here: I've translated the existing feature test cases to a new format, and I might have missed or corrupted some of them. In some scenarios, I tried to consolidate feature test cases (like when IDA only worked on some samples, and Ghidra on a different set) and it's possible that I mangled the underlying intent. However, I did my best and the test suite probably high quality. I believe we should accept this risk, but you should think about this, too.

Checklist

  • This submission includes AI-generated code and I have provided details in the description.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

Comment thread tests/fixtures/features/static.json
Comment thread tests/fixtures.py Outdated
gemini-code-assist[bot]

This comment was marked as outdated.

@williballenthin williballenthin force-pushed the data-fixtures-2 branch 6 times, most recently from f8d4c8b to ad6aeb0 Compare April 15, 2026 10:13
@williballenthin

This comment was marked as resolved.

@github-actions github-actions Bot dismissed their stale review April 21, 2026 12:46

CHANGELOG updated or no update needed, thanks! 😄

@williballenthin williballenthin marked this pull request as ready for review April 21, 2026 13:50
Copy link
Copy Markdown
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work, great step into a data-driven test direction - a few comments to discuss inline

Comment thread capa/rules/__init__.py
Comment thread capa/rules/__init__.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea: can you show that the changes here work with the current tests so we have some reassurance there?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. i can probably do this on a one-off, local basis and screenshot my work.

  • one-time regression check for rule parsing/handling

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that'd be great

Comment thread tests/fixtures/features/static.json
Comment thread tests/fixtures/features/static.json
Comment thread tests/fixtures/features/static.json
Comment thread tests/fixtures/features/vmray.json
Comment thread tests/data
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file not needed anymore?

Comment thread tests/test_dynamic_span_of_calls_scope.py
Comment on lines +29 to +30
if True:
import idapro # noqa: F401 [imported but unused]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idapro has to be imported first, because it messes with sys.path

but isort likes to reorder it
so i put it in a if True block to force it to show up first

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, maybe add a comment on this

Comment thread rules
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regression?

Remove 24 file entries from fixture manifests that were defined in
`files` but never referenced by any `features` entry (dead code).
Add a test that catches unreferenced file entries going forward.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

migrate feature tests fixtures to a JSON representation

2 participants