Summary
In order to be confident about the way that our rule matcher works, we need a data-driven and robust way to test the rule logic evaluator against a set of features. We have some of this today. For example, we have a handful of tests that kind of have hard-coded Pythonic representations of features that we match against embedded rules, like here:
|
def test_match_simple(): |
|
rule = textwrap.dedent(""" |
|
rule: |
|
meta: |
|
name: test rule |
|
scopes: |
|
static: function |
|
dynamic: process |
|
namespace: testns1/testns2 |
|
features: |
|
- number: 100 |
|
""") |
|
r = capa.rules.Rule.from_yaml(rule) |
|
|
|
features, matches = match([r], {capa.features.insn.Number(100): {1, 2}}, 0x0) |
|
assert "test rule" in matches |
|
assert MatchedRule("test rule") in features |
|
assert MatchedRule("testns1") in features |
|
assert MatchedRule("testns1/testns2") in features |
. This works well within the Python ecosystem. However, to support implementations of capa that are not written in Python, we need to be able to describe rules and feature sets and what we expect the result to be.
I'd like to be able to have a json/yaml-like format something like the following:
- name: simple number feature
rule:
features:
- number: 100
features: |
func: 0x4088A4
bb: 0x4088A4
insn: 0x4088A4: number(100)
matches: true
description: trivial feature matches
we could use the show-features.py DSL to represent features succinctly in the fixture data.
Summary
In order to be confident about the way that our rule matcher works, we need a data-driven and robust way to test the rule logic evaluator against a set of features. We have some of this today. For example, we have a handful of tests that kind of have hard-coded Pythonic representations of features that we match against embedded rules, like here:
capa/tests/test_match.py
Lines 48 to 66 in 7a79f79
I'd like to be able to have a json/yaml-like format something like the following:
we could use the show-features.py DSL to represent features succinctly in the fixture data.