Skip to content

Commit 8893b1a

Browse files
1ndahous3plusvic
andauthored
feat: add external modules support (VirusTotal#655)
Adds a first-class API for registering YARA-X modules from external crates, including support for callable functions accessible from YARA-X rules. # Why Currently, adding a YARA-X module requires forking the repository and modifying internal code. This change exposes a stable external API so third-party crates can ship their own modules. # Changes The module registration mechanism has been refactored, the `#[module_main]` attribute has been removed in favor of a new `register_module!` macro. --------- Co-authored-by: Victor M. Alvarez <vmalvarez@virustotal.com>
1 parent fe0ad48 commit 8893b1a

84 files changed

Lines changed: 3739 additions & 721 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ members = [
2828
"py",
2929
"ls",
3030
]
31+
exclude = ["examples/custom-module"]
3132
resolver = "3"
3233

3334

docs/ModuleDeveloperGuide.md

Lines changed: 233 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,13 @@ words
2525
- [Adding dependencies](#adding-dependencies)
2626
- [Using enums](#using-enums)
2727
- [Inline enums](#inline-enums)
28+
- [Creating a module in an external crate](#creating-a-module-in-an-external-crate)
29+
- [Setting up the crate](#setting-up-the-crate)
30+
- [Compiling the proto at build time](#compiling-the-proto-at-build-time)
31+
- [Registering the module](#registering-the-module)
32+
- [Adding callable functions to an external module](#adding-callable-functions-to-an-external-module)
33+
- [Overriding the module output at scan time](#overriding-the-module-output-at-scan-time)
34+
- [Ensuring the module is linked](#ensuring-the-module-is-linked)
2835
- [Tests](#tests)
2936
- [Structuring Testdata Input](#structuring-testdata-input)
3037
- [Linux](#linux)
@@ -121,7 +128,6 @@ Let's start with the interesting part:
121128
option (yara.module_options) = {
122129
name : "text"
123130
root_message: "text.Text"
124-
rust_module: "text"
125131
cargo_feature: "text-module"
126132
};
127133
```
@@ -133,10 +139,10 @@ file, but one describing a module. In fact, you can put any `.proto` file in the
133139
files is describing a YARA module. Only files containing a `yara.module_options`
134140
section will define a module.
135141

136-
Options `name` and `root_message` are required, while `rust_module` and
137-
`cargo_feature` are optional. The `name` option defines the module's name. This
138-
is the name that will be used for importing the module in a YARA rule, in this
139-
case our module will be imported with `import "text"`.
142+
Options `name` and `root_message` are required, while `cargo_feature` is optional.
143+
The `name` option defines the module's name. This is the name that will be used
144+
for importing the module in a YARA rule, in this case our module will be imported
145+
with `import "text"`.
140146

141147
The `cargo_feature` option indicates the name of the feature that controls
142148
whether
@@ -292,11 +298,6 @@ need write the logic that parses every scanned file and fills the module's
292298
structure with the data obtained from the file. This is done by implementing
293299
a function that will act as the entry point for your module.
294300

295-
This is where the `rust_module` option described in the previous section enters
296-
into play. This option is the name of the Rust module that contains the code
297-
for your module. In our `text.proto` file we have `rust_module: "text"`, which
298-
means that our Rust module must be named `text`.
299-
300301
There are two options for creating our `text` module:
301302

302303
* Creating a `text.rs` file in `lib/src/modules`.
@@ -309,24 +310,25 @@ second approach is the recommended one.
309310
So, let's create our `lib/src/modules/text.rs` file:
310311

311312
```rust
312-
use crate::modules::prelude::*;
313+
use crate::mods::prelude::*;
313314
use crate::modules::protos::text::*;
314315
315-
#[module_main]
316-
fn main(data: &[u8]) -> Text {
316+
fn main(data: &[u8], _meta: Option<&[u8]>) -> Result<Text, ModuleError> {
317317
let mut text_proto = Text::new();
318318
319319
// TODO: parse the data and populate text_proto.
320320
321-
text_proto
321+
Ok(text_proto)
322322
}
323+
324+
register_module!("text", Text, main);
323325
```
324326

325327
This is the simplest possible code for a YARA module, and it doesn't do anything
326328
special yet. Let's describe what it does in detail:
327329

328330
```rust
329-
use crate::modules::prelude::*;
331+
use crate::mods::prelude::*;
330332
```
331333

332334
This first line is very important as it imports all the dependencies required
@@ -348,38 +350,36 @@ will be `crate::modules::protos::foobar`
348350

349351
---
350352

351-
Next comes the module's main function:
353+
Next comes the module's main function and the module registration:
352354

353355
```rust
354-
#[module_main]
355-
fn main(data: &[u8]) -> Text {
356+
fn main(data: &[u8], _meta: Option<&[u8]>) -> Result<Text, ModuleError> {
356357
...
357358
}
359+
360+
register_module!("text", Text, main);
358361
```
359362

360363
The module's main function is called for every file scanned by YARA. This
361-
function receives a byte slice with the content of the file being scanned. It
362-
must return the `Text` structure that was generated from the `text.proto` file.
363-
The main function must have the `#[module_main]` attribute. Notice that the
364-
module's main function doesn't need to be called `main`, it can have any
365-
arbitrary name, as long as it has the `#[module_main]` attribute. Of course,
366-
this attribute can't be used with more than one function per module.
367-
368-
The main function usually consists in creating an instance of the protobuf
369-
you previously defined, and populating the protobuf with information extracted
370-
from
371-
the scanned file. Let's finish the implementation of the main function for our
372-
`text` module.
364+
function receives a byte slice with the content of the file being scanned and an
365+
optional byte slice with per-scan metadata, and it returns a `Result` containing the
366+
`Text` structure that was generated from the `text.proto` file (or a `ModuleError`).
367+
368+
Registering the module is as simple as calling the `register_module!` macro.
369+
It takes the name of the module (as used in YARA rules' `import` statements), the
370+
protobuf message type returned by the module, and the main function name. If the
371+
module is a data-only module with no main function, the third argument can be omitted.
372+
373+
Let's finish the implementation of the main function for our `text` module.
373374

374375
```rust
375-
use crate::modules::prelude::*;
376+
use crate::mods::prelude::*;
376377
use crate::modules::protos::text::*;
377378
378379
use std::io;
379380
use std::io::BufRead;
380381
381-
#[module_main]
382-
fn main(data: &[u8]) -> Text {
382+
fn main(data: &[u8], _meta: Option<&[u8]>) -> Result<Text, ModuleError> {
383383
// Create an empty instance of the Text protobuf.
384384
let mut text_proto = Text::new();
385385
@@ -396,7 +396,7 @@ fn main(data: &[u8]) -> Text {
396396
num_words += line.split_whitespace().count();
397397
num_lines += 1;
398398
}
399-
Err(_) => return text_proto,
399+
Err(_) => return Ok(text_proto),
400400
}
401401
}
402402
@@ -405,8 +405,10 @@ fn main(data: &[u8]) -> Text {
405405
text_proto.set_num_words(num_words as i64);
406406

407407
// Return the Text proto after filling the relevant fields.
408-
text_proto
408+
Ok(text_proto)
409409
}
410+
411+
register_module!("text", Text, main);
410412
```
411413

412414
That's all you need for having a fully functional YARA module. Now, let's build
@@ -1009,6 +1011,202 @@ enum CPU_SUBTYPE_ARM {
10091011
With the enums above you can refer to `macho.CPU_TYPE_X86` and instead of
10101012
`macho.CPU_TYPE.CPU_TYPE_X86` and `macho.CPU_SUBTYPE_INTEL.CPU_SUBTYPE_I386`.
10111013

1014+
## Creating a module in an external crate
1015+
1016+
Everything described so far assumes that your module lives inside the
1017+
`yara-x` repository itself. This is the right approach when you intend to
1018+
contribute the module upstream, but it requires modifying the `yara-x` source
1019+
tree and rebuilding the library. If you want to ship a module as part of your
1020+
own crate—without forking or patching `yara-x`—you can use the **custom
1021+
modules** API instead.
1022+
1023+
Custom modules are registered at link time through the
1024+
[inventory](https://docs.rs/inventory) crate. When your crate is linked into
1025+
a binary together with `yara-x`, the module is discovered automatically and
1026+
behaves exactly like a built-in module: it can be `import`-ed in rules, its
1027+
fields are accessible in conditions, and its functions are callable.
1028+
1029+
### Setting up the crate
1030+
1031+
Add `yara-x` and `protobuf` as dependencies:
1032+
1033+
```toml
1034+
[dependencies]
1035+
yara-x = { version = "..." }
1036+
protobuf = { version = "3" }
1037+
1038+
[build-dependencies]
1039+
protobuf-codegen = { version = "3" }
1040+
```
1041+
1042+
### Compiling the proto at build time
1043+
1044+
Just like built-in modules, an external module's structure is described by a
1045+
`.proto` file. The difference is that instead of placing it inside the
1046+
`yara-x` source tree and relying on `yara-x`'s own build script, you compile
1047+
it yourself in `build.rs` using `protobuf-codegen`:
1048+
1049+
```protobuf
1050+
// proto/foobar.proto
1051+
syntax = "proto2";
1052+
1053+
package foobar;
1054+
1055+
message Foobar {
1056+
optional uint64 count = 1;
1057+
optional string label = 2;
1058+
repeated string tags = 3;
1059+
}
1060+
```
1061+
1062+
```rust
1063+
// build.rs
1064+
fn main() {
1065+
println!("cargo:rerun-if-changed=build.rs");
1066+
println!("cargo:rerun-if-changed=proto");
1067+
1068+
protobuf_codegen::Codegen::new()
1069+
.pure()
1070+
.cargo_out_dir("protos")
1071+
.include("proto")
1072+
.input("proto/foobar.proto")
1073+
.run_from_script();
1074+
}
1075+
```
1076+
1077+
The `.pure()` call tells `protobuf-codegen` to use its built-in Rust parser,
1078+
so you do not need to have `protoc` installed. The generated code is placed
1079+
under `$OUT_DIR/protos/`, and you include it from your library like this:
1080+
1081+
```rust
1082+
pub mod proto {
1083+
include!(concat!(env!("OUT_DIR"), "/protos/mod.rs"));
1084+
}
1085+
pub use proto::foobar::Foobar;
1086+
```
1087+
1088+
### Registering the module
1089+
1090+
With the proto in place, registering the module requires two things: a main
1091+
function and a call to `yara_x::register_module!`. The main function follows the same
1092+
contract as in built-in modules—it receives the scanned data, populates the
1093+
protobuf, and returns it:
1094+
1095+
```rust
1096+
use yara_x::errors::ModuleError;
1097+
1098+
fn foobar_main(
1099+
data: &[u8],
1100+
_meta: Option<&[u8]>,
1101+
) -> Result<Foobar, ModuleError> {
1102+
let mut out = Foobar::new();
1103+
out.count = Some(data.len() as u64);
1104+
out.label = Some("foobar".to_owned());
1105+
Ok(out)
1106+
}
1107+
1108+
yara_x::register_module!("foobar", Foobar, foobar_main);
1109+
```
1110+
1111+
A few things to note here:
1112+
1113+
* The first argument is the string used in `import "foobar"` in YARA rules.
1114+
* The second argument is the root protobuf message type. The root descriptor is automatically obtained from this type.
1115+
* The third argument is the main function. If your module is data-only (the caller always
1116+
injects the output via `set_module_output`), it can be omitted.
1117+
* The macro automatically sets the Rust module name path via `module_path!()` so that YARA can find any callable functions.
1118+
1119+
If a custom module shares its name with a built-in module, the built-in one
1120+
takes precedence and your registration is silently ignored.
1121+
1122+
### Adding callable functions to an external module
1123+
1124+
Custom modules can also export functions that are callable from YARA rules, the
1125+
same way built-in modules do with `#[module_export]`. From an external crate
1126+
you use the same attribute, but you must pass an extra `yara_x_crate`
1127+
argument so that the macro can generate fully-qualified type references:
1128+
1129+
```rust
1130+
pub mod fns {
1131+
use yara_x::ScanContext;
1132+
1133+
#[yara_x::module_export(yara_x_crate = "yara_x")]
1134+
pub fn add(_ctx: &ScanContext, a: i64, b: i64) -> i64 {
1135+
a + b
1136+
}
1137+
}
1138+
```
1139+
1140+
The `yara_x_crate = "yara_x"` argument is required whenever the macro is used
1141+
outside of the `yara-x` source tree. Without it the macro generates bare type
1142+
names (`Caller`, `ScanContext`, etc.) that are only in scope inside `yara-x`.
1143+
1144+
The function signature rules are exactly the same as for built-in modules—
1145+
see [Valid function arguments](#valid-function-arguments) and
1146+
[Valid return types](#valid-return-types).
1147+
1148+
After this, the `add` function is callable from YARA rules as `foobar.add(a, b)`:
1149+
1150+
```yara
1151+
import "foobar"
1152+
1153+
rule add_works {
1154+
condition:
1155+
foobar.add(3, 4) == 7
1156+
}
1157+
```
1158+
1159+
### Overriding the module output at scan time
1160+
1161+
Sometimes you already have the data your module would expose, and you don't
1162+
want to re-derive it inside `main_fn`. The `Scanner` API lets you inject a
1163+
pre-built protobuf directly, bypassing `main_fn` entirely for that scan:
1164+
1165+
```rust
1166+
let rules = yara_x::Compiler::new()
1167+
.add_source(r#"import "foobar" rule r { condition: foobar.count == 99 }"#)?
1168+
.build();
1169+
1170+
let mut out = Foobar::new();
1171+
out.count = Some(99);
1172+
1173+
let mut scanner = yara_x::Scanner::new(&rules);
1174+
scanner.set_module_output(Box::new(out))?;
1175+
scanner.scan(data)?;
1176+
```
1177+
1178+
`set_module_output` identifies the target module by the type of the message
1179+
you pass in. YARA matches it against the `root_descriptor` you registered, so
1180+
the type must be exactly the message type declared as your module's root.
1181+
1182+
### Ensuring the module is linked
1183+
1184+
Rust's linker may discard your crate's `inventory::submit!` initializer if
1185+
nothing in the final binary directly references a symbol from your crate.
1186+
The safest workaround is to expose a no-op function and call it from the
1187+
binary's entry point:
1188+
1189+
```rust
1190+
/// Call this from your binary's `main` (or from test setup) to ensure
1191+
/// the linker keeps this crate's module registration.
1192+
pub fn ensure_registered() {}
1193+
```
1194+
1195+
```rust
1196+
fn main() {
1197+
my_module_crate::ensure_registered();
1198+
// ...
1199+
}
1200+
```
1201+
1202+
You can verify that the registration worked by checking the module registry:
1203+
1204+
```rust
1205+
my_module_crate::ensure_registered();
1206+
let names: Vec<&str> = yara_x::mods::module_names().collect();
1207+
assert!(names.contains(&"foobar"));
1208+
```
1209+
10121210
## Tests
10131211

10141212
You'll notice that each module in `/lib/src/modules/` has a `tests/`

0 commit comments

Comments
 (0)