YARA modules are the way in which you can extend YARA's capabilities by adding new data structures and functions that can be later used in your rules, making them more powerful and expressive. For example, a YARA module can parse a file format and expose to YARA any information extracted from the file that may be useful while creating YARA rules.
This document will guide you through the process of creating a YARA module.
For illustrative purposes we are going to create a text module that allows
creating YARA rules for plain-text files, based on the number of lines and
words
- Defining the module's structure
- Proto2 vs Proto3
- Tweaking the module's YAML output
- Implementing the module's main function
- Building your module
- Adding functions to your module
- String types in module functions
- Accessing the module's structure from a function
- Adding dependencies
- Using enums
- Creating a module in an external crate
- Tests
Most YARA modules define a data structure that is filled with information extracted from the scanned file and exposed to YARA with the purpose of being used in your rules. In YARA-X this structure is defined by using Protocol Buffers (protobuf from now on). The first thing you must do when creating your module is writing the protobuf that defines the module's structure. If you are not familiar with protobufs we recommend you to start by exploring its documentation and getting used to its syntax. You don't need to become an expert in protobufs for being able to define a YARA module, but some familiarity will certainly help.
Let's start defining the structure for our text module. For that, you must
create a text.proto file and put it into the lib/src/modules/protos
directory. As a starting point we can use the following file:
syntax = "proto2";
import "yara.proto";
package text;
option (yara.module_options) = {
name : "text"
root_message: "text.Text"
rust_module: "text"
cargo_feature: "text-module"
};
message Text {
optional int64 num_lines = 1;
optional int64 num_words = 2;
}This is a very simple module that exposes only two fields: num_lines and
num_words. Let's dissect the content of text.proto line by line:
syntax = "proto2";This indicates which version of the protobuf syntax you are going to use. There
are two possibilities here: proto2 and proto3. We are going to talk about
the differences between the two in Proto2 vs Proto3, but
for the time being let's use proto2. By the way, proto2 is the default
if you don't specify the syntax version, which means that the first line in
the file is actually optional. However, we recommend making your choice
explicit by always including this line at the beginning of your file. Let's
see the next line:
import "yara.proto";With this line we are importing certain YARA-specific definitions that will be
used later in the proto file. This line is required.
NOTE: If you are curious about the content of yara.proto you can find it
atyara-x-proto/src/yara.proto.
package text;The package statement is optional but highly recommended, as it prevents name
collisions between different YARA modules. If modules foo and bar both
define a structure named Foobar without specifying their own package names
it will confuse the protoc compiler when producing the Rust code for your
module. The resulting code can use foo.Foobar instead of bar.Foobar in
code generated for the bar module.
Let's start with the interesting part:
option (yara.module_options) = {
name : "text"
root_message: "text.Text"
cargo_feature: "text-module"
};The snippet above is also required for every .proto file that describes a
YARA module. This is what tells that this file is not an ordinary .proto
file, but one describing a module. In fact, you can put any .proto file in the
lib/src/modules/protos directory, and that doesn't mean that each of those
files is describing a YARA module. Only files containing a yara.module_options
section will define a module.
Options name and root_message are required, while cargo_feature is optional.
The name option defines the module's name. This is the name that will be used
for importing the module in a YARA rule, in this case our module will be imported
with import "text".
The cargo_feature option indicates the name of the feature that controls
whether
the module is built or not. If this option is not specified the module is always
built, but if you specify a feature name, this feature name must also be
included
in the Cargo.toml file, and the module will be built only when this cargo
feature is enabled.
The root_message option a very important option indicating which is the
module's
root structure, it must contain the name of some structure (a.k.a. message)
defined in the .proto file. In our case the value for root_message is
"text.Text" because we have defined our module's structure in a message named
Text, which is under package text. In general the value in this field will
have the form package.Message, except if the package statement is missing,
in which case it would be the name of the message alone (i.e: Text).
The root_message field is required because your .proto file can define
multiple messages, and YARA needs to know which of them is considered the root
message for the module. Without this field YARA can't know which message to
use.
And here is our root structure/message:
message Text {
optional int64 num_lines = 1;
optional int64 num_words = 2;
}This is a very simple structure with only two integer fields. Notice that the
numbers after the = signs are not the values for those fields, they are
actually field tags (i.e: a unique number identifying each field in a message).
This may be confusing if you are not familiar with protobuf's syntax, so again:
explore the
protobuf's documentation.
Also notice that we are defining our fields as optional. In proto2 fields
must be either optional or required, while in proto3 they are always
optional and can't be forced to be required. We are going to discuss this topic
more in-depth in the Proto2 vs Proto3 section.
With this we have defined a very simple module that could be used in a YARA rule like this:
import "text"
rule text_with_100_words {
condition:
text.num_words == 100
}Of course, we are not done yet. So far we have defined the structure of our
module, but we need to populate that structure with actual values extracted from
the scanned files. That's where the module's main function enters into play. But
before going into the details of how to implement this function, let's discuss
the differences between proto2 and proto3, as they may determine some
aspects
of our implementation.
When writing a YARA module you can choose between proto2 and proto3 for
writing the protobuf that describes your module's structure. Both of them
are very similar, but they also have some differences.
One of the most important differences is that in proto2 fields are either
optional or required, while in proto3 all fields are optional, and
therefore
the keywords optional and required are not accepted at all. But this
difference
goes beyond a simple syntactic matter, it also affects the way in which missing
fields are handled.
In proto2, when some structure contains a required field, this field must
be initialized to some value before the structure gets serialized. If you don't
provide a value for a required field, an error occurs while serializing the
structure. In the other hand, optional fields don't need to be initialized,
they simply won't be included in the serialized data, and they will be missing
after the data is deserialized. In fact, if you have a
field optional int64 foo
in your protobuf, the corresponding Rust structure will have a field
foo: Option<i64>. If foo is not initialized, its value after deserializing
the structure will be None. This means that in proto2 you can know whether
the field foo was originally initialized to some value, or it was simply left
uninitialized.
In proto3 things are a bit different. All fields in proto3 are optional,
but uninitialized fields will default to a value that depends on the type.
For example, for numeric fields the default value is zero, and for string fields
the default value is the empty string. This means that in proto3 a protobuf
field int64 foo is translated into a field foo: i64. If foo is not
initialized, it will behave as if it was initialized to 0. After deserializing
your structure, you won't be able to tell if foo was explicitly set to 0,
or if it was left uninitialized.
This subtle difference is important when creating a YARA module, because
YARA has the concept of undefined values. A value in YARA is undefined when
it hasn't been initialized to some meaningful value. When proto2 is used for
defining the structure of your module, uninitialized fields will have an
undefined value. In the other hand, when proto3 is used all fields will
have a value, even if you don't initialize it explicitly. If a field is not
initialized it will have its corresponding default value.
The bottom line is that with proto3 you won't be able to have fields with
undefined values. In some cases that may be what you need, and using proto3
is very useful in such cases, as you don't need to explicitly initialize all
the fields in your structure.
The yr dump command outputs the structure generated by one or more YARA
modules, presenting the information in either JSON or YAML format. The default
output format is YAML, because of its inherent human-friendly nature.
Nevertheless, you can help YARA to further enhance the quality of produced
YAML outputs.
Certain integer fields find a more intuitive representation in hexadecimal rather than decimal format. To communicate this preference to YARA, a dedicated configuration option can be employed. Consider the following illustrative example:
message Macho {
optional uint32 magic = 1 [(yara.field_options).fmt = "x"];
}
Here, [(yara.field_options).fmt = "x"] instructs YARA to portray the magic
field in hexadecimal format (i.e., "x"). Consequently, the output will display
magic: 0xfeedfacf instead of the less readable magic: 4277009103.
Supported format options also includes "t" for timestamps. For example:
optional uint32 my_timestamp = 1 [(yara.field_options).fmt = "t"];
In this scenario, the output would be rendered as follows:
my_timestamp: 999999999 # 2001-09-09 01:46:39 UTCOnce you have a .proto file that describes the structure of your module you
need write the logic that parses every scanned file and fills the module's
structure with the data obtained from the file. This is done by implementing
a function that will act as the entry point for your module.
There are two options for creating our text module:
- Creating a
text.rsfile inlib/src/modules. - Creating a directory
lib/src/modules/textcontaining amod.rsfile.
For simple modules that can be contained in a single .rs file the first
approach is enough. For more complex modules with multiple source files the
second approach is the recommended one.
So, let's create our lib/src/modules/text.rs file:
use crate::mods::prelude::*;
use crate::modules::protos::text::*;
fn main(data: &[u8], _meta: Option<&[u8]>) -> Result<Text, ModuleError> {
let mut text_proto = Text::new();
// TODO: parse the data and populate text_proto.
Ok(text_proto)
}
register_module!("text", Text, main);This is the simplest possible code for a YARA module, and it doesn't do anything special yet. Let's describe what it does in detail:
use crate::mods::prelude::*;This first line is very important as it imports all the dependencies required by a YARA module. All your modules must start by importing the module's prelude, otherwise compilation will fail.
use crate::modules::protos::text::*;Here we are importing all the Rust types automatically generated from our
text.proto file. This includes the Text structure, which represents the
Text message declared in the proto file.
NOTE: If your protobuf file is foobar.proto, the module created for it
will be crate::modules::protos::foobar
Next comes the module's main function and the module registration:
fn main(data: &[u8], _meta: Option<&[u8]>) -> Result<Text, ModuleError> {
...
}
register_module!("text", Text, main);The module's main function is called for every file scanned by YARA. This
function receives a byte slice with the content of the file being scanned and an
optional byte slice with per-scan metadata, and it returns a Result containing the
Text structure that was generated from the text.proto file (or a ModuleError).
Registering the module is as simple as calling the register_module! macro.
It takes the name of the module (as used in YARA rules' import statements), the
protobuf message type returned by the module, and the main function name. If the
module is a data-only module with no main function, the third argument can be omitted.
Let's finish the implementation of the main function for our text module.
use crate::mods::prelude::*;
use crate::modules::protos::text::*;
use std::io;
use std::io::BufRead;
fn main(data: &[u8], _meta: Option<&[u8]>) -> Result<Text, ModuleError> {
// Create an empty instance of the Text protobuf.
let mut text_proto = Text::new();
let mut num_lines = 0;
let mut num_words = 0;
// Create cursor for iterating over the lines.
let cursor = io::Cursor::new(data);
// Count the lines and words in the file.
for line in cursor.lines() {
match line {
Ok(line) => {
num_words += line.split_whitespace().count();
num_lines += 1;
}
Err(_) => return Ok(text_proto),
}
}
// Set the value for fields `num_lines` and `num_words` in the protobuf.
text_proto.set_num_lines(num_lines as i64);
text_proto.set_num_words(num_words as i64);
// Return the Text proto after filling the relevant fields.
Ok(text_proto)
}
register_module!("text", Text, main);That's all you need for having a fully functional YARA module. Now, let's build it!
After creating the files lib/src/modules/protos/text.proto and
lib/src/modules/text.rs we are almost ready for building the module into
YARA. But there are few more pending steps.
The first thing that you must know is that your module is behind a cargo
feature
flag. The module won't be built into YARA unless you tell cargo to do so. The
name of the feature controlling your module is text-module, and this feature
must be added to the [features] section in the lib/Cargo.toml file (not
the top-level Cargo.toml, but the one in the lib directory)
[features]
text-module = [] # Add this line to lib/Cargo.tomlAdditionally, if you want your module to be included by default in YARA, add
the feature name to the default list in the [features] section of
lib/Cargo.toml:
[features]
text-module = []
# Features that are enabled by default.
default = [
"constant-folding",
"test_proto2-module",
"test_proto3-module",
"text-module" # The text module will be included by default
]If the module's feature flag is not added to the default list, you must
explicitly tell cargo that you want to build YARA with your module:
cargo build --features=text-moduleNOTE: The name of the feature flag depends on the module's name. If the
module's name is foobar, the feature flag is foobar-module.
Congratulations! Now your text module can be used!
YARA modules not only expose structured data that can be used in rules. They
also allow to expose functions that can be called from YARA rules. Suppose that
you want to add a function to the text module that returns the N-th line in
the file.
Let's add a function called get_line to the lib/src/modules/text.rs
file. This function receives the line number and returns a string with the
corresponding line. Let's take a look at the implementation:
#[module_export]
fn get_line(ctx: &mut ScanContext, n: i64) -> Option<RuntimeString> {
let cursor = io::Cursor::new(ctx.scanned_data());
if let Some(Ok(line)) = cursor.lines().nth(n as usize) {
Some(RuntimeString::from_slice(ctx, line))
} else {
None
}
}The first thing you may have noticed is that the function has the
#[module_export] attribute. This is how you indicate that your module exports
that function to YARA rules. The function will appear as a field of the text
module, and will have the same name as in the Rust code, so you can call it
from a YARA rule like text.get_line(0).
The second important thing is that the function's first argument must be either
&mut ScanContext or &ScanContext. Of course this first argument won't be
part of the function's signature from the YARA rule standpoint. Any other
argument that you add to the function will be part of its signature when invoked
from YARA. In our example, the n: i64 argument is where we pass the line
number to the function.
As types in YARA are limited to integers, floats, booleans and strings, the types that you can use for function arguments or return types is limited to a handful of Rust types. The following tables summarizes the accepted argument and return types:
| Rust type | YARA type |
|---|---|
i32 |
integer |
i64 |
integer |
f32 |
float |
f64 |
float |
bool |
bool |
RuntimeString |
string |
| Rust type | YARA type |
|---|---|
i32 |
integer |
i64 |
integer |
f32 |
float |
f64 |
float |
bool |
bool |
RuntimeString |
string |
Option<i32> |
integer / undefined if None |
Option<i64> |
integer / undefined if None |
Option<f32> |
float / undefined if None |
Option<f64> |
float / undefined if None |
Option<bool> |
bool / undefined if None |
Option<RuntimeString> |
string / undefined if None |
One noticeable difference between arguments and return types is that in return
types you can use Option<T>, where T is one of the acceptable Rust types.
When a function returns None, that's translated to undefined in YARA's
world.
Also notice that string types are always passed around in the form of a
RuntimeString instance, which represents an arbitrary sequence of bytes. The
standard Rust types (String, &str, &[u8], etc) are not allowed. In the
next section were going to discuss more about how to receive and return strings
in module functions.
In our example, the get_line function is defined at the top level of the
module's namespace, which means that it will be invoked from YARA as
text.get_line(..). But sometimes we want our function to appear as a member
of some inner structure. For example, suppose that we have a module with the
following structure:
message SomeStruct {
optional int64 foo = 1;
optional int64 bar = 2;
}
message MyModule {
optional SomeStruct some_struct = 1;
}Here MyModule is the module's root structure, and it has a
field some_structure
described by SomeStruct. What if you want a function some_function that
appears to be a member of some_structure? In other words, you want to invoke
your function in YARA like this...
import "my_module"
rule my_rule {
condition:
my_module.some_structure.some_function()
}Well, in that case the function must be named explicitly, including in the
name the full path of the function relative to the module's main structure.
This is done by providing a name argument to the module_export attribute.
For example:
#[module_export(name = "some_structure.some_function")]
pub(crate) fn some_function(ctx: &mut ScanContext) {
// ...
}When a name argument is passed to module_export, the function's name in Rust
is ignored, and the explicitly provided name is used instead. This name is the
full path of the function, relative to the module.
This mechanism for choosing the name of your function explicitly also comes
handy
for function overloading (i.e: using the same name for functions that differ in
their signatures). For example, suppose that you want to implement a add
function that can receive either integer or floating point numbers. The Rust
language doesn't support function overloading, but you can provide two different
implementations and force them to have the same name in YARA.
#[module_export(name = "add")]
pub(crate) fn add_i64(ctx: &mut ScanContext, a: i64, b: i64) -> i64 {
a + b
}
#[module_export(name = "add")]
pub(crate) fn add_f64(ctx: &mut ScanContext, a: f64, b: f64) -> f64 {
a + b
}When you want to receive an argument or return a value of type string you must
use the RuntimeString type. This type is an enum with three variants:
RuntimeString::LiteralRuntimeString::ScannedDataSliceRuntimeString::Rc
RuntimeString::Literal is used when the string is a literal in the YARA rule.
For example, if your rule uses the
expression my_module.my_func("foo"), "foo"
is a literal and the function my_func will receive a RuntimeString::Literal
argument. Instances of RuntimeString::Literal are normally created by YARA and
passed as arguments to functions. Functions won't return instances of
RuntimeString::Literal created by themselves.
RuntimeString::ScannedDataSlice represents a string that is a slice of the
scanned data. This variant is useful when you want to return some string that
is part of the scanned data, without having to make a copy of it. Internally,
this variant simply contains the offset within the data where the string starts
and its length, so it's a very similar to Rust slices.
RuntimeString::Rc is a reference-counted string that is released when all
references are dropped. This is the variant used when the string you are
returning from your function is not part of the scanned data, and therefore
needs to reside in its own memory.
Regardless of the variant, RuntimeString has a as_bstr method that allows
you to obtain a reference to the actual string. This method receives
a &ScanContext
and returns a &BStr. The &BStr type is equivalent to &str, but it doesn't
require that the string must be a valid UTF-8, as &str does. Aside from that,
&BStr behaves almost exactly to &str and has the same methods. You can find
more information in the documentation for
the bstr
crate.
For creating an instance of RuntimeString you must either
use RuntimeString::new
or RuntimeString::from_slice. RuntimeString::new creates the runtime string
by taking ownership of a String, Vec<u8>, or any type that implements
Into<Vec<u8>.
In the other hand, RuntimeString::from_slice receives a &[u8]
and creates the runtime string by making a copy of the slice, except if the
slice lies within the boundaries of the scanned data, in which case the returned
variant is RuntimeString::ScannedDataSlice.
/// A function that always returns the string "foo".
#[module_export]
fn foo(ctx: &mut ScanContext) -> RuntimeString {
RuntimeString::from_slice("foo".as_bytes())
}/// A function that receives a string and returns it in uppercase.
#[module_export]
fn uppercase(ctx: &mut ScanContext, s: RuntimeString) -> RuntimeString {
// Obtain a &BStr pointing to the actual string content.
let s = s.as_bstr(ctx);
// &BStr has the same methods than &str, including to_uppercase.
let s = s.to_uppercase();
// Returns RuntimeString::Rc with the new string.
RuntimeString::new(s)
}/// A function that returns a string with the first n bytes of the scanned data.
///
/// If the data is smaller than n bytes the result will be `None`, which is
/// treated in YARA as `undefined`.
#[module_export]
fn head(ctx: &mut ScanContext, n: i64) -> Option<RuntimeString> {
// Get the first n bytes, or return None.
let head = ctx.scanned_data().get(0..n as usize)?;
// Returns RuntimeString::ScannedDataSlice, as the `head` slice is contained
// within the scanned data.
Some(RuntimeString::from_slice(ctx, head))
}When we discussed how to implement the module's main function, we saw that this function returns a protobuf structure with information that is usually extracted away from the data being scanned. The fields in this structure can be used directly in your YARA rules, but sometimes we need to use them from within module functions as well. This is useful for implementing functions that rely on information that was already extracted from the scanned data and stored in that protobuf message.
Let's go back to our text module example, which already exports two fields:
num_lines and num_words. Now suppose that you want to implement a function
avg_words_per_line, that returns the result of num_words / num_lines. Of
course, you could add a new field and initialize it with the correct value in
the main function, so this is not a very realistic example, but let's assume
that you don't want to resort to using a new field, but want a function instead.
How do you access the values of num_words and num_lines from the
avg_words_per_line function? Let's see it:
#[module_export]
fn avg_words_per_line(ctx: &mut ScanContext) -> Option<f64> {
// Obtain a reference to the `Text` protobuf that was returned by the
// module's main function.
let text = ctx.module_output::<Text>()?;
let num_lines = text.num_lines? as f64;
let num_words = text.num_words? as f64;
Some(num_words / num_lines)
}The secret is in this line:
let text = ctx.module_output::<Text>() ?;The ScanContext type has a module_output method that is generic over
type T,
while calling this method you must specify the type of the protobuf associated
to your module (Text in this case). This is done by using Rust's
turbofish syntax
(i.e: module_output::<T>()). Notice that this method returns Option<&T>.
Most of the time your module is going to depend on external crates. Let's say
we want to add a new feature to our text module that detects the language in
which the text is written. For that, we are going to rely on the existing crate
lingua.
The first step is adding the lingua to the [dependencies] section in
lib/Cargo.toml, as any other dependency for the project. For example:
[dependencies]
lingua = { version = "1.4.0", optional = true }Notice the use optional = true for preventing this crate from being compiled
by default. We want to build the lingua crate only when the text-module
feature flag is set. This done by adding "dep:lingua" to the feature as
shown below.
[features]
text-module = ["dep:lingua"]Now we can use the lingua crate in our text module, so let's add a function
for detecting the language:
use lingua::LanguageDetectorBuilder;
#[module_export]
fn language(ctx: &ScanContext) -> Option<i64> {
let data = ctx.scanned_data();
// Use `as_bstr()` for getting the scanned data as a `&BStr` instead of a
// a `&[u8]`. Then call `to_str` for converting the `&BStr` to `&str`. This
// operation can fail if the scanned data is not valid UTF-8, in that case
// returns `None`, which is interpreted as `undefined` in YARA.
let text = data.as_bstr().to_str().ok()?;
// Configure the languages that we want to detect.
let detector = LanguageDetectorBuilder::from_languages(&[
lingua::Language::English,
lingua::Language::French,
lingua::Language::German,
lingua::Language::Spanish,
])
.build();
// Detect the language. Returns `None` if the language cannot be reliably
// detected.
let language = detector.detect_language_of(text)?;
// `language` is an enum that has only unit variants, it can be casted to
// `i64` for getting the numeric value.
Some(language as i64)
}The code above has a problem, though. We are returning an i64 where each
value represents a language, but writing YARA conditions like
text.language() == 3 is not very readable. Which is language 3? This is
where enums come into play.
In the previous section we implemented the language function that returns a
numeric value indicating the language in which a text file is written. However,
using numeric literals like 1, 2 or 3 for identifying languages is not
readable, it would be much better if we could associate those literals to more
descriptive symbols like english, spanish and french. In cases like this
enums are really useful. Let's add this enum definition to the text.proto
file:
enum Language {
English = 1;
Spanish = 2;
French = 3;
German = 4;
}Now let's rewrite the language function:
#[module_export]
fn lang(ctx: &ScanContext) -> Option<i64> {
let data = ctx.scanned_data();
// Use `as_bstr()` for getting the scanned data as a `&BStr` instead of a
// a `&[u8]`. Then call `to_str` for converting the `&BStr` to `&str`. This
// operation can fail if the context is not valid UTF-8, in that case
// returns `None`, which is interpreted as `undefined` in YARA.
let text = data.as_bstr().to_str().ok()?;
let detector = LanguageDetectorBuilder::from_languages(&[
lingua::Language::English,
lingua::Language::French,
lingua::Language::German,
lingua::Language::Spanish,
])
.build();
// Detect the language. Convert the result returned by `lingua` to our
// own enum defined in the protobuf.
let language = match detector.detect_language_of(text)? {
lingua::Language::English => Language::English,
lingua::Language::French => Language::French,
lingua::Language::German => Language::German,
lingua::Language::Spanish => Language::Spanish,
_ => unreachable!(),
};
Some(language as i64)
}In the snippet above, constants Language::English, Language::French,
Language::German and Language::Spanish are the ones defined in the
text.proto file. They are accessible in our code because of the
use crate::modules::protos::text::*; statement at the beginning of the source
file, which imports all the types defined by the protobuf.
Also notice how the match statement is converting the result returned by
the lingua crate, which is one of the values in the lingua::Language enum,
to our own enum. This is because relying on the actual numeric value associated
to each alternative in lingua::Language (the discriminant in Rust terms) is
not a good idea. If the developers of lingua alter the order in which the
languages appear in the lingua::Language enum, this would also change their
respective numeric values (or discriminant).
After this change we can write a YARA rule like:
import "text"
rule text_in_english {
condition:
text.language() == text.Language.English
}In our example the values in the enum started at 1 and where consecutive. But this is not a requirement, you can associate arbitrary values to each enum item, and use hexadecimal numbers for better legibility. The following enum is also valid:
enum Threshold {
Low = 100;
Medium = 400;
High = 900;
}However, because tag numbers are of type i32, the range of possible values
goes from i32::MIN to i32::MAX. For larger values you need to use an
alternative approach:
enum MachO {
MAGIC = 0 [(yara.enum_value).i64 = 0xfeedface];
CIGAM = 1 [(yara.enum_value).i64 = 0xcefaedfe];
}In the enum above the values of MAGIC and CIGAM are not 0 and 1 but
0xfeedface and 0xcefaedfe respectively. Tag numbers are still present
because they are required in protobuf, however, their values are irrelevant.
The (yara.enum_value).i64 option has priority when assigning a value to each
enum item, and it allows setting values from i64::MIN to i64::MAX.
As you may have noticed in the examples above, the name path for accessing some
enum value in your YARA rules includes the name of the enum itself. For
instance,
if you want to use the Low value from the Threshold enum, declared in some
acme module, you would write acme.Threshold.Low.
The recommended way for naming enums and is that the enum's name itself should
describe the whole class of values it contains, and items in the enum should
describe specific items (i.e: acme.Threshold.Low is preferred over
acme.ThresholdLow or acme.Threshold_Low). However, some existing YARA
modules
don't follow this guideline, and have a multiple groups of constants defined at
the module's top level. For example, the macho module contains, among others,
the following constants (their values are included for reference):
macho.CPU_TYPE_MC680X0= 0x06macho.CPU_TYPE_X86= 0x07macho.CPU_TYPE_ARM= 0x0cmacho.CPU_SUBTYPE_386= 0x03macho.CPU_SUBTYPE_486= 0x04macho.CPU_SUBTYPE_586= 0x05macho.CPU_SUBTYPE_ARM_V4T= 0x05macho.CPU_SUBTYPE_ARM_V5= 0x07macho.CPU_SUBTYPE_ARM_V6= 0x06
All these constants can't be put together in a single enum, as some of them
share the same value, (e.g: both CPU_SUBTYPE_586 and CPU_SUBTYPE_ARM_V4T
are equal to 0x05), and protobuf enums can't have duplicated values. The
solution could be grouping them in different enums, like for example:
enum CPU_TYPE {
MC680X0 = 0x06;
X86 = 0x007;
ARM = 0x0c;
}
enum CPU_SUBTYPE_INTEL {
I386 = 0x03;
I486 = 0x04;
I586 = 0x05;
}
enum CPU_SUBTYPE_ARM {
V4T = 0x05;
V5 = 0x07;
V6 = 0x06;
}The problem with this approach is that you would need use expressions like these in your YARA rules:
macho.CPU_TYPE.ARMmacho.CPU_SUBTYPE_ARM.V5macho.CPU_SUBTYPE_INTEL.I386
This may be ok if you are writing a new module, but it's not enough if you are
porting an existing module like macho, where maintaining the same names for
constants is a must. What to do then? Inline enums comes to your rescue.
An inline enum is one that puts their items directly at the module's top level, or the structure where the enum is declared, without introducing an additional struct named after the enum itself. For instance:
enum CPU_TYPE {
option (yara.enum_options).inline = true;
CPU_TYPE_MC680X0 = 0x06;
CPU_TYPE_X86 = 0x007;
CPU_TYPE_ARM = 0x0c;
}
enum CPU_SUBTYPE_INTEL {
option (yara.enum_options).inline = true;
CPU_SUBTYPE_I386 = 0x03;
CPU_SUBTYPE_I486 = 0x04;
CPU_SUBTYPE_I586 = 0x05;
}
enum CPU_SUBTYPE_ARM {
option (yara.enum_options).inline = true;
CPU_SUBTYPE_V4T = 0x05;
CPU_SUBTYPE_V5 = 0x07;
CPU_SUBTYPE_V6 = 0x06;
}With the enums above you can refer to macho.CPU_TYPE_X86 and instead of
macho.CPU_TYPE.CPU_TYPE_X86 and macho.CPU_SUBTYPE_INTEL.CPU_SUBTYPE_I386.
Everything described so far assumes that your module lives inside the
yara-x repository itself. This is the right approach when you intend to
contribute the module upstream, but it requires modifying the yara-x source
tree and rebuilding the library. If you want to ship a module as part of your
own crate—without forking or patching yara-x—you can use the custom
modules API instead.
Custom modules are registered at link time through the
inventory crate. When your crate is linked into
a binary together with yara-x, the module is discovered automatically and
behaves exactly like a built-in module: it can be import-ed in rules, its
fields are accessible in conditions, and its functions are callable.
Add yara-x and protobuf as dependencies:
[dependencies]
yara-x = { version = "..." }
protobuf = { version = "3" }
[build-dependencies]
protobuf-codegen = { version = "3" }Just like built-in modules, an external module's structure is described by a
.proto file. The difference is that instead of placing it inside the
yara-x source tree and relying on yara-x's own build script, you compile
it yourself in build.rs using protobuf-codegen:
// proto/foobar.proto
syntax = "proto2";
package foobar;
message Foobar {
optional uint64 count = 1;
optional string label = 2;
repeated string tags = 3;
}// build.rs
fn main() {
println!("cargo:rerun-if-changed=build.rs");
println!("cargo:rerun-if-changed=proto");
protobuf_codegen::Codegen::new()
.pure()
.cargo_out_dir("protos")
.include("proto")
.input("proto/foobar.proto")
.run_from_script();
}The .pure() call tells protobuf-codegen to use its built-in Rust parser,
so you do not need to have protoc installed. The generated code is placed
under $OUT_DIR/protos/, and you include it from your library like this:
pub mod proto {
include!(concat!(env!("OUT_DIR"), "/protos/mod.rs"));
}
pub use proto::foobar::Foobar;With the proto in place, registering the module requires two things: a main
function and a call to yara_x::register_module!. The main function follows the same
contract as in built-in modules—it receives the scanned data, populates the
protobuf, and returns it:
use yara_x::errors::ModuleError;
fn foobar_main(
data: &[u8],
_meta: Option<&[u8]>,
) -> Result<Foobar, ModuleError> {
let mut out = Foobar::new();
out.count = Some(data.len() as u64);
out.label = Some("foobar".to_owned());
Ok(out)
}
yara_x::register_module!("foobar", Foobar, foobar_main);A few things to note here:
- The first argument is the string used in
import "foobar"in YARA rules. - The second argument is the root protobuf message type. The root descriptor is automatically obtained from this type.
- The third argument is the main function. If your module is data-only (the caller always
injects the output via
set_module_output), it can be omitted. - The macro automatically sets the Rust module name path via
module_path!()so that YARA can find any callable functions.
If a custom module shares its name with a built-in module, the built-in one takes precedence and your registration is silently ignored.
Custom modules can also export functions that are callable from YARA rules, the
same way built-in modules do with #[module_export]. From an external crate
you use the same attribute, but you must pass an extra yara_x_crate
argument so that the macro can generate fully-qualified type references:
pub mod fns {
use yara_x::ScanContext;
#[yara_x::module_export(yara_x_crate = "yara_x")]
pub fn add(_ctx: &ScanContext, a: i64, b: i64) -> i64 {
a + b
}
}The yara_x_crate = "yara_x" argument is required whenever the macro is used
outside of the yara-x source tree. Without it the macro generates bare type
names (Caller, ScanContext, etc.) that are only in scope inside yara-x.
The function signature rules are exactly the same as for built-in modules— see Valid function arguments and Valid return types.
After this, the add function is callable from YARA rules as foobar.add(a, b):
import "foobar"
rule add_works {
condition:
foobar.add(3, 4) == 7
}Sometimes you already have the data your module would expose, and you don't
want to re-derive it inside main_fn. The Scanner API lets you inject a
pre-built protobuf directly, bypassing main_fn entirely for that scan:
let rules = yara_x::Compiler::new()
.add_source(r#"import "foobar" rule r { condition: foobar.count == 99 }"#)?
.build();
let mut out = Foobar::new();
out.count = Some(99);
let mut scanner = yara_x::Scanner::new(&rules);
scanner.set_module_output(Box::new(out))?;
scanner.scan(data)?;set_module_output identifies the target module by the type of the message
you pass in. YARA matches it against the root_descriptor you registered, so
the type must be exactly the message type declared as your module's root.
Rust's linker may discard your crate's inventory::submit! initializer if
nothing in the final binary directly references a symbol from your crate.
The safest workaround is to expose a no-op function and call it from the
binary's entry point:
/// Call this from your binary's `main` (or from test setup) to ensure
/// the linker keeps this crate's module registration.
pub fn ensure_registered() {}fn main() {
my_module_crate::ensure_registered();
// ...
}You can verify that the registration worked by checking the module registry:
my_module_crate::ensure_registered();
let names: Vec<&str> = yara_x::mods::module_names().collect();
assert!(names.contains(&"foobar"));You'll notice that each module in /lib/src/modules/ has a tests/
directory with a nested testdata/ directory. The testing framework is
expecting a particular format and input structure to use them:
- File converted to Intel Hex format format
- Intel Hex format output zipped
- End file named <sha_256>.in.zip
To convert the binary to Intel Hex format, we can use the various tools, depending on the operating system.
To start, we need the raw binary with the sha256 of the binary as its
identifier. This can be done differently on various platforms. These steps
assume the binary is named its sha256 hash (<sha256_hash> is used as a
placeholder).
You can leverage objcopy.
objcopy -I binary -O ihex <sha256_hash> <sha256_hash>.inYou can leverage llvm-objcopy.
llvm-objcopy -I binary -O ihex <sha256_hash> <sha256_hash>.inIf you cannot use objcopy or llvm-objcopy on your current OS, you can use
the provided scripts
at python-intelhex/intelhex,
assuming you can run Python.
bin2hex.py <sha256_hash> <sha256_hash>.inYou can then archive the file into a zip archive using an archival utility and move it to the appropriate test directory. An example of that is below:
zip <sha256_hash>.in.zip <sha256_hash>.in
mv <sha256_hash>.in.zip <location_of_yara-x>/lib/src/modules/<module>/tests/testdata/If you need the files back in binary form, you can inverse the steps above.
You can then unarchive the file using an archival utility. An example of that is below:
unzip <sha256_hash>.in.zipYou can leverage objcopy.
objcopy -I ihex -O binary <sha256_hash>.in <sha256_hash>You can leverage llvm-objcopy.
llvm-objcopy -I ihex -O binary <sha256_hash>.in <sha256_hash>If you cannot use objcopy or llvm-objcopy on your current OS, you can use
the provided scripts
at python-intelhex/intelhex,
assuming you can run Python.
hex2bin.py <sha256_hash>.in <sha256_hash>