- Enriching network traffic metadata via NetBox lookups
- Compare and highlight discrepancies between NetBox inventory and observed network traffic
- Populating the NetBox inventory
- Compare NetBox inventory with database of known vulnerabilities
- Preloading NetBox inventory
- Backup and restore
Malcolm can utilize an instance of NetBox, an open-source "solution for modeling and documenting modern networks." Users may either use Malcolm's embedded NetBox instance (available at at https://localhost/netbox/ if connecting locally), or Malcolm may connect to a remote NetBox instance not managed by Malcolm. This choice is made during configuration (this example or the NetBox section of Environment variable files in the documentation).
The design of a potentially deeper integration between Malcolm and Netbox is a work in progress.
Please see the NetBox page on GitHub, its documentation and its public demo for more information.
As Zeek logs and Suricata alerts are parsed and enriched (if the NETBOX_ENRICHMENT environment variable in ./config/netbox-common.env is set to true), the NetBox API will be queried for the associated hosts' information. If found, the information retrieved by NetBox will be used to enrich these logs through the creation of the following new fields. See the NetBox API documentation and the NetBox documentation for more information.
destination.…destination.device.cluster(/virtualization/clusters/) (for Virtual Machine device types)destination.device.device_type(/dcim/device-types/)destination.device.id(/dcim/devices/{id})destination.device.manufacturer(/dcim/manufacturers/)destination.device.name(/dcim/devices/)destination.device.role(/dcim/device-roles/)destination.device.service(/ipam/services/)destination.device.site(/dcim/sites/)destination.device.details(full JSON object, only withNETBOX_ENRICHMENT_VERBOSE: 'true')destination.segment.id(/ipam/prefixes/{id})destination.segment.name(/ipam/prefixes/{description})destination.segment.site(/dcim/sites/)destination.segment.tenant(/tenancy/tenants/)destination.segment.details(full JSON object, only withNETBOX_ENRICHMENT_VERBOSE: 'true')
source.…same asdestination.…- collected as
relatedfields (the same approach used in ECS)related.device_typerelated.device_idrelated.device_namerelated.manufacturerrelated.rolerelated.segmentrelated.servicerelated.site
For Malcolm's purposes, both physical devices and virtualized hosts will be stored as described above: the device_type field can be used to distinguish between them.
NetBox has the concept of sites. Sites can have overlapping IP address ranges. The site to associate with network traffic can be specified when PCAP is uploaded, when configuring live analysis, and when configuring forwarding from Hedgehog Linux. If not otherwise specified, the value of the NETBOX_DEFAULT_SITE variable in environment variable in netbox-common.env will be used for these enrichment lookups.
When NetBox enrichment is attempted for a log, the value netbox is automatically added to its tags field.
As Malcolm cross-checks network traffic with NetBox's model (as described above), the resulting enrichment data (or lack thereof) can highlight devices and services observed in network traffic for which there is no corresponding entry in the list of inventoried assets.
These uninventoried devices and services are highlighted in two dashboards:
- Zeek Known Summary - this dashboard draws from the periodically-generated
known_logs andsoftwarelogs to provide a summary of the known devices and services in the network. The Uninventoried Observed Services and Uninventoried Observed Hosts tables show services and hosts (by IP address) that weren't found when searched via the NetBox API.
- Asset Interaction Analysis - this dashboard contains much of the same information from the Zeek Known Summary dashboard, but it is from a traffic standpoint rather than just an "observed" standpoint. The Uninventoried Internal Source IPs, Uninventoried Internal Destination IPs and Uninventoried Internal Assets - Logs tables highlight communications involving devices not found when searched via the NetBox API.
This feature was implemented as described in idaholab/Malcolm#133.
While the initial effort of populating NetBox's network segment and device inventory manually is high, it is the preferred method to ensure creation of an accurate model of the intended network design.
The Populating Data section of the NetBox documentation outlines mechanisms available to populate data in NetBox, including manual object creation, bulk import, scripting and the NetBox REST API.
The following elements of the NetBox data model are used by Malcolm for Asset Interaction Analysis.
- Network segments
- Network Hosts
- Devices
- Virtual Machines
- IP Addresses
- Can be assigned to devices and virtual machines
- Other
If the NETBOX_AUTO_POPULATE environment variable in ./config/netbox-common.env is set to true, uninventoried devices with private IP addresses (as defined in RFC 1918 and RFC 4193) observed in known network segments will be automatically created in the NetBox inventory based on the information available. This value is set to true by answering Y to "Should Malcolm automatically populate NetBox inventory based on observed network traffic?" during configuration.
However, careful consideration should be made before enabling this feature: the purpose of an asset management system is to document the intended state of a network: with Malcolm configured to populate NetBox with the live network state, a network misconfiguration fault could result in an incorrect documented configuration.
Devices created using this autopopulate method will include a tags value of Autopopulated. It is recommended that users periodically review automatically-created devices for correctness and to fill in known details that couldn't be determined from network traffic. For example, the manufacturer field for automatically-created devices will be set based on the organizational unique identifier (OUI) determined from the first three bytes of the observed MAC address, which may not be accurate if the device's traffic was observed across a router. If possible, observed hostnames (extracted from logs that provide a mapping of IP address to host name, such as Zeek's dns.log, ntlm.log, and dhcp.log) will be used in the naming of the automatically-created devices, falling back to the device manufacturer otherwise (e.g., MYHOSTNAME vs. Schweitzer Engineering @ 10.10.0.123).
Since device autocreation is based on IP address, information about network segments (IP prefixes) must be first manually specified in NetBox in order for devices to be automatically populated. Users should populate the description field in the NetBox IPAM Prefixes data model to specify a name to be used for NetBox network segment autopopulation and enrichment, otherwise the IP prefix itself will be used.
Although network devices can be automatically created using this method, services should inventoried manually. The Uninventoried Observed Services visualization in the Zeek Known Summary dashboard can help users review network services to be created in NetBox.
See idaholab/Malcolm#135 for more information on this feature.
When passive device autopopulation is enabled, devices with addresses in private IP space will be autopopulated by default. You can control this behavior using the NETBOX_AUTO_POPULATE_SUBNETS environment variable in ./config/netbox-common.env. This variable accepts a comma-separated list of private CIDR subnets, with the following logic:
- If left blank, all private IPv4 and IPv6 address ranges (as defined in RFC 1918 and RFC 4193) will be autopopulated.
- Use an exclamation point (
!) before a CIDR to explicitly exclude that subnet. - If only exclusions are listed, all private IPs are allowed except those excluded.
- If both inclusions and exclusions are listed:
- Only addresses matching the allowed subnets will be considered.
- Among those, any matching excluded subnets will be rejected.
- Network base and broadcast addresses (e.g.,
.0and.255) are not considered assignable and will be ignored.
This variable is especially useful for excluding dynamic address ranges such as those used by DHCP, which should generally not trigger autopopulation in NetBox. Since these addresses can change frequently and aren't tied to specific devices, including them could result in inaccurate or noisy inventory data. By fine-tuning which private subnets are included or excluded, users can ensure that only meaningful, typically static assignments are autopopulated.
Users may wish to apply different CIDR subnet filters for autopopulation within different NetBox sites. To support this, the NETBOX_AUTO_POPULATE_SUBNETS environment variable can accept multiple site-specific entries, each specifying a NetBox site name or numeric site ID, followed by a colon (:), and a comma-separated list of subnet rules (just like the single-site case described above). Multiple site entries should be separated by semicolons (;).
If no matching site-specific rule is found, the default rule — defined using an asterisk (*) as the site key, or by omitting the site name or ID — will be used as a fallback if present. If no fallback is defined, then all private IPs are autopopulated by default.
192.168.100.0/24- Only allow addresses in
192.168.100.0/24
- Only allow addresses in
!172.16.0.0/12- Allow all private IPs except
172.16.0.0/12
- Allow all private IPs except
!10.0.0.0/8,10.0.10.0/24- Exclude
10.0.0.0/8generally, but allow10.0.10.0/24as an override
- Exclude
10.0.0.0/8,!10.0.10.0/16,10.0.10.5/32- Allow all of
10.0.0.0/8except10.0.10.0/16, but still allow10.0.10.5
- Allow all of
!fc00::/7,fd12:3456:789a:1::/64- Exclude all ULA IPv6 ranges, except a specific subnet
site1:10.0.0.0/8,!10.0.10.0/16,10.0.10.5/32;site2:!172.16.0.0/12;site3:!fc00::/7,fd12:3456:789a:1::/64;!192.168.0.0/16- Specify different autopopulation rules for different NetBox sites
Malcolm's NetBox inventory is prepopulated with a collection of community-sourced device type definitions which can then be augmented by users manually or through preloading. During passive autopopulation device manufacturer is inferred from organizationally unique identifiers (OUIs), which make up the first three octets of a MAC address. The IEEE Standards Association maintains the registry of OUIs, which is not necessarily very internally consistent with how organizations specify the name associated with their OUI entry. In other words, there's not a foolproof programattic way for Malcolm to map MAC address OUI organization names to NetBox manufacturer names, barring creating and maintaining a manual mapping (which would be very large and difficult to keep up-to-date).
Malcolm's [NetBox lookup code]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/logstash/ruby/netbox_enrich.rb) used in the log enrichment pipeline attempts to match OUI organization names against the list of NetBox's manufacturers using "fuzzy string matching", a technique in which two strings of characters are compared and assigned a similarity score between 0 (completely dissimilar) and 1 (identical). The NETBOX_DEFAULT_FUZZY_THRESHOLD environment variable in netbox-common.env can be used to tune the threshold for determining a match. A fairly high value is recommended (above 0.85; 0.95 is the default) to avoid autopopulating the NetBox inventory with devices with manufacturers that don't actually exist in the network being monitored.
Users may select between two behaviors for when the match threshold is not met (i.e., no manufacturer is found in the NetBox database which closely matches the OUI organization name). This behavior is specified by the NETBOX_DEFAULT_AUTOCREATE_MANUFACTURER environment variable in netbox-common.env:
NETBOX_DEFAULT_AUTOCREATE_MANUFACTURER=false- the autopopulated device will be created with the manufacturer set toUnspecifiedNETBOX_DEFAULT_AUTOCREATE_MANUFACTURER=true- the autopopulated device will be created along with a new manufacturer entry in the NetBox database set to the OUI organization name
See idaholab/Malcolm#136.
See idaholab/Malcolm#134.
If Malcolm is using its own embedded NetBox instance, YML files in [./netbox/preload]({{ site.github.repository_url }}/tree/{{ site.github.build_revision }}/netbox/preload/) under the Malcolm installation directory will be preloaded upon startup using the third-party netbox-initializers plugin. Examples illustrating the format of these YML files can be found at its GitHub repository.
If Malcolm is using its own embedded NetBox instance, the NetBox database may be backed up and restored using ./scripts/netbox-backup and ./scripts/netbox-restore, respectively. While Malcolm is running, run the following command from within the Malcolm installation directory to backup the entire NetBox database:
$ ./scripts/netbox-backup
NetBox configuration database saved to ('malcolm_netbox_backup_20230110-133855.gz', 'malcolm_netbox_backup_20230110-133855.media.tar.gz')
To clear the existing NetBox database and restore a previous backup, run the following command (substituting the filename of the netbox_….gz to be restored) from within the Malcolm installation directory while Malcolm is running:
./scripts/netbox-restore --netbox-restore ./malcolm_netbox_backup_20230110-125756.gz
Users with a prior NetBox database backup (created with netbox-backup as described above) that they wish to be automatically restored on startup, that .gz file may be manually copied to the ./netbox/preload directory. Upon startup that file will be extracted and used to populate the NetBox database, taking priority over the other preload files. This process does not remove the .gz file from the directory upon restoring it; it will be restored again on subsequent restarts unless manually removed.
Note that network log enrichment will fail while a restore is in progress (indicated with HTTP/1.1 403 messages in the output of the netbox container in the Malcolm debug logs), but should resume once the restore process has completed.

