-
Notifications
You must be signed in to change notification settings - Fork 11
Data model
Note: This is about the main database of entities (IP, ASN, ...), events/alerts are stored in a separate database (EventDB)
All data about entities are stored in MongoDB. There is a collection for each entity type, containing a record (a JSON document) for each entity.
There are currently these collections/entity types:
| collection | description | data type _id keys |
|---|---|---|
| ip | IPv4 address | string (dotted-decimal) |
| asn | Autonomous system | number |
Each record is stored as a dictionary (object), its keys are called attributes.
Identification of the entity (e.g. IP address or AS number) is stored in a special key _id (used by MongoDB as primary key).
Each entity record contains at least the follwing attributes:
| key name | data type | description |
|---|---|---|
_id |
string/number | ID of the entity, i.e. IP address, AS number, etc. |
ts_added |
ISODate (datetime in Python) | Time of record creation |
ts_last_update |
ISODate (datetime in Python) | Time of last update of the record |
Note: All times are always stored in UTC.
Attribute names may be hierarchical (using dot-notation), corresponding to nested dictionaries/objects.
In the documentation, attributes names are sometimes written prefixed with the entity type they are used for and a colon (:), e.g.: ip:geo.ctry or asn:descr.
Many attributes have hierarchical or somehow complex values. Format of attributes is as follows:
Single value (string/number/bool/null) directly under the main key or a fixed hierarchy of keys and subkeys (the hierarchy is used only to group related keys together).
<key>: <value>
<key>: {
<subkey>: <value>,
<subkey>: {
<subkey>: <value>,
<subkey>: <value>,
}
}
The attribute name is then composed by joining the key and subkeys with a dot, e.g. <key>.<subkey>.<subkey>.
Example:
"hostname": null,
"geo": {
"ctry": "CZ",
"city": "Prague"
}
If a value needs a confidence to be assigned, it's stored as follows:
<key>: {"v": <value>, "c": <confidence>}
Confidence is a real number between 0.0 and 1.0 (1.0 means 100% confidence). Confidence is optional, if it's not present, 1.0 is assumed.
If more values of the attribute are possible, each with different confidence, an array may be used:
<key>: [
{"v": <value1>, "c": <confidence_of_value1>},
{"v": <value2>, "c": <confidence_of_value2>},
...
]
Each particular attribute should always use the same variant (i.e. with or without the array, labeled as list+conf or conf) for all entites.
The ".v" and ".c" are not considered part of the attribute name, this is composed only of the <key> (and possible subkeys).
Example: TBD
TBD
Some attributes, like ip:events or ip:bl (blacklists) use their own special format. Formats of particualr attributes are described in Attributes.