-
Notifications
You must be signed in to change notification settings - Fork 11
Data model
Note: This is about main database of entities (IP, ASN, ...), events/alerts are stored in a separate database (EventDB)
All data about entities are stored in MongoDB. There is a collection for each entity type, containing a record (a JSON document) for each entity.
There are currently these collections/entity types:
| collection | description | data type _id keys |
|---|---|---|
| ip | IPv4 address | string (dotted-deciaml) |
| asn | Autonomous system | number |
Each record is stored as a dictionary (object), its keys are called attributes.
Identification of the entity (e.g. IP address or AS number) is stored in a special key _id (used by MongoDB as primary key).
Each entity record contains at least the follwing attributes:
| key name | data type | description |
|---|---|---|
_id |
string/number | ID of the entity, i.e. IP address, AS number, etc. |
ts_added |
ISODate (datetime in Python) | Time of record creation |
ts_last_update |
ISODate (datetime in Python) | Time of last update of the record |
Note: All times are always stored in UTC.
Attribute names may be hierarchical (using dot-notation), corresponding to nested dictionaries/objects.
In the documentation, attributes names are sometimes written prefixed with the entity type they are used for and a colon (:), e.g.: ip:geo.ctry or asn:descr.
Many attributes have hierarchical or somehow complex values. Format of attributes is as follows:
Single value (string/number/bool/null) directly under the main key or a fixed hierarchy of keys and subkeys (the hierarchy is used only to group related keys together).
<key>: <value>
<key>: {
<subkey>: <value>,
<subkey>: {
<subkey>: <value>,
<subkey>: <value>,
}
}
The attribute name is then composed by joining the key and subkeys with a dot, e.g. <key>.<subkey>.<subkey>.
Example:
"hostname": null,
"geo": {
"ctry": "CZ",
"city": "Prague"
}
If a value needs a confidence to be assigned, is's stored as follows:
<key>: {"v": <value>, "c": <confidence>}
Confidence is a real number between 0.0 and 1.0 (1.0 means 100% confidence). Confidence is optional, if it's not present, 1.0 is assumed.
If more values of the attribute are possible, each with different confidence, an array may used:
<key>: [
{"v": <value1>, "c": <confidence_of_value1>},
{"v": <value2>, "c": <confidence_of_value2>},
...
]
Each particular attribute should always use the same variant (i.e. with or without the array) for all entites.
The ".v" and ".c" are not considered part of the attribute name, this is composed only of the (and possible subkeys).
Example: TBD
TBD
Some attributes, like ip:evetns or ip:bl (blacklists) use their own special format. Formats of particualr attributes are described in Attributes.