Skip to content

Data model

Václav Bartoš edited this page Jul 31, 2017 · 7 revisions

Note: This is about the main database of entities (IP, ASN, ...), events/alerts are stored in a separate database (EventDB)

Organization of the database

All data about entities are stored in MongoDB. There is a collection for each entity type, containing a record (a JSON document) for each entity.

There are currently these collections/entity types:

collection description data type _id keys
ip IPv4 address string (dotted-decimal)
asn Autonomous system number

Records and attributes

Each record is stored as a dictionary (object), its keys are called attributes.

Identification of the entity (e.g. IP address or AS number) is stored in a special key _id (used by MongoDB as primary key).

Each entity record contains at least the follwing attributes:

key name data type description
_id string/number ID of the entity, i.e. IP address, AS number, etc.
ts_added ISODate (datetime in Python) Time of record creation
ts_last_update ISODate (datetime in Python) Time of last update of the record

Note: All times are always stored in UTC.

Attribute names may be hierarchical (using dot-notation), corresponding to nested dictionaries/objects.

In the documentation, attributes names are sometimes written prefixed with the entity type they are used for and a colon (:), e.g.: ip:geo.ctry or asn:descr.

Generic formats of attribute values

Many attributes have hierarchical or somehow complex values. Format of attributes is as follows:

Simple values (plain)

Single value (string/number/bool/null) directly under the main key or a fixed hierarchy of keys and subkeys (the hierarchy is used only to group related keys together).

<key>: <value>
<key>: {
    <subkey>: <value>,
    <subkey>: {
        <subkey>: <value>,
        <subkey>: <value>,
    }
}

The attribute name is then composed by joining the key and subkeys with a dot, e.g. <key>.<subkey>.<subkey>.

Example:

"hostname": null,
"geo": {
    "ctry": "CZ",
    "city": "Prague"
}

Values with confidence (conf, list+conf)

If a value needs a confidence to be assigned, it's stored as follows:

<key>: {"v": <value>, "c": <confidence>}

Confidence is a real number between 0.0 and 1.0 (1.0 means 100% confidence). Confidence is optional, if it's not present, 1.0 is assumed.

If more values of the attribute are possible, each with different confidence, an array may be used:

<key>: [
    {"v": <value1>, "c": <confidence_of_value1>},
    {"v": <value2>, "c": <confidence_of_value2>},
    ...
]

Each particular attribute should always use the same variant (i.e. with or without the array, labeled as list+conf or conf) for all entites.

The ".v" and ".c" are not considered part of the attribute name, this is composed only of the <key> (and possible subkeys).

Example: TBD

Tags

TBD

Special formats

Some attributes, like ip:events or ip:bl (blacklists) use their own special format. Formats of particualr attributes are described in Attributes.

Clone this wiki locally