Skip to content

Data model

Václav Bartoš edited this page Mar 9, 2017 · 7 revisions

Note: This is about main database of entities (IP, ASN, ...), events/alerts are stored in a separate database (EventDB)

Organization of database

All data about entities are stored in MongoDB. There is a collection for each entity type, containing a record (a JSON document) for each entity.

There are currently these collections/entity types:

collection description data type _id keys
ip IPv4 address string (dotted-deciaml)
asn Autonomous system number

Records and attributes

Each record is stored as a dictionary (object), its keys are called attributes.

Identification of the entity (e.g. IP address or AS number) is stored in a special key _id (used by MongoDB as primary key).

Each entity record contains at least the follwing attributes:

key name data type description
_id string/number ID of the entity, i.e. IP address, AS number, etc.
ts_added ISODate (datetime in Python) Time of record creation
ts_last_update ISODate (datetime in Python) Time of last update of the record

Note: All times are always stored in UTC.

Attribute names may be hierarchical (using dot-notation), corresponding to nested dictionaries/objects.

In the documentation, attributes names are sometimes written prefixed with the entity type they are used for and a colon (:), e.g.: ip:geo.ctry or asn:descr.

Generic formats of attribute values

Many attributes have hierarchical or somehow complex values. Format of attributes is as follows:

Simple values

Single value (string/number/bool/null) directly under the main key or a fixed hierarchy of keys and subkeys (the hierarchy is used only to group related keys together).

<key>: <value>
<key>: {
    <subkey>: <value>,
    <subkey>: {
        <subkey>: <value>,
        <subkey>: <value>,
    }
}

The attribute name is then composed by joining the key and subkeys with a dot, e.g. <key>.<subkey>.<subkey>.

Example:

"hostname": null,
"geo": {
    "ctry": "CZ",
    "city": "Prague"
}

Values with confidence

If a value needs a confidence to be assigned, is's stored as follows:

<key>: {"v": <value>, "c": <confidence>}

Confidence is a real number between 0.0 and 1.0 (1.0 means 100% confidence). Confidence is optional, if it's not present, 1.0 is assumed.

If more values of the attribute are possible, each with different confidence, an array may used:

<key>: [
    {"v": <value1>, "c": <confidence_of_value1>},
    {"v": <value2>, "c": <confidence_of_value2>},
    ...
]

Each particular attribute should always use the same variant (i.e. with or without the array) for all entites.

The ".v" and ".c" are not considered part of the attribute name, this is composed only of the (and possible subkeys).

Example: TBD

Tags

TBD

Special formats

Some attributes, like ip:evetns or ip:bl (blacklists) use their own special format. Formats of particualr attributes are described in Attributes.

Clone this wiki locally