Skip to content

Latest commit

 

History

History
97 lines (77 loc) · 2.9 KB

File metadata and controls

97 lines (77 loc) · 2.9 KB

Indexes

Table indexes are data structures that allow fast lookups by an indexed attribute or combination of attributes.

In DataJoint, indexes are created by one of the three mechanisms:

  1. Primary key
  2. Foreign key
  3. Explicitly defined indexes

The first two mechanisms are obligatory. Every table has a primary key, which serves as an unique index. Therefore, restrictions by a primary key are very fast. Foreign keys create additional indexes unless a suitable index already exists.

Indexes for single primary key tables

Let’s say a mouse in the lab has a lab-specific ID but it also has a separate id issued by the animal facility.

@schema
class Mouse(dj.Manual):
    definition = """
    mouse_id : int  # lab-specific ID
    ---
    tag_id : int  # animal facility ID
    """

In this case, searching for a mouse by mouse_id is much faster than by tag_id because mouse_id is a primary key, and is therefore indexed.

To make searches faster on fields other than the primary key or a foreign key, you can add a secondary index explicitly.

Regular indexes are declared as index(attr1, ..., attrN) on a separate line anywhere in the table declaration (below the primary key divide).

Indexes can be declared with unique constraint as unique index (attr1, ..., attrN).

Let’s redeclare the table with a unique index on tag_id.

@schema
class Mouse(dj.Manual):
    definition = """
    mouse_id : int  # lab-specific ID
    ---
    tag_id : int  # animal facility ID
    unique index (tag_id)
    """

Now, searches with mouse_id and tag_id are similarly fast.

Indexes for tables with multiple primary keys

Let’s now imagine that rats in a lab are identified by the combination of lab_name and rat_id in a table Rat.

@schema
class Rat(dj.Manual):
    definition = """
    lab_name : char(16)
    rat_id : int unsigned # lab-specific ID
    ---
    date_of_birth = null : date
    """

Note that despite the fact that rat_id is in the index, searches by rat_id alone are not helped by the index because it is not first in the index. This is similar to searching for a word in a dictionary that orders words alphabetically. Searching by the first letters of a word is easy but searching by the last few letters of a word requires scanning the whole dictionary.

In this table, the primary key is a unique index on the combination (lab_name, rat_id). Therefore searches on these attributes or on lab_name alone are fast. But this index cannot help searches on rat_id alone. Similarly, searing by date_of_birth requires a full-table scan and is inefficient.

To speed up searches by the rat_id and date_of_birth, we can explicit indexes to Rat:

@schema
class Rat2(dj.Manual):
    definition = """
    lab_name : char(16)
    rat_id : int unsigned # lab-specific ID
    ---
    date_of_birth = null : date

    index(rat_id)
    index(date_of_birth)
    """