| title | Dolt System Tables |
|---|
dolt_branches contains information about branches known to the database.
Because the branch information is global to all clients, not just your
session, dolt_branches system table is read-only. Branches can be created
or deleted with the DOLT_BRANCH() stored procedure.
+------------------------+----------+
| Field | Type |
+------------------------+----------+
| name | TEXT |
| hash | TEXT |
| latest_committer | TEXT |
| latest_committer_email | TEXT |
| latest_commit_date | DATETIME |
| latest_commit_message | TEXT |
| remote | TEXT |
| branch | TEXT |
| dirty | BOOLEAN |
| latest_author | TEXT |
| latest_author_email | TEXT |
| latest_author_date | DATETIME |
+------------------------+----------+
The latest_author* columns reflect the original author of the branch's HEAD commit, which can differ from the committer. See commit identity variables.
Get all the branches.
{% embed url="https://www.dolthub.com/repositories/dolthub/first-hour-db/embed/main?q=select+*+from+dolt_branches%3B" %}
remote and branch show the remote host and branch that each branch is tracking.
dirty is TRUE when there are uncommitted changed on the branch.
To find the current active branch use select active_branch().
dolt_branches only contains information about local branches. For
branches on a remote you have fetched, see
dolt_remote_branches.
dolt_remote_branches contains information about branches on remotes
you have fetched. It has a similar schema as dolt_branches, but the remote, branch, and dirty columns
don't make sense in this context and are not included. Only remote branches are included in this table.
+------------------------+----------+
| Field | Type |
+------------------------+----------+
| name | TEXT |
| hash | TEXT |
| latest_committer | TEXT |
| latest_committer_email | TEXT |
| latest_commit_date | DATETIME |
| latest_commit_message | TEXT |
| latest_author | TEXT |
| latest_author_email | TEXT |
| latest_author_date | DATETIME |
+------------------------+----------+
Get all local and remote branches in a single query. Remote branches
will have the prefix remotes/<remoteName> in their names.
SELECT *
FROM dolt_branches
UNION
SELECT * FROM dolt_remote_branches;+-----------------+----------------------------------+------------------+------------------------+-------------------------+----------------------------+
| name | hash | latest_committer | latest_committer_email | latest_commit_date | latest_commit_message |
+-----------------+----------------------------------+------------------+------------------------+-------------------------+----------------------------+
| main | r3flrdqk73lkcrugtbohcdbb3hmr2bev | Zach Musgrave | zach@dolthub.com | 2023-02-01 18:59:55.156 | Initialize data repository |
| remotes/rem1/b1 | r3flrdqk73lkcrugtbohcdbb3hmr2bev | Zach Musgrave | zach@dolthub.com | 2023-02-01 18:59:55.156 | Initialize data repository |
+-----------------+----------------------------------+------------------+------------------------+-------------------------+----------------------------+
dolt_docs stores the contents of Dolt docs (LICENSE.md,
README.md).
You can modify the contents of these files via SQL, but you are not guaranteed to see these changes reflected in the files on disk.
+----------+------+
| field | type |
+----------+------+
| doc_name | text |
| doc_text | text |
+----------+------+
Gets all docs.
{% embed url="https://www.dolthub.com/repositories/dolthub/first-hour-db/embed/main?q=select+*+from+dolt_docs%3B" %}
dolt_procedures stores each stored procedure that has been created
on the database.
The values in this table are implementation details associated with the storage of stored procedures. It is recommended to use built-in SQL statements for examining and modifying stored procedures rather than using this table directly.
+-------------+----------+
| field | type |
+-------------+----------+
| name | longtext |
| create_stmt | longtext |
| created_at | datetime |
| modified_at | datetime |
| sql_mode | longtext |
+-------------+----------+
When using the standard CREATE PROCEDURE workflow, the name column
will always be lowercase. Dolt assumes that name is always lowercase
as a result, and manually inserting a stored procedure must also have
a lowercase name. Otherwise, it will be invisible to some
operations, such as DROP PROCEDURE.
CREATE PROCEDURE simple_proc1(x DOUBLE, y DOUBLE) SELECT x*y;
CREATE PROCEDURE simple_proc2() SELECT name FROM category;{% embed url="https://www.dolthub.com/repositories/dolthub/first-hour-db/embed/main?q=SELECT+*+FROM+dolt_procedures%3B" %}
The dolt_query_catalog system table stores named queries for your database.
Like all data stored in Dolt, these named queries are versioned alongside your data, so
after you create, modify, or remove a named query, you'll need to commit that change to save it.
You can use the Dolt CLI to save and execute named queries or you can use the
dolt_query_catalog system table directly to add, modify, or delete named queries.
All named queries are displayed in the Queries tab of your database on
DoltHub.
+---------------+-----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------------+------+-----+---------+-------+
| id | varchar(16383) | NO | PRI | | |
| display_order | bigint unsigned | NO | | | |
| name | varchar(16383) | YES | | | |
| query | varchar(16383) | YES | | | |
| description | varchar(16383) | YES | | | |
+---------------+-----------------+------+-----+---------+-------+
Using the dolthub/docs_examples from DoltHub as an example, you can create a
named query using the CLI, or by directly inserting into the dolt_query_catalog table.
> dolt sql -q "select * from tablename" -s "select all" -m "Query to select all records from tablename"After creating a named query, you can view it in the dolt_query_catalog table:
{% embed url="https://www.dolthub.com/repositories/dolthub/docs_examples/embed/main?q=select+*+from+dolt_query_catalog%3B" %}
Then you can use the dolt CLI to execute it:
> dolt sql -x "Large Irises"
Executing saved query 'Large Irises':
select distinct(class) from classified_measurements where petal_length_cm > 5
+------------+
| class) |
+------------+
| versicolor |
| virginica |
+------------+Last, but not least, if you want to persist that named query, be sure to commit your change to the
dolt_query_catalog table.
dolt add dolt_query_catalog
dolt commit -m "Adding new named query"dolt_remotes returns the remote subcontents of the repo_state.json, similar
to running dolt remote -v from the command line.
The dolt_remotes table is currently read only. Use the CLI dolt remote functions or dolt_remote() procedure
to add, update or delete remotes.
+-------------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------+------+-----+---------+-------+
| name | text | NO | PRI | | |
| url | text | NO | | | |
| fetch_specs | json | YES | | | |
| params | json | YES | | | |
+-------------+------+------+-----+---------+-------+
SELECT *
FROM dolt_remotes
WHERE name = 'origin';+--------+-----------------------------------------+--------------------------------------+--------+
| name | url | fetch_specs | params |
+--------+-----------------------------------------+--------------------------------------+--------+
| origin | file:///go/github.com/dolthub/dolt/rem1 | [refs/heads/*:refs/remotes/origin/*] | map[] |
+--------+-----------------------------------------+--------------------------------------+--------+
dolt_backups returns the backup contents of the repo_state.json, similar to running dolt backup -v from the command line.
The dolt_backups table is currently read-only. Use the dolt_backup() procedure to add, update, or delete backups.
+-------------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------+------+-----+---------+-------+
| name | text | NO | PRI | | |
| url | text | NO | | | |
+-------------+------+------+-----+---------+-------+
sql SELECT * FROM dolt_backups;
+-------------+----------------------------------------+
| name | url |
+-------------+----------------------------------------+
| backup-west | aws://[ddb-westtable:s3bucket-west]/db |
| backup-east | aws://[ddb-easttable:s3bucket-east]/db |
| backup-local| file:///path/to/local/backup |
+-------------+----------------------------------------+
dolt_schemas stores SQL schema fragments for a dolt database that
are versioned alongside the database itself. Certain DDL statements
will modify this table and the value of this table in a SQL session
will affect what database entities exist in the session.
The values in this table are implementation details associated with the storage of certain schema elements. It is recommended to use built-in SQL statements for examining and modifying schemas, rather than using this table directly.
+----------+----------------------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------------------------+------+-----+---------+-------+
| type | varchar(64) COLLATE utf8mb4_0900_ai_ci | NO | PRI | NULL | |
| name | varchar(64) COLLATE utf8mb4_0900_ai_ci | NO | PRI | NULL | |
| fragment | longtext | YES | | NULL | |
| extra | json | YES | | NULL | |
+----------+----------------------------------------+------+-----+---------+-------+
Currently, all VIEW, TRIGGER and EVENT definitions are stored in the dolt_schemas table.
The column type defines whether the fragment is view, trigger or event.
The column name is the fragment name as supplied in the CREATE statement.
The column fragment stores the CREATE statement of the fragment. The column
json is any additional important information such as CreateAt field
for the fragment.
The values in this table are partly implementation details associated with the implementation of the underlying database objects.
CREATE VIEW four AS SELECT 2+2 FROM dual;
CREATE TABLE mytable (x INT PRIMARY KEY);
CREATE TRIGGER inc_insert BEFORE INSERT ON mytable FOR EACH ROW SET NEW.x = NEW.x + 1;
CREATE EVENT monthly_gc ON SCHEDULE EVERY 1 MONTH DO CALL DOLT_GC();Then you can view them in dolt_schemas:
{% embed url="https://www.dolthub.com/repositories/dolthub/docs_examples/embed/main?q=select+*+from+dolt_schemas%3B" %}
dolt_tags shows information for all active tags in the current database.
DOLT_TAG() procedure can be used to INSERT and DELETE tags on the dolt_tags table.
+----------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------+------+-----+---------+-------+
| tag_name | text | NO | PRI | NULL | |
| tag_hash | text | NO | PRI | NULL | |
| tagger | text | NO | | NULL | |
| email | text | NO | | NULL | |
| date | datetime | NO | | NULL | |
| message | text | NO | | NULL | |
+----------+----------+------+-----+---------+-------+
Create a tag using dolt_tag() stored procedure.
CALL DOLT_TAG('_migrationtest','head','-m','savepoint for migration testing');+--------+
| status |
+--------+
| 0 |
+--------+
Get all the tags.
{% embed url="https://www.dolthub.com/repositories/dolthub/first-hour-db/embed/main?q=SELECT+*+FROM+dolt_tags%3B" %}
dolt_branch_activity provides insights into branch usage within the lifetime of a running dolt sql-server. This table helps administrators identify stale or unused branches. The data is global to the server, and it not related to the persisted data in the database. For the reason, the system_start_time column will always be the start time of your server, and the last_read and last_write columns should be considered in relation to that time. Activity for branches which have been deleted will not be included in this table.
For performance reasons, this table is only available if you opt-in to enable it. See server configuration for details.
+-------------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+----------+------+-----+---------+-------+
| branch | text | NO | PRI | NULL | |
| last_read | datetime | YES | | NULL | |
| last_write | datetime | YES | | NULL | |
| active_sessions | int | NO | | NULL | |
| system_start_time | datetime | NO | | NULL | |
+-------------------+----------+------+-----+---------+-------+
branch: The name of the branch. All current branches are listed, even if unused.last_read: The last time the branch was accessed in a query or checked out.NULLindicates no reads during the server's lifetime.last_write: The last time the branch was updated or committed to.NULLindicates no writes during the server's lifetime.active_sessions: The number of active server connections to the branch.system_start_time: The server's start time. Alllast_readandlast_writetimestamps occur after this time.
Find branches with no active sessions and no activity in the last 7 days:
SELECT *
FROM dolt_branch_activity
WHERE active_sessions = 0
AND system_start_time < NOW() - INTERVAL 7 DAY
AND (last_read IS NULL OR last_read < NOW() - INTERVAL 7 DAY)
AND (last_write IS NULL OR last_write < NOW() - INTERVAL 7 DAY);dolt_statistics includes currently collected
database statistics. This information is stored outside of the
commit graph and is not subject to versioning semantics.
+-----------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+----------+------+-----+---------+-------+
| database_name | text | NO | PRI | NULL | |
| table_name | text | NO | PRI | NULL | |
| index_name | text | NO | PRI | NULL | |
| row_count | bigint | NO | | NULL | |
| distinct_count | bigint | NO | | NULL | |
| null_count | bigint | NO | | NULL | |
| columns | text | NO | | NULL | |
| types | text | NO | | NULL | |
| upper_bound | text | NO | | NULL | |
| upper_bound_cnt | bigint | NO | | NULL | |
| created_at | datetime | NO | | NULL | |
| mcv1 | text | NO | | NULL | |
| mcv2 | text | NO | | NULL | |
| mcv3 | text | NO | | NULL | |
| mcv4 | text | NO | | NULL | |
| mcvCounts | text | NO | | NULL | |
+-----------------+----------+------+-----+---------+-------+For every user table that has a primary key, there is a queryable system view named dolt_blame_$tablename
which can be queried to see the user and commit responsible for the current value of each row.
This is equivalent to the dolt blame CLI command.
Tables without primary keys will not have an associated dolt_blame_$tablename.
The dolt_blame_$tablename system view has the following columns:
+-------------------+----------+
| field | type |
+-------------------+----------+
| commit | text |
| commit_date | datetime |
| committer | text |
| email | text |
| message | text |
| primary key cols | |
+-------------------+----------+
The remaining columns are dependent on the schema of the user table.
Every column from the primary key of your table will be included in the dolt_blame_$tablename system table.
Executing a SELECT * query for a dolt_blame_$tablename system view will show you the primary key columns
for every row in the underlying user table and the commit metadata for the last commit that modified that row.
Note that if a table has any uncommitted changes in the working set,
those will not be displayed in the dolt_blame_$tablename system view.
dolt_blame_$tablename is only available for tables with a primary key.
Attempting to query dolt_blame_$tablename for a table without a primary key will return an error message.
Consider the following example table city:
{% embed url="https://www.dolthub.com/repositories/dolthub/first-hour-db/embed/main?q=describe+city%3B" %}
To find who set the current values, we can query the dolt_blame_city table:
{% embed url="https://www.dolthub.com/repositories/dolthub/first-hour-db/embed/main?q=select+*+from+dolt_blame_city+limit+20%3B" %}
The dolt_commit_ancestors table records the ancestors for every commit in the database. Each commit has one or two
ancestors, two in the case of a merge commit.
Each commit hash has one or two entries in the table, depending on whether it has one or two parent commits. The root
commit of the database has a NULL parent. For merge commits, the merge base will have parent_index 0, and the commit
merged will have parent_index 1.
+--------------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------+------+-----+---------+-------+
| commit_hash | text | NO | PRI | | |
| parent_hash | text | NO | PRI | | |
| parent_index | int | NO | PRI | | |
+--------------+------+------+-----+---------+-------+
The dolt_commits system table shows ALL commits in a Dolt database.
This is similar, but different from the dolt_log system table
and the dolt log CLI command.
dolt log shows you commit history for all commit ancestors reachable from the current HEAD of the
checked out branch, whereas dolt_commits shows all commits from the entire database, no matter which branch is checked out.
> describe dolt_commits;
+--------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+----------+------+-----+---------+-------+
| commit_hash | text | NO | PRI | | |
| committer | text | NO | | | |
| email | text | NO | | | |
| date | datetime | NO | | | |
| message | text | NO | | | |
| author | text | NO | | | |
| author_email | text | NO | | | |
| author_date | datetime | NO | | | |
+--------------+----------+------+-----+---------+-------+
See dolt_log for author* column semantics.
Using the dolthub/first-hour-db database from DoltHub,
we can query for the five commits before April 20th, 2022, across all commits in the database
(regardless of what is checked out to HEAD) with this query:
{% embed url="https://www.dolthub.com/repositories/dolthub/first-hour-db/embed/main?q=SELECT+*%0AFROM+dolt_commits%0Awhere+date+%3C+%222022-04-20%22%0A" %}
For every user table named $TABLENAME, there is a read-only system table named dolt_history_$TABLENAME
that can be queried to find a row's value at every commit in the current branch's history.
Every Dolt history table contains columns for commit_hash, committer, and commit_date, plus every column
from the user table's schema at the current checked out branch.
+-------------+----------+
| field | type |
+-------------+----------+
| commit_hash | TEXT |
| committer | TEXT |
| commit_date | DATETIME |
| other cols | |
+-------------+----------+
Consider a table named mytable with the following schema:
+------------+--------+
| field | type |
+------------+--------+
| x | INT |
+------------+--------+
The schema for dolt_history_mytable would be:
+-------------+----------+
| field | type |
+-------------+----------+
| x | INT |
| commit_hash | TEXT |
| committer | TEXT |
| commit_date | DATETIME |
+-------------+----------+
Assume a database with the mytable table above and the following commit graph:
B---E feature
/
A---C---D main
When the feature branch is checked out, the following query returns the results below, showing
the row at every ancestor commit reachable from our current branch.
{% embed url="https://www.dolthub.com/repositories/dolthub/docs_examples/embed/feature?q=SELECT+*+FROM+dolt_history_mytable%3B" %}
The dolt_log system table contains the commit log for all commits reachable from the current HEAD.
This is the same data returned by the dolt log CLI command.
+--------------+-----------------+
| field | type |
+--------------+-----------------+
| commit_hash | text |
| committer | text |
| email | text |
| date | datetime |
| message | text |
| commit_order | bigint unsigned |
| parents | text |
| refs | text |
| signature | text |
| author | text |
| author_email | text |
| author_date | datetime |
+--------------+-----------------+
The commit_order field is an unsigned integer value that indicates the order of commits in descending order from HEAD.
Note that commit_order values can be repeated for different levels of the topological sort of the commit graph.
The parents, refs, and signature columns are populated only when explicitly projected by the query. parents is a comma-separated list of parent commit hashes. refs lists branches and tags that point to the commit. signature contains the commit's GPG signature when present.
The author* columns capture the original commit author, which can differ from the committer (e.g. after dolt cherry-pick). They mirror committer, email, and date when no separate author identity is recorded. See commit identity variables.
The following query shows the commits reachable from the current checked out head and created by user jennifersp since April, 2022:
For every user table named $TABLENAME, there is a read-only system table named dolt_commit_diff_$TABLENAME
that can be queried to see a diff of the data in the specified table between any two commits in the database.
For example, you can use this system table to view the diff between two commits on different branches.
The schema of the returned data from this system table is based on the schema of the underlying user table
at the currently checked out branch.
You must provide from_commit and to_commit in all queries to this system table in order to specify the
to and from points for the diff of your table data. Each returned row describes how a row in the underlying
user table has changed from the from_commit ref to the to_commit ref by showing the old and new values.
dolt_commit_diff_$TABLENAME is the analogue of the dolt diff CLI command.
It represents the two-dot diff
between the two commits provided.
The dolt_diff_$TABLENAME system table also exposes diff information, but instead of a two-way diff,
it returns a log of individual diffs between all adjacent commits in the history of the current branch.
In other words, if a row was changed in 10 separate commits, dolt_diff_$TABLENAME will show 10 separate rows –
one for each individual delta. In contrast, dolt_commit_diff_$TABLENAME would show a single row that combines
all the individual commit deltas into one diff.
The DOLT_DIFF() table function is an alternative to the
dolt_commit_diff_$tablename system table for cases where a table's schema has changed between the to and from
commits. Consider the DOLT_DIFF() table function if you need to see the schema from each of those commits,
instead of using the schema from the currently checked out branch.
+------------------+----------+
| field | type |
+------------------+----------+
| from_commit | TEXT |
| from_commit_date | DATETIME |
| to_commit | TEXT |
| to_commit_date | DATETIME |
| diff_type | TEXT |
| other cols | |
+------------------+----------+
The remaining columns are dependent on the schema of the user table at the currently checked out branch.
For every column X in your table at the currently checked out branch, there are columns in the result set named
from_X and to_X with the same type as X in the current schema.
The from_commit and to_commit parameters must both be specified in the query, or an error is returned.
Consider a simple example with a table that has one column:
+--------------+
| field | type |
+--------------+
| x | int |
+--------------+
Based on the table's schema above, the schema of the dolt_commit_diff_$TABLENAME will be:
+------------------+----------+
| field | type |
+------------------+----------+
| to_x | int |
| to_commit | longtext |
| to_commit_date | datetime |
| from_x | int |
| from_commit | longtext |
| from_commit_date | datetime |
| diff_type | varchar |
+------------------+----------+
Now consider the following branch structure:
A---B---C feature
/
D---E---F---G main
We can use the above table to represent two types of diffs: a two-point diff and a three-point diff. In a two-point diff we want to see the difference in rows between Point C and Point G.
We can also compute a three-point diff using this table. In a three-point diff we want to see how our feature branch has diverged from our common ancestor E, without including the changes from F and G on main.
The dolt_merge_base function
computes the closest ancestor E between main and feature.
There is one special to_commit value WORKING which can be used to
see what changes are in the working set that have yet to be committed
to HEAD. It is often useful to use the HASHOF() function
to get the commit hash of a branch, or an ancestor commit. The above table
requires both from_commit and to_commit to be filled.
The dolt_diff system table shows which tables in the current database were changed in each commit reachable from the active branch's HEAD. When multiple tables are changed in a single commit, there is one row in the dolt_diff system table for each table, all with the same commit hash. Any staged or unstaged changes in the working set are included with the value WORKING for their commit_hash. After identifying the tables that changed in a commit, the dolt_diff_$TABLENAME system tables can be used to determine the data that changed in each table.
The DOLT_DIFF system table has the following columns:
+---------------+----------+
| field | Type |
+---------------+----------+
| commit_hash | text |
| table_name | text |
| committer | text |
| email | text |
| date | datetime |
| message | text |
| data_change | boolean |
| schema_change | boolean |
| author | text |
| author_email | text |
| author_date | datetime |
+---------------+----------+
See dolt_log for author* column semantics.
dolt_diff displays the changes from the current branch HEAD, including any working set changes. If a commit did not
make any changes to tables (e.g. an empty commit), it is not included in the dolt_diff results.
Taking the
dolthub/first-hour-db
database from DoltHub as our
example, the following query uses the dolt_diff system table to find all commits, and the tables they changed,
from the month of April 2022.
From these results, we can see there were four commits to this database in April 2022. Commits
224helo only changed the dolt_schemas table, commit 7jrvg1a changed the dolt_docs
table, and commit 5jpgb0f made changes to two tables. We can also see which of these tables had changes made to their schemas vs just data.
To dig deeper into these changes, we can query
the dolt_diff_$TABLE system tables specific to each of the changed tables, like this:
The dolt_column_diff system table shows which columns and tables in the current database were changed in each commit
reachable from the active branch's HEAD. When multiple columns are changed in a single commit, there is one row in the
dolt_column_diff system table for each column, all with the same commit hash. Any staged changes in the working set
are included with the value STAGED for their commit_hash. Any unstaged changes in the working set are included with
the value WORKING for their commit_hash.
The DOLT_COLUMN_DIFF system table has the following columns
+--------------+----------+
| field | Type |
+--------------+----------+
| commit_hash | text |
| table_name | text |
| column_name | text |
| committer | text |
| email | text |
| date | datetime |
| message | text |
| diff_type | text |
| author | text |
| author_email | text |
| author_date | datetime |
+--------------+----------+
See dolt_log for author* column semantics.
dolt_column_diff displays the changes from the current branch HEAD, including any working set changes. If a commit did not
make any changes to tables (e.g. an empty commit), it is not included in the dolt_column_diff results.
Taking the
first-hour-db
database from DoltHub as our
example, the following query uses the dolt_column_diff system table to find commits and tables where the column name was updated.
If we narrow in on the dolt_schemas table we can count the number of commits that updated each column
over the course of all our commits.
From these results, we can see that fields describing the reasons an inmate is being held are being updated far more frequently than the fields holding demographic information about inmates.
For every user table named $TABLENAME, there is a read-only system
table named dolt_diff_$TABLENAME that returns a list of diffs showing
how rows have changed over time on the current branch.
Each row in the result set represents a row that has changed between two adjacent
commits on the current branch – if a row has been updated in 10 commits, then 10
individual rows are returned, showing each of the 10 individual updates.
Compared to the
dolt_commit_diff_$TABLENAME system table,
the dolt_diff_$TABLENAME system table focuses on
how a particular row has evolved over time in the current branch's history.
The major differences are that dolt_commit_diff_$TABLENAME requires specifying from_commit and to_commit,
works on any commits in the database (not just the current branch),
and returns a single combined diff for all changes to a row between those two commits. In the example
above where a row is changed 10 times, dolt_commit_diff_$TABLENAME would only return a single row
showing the diff, instead of the 10 individual deltas.
Every Dolt diff table will have the columns
+------------------+----------+
| field | type |
+------------------+----------+
| from_commit | TEXT |
| from_commit_date | DATETIME |
| to_commit | TEXT |
| to_commit_date | DATETIME |
| diff_type | TEXT |
| other cols | |
+------------------+----------+
The remaining columns are dependent on the schema of the user
table at the current branch. For every column X in your table at the current branch there will
be columns in the result set named from_X and to_X with the same type as X.
For a table named states with the following schema:
+------------+--------+
| field | type |
+------------+--------+
| state | TEXT |
| population | BIGINT |
| area | BIGINT |
+-------------+-------+
The schema for dolt_diff_states would be:
+------------------+----------+
| field | type |
+-----------------+-----------+
| from_state | TEXT |
| from_population | BIGINT |
| from_area | TEXT |
| from_commit | TEXT |
| from_commit_date | DATETIME |
| to_state | TEXT |
| to_population | BIGINT |
| to_area | TEXT |
| to_commit | TEXT |
| to_commit_date | DATETIME |
| diff_type | TEXT |
+------------------+----------+
A SELECT * query for a diff table will show you every change
that has occurred to each row for every commit in this branch's
history. Using to_commit or from_commit will limit the data to
specific commits. There is one special to_commit value WORKING
which can be used to see what changes are in the working set that have
yet to be committed to HEAD. It is often useful to use the
HASHOF()
function to get the commit hash of a branch, or an ancestor
commit. For example, to get the differences between the last commit and its parent
you could use to_commit=HASHOF("HEAD") and from_commit=HASHOF("HEAD^").
For each row the field diff_type will be one of the values added,
modified, or removed. You can filter which rows appear in the
result set to one or more of those diff_type values in order to
limit which types of changes will be returned.
Taking the
dolthub/us-jails
database from DoltHub as our
example, the following query will retrieve the jails whose total
num_inmates_rated_for have changed the most between 2 versions.
dolt_conflicts is a system table that has a row for every table in the working set that has an unresolved merge
conflict.
+---------------+-----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------------+------+-----+---------+-------+
| table | text | NO | PRI | | |
| num_conflicts | bigint unsigned | NO | | | |
+---------------+-----------------+------+-----+---------+-------+Query this table when resolving conflicts in a SQL session. For more information on resolving merge conflicts in SQL, see docs for the dolt_conflicts_$TABLENAME tables.
For each table $TABLENAME in conflict after a merge, there is a
corresponding system table named dolt_conflicts_$TABLENAME. The
schema of each such table contains three columns for each column in
the actual table, representing each row in conflict for each of ours,
theirs, and base values.
Consider a table mytable with this schema:
+-------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------+------+-----+---------+-------+
| a | int | NO | PRI | | |
| b | int | YES | | | |
+-------+------+------+-----+---------+-------+If we attempt a merge that creates conflicts in this table, the dynamically generated table will show the details of the conflict:
mydb> select * from dolt_conflicts_mytable;
+----------------------------------+--------+--------+-------+-------+---------------+---------+---------+-----------------+------------------------+
| from_root_ish | base_a | base_b | our_a | our_b | our_diff_type | their_a | their_b | their_diff_type | dolt_conflict_id |
+----------------------------------+--------+--------+-------+-------+---------------+---------+---------+-----------------+------------------------+
| gip4h957r8k07c9414lkp3sqe7rh9an6 | NULL | NULL | 3 | 3 | added | 3 | 1 | added | hWDLmYufTrm+eVjFSVzPWw |
| gip4h957r8k07c9414lkp3sqe7rh9an6 | NULL | NULL | 4 | 4 | added | 4 | 2 | added | gi2p1YbSwu8oUV/WRSpr3Q |
+----------------------------------+--------+--------+-------+-------+---------------+---------+---------+-----------------+------------------------+The table has a single row for every row that conflicts in the table. Each column is listed three times, once for the
base, once for the ours value, and once for the theirs value. Additionally, the our_diff_type and their_diff_type columns
indicate whether the row was "added", "modified", or "removed" in the corresponding branch. The dolt_conflict_id column is unique
for each row, and can be used to identify the specific conflict when writing automated conflict resolution applications. Finally, the
first column, from_root_ish is the ID of the database root at the time of the merge. User code generally ignores this column.
To simplify, you can query for just the columns you are interested in:
mydb> select dolt_conflict_id, base_a, base_b, our_a, our_b, their_a, their_b from dolt_conflicts_mytable;
+------------------------+--------+--------+-------+-------+---------+---------+
| dolt_conflict_id | base_a | base_b | our_a | our_b | their_a | their_b |
+------------------------+--------+--------+-------+-------+---------+---------+
| hWDLmYufTrm+eVjFSVzPWw | NULL | NULL | 3 | 3 | 3 | 1 |
| gi2p1YbSwu8oUV/WRSpr3Q | NULL | NULL | 4 | 4 | 4 | 2 |
+------------------------+--------+--------+-------+-------+---------+---------+To mark conflicts as resolved, delete them from the corresponding
table. To effectively keep all our values, I would simply run:
mydb> delete from dolt_conflicts_mytable;If I wanted to keep all their values, I would first run this statement:
mydb> replace into mytable (select their_a, their_b from dolt_conflicts_mytable);For convenience, you can also modify the our_ columns of the
dolt_conflicts_mytable to update the corresponding row in mytable. The above
replace statement can be rewritten as:
mydb> update dolt_conflicts_mytable set our_a = their_a, our_b = their_b;And of course you can use any combination of ours, theirs and
base rows in these statements.
{% hint style="info" %}
-
Updates made to the
our_columns are applied to the original table using the primary key (or keyless hash). If the row does not exist, it will be inserted. Updates made toour_columns will never delete a row, however. -
dolt_conflict_idis a unique identifier for the conflict. It is particularly useful when writing software that needs to resolve conflicts automatically. -
from_root_ishis the commit hash of the "from branch" of the merge. This hash can be used to identify which merge produced a conflict, since conflicts can accumulate across merges.
{% endhint %}
dolt_schema_conflicts is a system table that has a row for every table in the working
set that has an unresolved schema conflict.
> SELECT table_name, description, base_schema, our_schema, their_schema FROM dolt_schema_conflicts;
+------------+--------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| table_name | description | base_schema | our_schema | their_schema |
+------------+--------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| people | different column definitions for our | CREATE TABLE `people` ( | CREATE TABLE `people` ( | CREATE TABLE `people` ( |
| | column age and their column age | `id` int NOT NULL, | `id` int NOT NULL, | `id` int NOT NULL, |
| | | `last_name` varchar(120), | `last_name` varchar(120), | `last_name` varchar(120), |
| | | `first_name` varchar(120), | `first_name` varchar(120), | `first_name` varchar(120), |
| | | `birthday` datetime(6), | `birthday` datetime(6), | `birthday` datetime(6), |
| | | `age` int DEFAULT '0', | `age` float, | `age` bigint, |
| | | PRIMARY KEY (`id`) | PRIMARY KEY (`id`) | PRIMARY KEY (`id`) |
| | | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin; | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin; | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin; |
+------------+--------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+Query this table when resolving schema conflicts in a SQL session. For more information on resolving schema conflicts during merge, see the docs on conflicts.
The dolt_merge_status system table tells a user if a merge is active. It has the following schema:
CREATE TABLE `dolt_merge_status` (
-- Whether a merge is currently active or not
`is_merging` tinyint NOT NULL,
-- The commit spec that was used to initiate the merge
`source` text,
-- The commit that the commit spec resolved to at the time of merge
`source_commit` text,
-- The target destination working set
`target` text,
-- A list of tables that have conflicts or constraint violations
`unmerged_tables` text
)Let's create a simple conflict:
dolt sql -q "CREATE TABLE t (a INT PRIMARY KEY, b INT);"
dolt add .
dolt commit -am "base"
dolt checkout -b right
dolt sql <<SQL
ALTER TABLE t ADD c INT;
INSERT INTO t VALUES (1, 2, 1);
SQL
dolt commit -am "right"
dolt checkout main
dolt sql -q "INSERT INTO t values (1, 3);"
dolt commit -am "left"
dolt merge rightOutput of SELECT * from dolt_merge_status;:
+------------+--------+----------------------------------+-----------------+-----------------+
| is_merging | source | source_commit | target | unmerged_tables |
+------------+--------+----------------------------------+-----------------+-----------------+
| true | right | fbghslue1k9cfgbi00ti4r8417frgbca | refs/heads/main | t |
+------------+--------+----------------------------------+-----------------+-----------------+
The dolt_stashes system table returns information about all stashes in the current database.
The dolt_stashes table is read-only. Use the dolt_stash() stored procedure or the dolt stash cli functions to create, apply, or delete stashes.
+----------------+------+------+-----+
| Field | Type | Null | Key |
+----------------+------+------+-----+
| name | text | NO | |
| stash_id | text | NO | |
| branch | text | NO | |
| hash | text | NO | |
| commit_message | text | NO | |
+----------------+------+------+-----+
name: The name of the stash entrystash_id: The unique identifier for the stash (e.g., "stash@{0}")branch: The branch where the stash was createdhash: The commit hash of the stashed changescommit_message: Message for the commit the stash was created on
SELECT *
FROM dolt_stashes
WHERE name = 'myStash';+---------+------------+--------+----------------------------------+------------------+
| name | stash_id | branch | hash | commit_message |
+---------+------------+--------+----------------------------------+------------------+
| myStash | stash@{0} | main | pnpq4p07977jjbpkg6ojj2mpjp2kru9r | Created a table |
+---------+------------+--------+----------------------------------+------------------+
dolt_status returns the status of the database session, analogous to
running dolt status from the command line.
+------------+---------+------+-----+
| Field | Type | Null | Key |
+------------+---------+------+-----+
| table_name | text | NO | PRI |
| staged | tinyint | NO | PRI |
| status | text | NO | PRI |
+------------+---------+------+-----+
SELECT *
FROM dolt_status
WHERE staged=false;+------------+--------+-----------+
| table_name | staged | status |
+------------+--------+-----------+
| one_pk | false | new table |
+------------+--------+-----------+
dolt_status_ignored returns the status of the database session, including ignored tables (tables with names
matching patterns in the dolt_ignore system table). Selecting from this table is analogous to
running dolt status --ignored from the command line.
+------------+---------+------+-----+
| Field | Type | Null | Key |
+------------+---------+------+-----+
| table_name | text | NO | PRI |
| staged | tinyint | NO | PRI |
| status | text | NO | PRI |
| ignored | bool | NO | |
+------------+---------+------+-----+
INSERT INTO dolt_ignore VALUES ("ignored_*", true);
SELECT *
FROM dolt_status_ignored
WHERE staged=false;+---------------+--------+-----------+---------+
| table_name | staged | status | ignored |
+---------------+--------+-----------+---------+
| ignored_table | false | new table | true |
+---------------+--------+-----------+---------+
This system table shows you which rows have been changed in your workspace and if they are staged.
It is the union of rows changed from HEAD to STAGED, and STAGED to WORKING. Any table listed in
dolt_status table will have a non-empty corresponding dolt_workspace_$TABLENAME table. Changes
listed are all relative to the HEAD of the current branch.
These tables can be modified in order to update what changes are staged for commit. Workspace review
The schema of the source table is going to affect the schema of the workspace table. The first three column are always the same, then the schema of the source table is used to create "to_" and "from_" columns.
Each row in the dolt_workspace_$TABLENAME corresponds to a single row update in the table.
+------------------+-----------------+
| field | type |
+------------------+-----------------+
| id | bigint unsigned |
| staged | bool |
| diff_type | varchar(1023) |
| to_x | ... |
| to_y | ... |
| from_x | ... |
| from_y | ... |
+------------------+-----------------+The staged column will be TRUE when the changes are going to be committed on the next
call to dolt_commit(). Changes which have staged = FALSE are present in your
workspace which means all queries in your session contain them but they will not be recorded
in the event that dolt_commit() is executed.
There are two ways you can alter the state of your workspace using these tables.
- The
stagedcolumn can be toggled for any row. If changing from false to true, the row values will be moved to staging. If there are already staged changes for that row, they will be overwritten. If changing from true to false, the row values will be unstaged. If there are other changes in the workspace for that row, the workspace change will be preserved and the staged change will be dropped. - Any row which has
staged = FALSEcan be deleted. This will result in reverting the change to the row in the source table.
SELECT *
FROM dolt_workspace_mytable
WHERE staged=false+----+--------+-----------+-------+----------+---------+------------+
| id | staged | diff_type | to_id | to_value | from_id | from_value |
+----+--------+-----------+-------+----------+---------+------------+
| 0 | false | modified | 3 | 44 | 3 | 31 |
| 1 | false | modified | 4 | 68 | 4 | 1 |
| 2 | false | modified | 9 | 47 | 9 | 59 |
+----+--------+-----------+-------+----------+---------+------------+
3 rows in set (0.00 sec)UPDATE dolt_workspace_mytable SET staged = TRUE WHERE to_id = 3;
CALL dolt_commit("-m", "Added row id 3 in my table");The dolt_workspace_$TABLENAME tables are generated based on the session state when inspected,
so they can not be considered stable on a branch which has multiple editors.
The dolt_constraint_violations system table contains one row for every table that has a constraint violation
introduced by a merge. Dolt enforces constraints (such as foreign keys) during normal SQL operations, but it's possible
that a merge puts one or more tables in a state where constraints no longer hold. For example, a row deleted in the
merge base could be referenced via a foreign key constraint by an added row in the merged commit. Use
dolt_constraint_violations to discover such violations.
+----------------+-----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-----------------+------+-----+---------+-------+
| table | text | NO | PRI | | |
| num_violations | bigint unsigned | NO | | | |
+----------------+-----------------+------+-----+---------+-------+For each table $TABLENAME with a constraint violation after a merge, there is a corresponding system table named
dolt_constraint_violations_$TABLENAME. Each row in the table represents a constraint violation that must be resolved
via INSERT, UPDATE, or DELETE statements. Resolve each constraint violation before committing the result of the
merge that introduced them.
For a hypothetical table a with the following schema:
+-------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| x | bigint | NO | PRI | | |
| y | varchar(1) | YES | | | |
+-------+------------+------+-----+---------+-------+dolt_constraint_violations_a will have the following schema:
+----------------+------------------------------------------------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------------------------------------------------------+------+-----+---------+-------+
| from_root_ish | varchar(1023) | YES | | | |
| violation_type | enum('foreign key','unique index','check constraint','not null') | NO | PRI | | |
| x | bigint | NO | PRI | | |
| y | varchar(1) | YES | | | |
| violation_info | json | YES | | | |
+----------------+------------------------------------------------------------------+------+-----+---------+-------+Each row in the table represents a row in the primary table that is in violation of one or more
constraint violations. The violation_info field is a JSON payload describing the violation. The
payload varies depending on the type of constraint violation recorded.
For foreign key violations:
{
"ForeignKey": "key_name",
"Table": "myTable",
"Columns": ["col1", "col2"],
"Index": "myIdx",
"OnDelete": "RESTRICT",
"OnUpdate": "RESTRICT",
"ReferencedColumns": ["col3", "col4"],
"ReferencedIndex": "myIdx2",
"ReferencedTable": "refTable"
}For unique constraints:
{
"Name": "constraint_name",
"Columns": ["col1", "col2"]
}For not null constraints:
{
"Columns": ["col1", "col2"]
}For check constraints:
{
"Name": "constraint_name",
"Expression": "(col1 > 0)"
}As with dolt_conflicts, delete rows from the corresponding dolt_constraint_violations table to signal to dolt that
you have resolved any such violations before committing.
Configuration Tables can be staged and versioned just like user tables. They always exist, even in an empty database.
dolt_ignore stores a list of "table name patterns", and a boolean flag for each pattern indicating whether tables that match the patterns should not be staged for commit.
This only affects the staging of new tables. Tables that have already been staged or committed are not affected the contents of dolt_ignore, and changes to those tables can still be staged.
+------------+---------+------+-----+
| Field | Type | Null | Key |
+------------+---------+------+-----+
| pattern | text | NO | PRI |
| ignored | tinyint | NO | |
+------------+---------+------+-----+
The format of patterns is a simplified version of gitignore’s patterns:
- An asterisk "*" or ampersand "%" matches any number of characters.
- The character "?" matches any one character.
- All other characters match exactly.
If a table name matches multiple patterns with different values for ignored, the most specific pattern is chosen (a pattern A is more specific than a pattern B if all names that match A also match pattern B, but not vice versa.) If no pattern is most specific, then attempting to stage that table will result in an error.
Tables that match patterns in dolt_ignore can be force-committed by passing the --force flag to dolt add or CALL dolt_add.
dolt diff won't display new untracked tables that match ignore patterns. Diffs between two commits display all tables regardless of the current dolt_ignore contents.
INSERT INTO dolt_ignore VALUES ("generated_*", true), ("generated_exception", false);
CREATE TABLE foo (pk int);
CREATE TABLE generated_foo (pk int);
CREATE TABLE generated_exception (pk int);
CALL dolt_add("-A");
SELECT *
FROM dolt_status
WHERE staged=true;+---------------------+--------+-----------+
| table_name | staged | status |
+---------------------+--------+-----------+
| foo | true | new table |
| generated_exception | true | new table |
+---------------------+--------+-----------+
dolt_nonlocal_tables affects how table names are resolved, allowing a table name to resolve to a table stored on a different branch.
Each row in the table stores a table name (or a table name pattern, see below), as well as a target ref and target table name, which Dolt uses to locate the table data.
+----------------------+------+------+-----+
| Field | Type | Null | Key |
+----------------------+------+------+-----+
| table_name | text | NO | PRI |
| target_ref | text | YES | |
| ref_table | text | YES | |
| options | text | YES | |
+----------------------+------+------+-----+
table_name column can be a pattern. The format of patterns is a simplified version of gitignore’s patterns:
- An asterisk "*" or ampersand "%" matches any number of characters.
- The character "?" matches any one character.
- All other characters match exactly.
target_ref is the branch, tag, or commit hash containing the table. If target_ref is a local branch, then the table will be writable. If target_ref is a tag, commit hash, or remote branch, then the table will be read-only. If omitted, Dolt looks for a table on the current branch.
ref_table is the name of the table on the referenced branch. If omitted, Dolt looks for a table with the same name used in the query. The ref_table column will usually be omitted, but can be used if you want the table name on the target branch to be different from the table name on the source branch.
options is currently unused, but may be used in the future to provide configuration options. It must be set to the value "immediate".
INSERT INTO dolt_nonlocal_tables (table_name, target_ref, options) VALUES ('global_*', 'global_branch', 'immediate');
INSERT INTO global_users VALUES ('timsehn'); -- Equivalent to INSERT INTO global_branch.global_users VALUES ('timsehn')
INSERT INTO dolt_nonlocal_tables (table_name, target_ref, ref_table, options) VALUES ('origin_products', 'origin/main', 'products', 'immediate');
SELECT * FROM origin_products; -- Equivalent to SELECT * FROM `origin/main`.global_branch.productsdolt_tests stores test definitions that can be executed using the DOLT_TEST_RUN() table function. Tests define SQL queries with assertions about their expected results, providing a way to validate database behavior and catch regressions.
+----------------------+------+------+-----+
| Field | Type | Null | Key |
+----------------------+------+------+-----+
| test_name | text | NO | PRI |
| test_group | text | YES | |
| test_query | text | NO | |
| assertion_type | text | NO | |
| assertion_comparator | text | NO | |
| assertion_value | text | NO | |
+----------------------+------+------+-----+
Test queries must be read-only (no INSERT, UPDATE, DELETE, or DDL statements) and can only contain a single SQL statement.
There are three types of assertions available:
expected_rows: Asserts on the number of rows returned by the query. The assertion_value should be an integer.expected_columns: Asserts on the number of columns returned by the query. The assertion_value should be an integer.expected_single_value: Asserts on the value of a single cell. The query must return exactly one row and one column. The assertion_value is compared against the returned value.
The assertion_comparator field supports the following comparison operators: ==, !=, <, >, <=, >=.
Create a test that verifies a table has the expected number of rows:
INSERT INTO dolt_tests VALUES
('check_user_count', 'users', 'SELECT * FROM users', 'expected_rows', '==', '10');Create a test that validates a specific calculated value:
INSERT INTO dolt_tests VALUES
('total_revenue', 'finance', 'SELECT SUM(amount) FROM sales', 'expected_single_value', '>=', '100000');dolt_rebase is only present while an interactive rebase is in progress, and only on the branch where the rebase is being executed. For example, when rebasing the feature1 branch, the rebase will be executed on the dolt_rebase_feature1 branch, and the dolt_rebase system table will exist on that branch while the rebase is in-progress. The dolt_rebase system table starts off with the default rebase plan, which is to pick all of the commits identified for the rebase. Users can adjust the rebase plan by updating the dolt_rebase table to change the rebase action, reword a commit message, or even add new rows with additional commits to be applied as part of the rebase. For more details about rebasing, see the dolt_rebase() stored procedure.
+----------------+---------------------------------------------------+
| Field | Type |
+----------------+---------------------------------------------------+
| rebase_order | DECIMAL(6,2) |
| action | ENUM('pick', 'drop', 'reword', 'squash', 'fixup') |
| commit_hash | TEXT |
| commit_message | TEXT |
+----------------+---------------------------------------------------+
The action field can take one of the following rebase actions:
pick- apply a commit as is and keep its commit message.drop- do not apply a commit. The row in thedolt_rebasetable can also be deleted to drop a commit from the rebase plan.reword- apply a commit, and use the updated commit message from thecommit_messagefield in thedolt_rebasetable for its commit message. Note that if you edit thecommit_messagebut do not use therewordaction, the original commit message will still be used.squash- apply a commit, but include its changes in the previous commit instead of creating a new commit. The commit message of the previous commit will be altered to include the previous commit message as well as the commit message from the squashed commit. Note that the rebase plan MUST include apickorrewordaction in the plan before asquashaction.fixup- apply a commit, but include its changes in the previous commit instead of creating a new commit. The commit message of the previous commit will NOT be changed, and the commit message from the fixup commit will be discarded. Note that the rebase plan MUST include apickorrewordaction in the plan before afixupaction.
To squash all commits into a single commit and include the commit messages from all commits, the following query can be used:
update dolt_rebase set action = 'squash' where rebase_order > 1;To reword a commit with commit hash '123aef456f', be sure to set the action to reword and to update the commit_message field:
update dolt_rebase set action = 'reword', commit_message = 'here is my new message' where commit_hash = '123aef456f';To drop the second commit in the rebase plan, you can use the drop action:
update dolt_rebase set action = 'drop' where rebase_order = 2;Or you can simply delete that row from the dolt_rebase table:
delete from dolt_rebase where rebase_order = 2;