|
2 | 2 |
|
3 | 3 | A database connector is a compatibility layer that converts data structures that a |
4 | 4 | database natively works with to the ones that VectorCode works with. The connector |
5 | | -classes provides abstractions for VectorCode operations (`vectorise`, `query`, etc.), |
| 5 | +classes provide abstractions for VectorCode operations (`vectorise`, `query`, etc.), |
6 | 6 | which enables the use of different database backends. |
7 | 7 |
|
8 | 8 | <!-- mtoc-start --> |
9 | 9 |
|
10 | | -* [Creating Database Connectors](#creating-database-connectors) |
11 | | -* [Implementation Details](#implementation-details) |
12 | | - * [Connector Configuration](#connector-configuration) |
13 | | - * [Database Settings](#database-settings) |
14 | | - * [Documenting the Database Settings](#documenting-the-database-settings) |
15 | | - * [CRUD Operations](#crud-operations) |
| 10 | +* [Adding a New Database Connector](#adding-a-new-database-connector) |
| 11 | +* [Key Implementation Details](#key-implementation-details) |
| 12 | + * [The `Config` Object](#the-config-object) |
| 13 | + * [Implementing Abstract Methods](#implementing-abstract-methods) |
| 14 | + * [Error Handling](#error-handling) |
| 15 | +* [Testing](#testing) |
16 | 16 |
|
17 | 17 | <!-- mtoc-end --> |
18 | 18 |
|
19 | | -# Creating Database Connectors |
20 | | - |
21 | | -To add support for a new database backend, you'd need to: |
22 | | - |
23 | | -1. Implement a child class of `vectorcode.database.base.DatabaseConnectorBase` and all |
24 | | - of its abstract methods, and put it under this directory. |
25 | | -2. Add a new entry in the [`get_database_connector`](./__init__.py) function that |
26 | | - initialises your new database connector when the `configs.db_type` points to the new |
27 | | - database. |
28 | | -3. Add tests for your new database connector. The new tests should verify that your |
29 | | - connector correctly converts between the native data structures from the database and |
30 | | - the VectorCode data structures that the rest of the codebase (embedding function, |
31 | | - reranker, etc.)can work with. |
32 | | - |
33 | | -# Implementation Details |
34 | | - |
35 | | -> Apart from this document, you may refer to [the `DatabaseConnectorBase`](./base.py) |
36 | | -> and [the `ChromaDB0Connector`](./chroma0.py) implementations as reference designs of |
37 | | -> a new database connector. |
38 | | -
|
39 | | -In the following sections, I'll use the term _database_ to refer to the actual database |
40 | | -backends (chromadb, pgvector, etc.) that holds the data and performs the CRUD operations, |
41 | | -and the term _connector_ to refer to our compatibility layer (child classes of |
42 | | -`vectorcode.database.base.DatabaseConnectorBase`). |
43 | | - |
44 | | -## Connector Configuration |
45 | | - |
46 | | -The connector has a private attribute (that is, the attribute name is prefixed by a `_`) |
47 | | -`self._configs`. This is a `vectorcode.cli_utils.Config` object that holds various |
48 | | -configuration options, including the database settings used to initialise the |
49 | | -connections to the database and the parameters used for the CRUD operations with the |
50 | | -database. This attribute is **mutable** and _should_ be updated before calling a CRUD |
51 | | -method using the `self.update_config(new_config)` or the `self.replace_config(new_config)` |
52 | | -methods. However, the database-related settings shouldn't be changed. A new connector |
53 | | -instance should be created for that purpose. |
54 | | - |
55 | | -## Database Settings |
56 | | - |
57 | | -The database settings are configured in the JSON configuration file, and will be parsed |
58 | | -and stored in the `config.db_type` and `config.db_params` attributes of the |
59 | | -`self._configs` object. |
60 | | - |
61 | | -The `db_type` attribute is a string that indicates the type of the database backend |
62 | | -(for example, `ChromaDB0` for Chromadb 0.6.3). |
63 | | - |
64 | | -The `db_params` attribute is a dictionary that holds some database-specific settings |
65 | | -(for example, the database API endpoint URL and/or database directory). |
66 | | - |
67 | | -### Documenting the Database Settings |
68 | | - |
69 | | -Please document about the database-specific settings (`db_params`) in the doc-string |
70 | | -of your database connector. This doc-string will be presented in the error message when |
71 | | -the database fails to initialise, and should provide instructions to help the user |
72 | | -debug their configuration. |
73 | | - |
74 | | -## CRUD Operations |
75 | | - |
76 | | -Historically, the parameters of VectorCode operations have been stored and propagated |
77 | | -in a `vectorcode.cli_utils.Config` object. The database connectors continue to follow |
78 | | -this pattern. That is, each of the abstract methods that represent an abstracted |
79 | | -database operation (`query()`, `vectorise()`, `list()`, etc.) should read the necessary |
80 | | -parameters (`project_root`, file paths, query keywords, etc.) from the `self._configs` |
81 | | -attribute. Note that the `self._configs` attribute is mutable, so you should always read |
82 | | -the parameters from it directly for each of the operations. |
83 | | - |
84 | | -> Some methods support keyword arguments that allows temporarily overriding some |
85 | | -> parameters. For example, the `list_collection_content` method supports overriding |
86 | | -> `self._configs` by passing `_collection_id` and `collection_path`. The idea is that |
87 | | -> these methods can usually be used by the implementation of other methods or subcommands |
88 | | -> (for example, `list_collection_content` is used in `count` and `check_orphanes`), |
89 | | -> and being able to pass such parameters are convenient when writing those implementations. |
| 19 | +# Adding a New Database Connector |
| 20 | + |
| 21 | +To add support for a new database backend, you will need to: |
| 22 | + |
| 23 | +1. **Implement a connector class**: Create a new file in this directory and implement a child class of `vectorcode.database.base.DatabaseConnectorBase`. You must implement all of its abstract methods. |
| 24 | +2. **Write tests**: Add tests for your new connector in the `tests/database/` directory. The tests should mock the database's API and verify that your connector correctly converts data between the database's native format and VectorCode's data structures. |
| 25 | +3. **Register your connector**: Add a new entry in the `get_database_connector` function in `src/vectorcode/database/__init__.py` to initialize your new connector. |
| 26 | + |
| 27 | +For a concrete example, refer to the implementation of `DatabaseConnectorBase` and the `ChromaDB0Connector`. |
| 28 | + |
| 29 | +# Key Implementation Details |
| 30 | + |
| 31 | +## The `Config` Object |
| 32 | + |
| 33 | +All settings for a connector are passed through a single `vectorcode.cli_utils.Config` object, which is available as `self._configs`. This includes: |
| 34 | + |
| 35 | +- **Database Settings**: The `db_type` string and `db_params` dictionary are used to configure the connection to the database backend. As a contributor, you should document the specific `db_params` your connector requires in the class's docstring. |
| 36 | +- **Operation Parameters**: Parameters for operations like `query` or `vectorise` are also present in this object. |
| 37 | + |
| 38 | +The `self._configs` attribute is mutable and can be updated for subsequent operations, but the database connection settings (`db_type`, `db_params`) should not be changed after initialization. |
| 39 | + |
| 40 | +## Implementing Abstract Methods |
| 41 | + |
| 42 | +When implementing the abstract methods from `DatabaseConnectorBase`, you should: |
| 43 | + |
| 44 | +- Read the necessary parameters from the `self._configs` object. |
| 45 | +- Perform the corresponding operation against the database. |
| 46 | +- Return data in the format specified by the method's type hints (e.g., `QueryResult`, `CollectionInfo`). |
| 47 | + |
| 48 | +**Please refer to the docstrings in `DatabaseConnectorBase` for the specific API contract of each method.** They contain detailed information about what each method is expected to do and what parameters it uses from the `Config` object. |
| 49 | + |
| 50 | +## Error Handling |
| 51 | + |
| 52 | +If the underlying database library raises a specific exception (e.g., for a collection not being found), you should consider catching it and re-raise it as one of VectorCode's custom database exceptions from `vectorcode.database.errors`. This ensures consistent error handling in the CLI and other clients. |
| 53 | + |
| 54 | +For example: |
| 55 | +```python |
| 56 | +from vectorcode.database.errors import CollectionNotFoundError |
| 57 | + |
| 58 | +try: |
| 59 | + some_action_here() |
| 60 | +except SomeCustomException as e: |
| 61 | + raise CollectionNotFoundError("The collection was not found.") from e |
| 62 | +``` |
| 63 | + |
| 64 | +# Testing |
| 65 | + |
| 66 | +The unit tests for database backends should go under [`tests/database/`](../../../tests/database/). |
| 67 | +The tests should mock the request body and return values of the database. Integration |
| 68 | +tests that interact with an actual database are out of scope for now. |
| 69 | + |
| 70 | +> The tests for the subcommands currently use mocked database connectors. They're not |
| 71 | +> supposed to interact with live databases. |
0 commit comments