Skip to content
This repository was archived by the owner on Oct 30, 2024. It is now read-only.

Commit e7b8cd3

Browse files
committed
chore: docs
1 parent 95e0ca6 commit e7b8cd3

16 files changed

Lines changed: 150 additions & 32 deletions

docs/docs/03-architecture.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,10 @@ You can make use of it in the CLI by setting the `KNOW_SERVER_URL` environment v
2222
## 3. Index Database
2323

2424
The index database is an additional (relational) metadata database which keeps track of all datasets and ingested files and their relationships.
25-
It enables some extra convenience features but does not store the actual data (embeddings).
26-
The current implementation uses **SQLite**.
27-
It's fully embedded and does not require any additional setup.
25+
It enables some extra convenience features but does not store the actual data (content & embeddings).
26+
The current implementation uses **SQLite** by default, which is fully embedded and does not require any additional setup.
2827

2928
## 4. Vector Database
3029

31-
The vector database is the main storage for the embeddings of the ingested documents along with some metadata (e.g. source file information).
32-
The current implementation uses [**chromem-go**](https://github.com/philippgille/chromem-go).
33-
It's fully embedded and does not require any additional setup.
30+
The vector database is the main storage for the content and embeddings of the ingested documents along with some metadata (e.g. source file information).
31+
The current implementation uses [**chromem-go**](https://github.com/philippgille/chromem-go) by default, which is fully embedded and does not require any additional setup.

docs/docs/06-databases.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
title: Index & Vector Databases
3+
---
4+
5+
# Index & Vector Databases
6+
7+
## Index Database
8+
9+
The index database is an additional (relational) metadata database which keeps track of all datasets and ingested files and their relationships.
10+
It enables some extra convenience features but does not store the actual data (content & embeddings).
11+
The current implementation uses **SQLite** by default, which is fully embedded and does not require any additional setup.
12+
13+
You can configure it by setting a database connection string via the `KNOW_INDEX_DSN` environment variable.
14+
The following options are available:
15+
16+
- [SQLite](https://www.sqlite.org/) (default): `KNOW_INDEX_DSN="sqlite:///home/me/mysqlite.db"`
17+
- [Postgres](https://www.postgresql.org/): `KNOW_INDEX_DSN="postgres://knowledge:knowledge@localhost:5432/knowledge?sslmode=disable"`
18+
19+
20+
## Vector Database
21+
22+
The vector database is the main storage for the content and embeddings of the ingested documents along with some metadata (e.g. source file information).
23+
The current implementation uses [**chromem-go**](https://github.com/philippgille/chromem-go) by default, which is fully embedded and does not require any additional setup.
24+
25+
You can configure it by setting a database connection string via the `KNOW_VECTOR_DSN` environment variable.
26+
The following options are available:
27+
28+
- [Chromem-Go](https://github.com/philippgille/chromem-go) (default): `KNOW_VECTOR_DSN="chromem:///path/to/directory"` (Note: we're using a customized fork of chromem-go, so some details may differ from the original project)
29+
- [PGVector](https://github.com/pgvector/pgvector): `KNOW_VECTOR_DSN="pgvector://knowledge:knowledge@localhost:5432/knowledge?sslmode=disable"`
30+
- [SQLite-Vec](https://github.com/asg017/sqlite-vec): `KNOW_VECTOR_DSN="sqlite-vec:///home/me/mysqlite.db"`

docs/docs/99-cmd/knowledge.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,15 @@ knowledge [flags]
2020
* [knowledge askdir](knowledge_askdir.md) - Retrieve sources for a query from a dataset generated from a directory
2121
* [knowledge create-dataset](knowledge_create-dataset.md) - Create a new dataset
2222
* [knowledge delete-dataset](knowledge_delete-dataset.md) - Delete a dataset
23+
* [knowledge delete-file](knowledge_delete-file.md) - Delete a file from a dataset
2324
* [knowledge edit-dataset](knowledge_edit-dataset.md) - Edit an existing dataset
2425
* [knowledge export](knowledge_export.md) - Export one or more datasets as an archive (zip)
2526
* [knowledge get-dataset](knowledge_get-dataset.md) - Get a dataset
27+
* [knowledge get-file](knowledge_get-file.md) - Get a file from a dataset
2628
* [knowledge import](knowledge_import.md) - Import one or more datasets from an archive (zip) (default: all datasets)
2729
* [knowledge ingest](knowledge_ingest.md) - Ingest a file/directory into a dataset
2830
* [knowledge list-datasets](knowledge_list-datasets.md) - List existing datasets
31+
* [knowledge load](knowledge_load.md) - Load a file and transform it to markdown
2932
* [knowledge retrieve](knowledge_retrieve.md) - Retrieve sources for a query from a dataset
3033
* [knowledge version](knowledge_version.md) -
3134

docs/docs/99-cmd/knowledge_askdir.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,20 +15,24 @@ knowledge askdir [--path <path>] <query> [flags]
1515
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
1616
--concurrency int Number of concurrent ingestion processes ($KNOW_INGEST_CONCURRENCY) (default 10)
1717
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
18-
--dsn string Server database connection string (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_DB_DSN)
18+
--dedupe-func string Name of the deduplication function to use ($KNOW_INGEST_DEDUPE_FUNC)
1919
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
20+
--err-on-unsupported-file Error on unsupported file types ($KNOW_INGEST_ERR_ON_UNSUPPORTED_FILE)
2021
--flow string Flow name ($KNOW_FLOW)
21-
--flows-file string Path to a YAML/JSON file containing ingestion/retrieval flows ($KNOW_FLOWS_FILE)
22+
--flows-file string Path to a YAML/JSON file containing ingestion/retrieval flows ($KNOW_FLOWS_FILE) (default "blueprint:default")
2223
-h, --help help for askdir
2324
--ignore-extensions string Comma-separated list of file extensions to ignore ($KNOW_INGEST_IGNORE_EXTENSIONS)
2425
--ignore-file string Path to a .gitignore style file ($KNOW_INGEST_IGNORE_FILE)
2526
--include-hidden Include hidden files and directories ($KNOW_INGEST_INCLUDE_HIDDEN)
27+
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
28+
-w, --keyword strings Keywords that retrieved documents must contain ($KNOW_RETRIEVE_KEYWORDS)
2629
--no-create-dataset Do NOT create the dataset if it doesn't exist ($KNOW_INGEST_NO_CREATE_DATASET)
30+
--no-prune Do not prune deleted files ($KNOW_ASKDIR_NO_PRUNE)
2731
--no-recursive Don't recursively ingest directories ($KNOW_NO_INGEST_RECURSIVE)
2832
-p, --path string Path to the directory to query ($KNOWLEDGE_CLIENT_ASK_DIR_PATH) (default ".")
2933
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
3034
-k, --top-k int Number of sources to retrieve ($KNOWLEDGE_CLIENT_ASK_DIR_TOP_K) (default 10)
31-
--vector-dbpath string VectorDBPath to the vector database (default "$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DB_PATH)
35+
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
3236
```
3337

3438
### SEE ALSO

docs/docs/99-cmd/knowledge_create-dataset.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,11 @@ knowledge create-dataset <dataset-id> [flags]
1414
```
1515
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
1616
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
17-
--dsn string Server database connection string (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_DB_DSN)
1817
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
1918
-h, --help help for create-dataset
19+
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
2020
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
21-
--vector-dbpath string VectorDBPath to the vector database (default "$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DB_PATH)
21+
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
2222
```
2323

2424
### SEE ALSO

docs/docs/99-cmd/knowledge_delete-dataset.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,11 @@ knowledge delete-dataset <dataset-id> [flags]
1414
```
1515
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
1616
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
17-
--dsn string Server database connection string (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_DB_DSN)
1817
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
1918
-h, --help help for delete-dataset
19+
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
2020
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
21-
--vector-dbpath string VectorDBPath to the vector database (default "$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DB_PATH)
21+
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
2222
```
2323

2424
### SEE ALSO
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
title: "knowledge delete-file"
3+
---
4+
## knowledge delete-file
5+
6+
Delete a file from a dataset
7+
8+
```
9+
knowledge delete-file <file-id|file-abs-path> [flags]
10+
```
11+
12+
### Options
13+
14+
```
15+
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
16+
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
17+
-d, --dataset string Target Dataset ID ($KNOWLEDGE_CLIENT_DELETE_FILE_DATASET) (default "default")
18+
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
19+
-h, --help help for delete-file
20+
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
21+
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
22+
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
23+
```
24+
25+
### SEE ALSO
26+
27+
* [knowledge](knowledge.md) -
28+

docs/docs/99-cmd/knowledge_edit-dataset.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,14 @@ knowledge edit-dataset <dataset-id> [flags]
1414
```
1515
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
1616
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
17-
--dsn string Server database connection string (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_DB_DSN)
1817
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
1918
-h, --help help for edit-dataset
19+
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
2020
--replace-metadata strings replace metadata with key-value pairs (existing metadata will be removed) ($KNOWLEDGE_CLIENT_EDIT_DATASET_REPLACE_METADATA)
2121
--reset-metadata reset metadata to default (empty) ($KNOWLEDGE_CLIENT_EDIT_DATASET_RESET_METADATA)
2222
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
2323
--update-metadata strings update metadata key-value pairs (existing metadata will be updated/preserved) ($KNOWLEDGE_CLIENT_EDIT_DATASET_UPDATE_METADATA)
24-
--vector-dbpath string VectorDBPath to the vector database (default "$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DB_PATH)
24+
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
2525
```
2626

2727
### SEE ALSO

docs/docs/99-cmd/knowledge_export.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,12 @@ knowledge export <dataset-id> [<dataset-id>...] [flags]
1515
-a, --all Export all datasets ($KNOWLEDGE_CLIENT_EXPORT_DATASETS_ALL)
1616
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
1717
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
18-
--dsn string Server database connection string (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_DB_DSN)
1918
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
2019
-h, --help help for export
20+
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
2121
--output string Output path ($KNOWLEDGE_CLIENT_EXPORT_DATASETS_OUTPUT) (default ".")
2222
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
23-
--vector-dbpath string VectorDBPath to the vector database (default "$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DB_PATH)
23+
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
2424
```
2525

2626
### SEE ALSO

docs/docs/99-cmd/knowledge_get-dataset.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,12 @@ knowledge get-dataset <dataset-id> [flags]
1515
--archive string Path to the archive file ($KNOWLEDGE_CLIENT_GET_DATASET_ARCHIVE)
1616
--auto-migrate string Auto migrate database ($KNOW_DB_AUTO_MIGRATE) (default "true")
1717
-c, --config-file string Path to the configuration file ($KNOW_CONFIG_FILE)
18-
--dsn string Server database connection string (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_DB_DSN)
1918
--embedding-model-provider string Embedding model provider ($KNOW_EMBEDDING_MODEL_PROVIDER) (default "openai")
2019
-h, --help help for get-dataset
20+
--index-dsn string Index Database Connection string (relational DB) (default "sqlite://$XDG_DATA_HOME/gptscript/knowledge/knowledge.db") ($KNOW_INDEX_DSN)
2121
--no-docs Do not include documents in output (way less verbose) ($KNOWLEDGE_CLIENT_GET_DATASET_NO_DOCS)
2222
--server string URL of the Knowledge API Server ($KNOW_SERVER_URL)
23-
--vector-dbpath string VectorDBPath to the vector database (default "$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DB_PATH)
23+
--vector-dsn string DSN to the vector database (default "chromem:$XDG_DATA_HOME/gptscript/knowledge/vector.db") ($KNOW_VECTOR_DSN)
2424
```
2525

2626
### SEE ALSO

0 commit comments

Comments
 (0)