You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix spelling and grammar errors across docs (#208)
* Fix spelling and grammar errors across 16 docs files
Fix 31 spelling/grammar errors found during a full repo scan including
misspellings (ingsted, clusering, seperately, sercrets, etc.), missing
words, wrong articles (an→a before consonants), doubled words, and
incorrect verb forms.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update landing page image
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: prrao87 <35005448+prrao87@users.noreply.github.com>
Copy file name to clipboardExpand all lines: docs/geneva/jobs/startup.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -68,4 +68,4 @@ Here are some steps you can take to pre-warming worker nodes and pods so that ex
68
68
69
69
**Make a warmup call:** Making an initial request to ray will load the pod and zips content to the worker node so that subsequent startups will be fast.
70
70
71
-
**Prevent nodes from auto-scaling down:** During cluster creation, you can specifiy`idle_timeout_seconds` option -- this is the amount of time before an node needs to be idle before it is considered for de-provisioning.
71
+
**Prevent nodes from auto-scaling down:** During cluster creation, you can specify`idle_timeout_seconds` option -- this is the amount of time before a node needs to be idle before it is considered for de-provisioning.
You can also define a **stateful** UDF that retains its state across calls.
85
85
86
-
This can be used to share code and **parameterize your UDFs**. In the example below, the model being used is a parameter that can be specified at UDF registration time. It can also be used to paramterize input column names of `pa.RecordBatch` batch UDFS.
86
+
This can be used to share code and **parameterize your UDFs**. In the example below, the model being used is a parameter that can be specified at UDF registration time. It can also be used to parameterize input column names of `pa.RecordBatch` batch UDFS.
87
87
88
-
This also can be used to **optimize expensive initialization** that may require heavy resource on the distributed workers. For example, this can be used to load an model to the GPU once for all records sent to a worker instead of once per record or per batch of records.
88
+
This also can be used to **optimize expensive initialization** that may require heavy resources on the distributed workers. For example, this can be used to load a model to the GPU once for all records sent to a worker instead of once per record or per batch of records.
89
89
90
90
A stateful UDF is a `Callable` class, with `__call__()` method. The call method can be a scalar function or a batched function.
91
91
@@ -260,11 +260,11 @@ Let's say you backfilled data with your UDF then you noticed that your data has
260
260
2. Most values are correct but some values are incorrect due to a failure in UDF execution.
261
261
3. Values calculated correctly and you want to perform a second pass to fixup some of the values.
262
262
263
-
In scenario 1, you'll most likely want to replaced the UDF with a new version and recalulate all the values. You should perform a `alter_table` and then `backfill`.
263
+
In scenario 1, you'll most likely want to replace the UDF with a new version and recalculate all the values. You should perform a `alter_table` and then `backfill`.
264
264
265
265
In scenario 2, you'll most likely want to re-execute `backfill` to fill in the values. If the error is in your code (certain cases not handled), you can modify the UDF, and perform an `alter_table`, and then `backfill` with some filters.
266
266
267
-
In scenario 3, you have a few options. A) You could `alter` your UDF and include the fixup operations in the UDF. You'd `alter_table` and then `backfill` recalculating all the values. B) You could have a chain of computed columns -- create a new column, calculate the "fixed" up values and have your application use the new column or a combination of the original column. This is similar to A but does not recalulate A and can incur more storage. C) You could `update` the values in the the column with the fixed up values. This may be expedient but also sacrifices reproducability.
267
+
In scenario 3, you have a few options. A) You could `alter` your UDF and include the fixup operations in the UDF. You'd `alter_table` and then `backfill` recalculating all the values. B) You could have a chain of computed columns -- create a new column, calculate the "fixed" up values and have your application use the new column or a combination of the original column. This is similar to A but does not recalculate A and can incur more storage. C) You could `update` the values in the column with the fixed up values. This may be expedient but also sacrifices reproducibility.
268
268
269
269
The next section shows you how to change your column definition by `alter`ing the UDF.
Copy file name to clipboardExpand all lines: docs/integrations/ai/genkit.mdx
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,14 +41,14 @@ This'll add LanceDB as a retriever and indexer to the genkit instance. You can s
41
41
Let's see the raw retrieval results
42
42
43
43
<imgwidth="1710"alt="Screenshot 2025-05-11 at 7 21 05 PM"src="https://github.com/user-attachments/assets/b8d356ed-8421-4790-8fc0-d6af563b9657" />
44
-
On running this query, you'll 5 results fetched from the lancedb table, where each result looks something like this:
44
+
On running this query, you'll get 5 results fetched from the lancedb table, where each result looks something like this:
45
45
<imgwidth="1417"alt="Screenshot 2025-05-11 at 7 21 18 PM"src="https://github.com/user-attachments/assets/77429525-36e2-4da6-a694-e58c1cf9eb83" />
46
46
47
47
48
48
49
49
## Creating a custom RAG flow
50
50
51
-
Now that we've seen how you can use LanceDB for in a genkit pipeline, let's refine the flow and create a RAG. A RAG flow will consist of an index and a retreiver with its outputs postprocessed an fed into an LLM for final response
51
+
Now that we've seen how you can use LanceDB in a Genkit pipeline, let's refine the flow and create a RAG. A RAG flow will consist of an index and a retriever with its outputs postprocessed and fed into an LLM for final response
52
52
53
53
### Creating custom indexer flows
54
54
You can also create custom indexer flows, utilizing more options and features provided by LanceDB.
@@ -68,6 +68,6 @@ You can also create custom retriever flows, utilizing more options and features
Copy file name to clipboardExpand all lines: docs/integrations/ai/langchain.mdx
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -97,7 +97,7 @@ This method creates a scalar(for non-vector cols) or a vector index on a table.
97
97
|`index_cache_size`|`Optional[int]`|Size of the index cache.|`None`|
98
98
|`name`|`Optional[str]`|Name of the table to create index on.|`None`|
99
99
100
-
For index creation make sure your table has enough data in it. An ANN index is ususally not needed for datasets ~100K vectors. For large-scale (>1M) or higher dimension vectors, it is beneficial to create an ANN index.
100
+
For index creation make sure your table has enough data in it. An ANN index is usually not needed for datasets ~100K vectors. For large-scale (>1M) or higher dimension vectors, it is beneficial to create an ANN index.
Copy file name to clipboardExpand all lines: docs/integrations/ai/synthetic-data-kit.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ description: "Use Meta Llama's Synthetic Data Kit with LanceDB to generate high-
7
7
8
8
9
9
10
-
[Sythetic Data Kit](https://github.com/meta-llama/synthetic-data-kit) is a tool from Meta LLAMA that helps you generate high-quality synthetic datasets for fine-tuning large language models (LLMs). It simplifies the process of preparing data for fine-tuning by providing a command-line interface (CLI) with a modular four-command flow.
10
+
[Synthetic Data Kit](https://github.com/meta-llama/synthetic-data-kit) is a tool from Meta LLAMA that helps you generate high-quality synthetic datasets for fine-tuning large language models (LLMs). It simplifies the process of preparing data for fine-tuning by providing a command-line interface (CLI) with a modular four-command flow.
11
11
12
12
One of the key features of the `synthetic-data-kit` is its use of the Lance format for storing and ingesting datasets. This allows for efficient storage and retrieval of data, which is crucial when working with large datasets.
Copy file name to clipboardExpand all lines: docs/integrations/data/dlt.mdx
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,9 +40,9 @@ In this example, we will be fetching movie information from the [Open Movie Data
40
40
41
41
3.**Specify necessary credentials and/or embedding model details:**
42
42
43
-
In order to fetch data from the OMDb API, you will need to pass a valid API key into your pipeline. Depending on whether you're using LanceDB OSS or LanceDB Enterprise, you also may need to provide the necessary credentials to connect to the LanceDB instance. These can be pasted inside `.dlt/sercrets.toml`.
43
+
In order to fetch data from the OMDb API, you will need to pass a valid API key into your pipeline. Depending on whether you're using LanceDB OSS or LanceDB Enterprise, you also may need to provide the necessary credentials to connect to the LanceDB instance. These can be pasted inside `.dlt/secrets.toml`.
44
44
45
-
dlt's LanceDB integration also allows you to automatically embed the data during ingestion. Depending on the embedding model chosen, you may need to paste the necessary credentials inside `.dlt/sercrets.toml`:
45
+
dlt's LanceDB integration also allows you to automatically embed the data during ingestion. Depending on the embedding model chosen, you may need to paste the necessary credentials inside `.dlt/secrets.toml`:
46
46
```toml
47
47
[sources.rest_api]
48
48
api_key = "api_key"# Enter the API key for the OMDb API
| "`retrieval_query`" | Specifies the given text is a query in a search/retrieval setting. |
14
-
| "`retrieval_document`" | Specifies the given text is a document in a search/retrieval setting. Using this task type requires a title but is automatically proided by Embeddings API |
14
+
| "`retrieval_document`" | Specifies the given text is a document in a search/retrieval setting. Using this task type requires a title but is automatically provided by Embeddings API |
15
15
| "`semantic_similarity`" | Specifies the given text will be used for Semantic Textual Similarity (STS). |
16
16
| "`classification`" | Specifies that the embeddings will be used for classification. |
17
-
| "`clusering`" | Specifies that the embeddings will be used for clustering. |
17
+
| "`clustering`" | Specifies that the embeddings will be used for clustering. |
0 commit comments