Skip to content

Commit 74a79fe

Browse files
justinrmillerclaudeprrao87
authored
Fix spelling and grammar errors across docs (#208)
* Fix spelling and grammar errors across 16 docs files Fix 31 spelling/grammar errors found during a full repo scan including misspellings (ingsted, clusering, seperately, sercrets, etc.), missing words, wrong articles (an→a before consonants), doubled words, and incorrect verb forms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update landing page image --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: prrao87 <35005448+prrao87@users.noreply.github.com>
1 parent 8d81547 commit 74a79fe

17 files changed

Lines changed: 24 additions & 24 deletions

File tree

docs/embedding/quickstart.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ from lancedb.embeddings import get_registry
2626
- `lancedb`: The main database connection and operations
2727
- `LanceModel`: Pydantic model for defining table schemas
2828
- `Vector`: Field type for storing vector embeddings
29-
- `get_registry()`: Access to the embedding function registry. It has all the supported as well custom embedding functions registered by the user
29+
- `get_registry()`: Access to the embedding function registry. It has all the supported as well as custom embedding functions registered by the user
3030

3131
## Step 2: Connect to LanceDB
3232

docs/enterprise/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ ensures complete data sovereignty and high performance at scale.
3535
### 2. Best performance for petabyte scale
3636

3737
LanceDB OSS is built on the highly-efficient Lance format and offers extensive features out of the box. Our
38-
Enterprise solution amplifies these benefits by means of a custom-build distributed system.
38+
Enterprise solution amplifies these benefits by means of a custom-built distributed system.
3939

4040
| Benefit | Description |
4141
|:--------|:------------|

docs/geneva/jobs/startup.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,4 +68,4 @@ Here are some steps you can take to pre-warming worker nodes and pods so that ex
6868

6969
**Make a warmup call:** Making an initial request to ray will load the pod and zips content to the worker node so that subsequent startups will be fast.
7070

71-
**Prevent nodes from auto-scaling down:** During cluster creation, you can specifiy `idle_timeout_seconds` option -- this is the amount of time before an node needs to be idle before it is considered for de-provisioning.
71+
**Prevent nodes from auto-scaling down:** During cluster creation, you can specify `idle_timeout_seconds` option -- this is the amount of time before a node needs to be idle before it is considered for de-provisioning.

docs/geneva/udfs/udfs.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -83,9 +83,9 @@ def sum_vals(vals: np.ndarray | None) -> int | None:
8383

8484
You can also define a **stateful** UDF that retains its state across calls.
8585

86-
This can be used to share code and **parameterize your UDFs**. In the example below, the model being used is a parameter that can be specified at UDF registration time. It can also be used to paramterize input column names of `pa.RecordBatch` batch UDFS.
86+
This can be used to share code and **parameterize your UDFs**. In the example below, the model being used is a parameter that can be specified at UDF registration time. It can also be used to parameterize input column names of `pa.RecordBatch` batch UDFS.
8787

88-
This also can be used to **optimize expensive initialization** that may require heavy resource on the distributed workers. For example, this can be used to load an model to the GPU once for all records sent to a worker instead of once per record or per batch of records.
88+
This also can be used to **optimize expensive initialization** that may require heavy resources on the distributed workers. For example, this can be used to load a model to the GPU once for all records sent to a worker instead of once per record or per batch of records.
8989

9090
A stateful UDF is a `Callable` class, with `__call__()` method. The call method can be a scalar function or a batched function.
9191

@@ -260,11 +260,11 @@ Let's say you backfilled data with your UDF then you noticed that your data has
260260
2. Most values are correct but some values are incorrect due to a failure in UDF execution.
261261
3. Values calculated correctly and you want to perform a second pass to fixup some of the values.
262262

263-
In scenario 1, you'll most likely want to replaced the UDF with a new version and recalulate all the values. You should perform a `alter_table` and then `backfill`.
263+
In scenario 1, you'll most likely want to replace the UDF with a new version and recalculate all the values. You should perform a `alter_table` and then `backfill`.
264264

265265
In scenario 2, you'll most likely want to re-execute `backfill` to fill in the values. If the error is in your code (certain cases not handled), you can modify the UDF, and perform an `alter_table`, and then `backfill` with some filters.
266266

267-
In scenario 3, you have a few options. A) You could `alter` your UDF and include the fixup operations in the UDF. You'd `alter_table` and then `backfill` recalculating all the values. B) You could have a chain of computed columns -- create a new column, calculate the "fixed" up values and have your application use the new column or a combination of the original column. This is similar to A but does not recalulate A and can incur more storage. C) You could `update` the values in the the column with the fixed up values. This may be expedient but also sacrifices reproducability.
267+
In scenario 3, you have a few options. A) You could `alter` your UDF and include the fixup operations in the UDF. You'd `alter_table` and then `backfill` recalculating all the values. B) You could have a chain of computed columns -- create a new column, calculate the "fixed" up values and have your application use the new column or a combination of the original column. This is similar to A but does not recalculate A and can incur more storage. C) You could `update` the values in the column with the fixed up values. This may be expedient but also sacrifices reproducibility.
268268

269269
The next section shows you how to change your column definition by `alter`ing the UDF.
270270

docs/indexing/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ HNSW builds on k-ANN in two main ways:
140140

141141
This recursive structure can be thought of as separating into layers:
142142

143-
* At the bottom-most layer, an k-ANN graph on the whole dataset is present.
143+
* At the bottom-most layer, a k-ANN graph on the whole dataset is present.
144144
* At the second layer, a k-ANN graph on a fraction of the dataset (e.g. 10%) is present.
145145
* At the Lth layer, a k-ANN graph is present. It is over a (constant) fraction (e.g. 10%) of the vectors/vertices present in the L-1th layer.
146146

docs/integrations/ai/genkit.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,14 +41,14 @@ This'll add LanceDB as a retriever and indexer to the genkit instance. You can s
4141
Let's see the raw retrieval results
4242

4343
<img width="1710" alt="Screenshot 2025-05-11 at 7 21 05 PM" src="https://github.com/user-attachments/assets/b8d356ed-8421-4790-8fc0-d6af563b9657" />
44-
On running this query, you'll 5 results fetched from the lancedb table, where each result looks something like this:
44+
On running this query, you'll get 5 results fetched from the lancedb table, where each result looks something like this:
4545
<img width="1417" alt="Screenshot 2025-05-11 at 7 21 18 PM" src="https://github.com/user-attachments/assets/77429525-36e2-4da6-a694-e58c1cf9eb83" />
4646

4747

4848

4949
## Creating a custom RAG flow
5050

51-
Now that we've seen how you can use LanceDB for in a genkit pipeline, let's refine the flow and create a RAG. A RAG flow will consist of an index and a retreiver with its outputs postprocessed an fed into an LLM for final response
51+
Now that we've seen how you can use LanceDB in a Genkit pipeline, let's refine the flow and create a RAG. A RAG flow will consist of an index and a retriever with its outputs postprocessed and fed into an LLM for final response
5252

5353
### Creating custom indexer flows
5454
You can also create custom indexer flows, utilizing more options and features provided by LanceDB.
@@ -68,6 +68,6 @@ You can also create custom retriever flows, utilizing more options and features
6868
<CodeBlock filename="TypeScript" language="TypeScript" icon="square-js">
6969
{TsFrameworksGenkitCustomRetriever}
7070
</CodeBlock>
71-
Now using our retrieval flow, we can ask question about the ingsted PDF
71+
Now using our retrieval flow, we can ask a question about the ingested PDF
7272
<img width="1306" alt="Screenshot 2025-05-11 at 7 18 45 PM" src="https://github.com/user-attachments/assets/86c66b13-7c12-4d5f-9d81-ae36bfb1c346" />
7373

docs/integrations/ai/langchain.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ This method creates a scalar(for non-vector cols) or a vector index on a table.
9797
|`index_cache_size`|`Optional[int]` |Size of the index cache.|`None`|
9898
|`name`|`Optional[str]` |Name of the table to create index on.|`None`|
9999

100-
For index creation make sure your table has enough data in it. An ANN index is ususally not needed for datasets ~100K vectors. For large-scale (>1M) or higher dimension vectors, it is beneficial to create an ANN index.
100+
For index creation make sure your table has enough data in it. An ANN index is usually not needed for datasets ~100K vectors. For large-scale (>1M) or higher dimension vectors, it is beneficial to create an ANN index.
101101

102102
<CodeBlock filename="Python" language="Python" icon="python">
103103
{PyFrameworksLangchainCreateIndex}
@@ -209,7 +209,7 @@ Similarly, `max_marginal_relevance_search_by_vector()` function returns docs mos
209209

210210
##### add_images()
211211

212-
This method ddds images by automatically creating their embeddings and adds them to the vectorstore.
212+
This method adds images by automatically creating their embeddings and adds them to the vectorstore.
213213

214214
| Name | Type | Purpose | Default |
215215
|------------|-------------------------------|--------------------------------|---------|

docs/integrations/ai/synthetic-data-kit.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ description: "Use Meta Llama's Synthetic Data Kit with LanceDB to generate high-
77

88

99

10-
[Sythetic Data Kit](https://github.com/meta-llama/synthetic-data-kit) is a tool from Meta LLAMA that helps you generate high-quality synthetic datasets for fine-tuning large language models (LLMs). It simplifies the process of preparing data for fine-tuning by providing a command-line interface (CLI) with a modular four-command flow.
10+
[Synthetic Data Kit](https://github.com/meta-llama/synthetic-data-kit) is a tool from Meta LLAMA that helps you generate high-quality synthetic datasets for fine-tuning large language models (LLMs). It simplifies the process of preparing data for fine-tuning by providing a command-line interface (CLI) with a modular four-command flow.
1111

1212
One of the key features of the `synthetic-data-kit` is its use of the Lance format for storing and ingesting datasets. This allows for efficient storage and retrieval of data, which is crucial when working with large datasets.
1313

docs/integrations/data/dlt.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,9 +40,9 @@ In this example, we will be fetching movie information from the [Open Movie Data
4040

4141
3. **Specify necessary credentials and/or embedding model details:**
4242

43-
In order to fetch data from the OMDb API, you will need to pass a valid API key into your pipeline. Depending on whether you're using LanceDB OSS or LanceDB Enterprise, you also may need to provide the necessary credentials to connect to the LanceDB instance. These can be pasted inside `.dlt/sercrets.toml`.
43+
In order to fetch data from the OMDb API, you will need to pass a valid API key into your pipeline. Depending on whether you're using LanceDB OSS or LanceDB Enterprise, you also may need to provide the necessary credentials to connect to the LanceDB instance. These can be pasted inside `.dlt/secrets.toml`.
4444

45-
dlt's LanceDB integration also allows you to automatically embed the data during ingestion. Depending on the embedding model chosen, you may need to paste the necessary credentials inside `.dlt/sercrets.toml`:
45+
dlt's LanceDB integration also allows you to automatically embed the data during ingestion. Depending on the embedding model chosen, you may need to paste the necessary credentials inside `.dlt/secrets.toml`:
4646
```toml
4747
[sources.rest_api]
4848
api_key = "api_key" # Enter the API key for the OMDb API

docs/integrations/embedding/gemini.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@ The Gemini Embedding Model API supports various task types:
1111
| Task Type | Description |
1212
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
1313
| "`retrieval_query`" | Specifies the given text is a query in a search/retrieval setting. |
14-
| "`retrieval_document`" | Specifies the given text is a document in a search/retrieval setting. Using this task type requires a title but is automatically proided by Embeddings API |
14+
| "`retrieval_document`" | Specifies the given text is a document in a search/retrieval setting. Using this task type requires a title but is automatically provided by Embeddings API |
1515
| "`semantic_similarity`" | Specifies the given text will be used for Semantic Textual Similarity (STS). |
1616
| "`classification`" | Specifies that the embeddings will be used for classification. |
17-
| "`clusering`" | Specifies that the embeddings will be used for clustering. |
17+
| "`clustering`" | Specifies that the embeddings will be used for clustering. |
1818

1919

2020
Usage Example:

0 commit comments

Comments
 (0)