Skip to content

Commit 8ed60a3

Browse files
authored
docs: Add API doc links to Geneva docs (#203)
1 parent 74a79fe commit 8ed60a3

20 files changed

Lines changed: 103 additions & 1 deletion

docs/geneva/index.mdx

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,3 +56,10 @@ Visit the following pages to learn more about featuring engineering in LanceDB E
5656
- **UDFs**: [Using UDFs](/geneva/udfs/udfs) · [Blob helpers](/geneva/udfs/blobs/) · [Error handling](/geneva/udfs/error_handling) · [Advanced configuration](/geneva/udfs/advanced-configuration)
5757
- **Jobs**: [Backfilling](/geneva/jobs/backfilling/) · [Startup optimizations](/geneva/jobs/startup/) · [Materialized views](/geneva/jobs/materialized-views/) · [Execution contexts](/geneva/jobs/contexts/) · [Geneva console](/geneva/jobs/console) · [Performance](/geneva/jobs/performance/)
5858
- **Deployment**: [Deployment overview](/geneva/deployment/) · [Helm deployment](/geneva/deployment/helm/) · [Troubleshooting](/geneva/deployment/troubleshooting/)
59+
60+
## API Reference
61+
62+
- [`geneva.connect()`](https://lancedb.github.io/geneva/api/) — connect to a Geneva database
63+
- [Connection](https://lancedb.github.io/geneva/api/connection/) — manage tables, views, jobs, clusters, and manifests
64+
- [Table](https://lancedb.github.io/geneva/api/table/) — add columns, backfill, search, and manage table data
65+
- [UDF](https://lancedb.github.io/geneva/api/udf/) — define user-defined functions for feature computation

docs/geneva/jobs/backfilling.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,3 +125,5 @@ tbl.backfill("embedding", where="content is not null and embeddding is not null"
125125
Reference:
126126
* [`backfill` API](https://lancedb.github.io/geneva/api/table/#geneva.table.Table.backfill)
127127
* [`backfill_async` API](https://lancedb.github.io/geneva/api/table/#geneva.table.Table.backfill_async)
128+
* [`plan_backfill` API](https://lancedb.github.io/geneva/api/table/#geneva.table.Table.plan_backfill)
129+
* [UDF](https://lancedb.github.io/geneva/api/udf/)`@udf` decorator and UDF configuration options including `checkpoint_size`

docs/geneva/jobs/conflicts.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,3 +117,7 @@ If using LanceDB Enterprise which has auto-compaction enabled, consider disablin
117117

118118
- [Backfilling](/geneva/jobs/backfilling) - Triggering and configuring backfill operations
119119
- [Advanced Configuration](/geneva/udfs/advanced-configuration) - Environment variables for retry behavior
120+
121+
## API Reference
122+
123+
- [Table](https://lancedb.github.io/geneva/api/table/)`backfill()`, `add()`, `merge_insert()`, and other table mutation methods

docs/geneva/jobs/console.mdx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,3 +44,9 @@ See the Geneva clusters that you have defined to run jobs. Because clusters can
4444

4545
### Manifests
4646
See the Manifests you've defined and what packages/dependencies they contain. As with clusters, manifests are reusable, so it's easy to start a new job with the same dependencies as an old one by just specifying the manifest name.
47+
48+
## API Reference
49+
50+
- [Connection](https://lancedb.github.io/geneva/api/connection/)`get_job()`, `list_jobs()`, `list_clusters()`, `list_manifests()`
51+
- [Cluster](https://lancedb.github.io/geneva/api/cluster/)`GenevaCluster` and cluster configuration classes
52+
- [Manifest](https://lancedb.github.io/geneva/api/manifest/)`GenevaManifest` and manifest builder classes

docs/geneva/jobs/contexts.mdx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,3 +296,9 @@ ctx.__enter__()
296296
ctx.__exit__(None,None,None)
297297
```
298298
</CodeGroup>
299+
300+
## API Reference
301+
302+
- [Connection](https://lancedb.github.io/geneva/api/connection/)`context()`, `local_ray_context()`, `list_clusters()`, `list_manifests()`
303+
- [Cluster](https://lancedb.github.io/geneva/api/cluster/)`KubeRayClusterBuilder`, `LocalRayClusterBuilder`, `ExternalRayClusterBuilder`, and worker configuration classes
304+
- [Manifest](https://lancedb.github.io/geneva/api/manifest/)`PipManifestBuilder`, `CondaManifestBuilder`, `SiteManifestBuilder`

docs/geneva/jobs/index.mdx

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,10 @@ Set up and access the Geneva Console for monitoring and managing Geneva jobs, cl
4141
4. **Monitor performance** and optimize based on usage patterns
4242

4343
For detailed information about each job type and execution context, explore the documentation in this section.
44+
45+
## API Reference
46+
47+
- [Table](https://lancedb.github.io/geneva/api/table/)`backfill()`, `backfill_async()`, `refresh()`, `plan_backfill()`, `plan_refresh()`, and other job-triggering methods
48+
- [Connection](https://lancedb.github.io/geneva/api/connection/)`get_job()`, `list_jobs()`, `context()`, and `local_ray_context()`
49+
- [Cluster](https://lancedb.github.io/geneva/api/cluster/) — configure KubeRay and local Ray execution backends
50+
- [Manifest](https://lancedb.github.io/geneva/api/manifest/) — package Python environments for remote workers

docs/geneva/jobs/lifecycle.mdx

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,3 +119,8 @@ By default, checkpoints are stored in a `_ckp/` subdirectory inside the table's
119119
### Resuming Failed Jobs
120120

121121
To resume a failed job, simply re-run the same backfill or refresh command. The job will automatically detect existing checkpoints, skip already-processed fragments, and continue from where it left off.
122+
123+
## API Reference
124+
125+
- [Table](https://lancedb.github.io/geneva/api/table/)`backfill()`, `backfill_async()`, `refresh()`, `JobFuture`
126+
- [Connection](https://lancedb.github.io/geneva/api/connection/)`get_job()`, `list_jobs()`

docs/geneva/jobs/materialized-views.mdx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,3 +106,9 @@ No. The UDF does not but any UDF calculated values in the original table come to
106106
### On MV refresh, do we force materialization of UDFs cols on the source table?
107107

108108
No. They are managed at the source table only. If it is null the null values are propagated. Future options may force materialization/backfill "recursively".
109+
110+
## API Reference
111+
112+
- [Connection](https://lancedb.github.io/geneva/api/connection/)`create_materialized_view()`, `create_view()`
113+
- [Table](https://lancedb.github.io/geneva/api/table/)`refresh()`, `plan_refresh()`
114+
- [Query](https://lancedb.github.io/geneva/api/query/)`create_materialized_view()` on query builder

docs/geneva/jobs/performance.mdx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,3 +158,9 @@ Certain jobs that take a small dataset and expand it may appear as if the writer
158158
An example is a table that contains a list of URLs pointing to large media files. This list is relatively small (&lt; 100MB) and can fit into a single fragment. A UDF that downloads will fetch all the data and then attempt to write all of it out through the single writer. This single writer can then be responsible for serially writing out 500+GB of data to a single file!
159159

160160
To mitigate this, you can load your initial table so that there will be multiple fragments. Each fragment with new outputs can be written in parallel with higher write throughput.
161+
162+
## API Reference
163+
164+
- [Cluster](https://lancedb.github.io/geneva/api/cluster/)`KubeRayClusterBuilder`, `CpuWorkerBuilder`, `GpuWorkerBuilder` — configure CPU/GPU/memory resources
165+
- [Table](https://lancedb.github.io/geneva/api/table/)`backfill()`, `compact_files()`, `optimize()`
166+
- [UDF](https://lancedb.github.io/geneva/api/udf/)`@udf` decorator options: `num_cpus`, `num_gpus`, `memory`, `batch_size`

docs/geneva/jobs/troubleshooting.mdx

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,3 +180,10 @@ Refreshing a materialized view runs admission control for each UDF in the view.
180180
- **Versions** – Same Ray version on client and cluster; same Python minor (e.g. 3.10.x) on both. See [Troubleshooting Geneva Deployments](/geneva/deployment/troubleshooting#confirming-dependency-versions).
181181
- **Remote execution** – Use `ray.available_resources()` and a simple `@ray.remote` task to confirm the cluster is reachable and has the expected CPUs/GPUs/memory.
182182
- **Permissions** – Run a minimal remote task that `import geneva` and touches the same bucket/path as your job to surface permission issues early.
183+
184+
## API Reference
185+
186+
- [UDF](https://lancedb.github.io/geneva/api/udf/)`@udf` decorator options: `num_gpus`, `num_cpus`, `memory`
187+
- [Cluster](https://lancedb.github.io/geneva/api/cluster/)`KubeRayClusterBuilder`, `CpuWorkerBuilder`, `GpuWorkerBuilder`
188+
- [Table](https://lancedb.github.io/geneva/api/table/)`backfill()`, `refresh()`
189+
- [Error Handling](https://lancedb.github.io/geneva/api/error_handling/)`FatalWorkerOOMError`, `FatalWorkerCrashError`, and other worker error types

0 commit comments

Comments
 (0)