Skip to content

Commit 1ea93e1

Browse files
committed
docs: refactor format specification structure
1 parent 6aafaee commit 1ea93e1

36 files changed

Lines changed: 242 additions & 490 deletions

docs/clean-full-website.sh

Lines changed: 155 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ docs_src="$script_dir/src"
88
rm -rf "$docs_src/format/catalog"
99
rm -rf "$docs_src/format/namespace"
1010
rm -f "$docs_src/format/layout.png"
11+
rm -f "$docs_src/format/overview.png"
12+
rm -f "$docs_src/format/java-sdk-example.png"
1113
rm -rf "$docs_src/integrations/huggingface"
1214
rm -rf "$docs_src/integrations/duckdb"
1315
rm -rf "$docs_src/integrations/spark"
@@ -21,4 +23,156 @@ rm -f "$docs_src/community/project-specific/ray.md"
2123
rm -f "$docs_src/community/project-specific/spark.md"
2224
rm -f "$docs_src/community/project-specific/trino.md"
2325

24-
git -C "$repo_root" restore --source=HEAD --worktree docs/src/format/.pages docs/src/integrations/.pages 2>/dev/null || true
26+
cat > "$docs_src/format/.pages" <<'EOF'
27+
nav:
28+
- Overview: index.md
29+
- File Format: file
30+
- Table Format: table
31+
- Index Formats: index
32+
- Catalog Specs: catalog
33+
- Namespace Client Spec: namespace
34+
EOF
35+
36+
cat > "$docs_src/integrations/.pages" <<'EOF'
37+
nav:
38+
- Apache DataFusion: datafusion.md
39+
- PostgreSQL: https://github.com/lancedb/pglance
40+
- PyTorch: pytorch.md
41+
- Tensorflow: tensorflow.md
42+
EOF
43+
44+
mkdir -p "$docs_src/format/catalog/dir"
45+
mkdir -p "$docs_src/format/catalog/rest"
46+
mkdir -p "$docs_src/format/namespace/operations/models"
47+
mkdir -p "$docs_src/format/namespace/supported-catalogs"
48+
49+
cat > "$docs_src/format/catalog/.pages" <<'EOF'
50+
title: Catalog Specs
51+
nav:
52+
- Overview: index.md
53+
- Directory Catalog: dir
54+
- REST Catalog: rest
55+
EOF
56+
57+
cat > "$docs_src/format/catalog/index.md" <<'EOF'
58+
# Catalog Specs
59+
60+
This section describes how Lance catalogs organize, discover, and coordinate Lance tables.
61+
62+
When a local `lance-namespace` checkout with split catalog docs is available, `docs/make-full-website.sh` replaces these placeholders with the latest source content.
63+
64+
See also:
65+
66+
- [Directory Catalog](dir/index.md)
67+
- [REST Catalog](rest/index.md)
68+
- [Namespace Client Spec](../namespace/index.md)
69+
EOF
70+
71+
cat > "$docs_src/format/catalog/dir/index.md" <<'EOF'
72+
# Directory Catalog
73+
74+
The Directory Catalog is the storage-native catalog format for Lance.
75+
76+
This placeholder page keeps the local website buildable when external catalog docs are not available.
77+
Run `docs/make-full-website.sh` with `LANCE_NAMESPACE_REPO` pointing at a split `lance-namespace` checkout to populate the full specification.
78+
EOF
79+
80+
cat > "$docs_src/format/catalog/rest/index.md" <<'EOF'
81+
# REST Catalog
82+
83+
The REST Catalog is the service-oriented catalog specification for Lance.
84+
85+
This placeholder page keeps the local website buildable when external catalog docs are not available.
86+
Run `docs/make-full-website.sh` with `LANCE_NAMESPACE_REPO` pointing at a split `lance-namespace` checkout to populate the full specification.
87+
EOF
88+
89+
cat > "$docs_src/format/namespace/.pages" <<'EOF'
90+
title: Namespace Client Spec
91+
nav:
92+
- Overview: index.md
93+
- Objects & Relationships: object-relationship.md
94+
- Operations: operations
95+
- Supported Catalogs: supported-catalogs
96+
EOF
97+
98+
cat > "$docs_src/format/namespace/index.md" <<'EOF'
99+
# Namespace Client Spec
100+
101+
The Lance Namespace Client Spec defines the interface that engines and tools use to discover tables, resolve locations, and coordinate table operations through catalogs.
102+
103+
When a local `lance-namespace` checkout with split namespace docs is available, `docs/make-full-website.sh` replaces these placeholders with the latest source content.
104+
105+
See also:
106+
107+
- [Objects & Relationships](object-relationship.md)
108+
- [Operations](operations/index.md)
109+
- [Supported Catalogs](supported-catalogs/index.md)
110+
EOF
111+
112+
cat > "$docs_src/format/namespace/object-relationship.md" <<'EOF'
113+
# Objects & Relationships
114+
115+
This placeholder page keeps the local website buildable when external namespace docs are not available.
116+
117+
Run `docs/make-full-website.sh` with `LANCE_NAMESPACE_REPO` pointing at a split `lance-namespace` checkout to populate the full object model description.
118+
EOF
119+
120+
cat > "$docs_src/format/namespace/operations/.pages" <<'EOF'
121+
title: Operations
122+
nav:
123+
- Overview: index.md
124+
- Models: models
125+
EOF
126+
127+
cat > "$docs_src/format/namespace/operations/index.md" <<'EOF'
128+
# Operations
129+
130+
This placeholder page keeps the local website buildable when external namespace docs are not available.
131+
132+
Run `docs/make-full-website.sh` with `LANCE_NAMESPACE_REPO` pointing at a split `lance-namespace` checkout to populate the operation reference.
133+
EOF
134+
135+
cat > "$docs_src/format/namespace/operations/models/.pages" <<'EOF'
136+
title: Models
137+
EOF
138+
139+
cat > "$docs_src/format/namespace/operations/models/index.md" <<'EOF'
140+
# Operation Models
141+
142+
This placeholder page keeps the local website buildable when external namespace docs are not available.
143+
EOF
144+
145+
cat > "$docs_src/format/namespace/supported-catalogs/.pages" <<'EOF'
146+
title: Supported Catalogs
147+
nav:
148+
- Overview: index.md
149+
- Lance Directory Catalog: lance-dir.md
150+
- Lance REST Catalog: lance-rest.md
151+
- Template: template.md
152+
EOF
153+
154+
cat > "$docs_src/format/namespace/supported-catalogs/index.md" <<'EOF'
155+
# Supported Catalogs
156+
157+
This placeholder page keeps the local website buildable when external namespace docs are not available.
158+
159+
Run `docs/make-full-website.sh` with `LANCE_NAMESPACE_REPO` and `LANCE_NAMESPACE_IMPLS_REPO` set to local checkouts to populate the full integration catalog list.
160+
EOF
161+
162+
cat > "$docs_src/format/namespace/supported-catalogs/lance-dir.md" <<'EOF'
163+
# Lance Directory Catalog
164+
165+
This placeholder page keeps the local website buildable when external namespace docs are not available.
166+
EOF
167+
168+
cat > "$docs_src/format/namespace/supported-catalogs/lance-rest.md" <<'EOF'
169+
# Lance REST Catalog
170+
171+
This placeholder page keeps the local website buildable when external namespace docs are not available.
172+
EOF
173+
174+
cat > "$docs_src/format/namespace/supported-catalogs/template.md" <<'EOF'
175+
# Template
176+
177+
This placeholder page keeps the local website buildable when external namespace docs are not available.
178+
EOF

docs/make-full-website.sh

Lines changed: 51 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ copy_docs_dir() {
6767

6868
if [ -d "$source_dir" ]; then
6969
mkdir -p "$(dirname "$target_dir")"
70+
rm -rf "$target_dir"
7071
cp -R "$source_dir" "$target_dir"
7172
return 0
7273
fi
@@ -87,21 +88,7 @@ copy_file_if_exists() {
8788
return 1
8889
}
8990

90-
resolve_required_repo_dir() {
91-
local repo_label="$1"
92-
local repo_path="$2"
93-
94-
repo_path=$(normalize_path "$repo_path")
95-
96-
if [ ! -d "$repo_path" ]; then
97-
echo "Expected $repo_label repo at '$repo_path'. Override it with the matching --*-repo option or environment variable." >&2
98-
exit 1
99-
fi
100-
101-
(cd -- "$repo_path" && pwd)
102-
}
103-
104-
resolve_optional_repo_dir() {
91+
resolve_repo_dir() {
10592
local repo_path="$1"
10693

10794
repo_path=$(normalize_path "$repo_path")
@@ -114,59 +101,38 @@ resolve_optional_repo_dir() {
114101
printf '%s\n' "$repo_path"
115102
}
116103

117-
require_dir() {
118-
local required_dir="$1"
119-
local message="$2"
104+
warn_missing_repo() {
105+
local repo_label="$1"
106+
local repo_path="$2"
120107

121-
if [ ! -d "$required_dir" ]; then
122-
echo "$message" >&2
123-
exit 1
124-
fi
108+
echo "Warning: $repo_label repo not found at '$repo_path'; keeping placeholder docs." >&2
125109
}
126110

127-
namespace_repo=$(resolve_required_repo_dir "Lance Namespace" "$namespace_repo_input")
128-
namespace_impls_repo=$(resolve_required_repo_dir "Lance Namespace Impls" "$namespace_impls_repo_input")
129-
spark_repo=$(resolve_optional_repo_dir "$spark_repo_input")
130-
ray_repo=$(resolve_optional_repo_dir "$ray_repo_input")
131-
trino_repo=$(resolve_optional_repo_dir "$trino_repo_input")
132-
duckdb_repo=$(resolve_optional_repo_dir "$duckdb_repo_input")
133-
huggingface_repo=$(resolve_optional_repo_dir "$huggingface_repo_input")
111+
namespace_repo=$(resolve_repo_dir "$namespace_repo_input")
112+
namespace_impls_repo=$(resolve_repo_dir "$namespace_impls_repo_input")
113+
spark_repo=$(resolve_repo_dir "$spark_repo_input")
114+
ray_repo=$(resolve_repo_dir "$ray_repo_input")
115+
trino_repo=$(resolve_repo_dir "$trino_repo_input")
116+
duckdb_repo=$(resolve_repo_dir "$duckdb_repo_input")
117+
huggingface_repo=$(resolve_repo_dir "$huggingface_repo_input")
134118

135119
"$script_dir/clean-full-website.sh"
136120

137-
require_dir \
138-
"$namespace_repo/docs/src/catalog" \
139-
"Expected catalog docs at '$namespace_repo/docs/src/catalog'. Use a Lance Namespace checkout that contains split catalog docs, or override LANCE_NAMESPACE_REPO to point at one."
140-
require_dir \
141-
"$namespace_repo/docs/src/namespace" \
142-
"Expected namespace docs at '$namespace_repo/docs/src/namespace'. Use a Lance Namespace checkout that contains split namespace docs, or override LANCE_NAMESPACE_REPO to point at one."
143-
require_dir \
144-
"$namespace_impls_repo/docs/src" \
145-
"Expected namespace implementation docs at '$namespace_impls_repo/docs/src'. Override LANCE_NAMESPACE_IMPLS_REPO if needed."
146-
147-
for optional_repo in \
148-
"$spark_repo" \
149-
"$ray_repo" \
150-
"$trino_repo" \
151-
"$duckdb_repo" \
152-
"$huggingface_repo"; do
153-
if [ ! -d "$optional_repo" ]; then
154-
echo "Note: optional repo '$optional_repo' not found; skipping its docs." >&2
155-
fi
156-
done
157-
158-
copy_docs_dir "$namespace_repo/docs/src/catalog" "$docs_src/format/catalog"
159-
copy_docs_dir "$namespace_repo/docs/src/namespace" "$docs_src/format/namespace"
121+
if copy_docs_dir "$namespace_repo/docs/src/catalog" "$docs_src/format/catalog"; then
122+
:
123+
else
124+
warn_missing_repo "Lance Namespace catalog docs" "$namespace_repo/docs/src/catalog"
125+
fi
160126

161-
cat > "$docs_src/format/.pages" <<'EOF'
162-
nav:
163-
- Overview: index.md
164-
- File Format: file
165-
- Table Format: table
166-
- Catalog Specs: catalog
167-
- Namespace Client Spec: namespace
168-
EOF
127+
if copy_docs_dir "$namespace_repo/docs/src/namespace" "$docs_src/format/namespace"; then
128+
copy_file_if_exists "$namespace_repo/docs/src/overview.png" "$docs_src/format/overview.png"
129+
copy_file_if_exists "$namespace_repo/docs/src/java-sdk-example.png" "$docs_src/format/java-sdk-example.png"
130+
:
131+
else
132+
warn_missing_repo "Lance Namespace namespace docs" "$namespace_repo/docs/src/namespace"
133+
fi
169134

135+
if [ -f "$docs_src/format/namespace/operations/index.md" ]; then
170136
python3 - <<'PY' "$docs_src/format/namespace/operations/index.md"
171137
from pathlib import Path
172138
import sys
@@ -175,7 +141,9 @@ path = Path(sys.argv[1])
175141
text = path.read_text()
176142
path.write_text(text.replace('[Models](models/)', '[Models](models/index.md)'))
177143
PY
144+
fi
178145

146+
mkdir -p "$docs_src/format/namespace/operations/models"
179147
cat > "$docs_src/format/namespace/operations/models/index.md" <<'EOF'
180148
# Operation Models
181149
@@ -185,6 +153,7 @@ These pages define the JSON schemas referenced by the [operations overview](../i
185153
EOF
186154

187155
supported_catalogs_dir="$docs_src/format/namespace/supported-catalogs"
156+
if [ -d "$namespace_impls_repo/docs/src" ]; then
188157
while IFS= read -r -d '' source_file; do
189158
target_file="$supported_catalogs_dir/$(basename "$source_file")"
190159
if [ -e "$target_file" ]; then
@@ -227,6 +196,9 @@ if not inserted:
227196
228197
target.write_text('\n'.join(output) + '\n')
229198
PY
199+
else
200+
warn_missing_repo "Lance Namespace Impls docs" "$namespace_impls_repo/docs/src"
201+
fi
230202

231203
integration_entries=(
232204
)
@@ -237,22 +209,41 @@ done < <(sed -n '2,$p' "$docs_src/integrations/.pages")
237209

238210
if copy_docs_dir "$duckdb_repo/docs/src" "$docs_src/integrations/duckdb"; then
239211
integration_entries+=(" - DuckDB: duckdb")
212+
else
213+
warn_missing_repo "Lance DuckDB docs" "$duckdb_repo/docs/src"
240214
fi
241215

242216
if copy_docs_dir "$huggingface_repo/docs/src" "$docs_src/integrations/huggingface"; then
243217
integration_entries+=(" - Huggingface: huggingface")
218+
else
219+
warn_missing_repo "Lance HuggingFace docs" "$huggingface_repo/docs/src"
244220
fi
245221

246222
if copy_docs_dir "$spark_repo/docs/src" "$docs_src/integrations/spark"; then
223+
python3 - <<'PY' "$docs_src/integrations/spark/operations/ddl/create-index.md"
224+
from pathlib import Path
225+
import sys
226+
227+
path = Path(sys.argv[1])
228+
if path.exists():
229+
text = path.read_text()
230+
path.write_text(text.replace('https://lance.org/format/table/index/scalar/fts/#tokenizers', 'https://lance.org/format/index/scalar/fts/#tokenizers'))
231+
PY
247232
integration_entries+=(" - Apache Spark: spark")
233+
else
234+
warn_missing_repo "Lance Spark docs" "$spark_repo/docs/src"
248235
fi
249236

250237
if copy_docs_dir "$ray_repo/docs/src" "$docs_src/integrations/ray"; then
251238
integration_entries+=(" - Ray: ray")
239+
else
240+
warn_missing_repo "Lance Ray docs" "$ray_repo/docs/src"
252241
fi
253242

254243
if copy_docs_dir "$trino_repo/docs/src" "$docs_src/integrations/trino"; then
255244
integration_entries+=(" - Trino: trino")
245+
else
246+
warn_missing_repo "Lance Trino docs" "$trino_repo/docs/src"
256247
fi
257248

258249
{

docs/src/format/.pages

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,6 @@ nav:
22
- Overview: index.md
33
- File Format: file
44
- Table Format: table
5-
5+
- Index Formats: index
6+
- Catalog Specs: catalog
7+
- Namespace Client Spec: namespace

docs/src/format/AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ Also see [root AGENTS.md](../../../AGENTS.md) for cross-language standards.
1212

1313
- Explain schema/data evolution with concrete mechanics (field IDs, tombstones, data rewrites) — don't just name operations or defer to external specs.
1414
- Describe all algorithms with full detail: parameters, precision, ordering, normalization bounds, and implementation steps — never reference an algorithm by name alone.
15-
- Index docs must include explicit file schemas and describe reader navigation (page type distinction, root/entry point location) — follow the pattern in `table/index/scalar/bitmap.md`.
15+
- Index docs must include explicit file schemas and describe reader navigation (page type distinction, root/entry point location) — follow the pattern in `index/scalar/bitmap.md`.

0 commit comments

Comments
 (0)