Skip to content

[fix](doc) clean up table-design, vector index, lakehouse, and view docs#3718

Open
boluor wants to merge 1 commit into
apache:masterfrom
boluor:fix-batch31-final-simple
Open

[fix](doc) clean up table-design, vector index, lakehouse, and view docs#3718
boluor wants to merge 1 commit into
apache:masterfrom
boluor:fix-batch31-final-simple

Conversation

@boluor
Copy link
Copy Markdown
Contributor

@boluor boluor commented May 20, 2026

Summary

A pass of small correctness fixes across docs/, versioned_docs/version-{2.1,3.x,4.x}/, and the matching i18n/zh-CN/ files.

sql-manual/.../view/SHOW-VIEW.md

  • Replace the informal grammar: label with a proper ## Syntax heading.
  • Drop the empty trailing ## Best Practice section.

table-design/data-type.md — DECIMAL row

The cell mixed two bugs:

  • It closed the <ul> with another opening <ul> instead of </ul>, so the list never closed in HTML.
  • The third bullet read When 16 < precision <= 38 (overlapping with the previous bullet). It should be When 18 < precision <= 38, matching the boundaries used in the version-3.x / version-2.1 docs.

Also close each <li> explicitly to match the corrected </ul>.

table-design/index/prefix-index.md

Frontmatter keywords listed Prefix Index and Sort Key twice each — dedupe.

table-design/index/ngram-bloomfilter-index.md

The example SELECT projects any(product_title), but the result-table headers shown immediately below are any_value(product_title). Rewrite the SELECT to any_value(product_title) so the displayed result actually matches the query a reader would copy.

table-design/index/inverted-index/custom-analyzer.md

  • Strip a stray </content></invoke> block at the bottom of the file (looks like residue from a previous edit) — version-4.x and current were affected.
  • Dedupe custom analyzer in the frontmatter keywords.
  • Rename min_ngram / max_ngram to min_gram / max_gram in the parameter table and the bulleted parameter descriptions. Every working example in the same file, and the underlying tokenizer parameters Doris uses, are min_gram / max_gram.

table-design/index/vector-index/hnsw.md

The prose said "…takes about 1.2x the memory…" while the formula on the very next line, and the example calculation that arrives at 650 MB, both use 1.3x. Update the prose to 1.3x.

table-design/index/vector-index/ivf.md

For 128-dim / 1M rows, the formula above the table evaluates to "≈ 500 MB" while the reference-values table directly below says 496 MB. Align the table cell to 500 MB so the two presentations agree.

table-design/index/vector-index/overview.md

The Prepared Statement example used FROM l2_distance_approximate — that's the scalar function being projected, not a table. The intended table (used by all surrounding examples in this file) is sift_1M.

data-operate/import/data-source/bigquery.md

https://cloud.google.com/bigquerydocs/exporting-data is a 404. The correct path is https://cloud.google.com/bigquery/docs/exporting-data.

lakehouse/catalogs/jdbc-catalog-overview.md

The Maven Central example URL was given over plain HTTP (http://repo1.maven.org/...). repo1.maven.org redirects/enforces HTTPS, so the example fails as-shown — change to https://.

Test plan

  • Apply each substitution to the equivalent files in docs/, versioned_docs/, and i18n/zh-CN/ where the same content exists.
  • Spot-check rendered diffs for each finding.
  • CI build (docusaurus + sidebar checks).

- SHOW-VIEW.md: replace informal `grammar:` line with `## Syntax` heading and remove empty `## Best Practice` section.
- data-type.md DECIMAL row: close `<ul>` properly (was a second `<ul>`), close each `<li>`, and fix `16 < precision <= 38` to `18 < precision <= 38`.
- prefix-index.md: dedupe `Prefix Index` and `Sort Key` in frontmatter keywords.
- ngram-bloomfilter-index.md: rewrite SELECT to use `any_value(product_title)` so it matches the result-table headers shown below.
- custom-analyzer.md: remove stray `</content></invoke>` trailer left from a previous edit, dedupe `custom analyzer` keyword, and rename `min_ngram`/`max_ngram` in the parameter table and descriptions to `min_gram`/`max_gram` to match the working examples and the Lucene tokenizer parameters Doris uses.
- hnsw.md: memory-footprint sentence said "1.2x" while the formula and the example calculation use 1.3x — change the sentence to 1.3x.
- ivf.md: reference table cell for 128-dim / 1M was 496 MB, mismatching the "≈ 500 MB" formula immediately above — align to 500 MB.
- vector-index/overview.md: the Prepared Statement example used `FROM l2_distance_approximate` (a scalar function, not a table) instead of `FROM sift_1M` like the surrounding examples.
- bigquery.md: fix malformed Google Cloud docs URL `cloud.google.com/bigquerydocs/exporting-data` → `cloud.google.com/bigquery/docs/exporting-data`.
- jdbc-catalog-overview.md: upgrade Maven Central example URL from `http://repo1.maven.org` to `https://`.

Applied consistently across `docs/`, the matching `versioned_docs/version-{2.1,3.x,4.x}/` trees, and the corresponding `i18n/zh-CN/` files where the same content exists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@boluor boluor requested a deployment to Production May 21, 2026 05:04 — with GitHub Actions In progress
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant