Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/data-operate/import/data-source/bigquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ AS (

#### 2.2 Inspect the exported files on GCS

The command above exports `sales_data` to GCS. Each partition produces one or more files with incrementing file names. For details, see [exporting-data](https://cloud.google.com/bigquerydocs/exporting-data#exporting_data_into_one_or_more_files).
The command above exports `sales_data` to GCS. Each partition produces one or more files with incrementing file names. For details, see [exporting-data](https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files).

![gcs_export](/images/data-operate/gcs_export.png)

Expand Down
2 changes: 1 addition & 1 deletion docs/lakehouse/catalogs/jdbc-catalog-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (

2. Local absolute path. For example, `file:///path/to/mysql-connector-j-8.3.0.jar`. The Jar file must be pre-placed in the specified path on all FE/BE nodes.

3. HTTP URL. For example: `http://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.3.0/mysql-connector-j-8.3.0.jar`. The system will download the driver file from this HTTP address. Only supports HTTP services without authentication.
3. HTTP URL. For example: `https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.3.0/mysql-connector-j-8.3.0.jar`. The system will download the driver file from this HTTP address. Only supports HTTP services without authentication.

* Optional Properties

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

This statement is used to display all views based on the given table

grammar:
## Syntax

```sql
SHOW VIEW { FROM | IN } table [ FROM db ]
Expand All @@ -28,5 +28,3 @@ grammar:

SHOW, VIEW

## Best Practice

2 changes: 1 addition & 1 deletion docs/table-design/data-type.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The list of data types supported by Apache Doris is as follows:
| [LARGEINT](../sql-manual/basic-element/sql-data-types/numeric/LARGEINT) | 16 | Signed integer, range [-2^127 + 1 ~ 2^127 - 1]. |
| [FLOAT](../sql-manual/basic-element/sql-data-types/numeric/FLOATING-POINT) | 4 | Floating-point number, range [-3.4*10^38 ~ 3.4*10^38]. |
| [DOUBLE](../sql-manual/basic-element/sql-data-types/numeric/FLOATING-POINT) | 8 | Floating-point number, range [-1.79*10^308 ~ 1.79*10^308]. |
| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16/32 | High-precision fixed-point number. Format: DECIMAL(P[,S]). P represents the total number of significant digits (precision), and S represents the number of digits after the decimal point (scale). The range of P is [1, MAX_P]. When `enable_decimal256`=false, MAX_P=38; when `enable_decimal256`=true, MAX_P=76. The range of S is [0, P].<br>The default value of `enable_decimal256` is false. Setting it to true yields more precise results but incurs some performance overhead.<br>Storage size:<ul><li>When 0 < precision <= 9, occupies 4 bytes.<li>When 9 < precision <= 18, occupies 8 bytes.<li>When 16 < precision <= 38, occupies 16 bytes.<li>When 38 < precision <= 76, occupies 32 bytes.<ul>|
| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16/32 | High-precision fixed-point number. Format: DECIMAL(P[,S]). P represents the total number of significant digits (precision), and S represents the number of digits after the decimal point (scale). The range of P is [1, MAX_P]. When `enable_decimal256`=false, MAX_P=38; when `enable_decimal256`=true, MAX_P=76. The range of S is [0, P].<br>The default value of `enable_decimal256` is false. Setting it to true yields more precise results but incurs some performance overhead.<br>Storage size:<ul><li>When 0 < precision <= 9, occupies 4 bytes.</li><li>When 9 < precision <= 18, occupies 8 bytes.</li><li>When 18 < precision <= 38, occupies 16 bytes.</li><li>When 38 < precision <= 76, occupies 32 bytes.</li></ul>|

### [Date Types](../sql-manual/basic-element/sql-data-types/data-type-overview#date-types)

Expand Down
11 changes: 4 additions & 7 deletions docs/table-design/index/inverted-index/custom-analyzer.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
"language": "en",
"description": "Doris custom analyzers combine character filters, tokenizers, and token filters to flexibly control text segmentation strategies, improving the search relevance and precision of inverted indexes.",
"keywords": [
"custom analyzer",
"custom analyzer",
"inverted index tokenizer",
"tokenizer",
Expand Down Expand Up @@ -91,17 +90,17 @@ Supported tokenizer types:
| Type | Description | Main Parameters |
| --- | --- | --- |
| `standard` | Standard tokenization (follows Unicode text segmentation), suitable for most languages | None |
| `ngram` | Splits by N-grams | `min_ngram`, `max_ngram`, `token_chars` |
| `edge_ngram` | Generates N-grams starting from the beginning of the word | `min_ngram`, `max_ngram`, `token_chars` |
| `ngram` | Splits by N-grams | `min_gram`, `max_gram`, `token_chars` |
| `edge_ngram` | Generates N-grams starting from the beginning of the word | `min_gram`, `max_gram`, `token_chars` |
| `keyword` | Outputs the entire text as a single term, often combined with token_filter | None |
| `char_group` | Splits by the given characters | `tokenize_on_chars` |
| `basic` | Simple English / digit / Chinese / Unicode tokenization | `extra_chars` |
| `icu` | ICU internationalized tokenization, supports complex scripts in multiple languages | None |

Parameter descriptions:

- `min_ngram`: minimum length (default 1)
- `max_ngram`: maximum length (default 2)
- `min_gram`: minimum length (default 1)
- `max_gram`: maximum length (default 2)
- `token_chars`: character categories to keep (default: keep all). Options: `letter`, `digit`, `whitespace`, `punctuation`, `symbol`
- `tokenize_on_chars`: a character list or category. Categories support `whitespace`, `letter`, `digit`, `punctuation`, `symbol`, `cjk`
- `extra_chars`: additional ASCII characters to split on (such as `[]().`)
Expand Down Expand Up @@ -503,5 +502,3 @@ Result:
1. Nesting multiple components in a custom `analyzer` may degrade tokenization performance.
2. The `select tokenize` tokenization function supports custom analyzers and can be used to debug tokenization results.
3. Only one of the predefined `built_in_analyzer` and a custom `analyzer` can exist on the same index.
</content>
</invoke>
2 changes: 1 addition & 1 deletion docs/table-design/index/ngram-bloomfilter-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ mysql> SELECT COUNT() FROM amazon_reviews;
```sql
SELECT
product_id,
any(product_title),
any_value(product_title),
AVG(star_rating) AS rating,
COUNT() AS count
FROM
Expand Down
2 changes: 0 additions & 2 deletions docs/table-design/index/prefix-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@
"keywords": [
"Prefix Index",
"Sort Key",
"Sort Key",
"Prefix Index",
"Apache Doris",
"sparse index",
"query acceleration",
Expand Down
2 changes: 1 addition & 1 deletion docs/table-design/index/vector-index/hnsw.md
Original file line number Diff line number Diff line change
Expand Up @@ -418,7 +418,7 @@ Before high-concurrency queries, run a cold query first to warm up the index fil

#### Memory Footprint and Performance

> **An HNSW index (without quantization compression) takes about 1.2x the memory of the vectors it indexes.**
> **An HNSW index (without quantization compression) takes about 1.3x the memory of the vectors it indexes.**
For example, for a 128-dimensional, 1M dataset, an HNSW FLAT index needs about `128 x 4 x 1,000,000 x 1.3 ~= 650 MB`.

Expand Down
2 changes: 1 addition & 1 deletion docs/table-design/index/vector-index/ivf.md
Original file line number Diff line number Diff line change
Expand Up @@ -395,7 +395,7 @@ Reference values:

| dim | rows | Estimated memory |
| --- | --- | --- |
| 128 | 1M | 496 MB |
| 128 | 1M | 500 MB |
| 768 | 1M | 2.9 GB |

To guarantee query performance, the BE must have enough memory to hold the entire index. Otherwise, frequent IO on index files causes severe query performance degradation.
Expand Down
2 changes: 1 addition & 1 deletion docs/table-design/index/vector-index/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,7 @@ Common embedding-model outputs are typically 768 dimensions or higher. If you em
```java
// use `?` for placement holders, readStatement should be reused
PreparedStatement readStatement = conn.prepareStatement("SELECT id, l2_distance_approximate(embedding, cast (? as ARRAY<FLOAT>)) AS distance
FROM l2_distance_approximate
FROM sift_1M
ORDER BY distance
LIMIT 10");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ AS (

#### 2.2 查看 GCS 上的导出文件

以上命令会将 `sales_data` 的数据导出到 GCS,每个分区会产生一个或多个文件,文件名递增。具体规则可参考 [exporting-data](https://cloud.google.com/bigquerydocs/exporting-data#exporting_data_into_one_or_more_files)
以上命令会将 `sales_data` 的数据导出到 GCS,每个分区会产生一个或多个文件,文件名递增。具体规则可参考 [exporting-data](https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files)

![gcs_export](/images/data-operate/gcs_export.png)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (

2. 本地绝对路径。如 `file:///path/to/mysql-connector-j-8.3.0.jar`。需将 Jar 包预先存放在所有 FE/BE 节点指定的路径下。

3. Http 地址。如:`http://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.3.0/mysql-connector-j-8.3.0.jar` 系统会从这个 Http 地址下载 Driver 文件。仅支持无认证的 Http 服务。
3. Http 地址。如:`https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.3.0/mysql-connector-j-8.3.0.jar` 系统会从这个 Http 地址下载 Driver 文件。仅支持无认证的 Http 服务。


* 可选属性
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Apache Doris 已支持的数据类型列表如下:
| [LARGEINT](../sql-manual/basic-element/sql-data-types/numeric/LARGEINT) | 16 | 有符号整数,范围 [-2^127 + 1 ~ 2^127 - 1]。 |
| [FLOAT](../sql-manual/basic-element/sql-data-types/numeric/FLOATING-POINT) | 4 | 浮点数,范围 [-3.4*10^38 ~ 3.4*10^38]。 |
| [DOUBLE](../sql-manual/basic-element/sql-data-types/numeric/FLOATING-POINT) | 8 | 浮点数,范围 [-1.79*10^308 ~ 1.79*10^308]。 |
| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16/32 | 高精度定点数,格式:DECIMAL(P[,S])。其中,P 代表一共有多少个有效数字(precision),S 代表小数位有多少数字(scale)。有效数字 P 的范围是 [1, MAX_P],`enable_decimal256`=false 时,MAX_P=38,`enable_decimal256`=true 时,MAX_P=76。小数位数字数量 S 的范围是 [0, P]。<br>`enable_decimal256` 的默认值是 false,设置为 true 可以获得更加精确的结果,但是会带来一些性能损失。<br>存储空间:<ul><li>0 < precision <= 9 时,占用 4 字节。<li>9 < precision <= 18 时,占用 8 字节。<li>16 < precision <= 38 时,占用 16 字节。<li>38 < precision <= 76 的场合,占用 32 字节。<ul>|
| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16/32 | 高精度定点数,格式:DECIMAL(P[,S])。其中,P 代表一共有多少个有效数字(precision),S 代表小数位有多少数字(scale)。有效数字 P 的范围是 [1, MAX_P],`enable_decimal256`=false 时,MAX_P=38,`enable_decimal256`=true 时,MAX_P=76。小数位数字数量 S 的范围是 [0, P]。<br>`enable_decimal256` 的默认值是 false,设置为 true 可以获得更加精确的结果,但是会带来一些性能损失。<br>存储空间:<ul><li>0 < precision <= 9 时,占用 4 字节。</li><li>9 < precision <= 18 时,占用 8 字节。</li><li>18 < precision <= 38 时,占用 16 字节。</li><li>38 < precision <= 76 的场合,占用 32 字节。</li></ul>|

### [日期类型](../sql-manual/basic-element/sql-data-types/data-type-overview#日期类型)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,17 +91,17 @@ PROPERTIES (
| 类型 | 说明 | 主要参数 |
| --- | --- | --- |
| `standard` | 标准分词(遵循 Unicode 文本分割),适用于多数语言 | 无 |
| `ngram` | 按 N 元组切分 | `min_ngram`、`max_ngram`、`token_chars` |
| `edge_ngram` | 从词首起始位置生成 N 元组 | `min_ngram`、`max_ngram`、`token_chars` |
| `ngram` | 按 N 元组切分 | `min_gram`、`max_gram`、`token_chars` |
| `edge_ngram` | 从词首起始位置生成 N 元组 | `min_gram`、`max_gram`、`token_chars` |
| `keyword` | 整段文本作为一个词项输出,常与 token_filter 组合 | 无 |
| `char_group` | 按给定字符切分 | `tokenize_on_chars` |
| `basic` | 简单英文 / 数字 / 中文 / Unicode 分词 | `extra_chars` |
| `icu` | ICU 国际化分词,支持多语言复杂脚本 | 无 |

参数说明:

- `min_ngram`:最小长度(默认 1)
- `max_ngram`:最大长度(默认 2)
- `min_gram`:最小长度(默认 1)
- `max_gram`:最大长度(默认 2)
- `token_chars`:保留字符类别(默认保留全部)。可选:`letter`、`digit`、`whitespace`、`punctuation`、`symbol`
- `tokenize_on_chars`:字符列表或类别,类别支持 `whitespace`、`letter`、`digit`、`punctuation`、`symbol`、`cjk`
- `extra_chars`:额外分割的 ASCII 字符(如 `[]().`)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ mysql> SELECT COUNT() FROM amazon_reviews;
```sql
SELECT
product_id,
any(product_title),
any_value(product_title),
AVG(star_rating) AS rating,
COUNT() AS count
FROM
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -418,7 +418,7 @@ Doris 的 ANN 索引基于 Meta 开源的 [faiss](https://github.com/facebookres

#### 内存空间与性能

> **HNSW 索引(无量化压缩)占用的内存空间约为其检索向量内存大小的 1.2 倍。**
> **HNSW 索引(无量化压缩)占用的内存空间约为其检索向量内存大小的 1.3 倍。**
例如 128 维、1M 数据集,HNSW FLAT 索引大约需要 `128 × 4 × 1,000,000 × 1.3 ≈ 650 MB`

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,7 @@ Doris 的 ANN 索引基于 Meta 开源的 [faiss](https://github.com/facebookres

| dim | rows | 预估内存 |
| --- | --- | --- |
| 128 | 1M | 496 MB |
| 128 | 1M | 500 MB |
| 768 | 1M | 2.9 GB |

为保证查询性能,BE 必须有足够的内存容纳全部索引;否则索引文件频繁 IO 会导致查询性能大幅衰减。
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,7 @@ PROPERTIES (
```java
// use `?` for placement holders, readStatement should be reused
PreparedStatement readStatement = conn.prepareStatement("SELECT id, l2_distance_approximate(embedding, cast (? as ARRAY<FLOAT>)) AS distance
FROM l2_distance_approximate
FROM sift_1M
ORDER BY distance
LIMIT 10");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ PROPERTIES (

2.2. **查看 GCS 上的导出文件**

以上命令会将 sales_data 的数据导出到 GCS 上,并且每个分区会产生一个或多个文件,文件名递增,具体可参考[exporting-data](https://cloud.google.com/bigquerydocs/exporting-data#exporting_data_into_one_or_more_files),如下
以上命令会将 sales_data 的数据导出到 GCS 上,并且每个分区会产生一个或多个文件,文件名递增,具体可参考[exporting-data](https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files),如下
![gcs_export](/images/data-operate/gcs_export.png)

## 3. 导入数据到 Doris
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (

2. 本地绝对路径。如 `file:///path/to/mysql-connector-j-8.3.0.jar`。需将 Jar 包预先存放在所有 FE/BE 节点指定的路径下。

3. Http 地址。如:`http://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.3.0/mysql-connector-j-8.3.0.jar` 系统会从这个 Http 地址下载 Driver 文件。仅支持无认证的 Http 服务。
3. Http 地址。如:`https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.3.0/mysql-connector-j-8.3.0.jar` 系统会从这个 Http 地址下载 Driver 文件。仅支持无认证的 Http 服务。


* 可选属性
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Apache Doris 已支持的数据类型列表如下:
| [LARGEINT](../sql-manual/basic-element/sql-data-types/numeric/LARGEINT) | 16 | 有符号整数,范围 [-2^127 + 1 ~ 2^127 - 1]。 |
| [FLOAT](../sql-manual/basic-element/sql-data-types/numeric/FLOAT) | 4 | 浮点数,范围 [-3.4*10^38 ~ 3.4*10^38]。 |
| [DOUBLE](../sql-manual/basic-element/sql-data-types/numeric/DOUBLE) | 8 | 浮点数,范围 [-1.79*10^308 ~ 1.79*10^308]。 |
| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16/32 | 高精度定点数,格式:DECIMAL(P[,S])。其中,P 代表一共有多少个有效数字(precision),S 代表小数位有多少数字(scale)。有效数字 P 的范围是 [1, MAX_P],`enable_decimal256`=false 时,MAX_P=38,`enable_decimal256`=true 时,MAX_P=76。小数位数字数量 S 的范围是 [0, P]。<br>`enable_decimal256` 的默认值是 false,设置为 true 可以获得更加精确的结果,但是会带来一些性能损失。<br>存储空间:<ul><li>0 < precision <= 9 时,占用 4 字节。<li>9 < precision <= 18 时,占用 8 字节。<li>16 < precision <= 38 时,占用 16 字节。<li>38 < precision <= 76 的场合,占用 32 字节。<ul>|
| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16/32 | 高精度定点数,格式:DECIMAL(P[,S])。其中,P 代表一共有多少个有效数字(precision),S 代表小数位有多少数字(scale)。有效数字 P 的范围是 [1, MAX_P],`enable_decimal256`=false 时,MAX_P=38,`enable_decimal256`=true 时,MAX_P=76。小数位数字数量 S 的范围是 [0, P]。<br>`enable_decimal256` 的默认值是 false,设置为 true 可以获得更加精确的结果,但是会带来一些性能损失。<br>存储空间:<ul><li>0 < precision <= 9 时,占用 4 字节。</li><li>9 < precision <= 18 时,占用 8 字节。</li><li>18 < precision <= 38 时,占用 16 字节。</li><li>38 < precision <= 76 的场合,占用 32 字节。</li></ul>|

### [日期类型](../sql-manual/basic-element/sql-data-types/data-type-overview#日期类型)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ mysql> SELECT COUNT() FROM amazon_reviews;
```
SELECT
product_id,
any(product_title),
any_value(product_title),
AVG(star_rating) AS rating,
COUNT() AS count
FROM
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ PROPERTIES (

2.2. **查看 GCS 上的导出文件**

以上命令会将 sales_data 的数据导出到 GCS 上,并且每个分区会产生一个或多个文件,文件名递增,具体可参考[exporting-data](https://cloud.google.com/bigquerydocs/exporting-data#exporting_data_into_one_or_more_files),如下
以上命令会将 sales_data 的数据导出到 GCS 上,并且每个分区会产生一个或多个文件,文件名递增,具体可参考[exporting-data](https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files),如下
![gcs_export](/images/data-operate/gcs_export.png)

## 3. 导入数据到 Doris
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (

2. 本地绝对路径。如 `file:///path/to/mysql-connector-j-8.3.0.jar`。需将 Jar 包预先存放在所有 FE/BE 节点指定的路径下。

3. Http 地址。如:`http://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.3.0/mysql-connector-j-8.3.0.jar` 系统会从这个 Http 地址下载 Driver 文件。仅支持无认证的 Http 服务。
3. Http 地址。如:`https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.3.0/mysql-connector-j-8.3.0.jar` 系统会从这个 Http 地址下载 Driver 文件。仅支持无认证的 Http 服务。


* 可选属性
Expand Down
Loading
Loading