You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vectordb_bench/frontend/components/custom/displaypPrams.py
+2-4Lines changed: 2 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,5 @@
1
1
defdisplayParams(st):
2
-
st.markdown(
3
-
"""
2
+
st.markdown("""
4
3
- `Folder Path` - The path to the folder containing all the files. Please ensure that all files in the folder are in the `Parquet` format.
5
4
- Vectors data files: The file should have two kinds of columns: `id` as an incrementing `int` and `emb` as an array of `float32`. The name of two columns could be defined on your own.
6
5
- Query test vectors: The file could be named on your own and should have two kinds of columns: `id` as an incrementing `int` and `emb` as an array of `float32`. The `id` column must be named as `id`, and `emb` column could be defined on your own.
@@ -14,8 +13,7 @@ def displayParams(st):
14
13
15
14
- `Label percentages` - If you have filter file, please input label percentage you want to real run and `split with ','` when it's `more than one`. If you `don't have` filter file, than `keep the text vacant.`
16
15
17
-
"""
18
-
)
16
+
""")
19
17
st.caption(
20
18
"""We recommend limiting the number of test query vectors, like 1,000.""",
Copy file name to clipboardExpand all lines: vectordb_bench/frontend/components/welcome/explainPrams.py
+8-16Lines changed: 8 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -1,24 +1,20 @@
1
1
defexplainPrams(st):
2
2
st.markdown("## descriptions")
3
3
st.markdown("### 1. Overview")
4
-
st.markdown(
5
-
"""
4
+
st.markdown("""
6
5
- **VectorDBBench(VDBBench)** is an open-source benchmarking tool designed specifically for vector databases. Its main features include:
7
6
- (1) An easy-to-use **web UI** for configuration of tests and visual analysis of results.
8
7
- (2) A comprehensive set of **standards for testing and metric collection**.
9
8
- (3) Support for **various scenarios**, including additional support for **Filter** and **Streaming** based on standard tests.
10
9
- VDBBench embraces open-source and welcome contributions of code and test result submissions. The testing process and extended scenarios of VDBBench, as well as the intention behind our design will be introduced as follows.
11
-
"""
12
-
)
10
+
""")
13
11
st.markdown("### 2. Dataset")
14
-
st.markdown(
15
-
"""
12
+
st.markdown("""
16
13
- We provide two embedding datasets:
17
14
- (1)*[Cohere 768dim](https://huggingface.co/datasets/Cohere/wikipedia-22-12)*, generated using the **Cohere** model based on the Wikipedia corpus.
18
15
- (2)*[Cohere 1024dim](https://huggingface.co/datasets/Cohere/beir-embed-english-v3)*, generated using the **Cohere** embed-english-v3.0 model based on the bioasq corpus.
19
16
- (3)*OpenAI 1536dim*, generated using the **OpenAI** model based on the [C4 corpus](https://huggingface.co/datasets/legacy-datasets/c4).
20
-
"""
21
-
)
17
+
""")
22
18
st.markdown("### 3. Standard Test")
23
19
st.markdown(
24
20
"""
@@ -43,15 +39,12 @@ def explainPrams(st):
43
39
unsafe_allow_html=True,
44
40
)
45
41
st.markdown("### 4. Filter Search Test")
46
-
st.markdown(
47
-
"""
42
+
st.markdown("""
48
43
- Compared to the Standard Test, the **Filter Search** introduces additional scalar constraints (e.g. **color == red**) during the Search Test. Different **filter_ratios** present varying levels of challenge to the VectorDB's search performance.
49
44
- We provide an additional **string column** containing 10 labels with different distribution ratios (50%,20%,10%,5%,2%,1%,0.5%,0.2%,0.1%). For each label, we conduct both a **Serial Test** and a **Concurrency Test** to observe the VectorDB's performance in terms of **QPS, latency, and recall** under different filtering conditions.
50
-
"""
51
-
)
45
+
""")
52
46
st.markdown("### 5. Streaming Search Test")
53
-
st.markdown(
54
-
"""
47
+
st.markdown("""
55
48
Different from Standard's load and search separation, Streaming Search Test primarily focuses on **search performance during the insertion process**.
56
49
Different **base dataset sizes** and varying **insertion rates** set distinct challenges to the VectorDB's search capabilities.
57
50
VDBBench will send insert requests at a **fixed rate**, maintaining consistent insertion pressure. The search test consists of three steps as follows:
@@ -62,5 +55,4 @@ def explainPrams(st):
62
55
- Note: at this time, the insertion pressure drops to zero since data insertion is complete.
63
56
- 3.**Optimized Search (Optional)**
64
57
- Users can optionally perform an additional optimization step followed by a Serial Test and a Concurrent Test, recording qps, latency, and recall performance. This step **compares performance in Streaming section with the theoretically optimal performance**.
0 commit comments