Skip to content

Commit 3ddc319

Browse files
authored
Merge pull request #22 from vectordotdev/data/quickwit-snapshot
Add quickwit-oss/quickwit historical snapshot
2 parents 9a25e2a + 6d76d26 commit 3ddc319

12 files changed

Lines changed: 125360 additions & 0 deletions

File tree

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
[
2+
{
3+
"author": {
4+
"login": "vtripolitakis"
5+
},
6+
"bodyText": "Hello and congrats for your great work.\nAs per #140 I see that for the time being, range queries are not supported. Is any plan on the roadmap to add such support?\nThanks in advance.",
7+
"category": {
8+
"name": "Q&A"
9+
},
10+
"closed": false,
11+
"closedAt": null,
12+
"comments": {
13+
"totalCount": 2
14+
},
15+
"createdAt": "2021-11-26T12:18:16Z",
16+
"isAnswered": true,
17+
"locked": false,
18+
"number": 817,
19+
"stateReason": "REOPENED",
20+
"title": "Support for range queries",
21+
"updatedAt": "2023-05-15T22:35:29Z",
22+
"upvoteCount": 2,
23+
"url": "https://github.com/quickwit-oss/quickwit/discussions/817"
24+
},
25+
{
26+
"author": {
27+
"login": "oronsh"
28+
},
29+
"bodyText": "I've noticed that tantivy holds a thread pool per IndexWriter so I suppose each indexing server will open NUMBER_OF_INDEXES * NUM_THREADS_PER_INDEX threads?\nIt sounds like a lot of threads to me, am I missing something?\nThanks a lot! :)",
30+
"category": {
31+
"name": "Q&A"
32+
},
33+
"closed": false,
34+
"closedAt": null,
35+
"comments": {
36+
"totalCount": 1
37+
},
38+
"createdAt": "2021-11-03T18:49:35Z",
39+
"isAnswered": true,
40+
"locked": false,
41+
"number": 724,
42+
"stateReason": null,
43+
"title": "Is indexing server going to support multi tenancy?",
44+
"updatedAt": "2023-05-15T22:36:28Z",
45+
"upvoteCount": 1,
46+
"url": "https://github.com/quickwit-oss/quickwit/discussions/724"
47+
},
48+
{
49+
"author": {
50+
"login": "jume-dev"
51+
},
52+
"bodyText": "Hi,\njust a quick question, is it possible to do a wild card search on a text field? I have a simple test data structure of people with firstname, lastname and address. Firstname is mostly only one word and I wanted to search on word fragments, like for example if I had a Willy in my data, I wanted to search for illy and find him.\nIs this possible yet and if so, how?\nBest regards",
53+
"category": {
54+
"name": "Q&A"
55+
},
56+
"closed": false,
57+
"closedAt": null,
58+
"comments": {
59+
"totalCount": 1
60+
},
61+
"createdAt": "2021-07-30T19:23:49Z",
62+
"isAnswered": true,
63+
"locked": false,
64+
"number": 346,
65+
"stateReason": null,
66+
"title": "Wildcardsearch",
67+
"updatedAt": "2022-06-19T07:18:17Z",
68+
"upvoteCount": 1,
69+
"url": "https://github.com/quickwit-oss/quickwit/discussions/346"
70+
},
71+
{
72+
"author": {
73+
"login": "fmassot"
74+
},
75+
"bodyText": "Dataset description\nDataset containing 40 millions of hdfs log entries, available here.\nSize: 13GB.\nExample of a log entry formatted for quickwit\n{\n \"timestamp\": 1460530013,\n \"severity_text\": \"INFO\",\n \"body\": \"PacketResponder: BP-108841162-10.10.34.11-1440074360971:blk_1074072698_331874, type=HAS_DOWNSTREAM_IN_PIPELINE terminating\",\n \"resource\": {\n \"service\": \"datanode/01\"\n },\n \"attributes\": {\n \"class\": \"org.apache.hadoop.hdfs.server.datanode.DataNode\"\n }\n}\n\nIndexing performance\n\nDocuments Read: 40069452 Parse Errors: 0 Published Splits: 9 Dataset Size: 14003MB Throughput: 45.32MB/s\nIndexed 40069452 documents in 5.15min\n\nSplit generated: 8 with 5 million docs and 1 with 69452 docs on apple M1.\nTypically, for 1 split, we have:\n\n.term: 11MB\n.fast: 13MB\nhotcache: 129MB\n.store: 413MB\n.idx: 108MB\n\nLocal search performance\nTODO\nConfiguration\nDoc mapper\n{\n \"store_source\": true,\n \"default_search_fields\": [\"body\", \"severity_text\"],\n \"timestamp_field\": \"timestamp\",\n \"field_mappings\": [\n {\n \"name\": \"timestamp\",\n \"type\": \"i64\",\n \"fast\": true\n },\n {\n \"name\": \"severity_text\",\n \"type\": \"text\"\n },\n {\n \"name\": \"body\",\n \"type\": \"text\"\n },\n {\n \"name\": \"resource\",\n \"type\": \"object\",\n \"field_mappings\": [\n {\n \"name\": \"service\",\n \"type\": \"text\"\n }\n ]\n },\n {\n \"name\": \"attributes\",\n \"type\": \"object\",\n \"field_mappings\": [\n {\n \"name\": \"class\",\n \"type\": \"text\"\n }\n ]\n }\n ]\n}",
76+
"category": {
77+
"name": "Show and tell"
78+
},
79+
"closed": false,
80+
"closedAt": null,
81+
"comments": {
82+
"totalCount": 0
83+
},
84+
"createdAt": "2021-07-06T19:11:31Z",
85+
"isAnswered": null,
86+
"locked": false,
87+
"number": 200,
88+
"stateReason": null,
89+
"title": "Index and search on HDFS log dataset",
90+
"updatedAt": "2022-06-27T01:43:33Z",
91+
"upvoteCount": 1,
92+
"url": "https://github.com/quickwit-oss/quickwit/discussions/200"
93+
}
94+
]
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
[
2+
{
3+
"author": {
4+
"login": "zbalkan"
5+
},
6+
"bodyText": "Hi. I am currently using Wazuh as SIEM with OpenSearch as the log database. Yet, the default OpenSearch implementation does not provide immutable indices and it is not easy to set it up without a third party. I am wondering if I can replace OpenSearch instance with Quickwit. Do you have a documentation, article, or tutorial for that?",
7+
"category": {
8+
"name": "Q&A"
9+
},
10+
"closed": false,
11+
"closedAt": null,
12+
"comments": {
13+
"totalCount": 1
14+
},
15+
"createdAt": "2022-09-27T18:50:08Z",
16+
"isAnswered": true,
17+
"locked": false,
18+
"number": 2024,
19+
"stateReason": null,
20+
"title": "Wazuh with Quickwit backend",
21+
"updatedAt": "2023-05-15T22:25:06Z",
22+
"upvoteCount": 1,
23+
"url": "https://github.com/quickwit-oss/quickwit/discussions/2024"
24+
},
25+
{
26+
"author": {
27+
"login": "tshepang"
28+
},
29+
"bodyText": "One can use S3 as primary storage with Quickwit, so I do wonder if one can similarly use Azure object storage.",
30+
"category": {
31+
"name": "Q&A"
32+
},
33+
"closed": false,
34+
"closedAt": null,
35+
"comments": {
36+
"totalCount": 1
37+
},
38+
"createdAt": "2022-08-24T00:02:19Z",
39+
"isAnswered": true,
40+
"locked": false,
41+
"number": 1873,
42+
"stateReason": null,
43+
"title": "can one use azure object storage",
44+
"updatedAt": "2022-08-24T15:18:34Z",
45+
"upvoteCount": 1,
46+
"url": "https://github.com/quickwit-oss/quickwit/discussions/1873"
47+
},
48+
{
49+
"author": {
50+
"login": "collimarco"
51+
},
52+
"bodyText": "I am looking at \"Concepts\" in the Documentation.\nThere is a part that I cannot find: what are the files stored in S3? What is the file structure of indexes and reverse indexes on S3? What is the content / structure of each file?\nIt would be really useful in order to better understand this project.\nCurrently the documentation suggests that there is split pruning based on timestamp and tags, but then it doesn't explain anything about full text search, aggregations, etc. It would be interesting to see the steps involved or some diagrams.",
53+
"category": {
54+
"name": "General"
55+
},
56+
"closed": false,
57+
"closedAt": null,
58+
"comments": {
59+
"totalCount": 1
60+
},
61+
"createdAt": "2022-07-07T10:06:38Z",
62+
"isAnswered": null,
63+
"locked": false,
64+
"number": 1738,
65+
"stateReason": null,
66+
"title": "Indexing overview",
67+
"updatedAt": "2022-07-12T00:25:36Z",
68+
"upvoteCount": 2,
69+
"url": "https://github.com/quickwit-oss/quickwit/discussions/1738"
70+
},
71+
{
72+
"author": {
73+
"login": "collimarco"
74+
},
75+
"bodyText": "From your documentation:\n\nNote that this metadata is only generated when the cardinality of the field is less than 1 000. Tag pruning is notably useful on multi-tenant datasets.\n\nDoes that mean that we can only have 1000 tenants?\nI was thinking about using the tenant ID as a tag...",
76+
"category": {
77+
"name": "Q&A"
78+
},
79+
"closed": false,
80+
"closedAt": null,
81+
"comments": {
82+
"totalCount": 2
83+
},
84+
"createdAt": "2022-06-20T09:58:44Z",
85+
"isAnswered": true,
86+
"locked": false,
87+
"number": 1652,
88+
"stateReason": null,
89+
"title": "Tag pruning limits",
90+
"updatedAt": "2023-05-15T22:33:21Z",
91+
"upvoteCount": 1,
92+
"url": "https://github.com/quickwit-oss/quickwit/discussions/1652"
93+
},
94+
{
95+
"author": {
96+
"login": "HeenaBansal2009"
97+
},
98+
"bodyText": "Hi @fmassot ,\nI have few more questions about quickwit indexing .\n\nwhat kind of tokenizer does the indexer use, ngram or language specific tokenizer.\nDoes quickwit tokens filter space too like , query = VPN authenticated user is supported in current release?\n\nThanks.",
99+
"category": {
100+
"name": "Q&A"
101+
},
102+
"closed": false,
103+
"closedAt": null,
104+
"comments": {
105+
"totalCount": 1
106+
},
107+
"createdAt": "2022-05-06T16:20:26Z",
108+
"isAnswered": true,
109+
"locked": false,
110+
"number": 1388,
111+
"stateReason": null,
112+
"title": "Is the index in Quikwit language independent?",
113+
"updatedAt": "2023-05-15T22:34:00Z",
114+
"upvoteCount": 1,
115+
"url": "https://github.com/quickwit-oss/quickwit/discussions/1388"
116+
},
117+
{
118+
"author": {
119+
"login": "HeenaBansal2009"
120+
},
121+
"bodyText": "I have a question\nWhat all result formats quickwit supports. I read somewhere that quickwit supports only JSON as of now.\nI would really appreciate if some can confirm the supported query results formats as well as the ingested data format for quickwit .",
122+
"category": {
123+
"name": "Q&A"
124+
},
125+
"closed": false,
126+
"closedAt": null,
127+
"comments": {
128+
"totalCount": 3
129+
},
130+
"createdAt": "2022-05-02T15:40:14Z",
131+
"isAnswered": true,
132+
"locked": false,
133+
"number": 1357,
134+
"stateReason": null,
135+
"title": "Supported format in QUICKWIT",
136+
"updatedAt": "2022-06-16T18:58:47Z",
137+
"upvoteCount": 1,
138+
"url": "https://github.com/quickwit-oss/quickwit/discussions/1357"
139+
},
140+
{
141+
"author": {
142+
"login": "HeenaBansal2009"
143+
},
144+
"bodyText": "Hi ,\nI am trying to ingest json file in to Quickwit using below query. I am getting error. Please advice what is wrong here.\ncat ./testdata/hackernews.json | ./quickwit index search --index hackernews\nI followed below steps before ingesting data to quickwit :\n\nI converted the. native.zst downloaded from hackernews to json using clickhouse.\nfeed index configuration as per data schema to quickwit with (./quickwit index create --index-config hacker news.yaml)\nNow When i am trying to ingest json file in to Quickwit using (cat ./testdata/hackernews.json | ./quickwit index search --index hackernews), I ma getting command usability error.",
145+
"category": {
146+
"name": "Q&A"
147+
},
148+
"closed": false,
149+
"closedAt": null,
150+
"comments": {
151+
"totalCount": 3
152+
},
153+
"createdAt": "2022-05-03T14:53:14Z",
154+
"isAnswered": true,
155+
"locked": false,
156+
"number": 1359,
157+
"stateReason": null,
158+
"title": "Trying to ingest json file in to quickwit.",
159+
"updatedAt": "2022-06-02T02:17:50Z",
160+
"upvoteCount": 1,
161+
"url": "https://github.com/quickwit-oss/quickwit/discussions/1359"
162+
}
163+
]

0 commit comments

Comments
 (0)