Commit 5546e5a
authored
Add 372 fix 489 (#528)
* fix video indexer auth
* fixed video indexer support
* fixed video indexer support
* fix video indexer support
* fixed video indexer support
* added support for .xlsm
* fixed all scope issue
* added supprot for xml, yaml, and log
#### **JSON**
- Uses `RecursiveJsonSplitter`:
- `max_chunk_size=600`
- `convert_lists=True`
- Produces JSON strings that retain original structure.
- See `process_json_file`.
#### **XML**
- Uses `RecursiveCharacterTextSplitter` with XML-aware separators.
- **Structure-preserving chunking**:
- Separators prioritized: `\n\n` → `\n` → `>` (end of XML tags) → space → character
- Splits at logical boundaries to maintain tag integrity
- **Chunked by 4000 characters** with 200-character overlap for context preservation.
- **Goal**: Preserve XML structure while providing manageable chunks for LLM processing.
- See `process_xml`.
#### **YAML / YML**
- Processed using regex word splitting (similar to TXT).
- **Chunked by 400 words**.
- Maintains YAML structure through simple word-based splitting.
- See `process_yaml`.
#### **LOG**
- Processed using line-based chunking to maintain log record integrity.
- **Never splits mid-line** to preserve complete log entries.
- **Line-Level Chunking**:
1. Split file by lines using `splitlines(keepends=True)` to preserve line endings.
2. Accumulate complete lines until reaching target word count ≈1000 words.
3. When adding next line would exceed target AND chunk already has content:
- Finalize current chunk
- Start new chunk with current line
4. If single line exceeds target, it gets its own chunk to prevent infinite loops.
5. Emit chunks with complete log records.
- **Goal**: Provide substantial log context (1000 words) while ensuring no log entry is split across chunks.
- See `process_log`.
* updated yaml to use recursivesplitter
* removed chunk overlap for yaml and xml
* added support for older .doc files and .docm
* added keyword and abstract for each doc in citation
* added multi-modal input support
* added ai vision analysis1 parent b2aa14a commit 5546e5a
31 files changed
Lines changed: 4238 additions & 296 deletions
File tree
- application/single_app
- static/js
- admin
- chat
- templates
- docs
- features
- v0.229.086
- v0.229.088
- fixes
- functional_tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
91 | | - | |
| 91 | + | |
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| |||
121 | 121 | | |
122 | 122 | | |
123 | 123 | | |
124 | | - | |
| 124 | + | |
125 | 125 | | |
126 | | - | |
| 126 | + | |
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
245 | 245 | | |
246 | 246 | | |
247 | 247 | | |
248 | | - | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
249 | 269 | | |
250 | 270 | | |
251 | 271 | | |
252 | 272 | | |
253 | 273 | | |
254 | 274 | | |
255 | | - | |
256 | | - | |
| 275 | + | |
257 | 276 | | |
258 | 277 | | |
259 | 278 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
172 | 172 | | |
173 | 173 | | |
174 | 174 | | |
175 | | - | |
| 175 | + | |
176 | 176 | | |
177 | 177 | | |
178 | 178 | | |
| |||
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
61 | 65 | | |
62 | 66 | | |
63 | 67 | | |
| |||
97 | 101 | | |
98 | 102 | | |
99 | 103 | | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
112 | 120 | | |
113 | 121 | | |
114 | 122 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
137 | 141 | | |
138 | 142 | | |
139 | 143 | | |
| |||
215 | 219 | | |
216 | 220 | | |
217 | 221 | | |
218 | | - | |
219 | 222 | | |
220 | 223 | | |
221 | 224 | | |
222 | | - | |
| 225 | + | |
223 | 226 | | |
224 | 227 | | |
225 | 228 | | |
| |||
0 commit comments