@@ -235,31 +235,136 @@ rm ./mydb/.lock
235235
236236---
237237
238- ### Memory Issues
238+ ### Memory Configuration
239239
240- ** Problem ** : Out of memory errors
240+ #### JVM Memory Configuration
241241
242- ** Solutions ** :
242+ Configure JVM memory via the ` ARCADEDB_JVM_ARGS ` environment variable ** before ** importing ` arcadedb_embedded ` :
243243
244- 1 . ** Increase JVM Heap** :
245- ``` python
246- import jpype
244+ ** Basic Configuration:**
247245
248- # Set before first import
249- jpype.startJVM(" -Xmx4g" ) # 4GB heap
246+ ``` bash
247+ # Default: 4GB heap
248+ python script.py
250249
251- import arcadedb_embedded as arcadedb
250+ # Production: 8GB heap with matching initial size
251+ export ARCADEDB_JVM_ARGS=" -Xmx8g -Xms8g"
252+ python script.py
253+
254+ # One-liner
255+ ARCADEDB_JVM_ARGS=" -Xmx8g -Xms8g" python script.py
256+ ```
257+
258+ ** Common JVM Options:**
259+
260+ | Option | Description | Example |
261+ | --------| -------------| ----------|
262+ | ` -Xmx<size> ` | Maximum heap memory | ` -Xmx8g ` (8 gigabytes) |
263+ | ` -Xms<size> ` | Initial heap size (recommended: same as ` -Xmx ` ) | ` -Xms8g ` |
264+ | ` -XX:MaxDirectMemorySize=<size> ` | Limit off-heap direct buffers | ` -XX:MaxDirectMemorySize=8g ` |
265+ | ` -Darcadedb.vectorIndex.locationCacheSize=<count> ` | Max vector locations to cache (default: -1 = unlimited) | ` -Darcadedb.vectorIndex.locationCacheSize=100000 ` |
266+ | ` -Darcadedb.vectorIndex.graphBuildCacheSize=<count> ` | Max vectors cached during HNSW build (default: 10000) | ` -Darcadedb.vectorIndex.graphBuildCacheSize=3000 ` |
267+ | ` -Darcadedb.vectorIndex.mutationsBeforeRebuild=<count> ` | Mutations before graph rebuild (default: 100) | ` -Darcadedb.vectorIndex.mutationsBeforeRebuild=200 ` |
268+
269+ ** Vector Index Memory Tuning:**
270+
271+ For applications using vector indexes, control memory usage:
272+
273+ ``` bash
274+ # Conservative: bounded caches for large vector datasets
275+ export ARCADEDB_JVM_ARGS=" -Xmx8g -Xms8g -XX:MaxDirectMemorySize=8g \
276+ -Darcadedb.vectorIndex.locationCacheSize=100000 \
277+ -Darcadedb.vectorIndex.graphBuildCacheSize=3000"
278+ python vector_app.py
279+ ```
280+
281+ ** Cache Size Guidelines:**
282+
283+ - ` locationCacheSize ` : Number of vector locations (each ~ 56 bytes)
284+ - 100000 entries ≈ 5.6 MB
285+ - -1 = unlimited (backward compatible, may consume unbounded memory)
286+ - Recommended: 100000 for datasets with 1M+ vectors
287+
288+ - ` graphBuildCacheSize ` : Number of vectors during HNSW build
289+ - Memory ≈ cacheSize × (dimensions × 4 + 64) bytes
290+ - For 768-dim: 10000 entries ≈ 30 MB
291+ - Lower values reduce build-time memory spikes
292+ - Recommended: 3000-5000 for high-dimensional vectors
293+
294+ ** Memory Planning:**
295+
296+ ``` text
297+ Total Process Memory = JVM Heap + Off-Heap Components
298+
299+ Off-Heap Components:
300+ - Direct buffers (MaxDirectMemorySize)
301+ - Metaspace (class definitions)
302+ - Page cache
303+ - Thread stacks
304+ - Vector index caches (if bounded)
305+
306+ Rule of thumb: Plan for 1.5-2× your heap size in actual RAM
307+ ```
308+
309+ ** Example Configurations:**
310+
311+ ``` bash
312+ # Small datasets (<1M records, <100K vectors)
313+ ARCADEDB_JVM_ARGS=" -Xmx2g -Xms2g"
314+
315+ # Medium datasets (1M-10M records, 100K-1M vectors)
316+ ARCADEDB_JVM_ARGS=" -Xmx8g -Xms8g -XX:MaxDirectMemorySize=8g"
317+
318+ # Large datasets (10M+ records, 1M+ vectors) with bounded caches
319+ ARCADEDB_JVM_ARGS=" -Xmx16g -Xms16g -XX:MaxDirectMemorySize=16g \
320+ -Darcadedb.vectorIndex.locationCacheSize=100000 \
321+ -Darcadedb.vectorIndex.graphBuildCacheSize=5000"
322+
323+ # High-dimensional vectors (e.g., 1536-dim embeddings)
324+ ARCADEDB_JVM_ARGS=" -Xmx8g -Xms8g -XX:MaxDirectMemorySize=8g \
325+ -Darcadedb.vectorIndex.locationCacheSize=50000 \
326+ -Darcadedb.vectorIndex.graphBuildCacheSize=2000"
327+ ```
328+
329+ !!! warning "Configuration Timing"
330+ ` ARCADEDB_JVM_ARGS ` must be set ** before** the first ` import arcadedb_embedded ` . The
331+ JVM can only be configured once per Python process.
332+
333+ !!! tip "Alternative: ARCADEDB_JVM_ERROR_FILE"
334+ Set crash log location:
335+ ```bash
336+ export ARCADEDB_JVM_ERROR_FILE="/var/log/arcade/errors.log"
337+ ```
338+
339+ #### Out of Memory Errors
340+
341+ ** Problem** : ` OutOfMemoryError ` or heap space errors
342+
343+ ** Solutions** :
344+
345+ 1 . ** Increase Heap via Environment Variable** (Recommended):
346+ ``` bash
347+ export ARCADEDB_JVM_ARGS=" -Xmx8g -Xms8g"
348+ python script.py
349+ ```
350+
351+ 2 . ** Bound Vector Caches** (for vector workloads):
352+ ``` bash
353+ export ARCADEDB_JVM_ARGS=" -Xmx8g -Xms8g \
354+ -Darcadedb.vectorIndex.locationCacheSize=100000 \
355+ -Darcadedb.vectorIndex.graphBuildCacheSize=3000"
356+ python script.py
252357 ```
253358
254- 2 . ** Use Batch Processing** :
359+ 3 . ** Use Batch Processing** :
255360 ``` python
256361 batch_size = 1000
257362 for i in range (0 , len (data), batch_size):
258363 batch = data[i:i + batch_size]
259364 process_batch(batch)
260365 ```
261366
262- 3 . ** Close ResultSets** :
367+ 4 . ** Close ResultSets** :
263368 ``` python
264369 result = db.query(" sql" , " SELECT FROM LargeTable" )
265370 try :
0 commit comments