MygramDB supports a rich boolean query syntax for complex text search operations.
- Basic Syntax
- Boolean Operators
- Complex Boolean Queries
- Operator Precedence
- Quoted Phrases
- Filter Conditions
- Sorting (ORDER BY)
- Pagination (LIMIT/OFFSET)
- BM25 Relevance Scoring (SORT _score)
- Highlighting (HIGHLIGHT)
- Fuzzy Search (FUZZY)
- Error Handling
- Performance Tips
SEARCH <table> <query_expression> [FILTER ...] [SORT ...] [LIMIT ...] [OFFSET ...]
COUNT <table> <query_expression> [FILTER ...]
SEARCH <table> <term>
Example:
SEARCH threads golang
SEARCH Response:
OK RESULTS <total_count> <id1> <id2> <id3> ...
Example:
OK RESULTS 3 101 205 387
COUNT Response:
OK COUNT <number>
Example:
OK COUNT 42
Search for documents containing all specified terms.
SEARCH <table> term1 AND term2 AND term3
Example:
SEARCH threads golang AND tutorial
Search for documents containing any of the specified terms.
SEARCH <table> term1 OR term2 OR term3
Example:
SEARCH threads golang OR python OR rust
Exclude documents containing specific terms.
SEARCH <table> term1 NOT term2
Example:
SEARCH threads tutorial NOT beginner
Important: NOT excludes documents from the result set. Use it carefully with large indexes.
Use parentheses to group expressions and control operator precedence:
SEARCH <table> (term1 OR term2) AND term3
Example:
SEARCH threads (golang OR python) AND tutorial
This finds documents containing "tutorial" AND either "golang" OR "python".
You can nest multiple levels of parentheses:
SEARCH <table> ((term1 OR term2) AND term3) OR term4
Example:
SEARCH threads ((golang OR python) AND web) OR rust
This finds:
- Documents with "web" AND ("golang" OR "python")
- OR documents with "rust"
SEARCH threads (golang OR python) AND tutorial NOT beginner
SEARCH posts database AND (mysql OR postgresql) NOT sqlite
SEARCH articles "machine learning" AND (python OR R) NOT tensorflow
When no parentheses are used, operators have the following precedence (highest to lowest):
- NOT (highest)
- AND (medium)
- OR (lowest)
Query: a OR b AND c
Parsed as: a OR (b AND c)
Query: NOT a AND b
Parsed as: (NOT a) AND b
Query: a AND b OR c AND d
Parsed as: (a AND b) OR (c AND d)
Best Practice: Use parentheses to make intent explicit, even when not strictly necessary.
Use double quotes " or single quotes ' for exact phrase matching:
SEARCH <table> "exact phrase"
SEARCH <table> 'machine learning'
Supported escape sequences inside quoted strings:
\n- Newline\t- Tab\r- Carriage return\\- Backslash\"- Double quote\'- Single quote
Example:
SEARCH articles "hello \"world\""
Quoted phrases can be combined with boolean operators:
SEARCH threads "web framework" AND (golang OR python)
SEARCH posts "machine learning" NOT "deep learning"
Filter results by column values using the FILTER clause.
SEARCH <table> <query> FILTER <column> <operator> <value> [FILTER <col> <op> <val> ...]
Multiple filters can be specified (all must match - AND logic).
=orEQ- Equal!=orNE- Not equal>orGT- Greater than>=orGTE- Greater than or equal<orLT- Less than<=orLTE- Less than or equal
Single filter:
SEARCH articles tech FILTER status = 1
Multiple filters:
SEARCH articles tech FILTER status = 1 FILTER category = ai
Comparison operators:
SEARCH articles tech FILTER views > 1000
SEARCH articles tech FILTER created_at >= 2024-01-01
SEARCH articles tech FILTER priority != 0
With boolean queries:
SEARCH threads (golang OR python) AND tutorial FILTER status = published
MygramDB supports filtering on indexed filter columns:
- Integer:
status=1,priority=5 - String:
category=tech,author=john - Date/Time:
created_at=2024-01-15T10:30:00
Note: Only columns configured as filters in config.yaml can be used in FILTER clauses.
- Bitmap indexes: Very fast for low-cardinality columns (e.g., status, category)
- Dictionary compression: Efficient for string columns
- Filtering order: Filters are applied after text search intersection
Sort search results using the SORT clause.
SEARCH <table> <query> SORT <column> [ASC|DESC]
Note: The ORDER BY syntax is not supported. Use SORT instead.
If SORT is not specified, results are sorted by primary key in descending order (newest first).
SEARCH threads golang
-- Equivalent to: SEARCH threads golang SORT id DESC
Full syntax:
SEARCH threads golang SORT id ASC
SEARCH threads golang SORT id DESC
Shorthand syntax (recommended):
SEARCH threads golang SORT ASC -- Primary key ascending
SEARCH threads golang SORT DESC -- Primary key descending
Sort by any indexed filter column:
SEARCH threads golang SORT created_at DESC LIMIT 10
SEARCH posts database SORT score ASC LIMIT 20
SEARCH threads (golang OR python) AND tutorial SORT created_at DESC LIMIT 10
SEARCH posts ((mysql OR postgresql) AND database) NOT sqlite SORT score ASC
Sorting Algorithm:
- With LIMIT: Uses
partial_sort- O(N × log(K)) where K = LIMIT + OFFSET - Without LIMIT: Uses full sort - O(N × log(N))
- Memory: In-place sorting, no additional memory allocation
For large result sets (e.g., 1M documents with 800K matches):
- Always use LIMIT whenever possible to leverage partial_sort optimization (~3x faster)
- Sorting by primary key is faster than filter columns for numeric keys
- Results are sorted before applying OFFSET/LIMIT (ensures correct pagination)
Example Performance:
- 800K results with LIMIT 100: ~3x faster with partial_sort
- Sorting happens in-place: no memory overhead
- Primary key: Always valid
- Filter columns: Must exist in at least one document
- Non-existent columns: Logged as warning, treated as NULL values (sorted last)
Control the number of results returned using LIMIT and OFFSET.
SEARCH <table> <query> LIMIT <n>
Example:
SEARCH articles tech LIMIT 10
Default: 100 (configurable via api.default_limit in config.yaml)
Range: 5-1000 (configurable via kMinLimit and kMaxLimit)
SEARCH <table> <query> OFFSET <n>
Example:
SEARCH articles tech LIMIT 10 OFFSET 20
This returns results 21-30 (skips first 20).
Page 1 (first 10 results):
SEARCH articles tech LIMIT 10 OFFSET 0
Page 2 (results 11-20):
SEARCH articles tech LIMIT 10 OFFSET 10
Page 3 (results 21-30):
SEARCH articles tech LIMIT 10 OFFSET 20
MygramDB rejects queries whose combined expression length (search text + AND/NOT terms + FILTER values) exceeds the configured limit.
- Default: 128 characters
- Config:
api.max_query_length(0disables the guard) - Error:
ERROR Query expression length (...) exceeds maximum allowed length...
Keep boolean expressions compact or raise the limit in config.yaml if applications require longer filters.
SEARCH threads (golang OR python) AND tutorial
FILTER status = published
SORT created_at DESC
LIMIT 10
OFFSET 20
This query:
- Finds documents with "tutorial" AND ("golang" OR "python")
- Filters to only published documents
- Sorts by creation date (newest first)
- Returns results 21-30 (page 3 with 10 results per page)
- LIMIT optimization: Enables partial_sort (much faster for large result sets)
- OFFSET cost: O(N) where N = OFFSET (results are still generated, just not returned)
- Best practice: Use LIMIT with ORDER BY for consistent pagination
- Deep pagination: Large OFFSET values (e.g., 10000+) can be slow
Sort results by relevance using the BM25 ranking function.
SEARCH <table> <query> SORT _score [ASC|DESC]
BM25 computes a relevance score for each document based on:
- TF (Term Frequency): How often the search term appears in the document
- IDF (Inverse Document Frequency): How rare the term is across all documents
- Document length normalization: Shorter documents with matching terms score higher
Parameters (built-in, not configurable):
k1 = 1.2— Term frequency saturationb = 0.75— Document length normalization (0 = none, 1 = full)
SEARCH articles "machine learning" SORT _score DESC LIMIT 10
SEARCH articles golang AND tutorial SORT _score LIMIT 20
BM25 scoring requires verify_text to be set to "ascii" or "all" in configuration, since term frequency is counted from stored normalized text.
memory:
verify_text: "all" # or "ascii"SEARCH articles "database" SORT _score DESC FILTER category = tech LIMIT 10
Return text snippets with search terms highlighted using configurable tags.
SEARCH <table> <query> HIGHLIGHT [TAG <open> <close>] [SNIPPET_LEN <n>] [MAX_FRAGMENTS <n>]
| Option | Default | Range | Description |
|---|---|---|---|
| TAG | <em> / </em> |
— | Open and close tags wrapping matched terms |
| SNIPPET_LEN | 100 | 1–10,000 | Max code points per snippet fragment |
| MAX_FRAGMENTS | 3 | 1–100 | Max fragments joined by ellipsis (...) |
Default highlighting:
SEARCH articles "machine learning" HIGHLIGHT LIMIT 10
Custom tags:
SEARCH articles "golang" HIGHLIGHT TAG <strong> </strong> LIMIT 10
Longer snippets with more fragments:
SEARCH articles "database" HIGHLIGHT SNIPPET_LEN 200 MAX_FRAGMENTS 5 LIMIT 10
Highlighting requires verify_text to be set to "ascii" or "all" in configuration.
SEARCH articles "tech" HIGHLIGHT TAG <b> </b> SORT _score DESC FILTER status = 1 LIMIT 10
Match terms within a specified Levenshtein edit distance (insertions, deletions, substitutions).
SEARCH <table> <query> FUZZY [distance]
- distance (optional):
1(default) or21: Match terms within 1 edit operation2: Match terms within 2 edit operations
SEARCH articles "machne" FUZZY LIMIT 10
SEARCH articles "databse" FUZZY 2 LIMIT 10
Fuzzy search pre-filters candidates by length difference to avoid unnecessary distance computations. Use FUZZY 1 (default) for best performance.
The following will return errors:
Empty parentheses:
SEARCH threads ()
ERROR Invalid query: empty expression in parentheses
Unclosed parentheses:
SEARCH threads (golang AND python
ERROR Invalid query: unclosed parentheses
Extra closing parentheses:
SEARCH threads golang AND python)
ERROR Invalid query: unexpected closing parenthesis
Operator without operands:
SEARCH threads AND
ERROR Invalid query: operator without operands
Trailing operator:
SEARCH threads golang AND
ERROR Invalid query: trailing operator
Unclosed quotes:
SEARCH threads "golang tutorial
ERROR Invalid query: unclosed quote
Non-existent table:
SEARCH nonexistent tech
ERROR Table not found: nonexistent
Invalid filter column:
SEARCH articles tech FILTER invalid_column=1
ERROR Filter column not found: invalid_column
Non-existent column:
SEARCH articles tech ORDER BY nonexistent DESC
WARNING Column 'nonexistent' not found in documents, treating as NULL
Note: Non-existent columns generate a warning but don't error (treated as NULL).
-- Good: Specific term first
SEARCH articles "machine learning" AND tutorial
-- Less optimal: Generic term first
SEARCH articles tutorial AND "machine learning"
-- Explicit and readable
SEARCH threads (golang OR python) AND (web OR api)
-- Harder to understand
SEARCH threads golang OR python AND web OR api
-- Good: Positive term first
SEARCH articles tech NOT old
-- Less optimal: Leading NOT
SEARCH articles NOT old
Leading NOT requires scanning all documents before exclusion.
-- Good: Filter narrows results early
SEARCH articles tech FILTER category = ai FILTER status = 1
-- Works but less efficient
SEARCH articles tech AND ai AND published
Filters on indexed columns are faster than text search on those terms.
-- Good: Uses partial_sort optimization
SEARCH articles tech SORT created_at DESC LIMIT 10
-- Slower: Full sort of all results
SEARCH articles tech SORT created_at DESC
-- Efficient
SEARCH articles tech LIMIT 10 OFFSET 0
-- Less efficient (large OFFSET)
SEARCH articles tech LIMIT 10 OFFSET 10000
Consider alternative pagination strategies for deep results (e.g., cursor-based).
All boolean query syntax works with COUNT as well:
COUNT <table> <query_expression> [FILTER ...]
Examples:
COUNT threads (golang OR python) AND tutorial
COUNT articles tech FILTER status = 1 FILTER category = ai
COUNT posts database AND (mysql OR postgresql) NOT sqlite
Note: COUNT does not support SORT, LIMIT, or OFFSET (not needed for counting).
Queries are parsed into an Abstract Syntax Tree (AST) with proper operator precedence:
query → or_expr
or_expr → and_expr (OR and_expr)*
and_expr → not_expr (AND not_expr)*
not_expr → NOT not_expr | primary
primary → TERM | '(' or_expr ')'
- AND operations: Efficient intersection using sorted posting lists
- OR operations: Efficient union using set operations
- NOT operations: Complement against all documents (potentially expensive)
- Parentheses: No performance overhead; only affects parsing
MygramDB uses n-gram tokenization for indexing and search:
- Default n-gram size: 2 (bigrams) - configurable per table
- CJK text: Separate n-gram size for kanji/kana (configurable)
- Unicode normalization: NFKC normalization, width conversion, optional lowercasing
Manually trigger snapshot synchronization from MySQL to MygramDB.
Syntax:
SYNC <table_name>
Example:
SYNC articles
Response:
OK SYNC STARTED table=articles job_id=1
See SYNC Command Guide for detailed usage.
Check the progress and status of SYNC operations.
Syntax:
SYNC STATUS
Example:
SYNC STATUS
Response Examples:
table=articles status=IN_PROGRESS progress=10000/25000 rows (40.0%) rate=5000 rows/s
table=articles status=COMPLETED rows=25000 time=5.2s gtid=uuid:123 replication=STARTED
status=IDLE message="No sync operation performed"
See SYNC Command Guide for detailed field descriptions.
- SYNC Command Guide - Manual snapshot synchronization
- Protocol Reference - All commands and protocol details
- Configuration Guide - Configure filters, n-gram sizes, and limits
- Performance Tuning - Advanced optimization techniques
- README - Project overview and quick start