- Overview
- Core Concepts
- Document Operations
- Query Strings
- Tokenization
- Query and Sort Options
- Handling Results
- Faceted Search
- Index Management
- Best Practices
- Quotas and Limits
- Important Characteristics
- Local Development Server Limitations
- Common Patterns
- Quick Reference
The Search API provides a model for indexing documents with structured data, enabling full-text search operations with advanced querying capabilities. Documents and indexes are stored in a separate persistent store optimized for search operations.
A document is an object with:
- Unique ID (doc_id): Up to 500 printable ASCII characters (codes 33-126)
- List of fields: Named, typed data containers
- Maximum size: 1 MB per document
- Can be auto-generated or manually specified
- Cannot begin with
!or be wrapped with__ - Must contain only visible, printable ASCII characters
- Used for direct retrieval without search
- Text Field: Plain text, searchable word-by-word (max 1,048,576 characters)
- HTML Field: HTML markup, only text outside tags is searchable (max 1,048,576 characters)
- Atom Field: Indivisible string, not tokenized (max 500 characters)
- Number Field: A double-precision floating-point number. Values for this field must be in the range -2,147,483,647 to 2,147,483,647.
- Date Field: Date object (stored as days since 1/1/1970 UTC)
- Geopoint Field: Latitude and longitude coordinates
- Case sensitive, ASCII only
- Must start with letter
- Can contain letters, digits, underscore
- Maximum 500 characters
An index stores documents for retrieval. Key characteristics:
-
No limit on number of documents or indexes
-
Default size limit: 10 GB per index (can increase to 200 GB)
-
Total storage across all indexes: 0.25 GB free quota
-
Supports document retrieval by ID, ID range, or query
Query strings can be:
- Global search: Search values across all fields
- Field search: Target specific fields by name
- Mixed: Combine both approaches - Maximum length: 2000 characters
Search results include:
- Number of documents found (estimate)
- Number of documents returned (actual)
- Collection of ScoredDocument objects - Maximum 10,000 matching documents per search
- Default return: 20 documents at a time
User currentUser = UserServiceFactory.getUserService().getCurrentUser();
String userEmail = currentUser == null ? "" : currentUser.getEmail();
String myDocId = "PA6-5000";
Document doc = Document.newBuilder()
.setId(myDocId) // Optional
.addField(Field.newBuilder().setName("content").setText("the rain in spain"))
.addField(Field.newBuilder().setName("email").setText(userEmail))
.addField(Field.newBuilder().setName("published").setDate(new Date()))
.build();IndexSpec indexSpec = IndexSpec.newBuilder().setName(indexName).build();
Index index = SearchServiceFactory.getSearchService().getIndex(indexSpec);
// Batch put (up to 200 documents)
index.put(document);Best Practice: Batch operations with up to 200 documents are more efficient than single additions.
Documents are immutable once added. To update: 1. Create a new document with the same doc_id 2. Add it to the index (replaces the old document)
// By doc_id
Document doc = index.get("AZ125");
// Range of doc_ids
GetResponse<Document> docs = index.getRange(
GetRequest.newBuilder()
.setStartId("AZ125")
.setLimit(100)
.build()
);// Delete by doc_id (batch up to 200)
List<String> docIds = new ArrayList<>();
docIds.add("doc1");
docIds.add("doc2");
index.delete(docIds);Search for values in any field:
"rose water" // Find documents with both words
"1776-07-04" // Find date or text matching this
"NOT red" // Find documents without "red"
"red OR blue" // Find either color
"keyboard AND mouse" // Find both termsBoolean Operator Precedence: NOT > OR > AND
Target specific fields:
"product:piano" // Equality
"price < 500" // Comparison
"product:piano AND price < 5000" // Combined
"color:(red OR blue)" // Multiple values
"birthday >= 2000-12-31" // Date comparison| Field Type | Operators |
|---|---|
| Atom | : = |
| Text/HTML | : = |
| Number | : = < <= > >= |
| Date | : = < <= > >= |
| Geopoint | Use distance() function |
"distance(survey_marker, geopoint(35.2, 40.5)) < 100"
"distance(home, geopoint(35.2, 40.5)) > 100"Use ~ prefix to match word variations:
"~cat" // Matches "cat" and "cats"
"~dog" // Matches "dog" and "dogs"
Exact phrase matching:
"Comment:\"insanely great\"" "Title:\"Tom&Jerry\""
String fields are tokenized on: - Whitespace characters - Most punctuation marks - Special characters
Special Cases: - Underscore _ and ampersand & do NOT break tokens -
Acronyms like "I.B.M." become "ibm" - Hash signs in patterns like #google or
c# remain part of word - Apostrophe in possessives like "John's" stays
attached
Atom Fields: Never tokenized (exact match only)
QueryOptions options = QueryOptions.newBuilder()
.setLimit(25)
.setFieldsToReturn("model", "price", "description")
.build();
Query query = Query.newBuilder()
.setOptions(options)
.build(queryString);
Results<ScoredDocument> result = index.search(query);| Property | Description | Default | Maximum |
|---|---|---|---|
| Limit | Max documents to return | 20 | 1000 |
| Offset | Starting position | 0 | 1000 |
| Cursor | Alternative to offset | null | - |
| ReturningIdsOnly | Return IDs only | false | - |
| FieldsToReturn | Specific fields to include | All fields | 100 fields |
| ExpressionsToReturn | Computed fields | None | - |
| FieldsToSnippet | Generate snippets | None | - |
SortOptions sortOptions = SortOptions.newBuilder()
.addSortExpression(
SortExpression.newBuilder()
.setExpression("price")
.setDirection(SortExpression.SortDirection.DESCENDING)
.setDefaultValueNumeric(0)
)
.addSortExpression(
SortExpression.newBuilder()
.setExpression("brand")
.setDirection(SortExpression.SortDirection.DESCENDING)
.setDefaultValue("")
)
.setLimit(1000)
.build();Important: Sorting limits results to 10,000 documents maximum. Default sort limit is 1,000.
Create computed fields using expressions:
"price * quantity"
"(men + women)/2"
"min(daily_use, 10) * rate"
"snippet('rose', flower, 120)"Special Terms: - _rank: Document's rank property - _score: Match score
(if MatchScorer enabled)
Numeric Functions: - max(...), min(...), abs(...), log(...), pow(x, y), count(field)
Geopoint Functions: - geopoint(lat, long): Create geopoint -
distance(point1, point2): Calculate distance in meters
Generate text fragments showing matched content:
snippet(query, body, [max_chars])
// Example
QueryOptions options = QueryOptions.newBuilder()
.setFieldsToSnippet("description", "content")
.build();Returns HTML with matched text in boldface, default 160 characters.
Results<ScoredDocument> result = index.search(query);
// Get counts
long totalMatches = result.getNumberFound();
int numberOfDocsReturned = result.getNumberReturned();
// Iterate documents
for (ScoredDocument doc : result) {
String maker = doc.getOnlyField("maker").getText();
double price = doc.getOnlyField("price").getNumber();
}int offset = 0;
do {
QueryOptions options = QueryOptions.newBuilder()
.setOffset(offset)
.build();
Query query = Query.newBuilder()
.setOptions(options)
.build(queryString);
Results<ScoredDocument> result = index.search(query);
int numberRetrieved = result.getNumberReturned();
if (numberRetrieved > 0) {
offset += numberRetrieved;
// Process documents
}
} while (numberRetrieved > 0);Cursor cursor = Cursor.newBuilder().build();
do {
QueryOptions options = QueryOptions.newBuilder()
.setCursor(cursor)
.build();
Query query = Query.newBuilder()
.setOptions(options)
.build(queryString);
Results<ScoredDocument> result = index.search(query);
cursor = result.getCursor();
// Process documents
} while (cursor != null);Cursor cursor = Cursor.newBuilder()
.setPerResult(true)
.build();
QueryOptions options = QueryOptions.newBuilder()
.setCursor(cursor)
.build();
Results<ScoredDocument> result = index.search(query);
for (ScoredDocument doc : result) {
if (/* document of interest */) {
cursor = doc.getCursor();
}
}// Save
String cursorString = cursor.toWebSafeString();
// Restore
Cursor cursor = Cursor.newBuilder().build(cursorString);Document doc = Document.newBuilder()
.setId("doc1")
.addField(Field.newBuilder().setName("name").setAtom("x86"))
.addFacet(Facet.withAtom("type", "computer"))
.addFacet(Facet.withNumber("ram_size_gb", 8.0))
.build();Facet Rules: - Name: Same rules as field names (500 char max) - Value: Atom string (500 char max) or number - No limit on values per facet or facets per document - Can have multiple values for same facet name
Results<ScoredDocument> result = index.search(
Query.newBuilder()
.setEnableFacetDiscovery(true)
.build("name:x86")
);
for (FacetResult facetResult : result.getFacets()) {
System.out.printf("Facet %s:\n", facetResult.getName());
for (FacetResultValue facetValue : facetResult.getValues()) {
System.out.printf(" %s: Count=%s\n",
facetValue.getLabel(),
facetValue.getCount()
);
}
}Results<ScoredDocument> result = index.search(
Query.newBuilder()
.addReturnFacet("type")
.addReturnFacet("ram_size_gb")
.build("name:x86")
);Results<ScoredDocument> result = index.search(
Query.newBuilder()
.addReturnFacet(FacetRequest.newBuilder()
.setName("type")
.addValueConstraint("computer")
.addValueConstraint("printer"))
.addReturnFacet(FacetRequest.newBuilder()
.setName("ram_size_gb")
.addRange(FacetRange.withEnd(4.0))
.addRange(FacetRange.withStartEnd(4.0, 8.0))
.addRange(FacetRange.withStart(8.0)))
.build("name:x86")
);Query query = Query.newBuilder()
.setFacetOptions(FacetOptions.newBuilder()
.setDiscoveryLimit(5) // Default: 10
.setDiscoveryValueLimit(10) // Default: 10
.setDepth(6000) // Default: 1000
.build())
.build(queryString);Query query = Query.newBuilder()
.addFacetRefinementFromToken(refinement_key1)
.addFacetRefinementFromToken(refinement_key2)
.build("some_query");Refinement Logic: - Same facet refinements: Combined with OR - Different facet refinements: Combined with AND
Schemas are maintained automatically and show all field names and types:
GetResponse<Index> response = SearchServiceFactory.getSearchService()
.getIndexes(GetIndexesRequest.newBuilder()
.setSchemaFetched(true)
.build());
for (Index index : response) {
Schema schema = index.getSchema();
for (String fieldName : schema.getFieldNames()) {
List<FieldType> typesForField = schema.getFieldTypes(fieldName);
// Process schema information
}
}Schema Characteristics: - Auto-updated as documents are added - Fields can never be removed from schema - Same field name can have multiple types - Not returned by default (must request explicitly)
// Current namespace only
GetResponse<Index> response = SearchServiceFactory.getSearchService()
.getIndexes(GetIndexesRequest.newBuilder().build());
// All namespaces
GetResponse<Index> response = SearchServiceFactory.getSearchService()
.getIndexes(GetIndexesRequest.newBuilder()
.setAllNamespaces(true)
.build());Pagination: Maximum 1000 indexes per call. Use setStartIndexName() for
more.
// Maximum allowed size
long maxSize = index.getStorageLimit();
// Current usage (estimate)
long currentUsage = index.getStorageUsage();To delete an index: 1. Delete all documents 2. Delete the index schema
// Delete all documents
while (true) {
List<String> docIds = new ArrayList<>();
GetRequest request = GetRequest.newBuilder()
.setReturningIdsOnly(true)
.build();
GetResponse<Document> response = index.getRange(request);
if (response.getResults().isEmpty()) {
break;
}
for (Document doc : response) {
docIds.add(doc.getId());
}
index.delete(docIds);
}-
Batch Operations: Always batch puts/deletes (up to 200 documents)
-
Use Document Rank for Pre-sorting:
java // Set rank to price for default price sorting Document.newBuilder().setRank(price) -
Avoid Expensive Operations:
- Use atom fields for boolean data (not numbers)
- Transform negations:
cuisine_known:yesvsNOT cuisine:undefined - Transform disjunctions:
cuisine:Asianvscuisine:Japanese OR cuisine:Korean - Eliminate tautologies:
city:torontovscity:toronto AND NOT city:montreal
-
Narrow Before Sorting:
"cuisine:japanese" + sort by distance
// Good: Filter first, then sort smaller set
"cuisine:japanese AND city:<user-city>" + sort by distance- Use Categories to Avoid Sorting:
// Create price ranges:
price_0_10, price_11_20, etc. "price_range:price_21_30 OR
price_range:price_31_40"- Avoid Scoring Unless Needed: Scoring is expensive in operations and time
-
Use Rank Strategically: Default sort by rank is most efficient
-
Multiple Sort Orders: Create separate indexes for different sort orders
- Index 1:
rank = price - Index 2:
rank = MAXINT - price
- Index 1:
-
Multi-valued Fields: Only first value used in sorts
-
Document ID Design: Can't search on doc_id directly, so also store in atom field if needed
| Resource | Free Quota |
|---|---|
| Total storage | 0.25 GB |
| Queries | 1000/day |
| Adding documents | 0.01 GB/day |
| Resource | Limit |
|---|---|
| Query execution time | 100 aggregated minutes/minute |
| Documents added/deleted | 15,000/minute |
| Index size | 10 GB (up to 200 GB with request) |
| Document size | 1 MB |
| Query string length | 2000 characters |
| Documents per search | 10,000 max found |
| Documents per put/delete | 200 |
| Fields to return | 100 |
| Sort limit | 10,000 (default 1,000) |
| Resource | Cost |
|---|---|
| Storage | $0.18/GB/month |
| Queries | $0.50/10K queries |
| Indexing | $2.00/GB |
Changes to documents propagate across data centers with eventual consistency: - Updates may not be immediately visible - Search results may not reflect most recent changes - Designed for high availability across distributed systems
Document doc = Document.newBuilder()
.setId(docId)
.setRank(customRank) // Default: seconds since Jan 1, 2011
.setLocale("en") // Language encoding
.addField(...)
.build();Rank Usage: - Positive integer determining default sort order - Don't assign
same rank to >10,000 documents - Referenced as _rank in expressions
- Stored as days since 1/1/1970 UTC
- Time component ignored for indexing/searching
- Query format:
yyyy-mm-dd(leading zeros optional) - Sort order for same date is undefined
Features NOT available on local dev server: - Stemming (e.g., ~cat) - Asian
language tokenization - Match scoring - Diacritical marks in atom/text/HTML
fields
public Results<ScoredDocument> searchWithOptions(
String indexName,
String queryString
) {
SortOptions sortOptions = SortOptions.newBuilder()
.addSortExpression(
SortExpression.newBuilder()
.setExpression("price")
.setDirection(SortExpression.SortDirection.DESCENDING)
.setDefaultValueNumeric(0))
.setLimit(1000)
.build();
QueryOptions options = QueryOptions.newBuilder()
.setLimit(25)
.setFieldsToReturn("model", "price", "description")
.setSortOptions(sortOptions)
.build();
Query query = Query.newBuilder()
.setOptions(options)
.build(queryString);
IndexSpec indexSpec = IndexSpec.newBuilder()
.setName(indexName)
.build();
Index index = SearchServiceFactory.getSearchService()
.getIndex(indexSpec);
return index.search(query);
}final int maxRetry = 3;
int attempts = 0;
int delay = 2;
while (true) {
try {
index.put(document);
break;
} catch (PutException e) {
if (StatusCode.TRANSIENT_ERROR.equals(e.getOperationResult().getCode())
&& ++attempts < maxRetry) {
Thread.sleep(delay * 1000);
delay *= 2; // Exponential backoff
continue;
} else {
throw e;
}
}
}Global: "value1 value2"
Field: "field:value"
Comparison: "price < 100"
Boolean: "field1:value1 AND field2:value2"
Negation: "NOT field:value"
Parentheses: "(field1:value1 OR field2:value2) AND field3:value3"
Stemming: "~word"
Exact phrase: "field:\"exact phrase\""
Geopoint: "distance(field, geopoint(lat, long)) < 100"
- Atom: Exact match only (product IDs, categories, booleans)
- Text: Word-by-word search (descriptions, comments)
- HTML: Like text but ignores markup (formatted content)
- Number: Numeric comparisons (prices, quantities)
- Date: Date range queries (timestamps, birthdays)
- Geopoint: Distance calculations (locations, coordinates)