Google App Engine Search API Documentation

Overview
Core Concepts
- Documents
- Field Types
- Indexes
- Queries
- Results
Document Operations
Query Strings
Tokenization
Query and Sort Options
Handling Results
Faceted Search
Index Management
Best Practices
- Performance Optimization
- Data Modeling
Quotas and Limits
Important Characteristics
Local Development Server Limitations
Common Patterns
- Search with Pagination
- Error Handling with Retry
Quick Reference
- Query String Syntax
- Field Type Selection Guide

Overview

The Search API provides a model for indexing documents with structured data, enabling full-text search operations with advanced querying capabilities. Documents and indexes are stored in a separate persistent store optimized for search operations.

Core Concepts

Documents

A document is an object with:

Unique ID (doc_id): Up to 500 printable ASCII characters (codes 33-126)
List of fields: Named, typed data containers
Maximum size: 1 MB per document

Document Identifier Rules

Can be auto-generated or manually specified
Cannot begin with ! or be wrapped with __
Must contain only visible, printable ASCII characters
Used for direct retrieval without search

Field Types

String Fields

Text Field: Plain text, searchable word-by-word (max 1,048,576 characters)
HTML Field: HTML markup, only text outside tags is searchable (max 1,048,576 characters)
Atom Field: Indivisible string, not tokenized (max 500 characters)

Non-Text Fields

Number Field: A double-precision floating-point number. Values for this field must be in the range -2,147,483,647 to 2,147,483,647.
Date Field: Date object (stored as days since 1/1/1970 UTC)
Geopoint Field: Latitude and longitude coordinates

Field Naming Rules

Case sensitive, ASCII only
Must start with letter
Can contain letters, digits, underscore
Maximum 500 characters

Indexes

An index stores documents for retrieval. Key characteristics:

No limit on number of documents or indexes
Default size limit: 10 GB per index (can increase to 200 GB)
Total storage across all indexes: 0.25 GB free quota
Supports document retrieval by ID, ID range, or query

Queries

Query strings can be:

Global search: Search values across all fields
Field search: Target specific fields by name
Mixed: Combine both approaches - Maximum length: 2000 characters

Results

Search results include:

Number of documents found (estimate)
Number of documents returned (actual)
Collection of ScoredDocument objects - Maximum 10,000 matching documents per search
Default return: 20 documents at a time

Document Operations

Creating Documents

User currentUser = UserServiceFactory.getUserService().getCurrentUser();
String userEmail = currentUser == null ? "" : currentUser.getEmail();
String myDocId = "PA6-5000";

Document doc = Document.newBuilder()
    .setId(myDocId)  // Optional
    .addField(Field.newBuilder().setName("content").setText("the rain in spain"))
    .addField(Field.newBuilder().setName("email").setText(userEmail))
    .addField(Field.newBuilder().setName("published").setDate(new Date()))
    .build();

Adding Documents to Index

IndexSpec indexSpec = IndexSpec.newBuilder().setName(indexName).build();
Index index = SearchServiceFactory.getSearchService().getIndex(indexSpec);

// Batch put (up to 200 documents)
index.put(document);

Best Practice: Batch operations with up to 200 documents are more efficient than single additions.

Updating Documents

Documents are immutable once added. To update: 1. Create a new document with the same doc_id 2. Add it to the index (replaces the old document)

Retrieving Documents

// By doc_id
Document doc = index.get("AZ125");

// Range of doc_ids
GetResponse<Document> docs = index.getRange(
    GetRequest.newBuilder()
        .setStartId("AZ125")
        .setLimit(100)
        .build()
);

Deleting Documents

// Delete by doc_id (batch up to 200)
List<String> docIds = new ArrayList<>();
docIds.add("doc1");
docIds.add("doc2");
index.delete(docIds);

Query Strings

Global Search

Search for values in any field:

"rose water"              // Find documents with both words
"1776-07-04"             // Find date or text matching this
"NOT red"                // Find documents without "red"
"red OR blue"            // Find either color
"keyboard AND mouse"     // Find both terms

Boolean Operator Precedence: NOT > OR > AND

Field Search

Target specific fields:

"product:piano"                          // Equality
"price < 500"                            // Comparison
"product:piano AND price < 5000"         // Combined
"color:(red OR blue)"                    // Multiple values
"birthday >= 2000-12-31"                 // Date comparison

Relational Operators by Field Type

Field Type	Operators
Atom	`:` `=`
Text/HTML	`:` `=`
Number	`:` `=` `<` `<=` `>` `>=`
Date	`:` `=` `<` `<=` `>` `>=`
Geopoint	Use `distance()` function

Geopoint Queries

"distance(survey_marker, geopoint(35.2, 40.5)) < 100"
"distance(home, geopoint(35.2, 40.5)) > 100"

Special Features

Stemming

Use ~ prefix to match word variations:

"~cat" // Matches "cat" and "cats"
"~dog" // Matches "dog" and "dogs"

Quoted Strings

Exact phrase matching:

 "Comment:\"insanely great\"" "Title:\"Tom&Jerry\""

Tokenization

String fields are tokenized on: - Whitespace characters - Most punctuation marks - Special characters

Special Cases: - Underscore _ and ampersand & do NOT break tokens - Acronyms like "I.B.M." become "ibm" - Hash signs in patterns like #google or c# remain part of word - Apostrophe in possessives like "John's" stays attached

Atom Fields: Never tokenized (exact match only)

Query and Sort Options

Basic Query Construction

QueryOptions options = QueryOptions.newBuilder()
    .setLimit(25)
    .setFieldsToReturn("model", "price", "description")
    .build();

Query query = Query.newBuilder()
    .setOptions(options)
    .build(queryString);

Results<ScoredDocument> result = index.search(query);

QueryOptions Properties

Property	Description	Default	Maximum
Limit	Max documents to return	20	1000
Offset	Starting position	0	1000
Cursor	Alternative to offset	null	-
ReturningIdsOnly	Return IDs only	false	-
FieldsToReturn	Specific fields to include	All fields	100 fields
ExpressionsToReturn	Computed fields	None	-
FieldsToSnippet	Generate snippets	None	-

Sort Options

SortOptions sortOptions = SortOptions.newBuilder()
    .addSortExpression(
        SortExpression.newBuilder()
            .setExpression("price")
            .setDirection(SortExpression.SortDirection.DESCENDING)
            .setDefaultValueNumeric(0)
    )
    .addSortExpression(
        SortExpression.newBuilder()
            .setExpression("brand")
            .setDirection(SortExpression.SortDirection.DESCENDING)
            .setDefaultValue("")
    )
    .setLimit(1000)
    .build();

Important: Sorting limits results to 10,000 documents maximum. Default sort limit is 1,000.

Field Expressions

Create computed fields using expressions:

"price * quantity"
"(men + women)/2"
"min(daily_use, 10) * rate"
"snippet('rose', flower, 120)"

Special Terms: - _rank: Document's rank property - _score: Match score (if MatchScorer enabled)

Numeric Functions: - max(...), min(...), abs(...), log(...), pow(x, y), count(field)

Geopoint Functions: - geopoint(lat, long): Create geopoint - distance(point1, point2): Calculate distance in meters

Snippets

Generate text fragments showing matched content:

snippet(query, body, [max_chars])

// Example
QueryOptions options = QueryOptions.newBuilder()
    .setFieldsToSnippet("description", "content")
    .build();

Returns HTML with matched text in boldface, default 160 characters.

Handling Results

Processing Results

Results<ScoredDocument> result = index.search(query);

// Get counts
long totalMatches = result.getNumberFound();
int numberOfDocsReturned = result.getNumberReturned();

// Iterate documents
for (ScoredDocument doc : result) {
    String maker = doc.getOnlyField("maker").getText();
    double price = doc.getOnlyField("price").getNumber();
}

Using Offsets

int offset = 0;
do {
    QueryOptions options = QueryOptions.newBuilder()
        .setOffset(offset)
        .build();

    Query query = Query.newBuilder()
        .setOptions(options)
        .build(queryString);

    Results<ScoredDocument> result = index.search(query);
    int numberRetrieved = result.getNumberReturned();

    if (numberRetrieved > 0) {
        offset += numberRetrieved;
        // Process documents
    }
} while (numberRetrieved > 0);

Using Cursors (Recommended for Large Result Sets)

Per-Query Cursor

Cursor cursor = Cursor.newBuilder().build();

do {
    QueryOptions options = QueryOptions.newBuilder()
        .setCursor(cursor)
        .build();

    Query query = Query.newBuilder()
        .setOptions(options)
        .build(queryString);

    Results<ScoredDocument> result = index.search(query);
    cursor = result.getCursor();

    // Process documents
} while (cursor != null);

Per-Result Cursor

Cursor cursor = Cursor.newBuilder()
    .setPerResult(true)
    .build();

QueryOptions options = QueryOptions.newBuilder()
    .setCursor(cursor)
    .build();

Results<ScoredDocument> result = index.search(query);

for (ScoredDocument doc : result) {
    if (/* document of interest */) {
        cursor = doc.getCursor();
    }
}

Saving/Restoring Cursors

// Save
String cursorString = cursor.toWebSafeString();

// Restore
Cursor cursor = Cursor.newBuilder().build(cursorString);

Faceted Search

Adding Facets to Documents

Document doc = Document.newBuilder()
    .setId("doc1")
    .addField(Field.newBuilder().setName("name").setAtom("x86"))
    .addFacet(Facet.withAtom("type", "computer"))
    .addFacet(Facet.withNumber("ram_size_gb", 8.0))
    .build();

Facet Rules: - Name: Same rules as field names (500 char max) - Value: Atom string (500 char max) or number - No limit on values per facet or facets per document - Can have multiple values for same facet name

Retrieving Facet Information

Automatic Discovery

Results<ScoredDocument> result = index.search(
    Query.newBuilder()
        .setEnableFacetDiscovery(true)
        .build("name:x86")
);

for (FacetResult facetResult : result.getFacets()) {
    System.out.printf("Facet %s:\n", facetResult.getName());
    for (FacetResultValue facetValue : facetResult.getValues()) {
        System.out.printf(" %s: Count=%s\n",
            facetValue.getLabel(),
            facetValue.getCount()
        );
    }
}

By Name

Results<ScoredDocument> result = index.search(
    Query.newBuilder()
        .addReturnFacet("type")
        .addReturnFacet("ram_size_gb")
        .build("name:x86")
);

By Name and Value

Results<ScoredDocument> result = index.search(
    Query.newBuilder()
        .addReturnFacet(FacetRequest.newBuilder()
            .setName("type")
            .addValueConstraint("computer")
            .addValueConstraint("printer"))
        .addReturnFacet(FacetRequest.newBuilder()
            .setName("ram_size_gb")
            .addRange(FacetRange.withEnd(4.0))
            .addRange(FacetRange.withStartEnd(4.0, 8.0))
            .addRange(FacetRange.withStart(8.0)))
        .build("name:x86")
);

Facet Options

Query query = Query.newBuilder()
    .setFacetOptions(FacetOptions.newBuilder()
        .setDiscoveryLimit(5)         // Default: 10
        .setDiscoveryValueLimit(10)   // Default: 10
        .setDepth(6000)               // Default: 1000
        .build())
    .build(queryString);

Using Refinements

Query query = Query.newBuilder()
    .addFacetRefinementFromToken(refinement_key1)
    .addFacetRefinementFromToken(refinement_key2)
    .build("some_query");

Refinement Logic: - Same facet refinements: Combined with OR - Different facet refinements: Combined with AND

Index Management

Index Schemas

Schemas are maintained automatically and show all field names and types:

GetResponse<Index> response = SearchServiceFactory.getSearchService()
    .getIndexes(GetIndexesRequest.newBuilder()
        .setSchemaFetched(true)
        .build());

for (Index index : response) {
    Schema schema = index.getSchema();
    for (String fieldName : schema.getFieldNames()) {
        List<FieldType> typesForField = schema.getFieldTypes(fieldName);
        // Process schema information
    }
}

Schema Characteristics: - Auto-updated as documents are added - Fields can never be removed from schema - Same field name can have multiple types - Not returned by default (must request explicitly)

Retrieving All Indexes

// Current namespace only
GetResponse<Index> response = SearchServiceFactory.getSearchService()
    .getIndexes(GetIndexesRequest.newBuilder().build());

// All namespaces
GetResponse<Index> response = SearchServiceFactory.getSearchService()
    .getIndexes(GetIndexesRequest.newBuilder()
        .setAllNamespaces(true)
        .build());

Pagination: Maximum 1000 indexes per call. Use setStartIndexName() for more.

Checking Index Size

// Maximum allowed size
long maxSize = index.getStorageLimit();

// Current usage (estimate)
long currentUsage = index.getStorageUsage();

Deleting an Index

To delete an index: 1. Delete all documents 2. Delete the index schema

// Delete all documents
while (true) {
    List<String> docIds = new ArrayList<>();
    GetRequest request = GetRequest.newBuilder()
        .setReturningIdsOnly(true)
        .build();
    GetResponse<Document> response = index.getRange(request);

    if (response.getResults().isEmpty()) {
        break;
    }

    for (Document doc : response) {
        docIds.add(doc.getId());
    }
    index.delete(docIds);
}

Best Practices

Performance Optimization

Batch Operations: Always batch puts/deletes (up to 200 documents)
Use Document Rank for Pre-sorting: java // Set rank to price for default price sorting Document.newBuilder().setRank(price)
Avoid Expensive Operations:
- Use atom fields for boolean data (not numbers)
- Transform negations: cuisine_known:yes vs NOT cuisine:undefined
- Transform disjunctions: cuisine:Asian vs cuisine:Japanese OR cuisine:Korean
- Eliminate tautologies: city:toronto vs city:toronto AND NOT city:montreal
Narrow Before Sorting:

"cuisine:japanese" + sort by distance

// Good: Filter first, then sort smaller set 
"cuisine:japanese AND city:<user-city>" + sort by distance

Use Categories to Avoid Sorting:

// Create price ranges:
    price_0_10, price_11_20, etc. "price_range:price_21_30 OR
    price_range:price_31_40"

Avoid Scoring Unless Needed: Scoring is expensive in operations and time

Data Modeling

Use Rank Strategically: Default sort by rank is most efficient
Multiple Sort Orders: Create separate indexes for different sort orders
- Index 1: rank = price
- Index 2: rank = MAXINT - price
Multi-valued Fields: Only first value used in sorts
Document ID Design: Can't search on doc_id directly, so also store in atom field if needed

Quotas and Limits

Free Quotas

Resource	Free Quota
Total storage	0.25 GB
Queries	1000/day
Adding documents	0.01 GB/day

Safety Limits (All Apps)

Resource	Limit
Query execution time	100 aggregated minutes/minute
Documents added/deleted	15,000/minute
Index size	10 GB (up to 200 GB with request)
Document size	1 MB
Query string length	2000 characters
Documents per search	10,000 max found
Documents per put/delete	200
Fields to return	100
Sort limit	10,000 (default 1,000)

Pricing (Beyond Free Quotas)

Resource	Cost
Storage	$0.18/GB/month
Queries	$0.50/10K queries
Indexing	$2.00/GB

Important Characteristics

Eventual Consistency

Changes to documents propagate across data centers with eventual consistency: - Updates may not be immediately visible - Search results may not reflect most recent changes - Designed for high availability across distributed systems

Document Properties

Document doc = Document.newBuilder()
    .setId(docId)
    .setRank(customRank)    // Default: seconds since Jan 1, 2011
    .setLocale("en")        // Language encoding
    .addField(...)
    .build();

Rank Usage: - Positive integer determining default sort order - Don't assign same rank to >10,000 documents - Referenced as _rank in expressions

Date Field Precision

Stored as days since 1/1/1970 UTC
Time component ignored for indexing/searching
Query format: yyyy-mm-dd (leading zeros optional)
Sort order for same date is undefined

Local Development Server Limitations

Features NOT available on local dev server: - Stemming (e.g., ~cat) - Asian language tokenization - Match scoring - Diacritical marks in atom/text/HTML fields

Common Patterns

Search with Pagination

public Results<ScoredDocument> searchWithOptions(
    String indexName,
    String queryString
) {
    SortOptions sortOptions = SortOptions.newBuilder()
        .addSortExpression(
            SortExpression.newBuilder()
                .setExpression("price")
                .setDirection(SortExpression.SortDirection.DESCENDING)
                .setDefaultValueNumeric(0))
        .setLimit(1000)
        .build();

    QueryOptions options = QueryOptions.newBuilder()
        .setLimit(25)
        .setFieldsToReturn("model", "price", "description")
        .setSortOptions(sortOptions)
        .build();

    Query query = Query.newBuilder()
        .setOptions(options)
        .build(queryString);

    IndexSpec indexSpec = IndexSpec.newBuilder()
        .setName(indexName)
        .build();
    Index index = SearchServiceFactory.getSearchService()
        .getIndex(indexSpec);

    return index.search(query);
}

Error Handling with Retry

final int maxRetry = 3;
int attempts = 0;
int delay = 2;

while (true) {
    try {
        index.put(document);
        break;
    } catch (PutException e) {
        if (StatusCode.TRANSIENT_ERROR.equals(e.getOperationResult().getCode())
            && ++attempts < maxRetry) {
            Thread.sleep(delay * 1000);
            delay *= 2;  // Exponential backoff
            continue;
        } else {
            throw e;
        }
    }
}

Quick Reference

Query String Syntax

Global:          "value1 value2"
Field:           "field:value"
Comparison:      "price < 100"
Boolean:         "field1:value1 AND field2:value2"
Negation:        "NOT field:value"
Parentheses:     "(field1:value1 OR field2:value2) AND field3:value3"
Stemming:        "~word"
Exact phrase:    "field:\"exact phrase\""
Geopoint:        "distance(field, geopoint(lat, long)) < 100"

Field Type Selection Guide

Atom: Exact match only (product IDs, categories, booleans)
Text: Word-by-word search (descriptions, comments)
HTML: Like text but ignores markup (formatted content)
Number: Numeric comparisons (prices, quantities)
Date: Date range queries (timestamps, birthdays)
Geopoint: Distance calculations (locations, coordinates)

FilesExpand file tree

search.md

Latest commit

History