Log Store Configuration and Usage

This guide explains how to configure subgraph indexing logs storage in graph-node.

Overview
How Log Stores Work
Log Store Types
Configuration
Querying Logs
Migrating from Deprecated Configuration
Choosing the Right Backend
Best Practices
Troubleshooting

Overview

Graph Node supports multiple logs storage backends for subgraph indexing logs. Subgraph indexing logs include:

User-generated logs: Explicit logging from subgraph mapping code (log.info(), log.error(), etc.)
Runtime logs: Handler execution, event processing, data source activity
System logs: Warnings, errors, and diagnostics from the indexing system

Available backends:

File: JSON Lines files on local filesystem (for local development)
Elasticsearch: Enterprise-grade search and analytics (for production)
Loki: Grafana's lightweight log aggregation system (for production)
Disabled: No log storage (default)

All backends share the same query interface through GraphQL, making it easy to switch between them.

Important Note: When log storage is disabled (the default), subgraph logs still appear in stdout/stderr as they always have. The "disabled" setting simply means logs are not stored separately in a queryable format. You can still see logs in your terminal or container logs - they just won't be available via the _logs GraphQL query.

How Log Stores Work

Architecture

┌─────────────────┐
│  Subgraph Code  │
│   (mappings)    │
└────────┬────────┘
         │ log.info(), log.error(), etc.
         ▼
┌─────────────────┐
│  Graph Runtime  │
│   (WebAssembly) │
└────────┬────────┘
         │ Log events
         ▼
┌─────────────────┐
│   Log Drain     │  ◄─── slog-based logging system
└────────┬────────┘
         │ Write
         ▼
┌─────────────────┐
│   Log Store     │  ◄─── Configurable backend
│  (ES/Loki/File) │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  GraphQL API    │  ◄─── Unified query interface
│  (port 8000)    │
└─────────────────┘

Log Flow

Log sources generate logs from:
- User mapping code (explicit log.info(), log.error(), etc. calls)
- Subgraph runtime (handler execution, event processing, data source triggers)
- System warnings and errors (indexing issues, constraint violations, etc.)
Graph runtime captures these logs with metadata (timestamp, level, source location)
Log drain formats logs and writes to configured backend
Log store persists logs and handles queries
GraphQL API exposes logs through the _logs query

Log Entry Structure

Each log entry contains:

id: Unique identifier
subgraphId: Deployment hash (QmXxx...)
timestamp: ISO 8601 timestamp (e.g., 2024-01-15T10:30:00.123456789Z)
level: CRITICAL, ERROR, WARNING, INFO, or DEBUG
text: Log message
arguments: Key-value pairs from structured logging
meta: Source location (module, line, column)

Log Store Types

File-based Logs

Best for: Local development, testing

How It Works

File-based logs store each subgraph's logs in a separate JSON Lines (.jsonl) file:

graph-logs/
├── QmSubgraph1Hash.jsonl
├── QmSubgraph2Hash.jsonl
└── QmSubgraph3Hash.jsonl

Each line in the file is a complete JSON object representing one log entry.

Storage Format

{"id":"QmTest-2024-01-15T10:30:00.123456789Z","subgraphId":"QmTest","timestamp":"2024-01-15T10:30:00.123456789Z","level":"error","text":"Handler execution failed, retries: 3","arguments":[{"key":"retries","value":"3"}],"meta":{"module":"mapping.ts","line":42,"column":10}}

Query Performance

File-based logs stream through files line-by-line with bounded memory usage.

Performance characteristics:

Query time: O(n) where n = number of log entries
Memory usage: O(skip + first) - only matching entries kept in memory
Suitable for: Development and testing

Configuration

Minimum configuration (CLI):

graph-node \
  --postgres-url postgresql://graph:pass@localhost/graph-node \
  --ethereum-rpc mainnet:https://... \
  --ipfs 127.0.0.1:5001 \
  --log-store-backend file \
  --log-store-file-dir ./graph-logs

Full configuration (environment variables):

export GRAPH_LOG_STORE_BACKEND=file
export GRAPH_LOG_STORE_FILE_DIR=/var/log/graph-node
export GRAPH_LOG_STORE_FILE_MAX_SIZE=104857600      # 100MB
export GRAPH_LOG_STORE_FILE_RETENTION_DAYS=30

Features

Advantages:

No external dependencies
Simple setup (just specify a directory)
Human-readable format (JSON Lines)
Easy to inspect with standard tools (jq, grep, etc.)
Good for debugging during development

Limitations:

Not suitable for production with high log volume
No indexing (O(n) query time scales with file size)
No automatic log rotation or retention management
Single file per subgraph (no sharding)

When to Use

Use file-based logs when:

Developing subgraphs locally
Testing on a development machine
Running low-traffic subgraphs (< 1000 total logs/day including system logs)
You want simple log access without external services

Elasticsearch

Best for: Production deployments, high log volume, advanced search

How It Works

Elasticsearch stores logs in indices with full-text search capabilities, making it ideal for production deployments with high log volume.

Architecture:

graph-node → Elasticsearch HTTP API → Elasticsearch cluster
                                    → Index: subgraph-logs-*
                                    → Query DSL for filtering

Features

Advantages:

Indexed searching: Fast queries even with millions of logs
Full-text search: Powerful text search across log messages
Scalability: Handles billions of log entries
High availability: Supports clustering and replication
Kibana integration: Rich visualization and dashboards for operators
Time-based indices: Efficient retention management

Considerations:

Requires Elasticsearch cluster (infrastructure overhead)
Resource-intensive (CPU, memory, disk)

Configuration

Minimum configuration (CLI):

graph-node \
  --postgres-url postgresql://graph:pass@localhost/graph-node \
  --ethereum-rpc mainnet:https://... \
  --ipfs 127.0.0.1:5001 \
  --log-store-backend elasticsearch \
  --log-store-elasticsearch-url http://localhost:9200

Full configuration with authentication:

graph-node \
  --postgres-url postgresql://graph:pass@localhost/graph-node \
  --ethereum-rpc mainnet:https://... \
  --ipfs 127.0.0.1:5001 \
  --log-store-backend elasticsearch \
  --log-store-elasticsearch-url https://es.example.com:9200 \
  --log-store-elasticsearch-user elastic \
  --log-store-elasticsearch-password secret \
  --log-store-elasticsearch-index subgraph-logs

Environment variables:

export GRAPH_LOG_STORE_BACKEND=elasticsearch
export GRAPH_LOG_STORE_ELASTICSEARCH_URL=http://localhost:9200
export GRAPH_LOG_STORE_ELASTICSEARCH_USER=elastic
export GRAPH_LOG_STORE_ELASTICSEARCH_PASSWORD=secret
export GRAPH_LOG_STORE_ELASTICSEARCH_INDEX=subgraph-logs

Index Configuration

Logs are stored in the configured index (default: subgraph). The index mapping is automatically created.

Recommended index settings for production:

{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "5s"
  }
}

Query Performance

Performance characteristics:

Query time: O(log n) with indexing
Memory usage: Minimal (server-side filtering)
Suitable for: Millions to billions of log entries

When to Use

Use Elasticsearch when:

Running production deployments
High log volume
Need advanced search and filtering
Want to build dashboards with Kibana
Need high availability and scalability
Have DevOps resources to manage Elasticsearch or can set up a managed ElasticSearch deployment

Loki

Best for: Production deployments, Grafana users, cost-effective at scale

How It Works

Loki is Grafana's log aggregation system, designed to be cost-effective and easy to operate. Unlike Elasticsearch, Loki only indexes metadata (not full-text), making it more efficient for time-series log data.

Architecture:

graph-node → Loki HTTP API → Loki
                           → Stores compressed chunks
                           → Indexes labels only

Features

Advantages:

Cost-effective: Lower storage costs than Elasticsearch
Grafana integration: Native integration with Grafana
Horizontal scalability: Designed for cloud-native deployments
Multi-tenancy: Built-in tenant isolation
Efficient compression: Optimized for log data
LogQL: Powerful query language similar to PromQL
Lower resource usage: Less CPU/memory than Elasticsearch

Considerations:

No full-text indexing (slower text searches)
Best used with Grafana (less tooling than Elasticsearch)
Younger ecosystem than Elasticsearch
Query performance depends on label cardinality

Configuration

Minimum configuration (CLI):

graph-node \
  --postgres-url postgresql://graph:pass@localhost/graph-node \
  --ethereum-rpc mainnet:https://... \
  --ipfs 127.0.0.1:5001 \
  --log-store-backend loki \
  --log-store-loki-url http://localhost:3100

With multi-tenancy:

graph-node \
  --postgres-url postgresql://graph:pass@localhost/graph-node \
  --ethereum-rpc mainnet:https://... \
  --ipfs 127.0.0.1:5001 \
  --log-store-backend loki \
  --log-store-loki-url http://localhost:3100 \
  --log-store-loki-tenant-id my-graph-node

Environment variables:

export GRAPH_LOG_STORE_BACKEND=loki
export GRAPH_LOG_STORE_LOKI_URL=http://localhost:3100
export GRAPH_LOG_STORE_LOKI_TENANT_ID=my-graph-node

Labels

Loki uses labels for indexing. Graph Node automatically creates labels:

subgraph_id: Deployment hash
level: Log level
job: "graph-node"

Query Performance

Performance characteristics:

Query time: O(n) for text searches, O(log n) for label queries
Memory usage: Minimal (server-side processing)
Suitable for: Millions to billions of log entries
Best performance with label-based filtering

When to Use

Use Loki when:

Already using Grafana for monitoring
Need cost-effective log storage at scale
Want simpler operations than Elasticsearch
Multi-tenancy is required
Log volume is very high (> 1M logs/day)
Full-text search is not critical

Disabled

Best for: Minimalist deployments, reduced overhead

How It Works

When log storage is disabled (the default), subgraph logs are still written to stdout/stderr along with all other graph-node logs. They are just not stored separately in a queryable format.

Important: "Disabled" does NOT mean logs are discarded. It means:

Logs appear in stdout/stderr (traditional behavior)
Logs are not stored in a separate queryable backend
The _logs GraphQL query returns empty results

This is the default behavior - logs continue to work exactly as they did before this feature was added.

Configuration

Explicitly disable:

export GRAPH_LOG_STORE_BACKEND=disabled

Or simply don't configure a backend (defaults to disabled):

# No log store configuration = disabled
graph-node \
  --postgres-url postgresql://graph:pass@localhost/graph-node \
  --ethereum-rpc mainnet:https://... \
  --ipfs 127.0.0.1:5001

Features

Advantages:

Zero additional overhead
No external dependencies
Minimal configuration
Logs still appear in stdout/stderr for debugging

Limitations:

Cannot query logs via GraphQL (_logs returns empty results)
No separation of subgraph logs from other graph-node logs in stdout
Logs mixed with system logs (harder to filter programmatically)
No structured querying or filtering capabilities

When to Use

Use disabled log storage when:

Running minimal test deployments with less dependencies
Exposing logs to users is not required for your use case
You'd like subgraph logs sent to external log collection (e.g., container logs)

Configuration

Environment Variables

Environment variables are the recommended way to configure log stores, especially in containerized deployments.

Backend Selection

GRAPH_LOG_STORE_BACKEND=<backend>

Valid values: disabled, elasticsearch, loki, file

Elasticsearch

GRAPH_LOG_STORE_ELASTICSEARCH_URL=http://localhost:9200
GRAPH_LOG_STORE_ELASTICSEARCH_USER=elastic          # Optional
GRAPH_LOG_STORE_ELASTICSEARCH_PASSWORD=secret       # Optional
GRAPH_LOG_STORE_ELASTICSEARCH_INDEX=subgraph        # Default: "subgraph"

Loki

GRAPH_LOG_STORE_LOKI_URL=http://localhost:3100
GRAPH_LOG_STORE_LOKI_TENANT_ID=my-tenant           # Optional

File

GRAPH_LOG_STORE_FILE_DIR=/var/log/graph-node
GRAPH_LOG_STORE_FILE_MAX_SIZE=104857600            # Default: 100MB
GRAPH_LOG_STORE_FILE_RETENTION_DAYS=30             # Default: 30

CLI Arguments

CLI arguments provide the same functionality as environment variables and the two can be mixed together.

Backend Selection

--log-store-backend <backend>

Elasticsearch

--log-store-elasticsearch-url <URL>
--log-store-elasticsearch-user <USER>
--log-store-elasticsearch-password <PASSWORD>
--log-store-elasticsearch-index <INDEX>

Loki

--log-store-loki-url <URL>
--log-store-loki-tenant-id <TENANT_ID>

File

--log-store-file-dir <DIR>
--log-store-file-max-size <BYTES>
--log-store-file-retention-days <DAYS>

Configuration Precedence

When multiple configuration methods are used:

CLI arguments take highest precedence
Environment variables are used if no CLI args provided
Defaults are used if neither is set

Querying Logs

All log backends share the same GraphQL query interface. Logs are queried through the subgraph-specific GraphQL endpoint:

Subgraph by deployment: http://localhost:8000/subgraphs/id/<deployment-hash>
Subgraph by name: http://localhost:8000/subgraphs/name/<subgraph-name>

The _logs query is automatically scoped to the subgraph in the URL, so you don't need to pass a subgraphId parameter.

Note: Queries return all log types - both user-generated logs from mapping code and system-generated runtime logs (handler execution, events, warnings, etc.). Use the search filter to search for specific messages, or level to filter by severity.

Basic Query

Query the _logs field at your subgraph's GraphQL endpoint:

query {
  _logs(
    first: 100
  ) {
    id
    timestamp
    level
    text
  }
}

Example endpoint: http://localhost:8000/subgraphs/id/QmYourDeploymentHash

Query with Filters

query {
  _logs(
    level: ERROR
    from: "2024-01-01T00:00:00Z"
    to: "2024-01-31T23:59:59Z"
    search: "timeout"
    first: 50
    skip: 0
  ) {
    id
    timestamp
    level
    text
    arguments {
      key
      value
    }
    meta {
      module
      line
      column
    }
  }
}

Available Filters

Filter	Type	Description
`level`	LogLevel	Filter by level: CRITICAL, ERROR, WARNING, INFO, DEBUG
`from`	String	Start timestamp (ISO 8601)
`to`	String	End timestamp (ISO 8601)
`search`	String	Case-insensitive substring search in log messages
`first`	Int	Number of results to return (default: 100, max: 1000)
`skip`	Int	Number of results to skip for pagination (max: 10000)

Response Fields

Field	Type	Description
`id`	String	Unique log entry ID
`timestamp`	String	ISO 8601 timestamp with nanosecond precision
`level`	LogLevel	Log level (CRITICAL, ERROR, WARNING, INFO, DEBUG)
`text`	String	Complete log message with arguments
`arguments`	[(String, String)]	Structured key-value pairs
`meta.module`	String	Source file name
`meta.line`	Int	Line number
`meta.column`	Int	Column number

Query Examples

Recent Errors

query RecentErrors {
  _logs(
    level: ERROR
    first: 20
  ) {
    timestamp
    text
    meta {
      module
      line
    }
  }
}

Search for Specific Text

query SearchTimeout {
  _logs(
    search: "timeout"
    first: 50
  ) {
    timestamp
    level
    text
  }
}

Handler Execution Logs

query HandlerLogs {
  _logs(
    search: "handler"
    first: 50
  ) {
    timestamp
    level
    text
  }
}

Time Range Query

query LogsInRange {
  _logs(
    from: "2024-01-15T00:00:00Z"
    to: "2024-01-15T23:59:59Z"
    first: 1000
  ) {
    timestamp
    level
    text
  }
}

Pagination

# First page
query Page1 {
  _logs(
    first: 100
    skip: 0
  ) {
    id
    text
  }
}

# Second page
query Page2 {
  _logs(
    first: 100
    skip: 100
  ) {
    id
    text
  }
}

Querying the logs store using cURL

curl -X POST http://localhost:8000/subgraphs/id/<deployment-hash> \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{ _logs(level: ERROR, first: 10) { timestamp level text } }"
  }'

Performance Considerations

File-based: for development only

Streams through files line-by-line (bounded memory usage)
Memory usage limited to O(skip + first) entries
Query time is O(n) where n = total log entries in file

Elasticsearch:

Indexed queries are fast regardless of size
Text searches are optimized with full-text indexing
Can handle billions of log entries
Best for production with high query volume

Loki:

Label-based queries are fast (indexed)
Text searches scan compressed chunks (slower than Elasticsearch)
Good performance with proper label filtering
Best for production with Grafana integration

Choosing the Right Backend

Decision Matrix

Scenario	Recommended Backend	Reason
Local development	File	Simple, no dependencies, easy to inspect
Testing/staging	File or Elasticsearch	File for simplicity, ES if testing production config
Production	Elasticsearch or Loki	Both handle scale well
Using Grafana	Loki	Native integration
Cost-sensitive at scale	Loki	Lower storage costs
Want rich ecosystem	Elasticsearch	More tools and plugins
Minimal deployment	Disabled	No overhead

Resource Requirements

File-based

Disk: Minimal (log files only)
Memory: Depends on file size during queries
CPU: Minimal
Network: None
External services: None

Elasticsearch

Disk: High (indices + replicas)
Memory: 4-8GB minimum for small deployments
CPU: Medium to high
Network: HTTP API calls
External services: Elasticsearch cluster

Loki

Disk: Medium (compressed chunks)
Memory: 2-4GB minimum
CPU: Low to medium
Network: HTTP API calls
External services: Loki server

Best Practices

General

Start with file-based for development - Simplest setup, easy debugging
Use Elasticsearch or Loki for production - Better performance and features
Monitor log volume - Set up alerts if log volume grows unexpectedly (includes both user logs and system-generated runtime logs)
Set retention policies - Don't keep logs forever (disk space and cost)
Use structured logging - Pass key-value pairs to log functions for better filtering

File-based Logs

Monitor file size - While queries use bounded memory, larger files take longer to scan (O(n) query time)
Archive old logs - Manually archive/delete old files or implement external rotation
Monitor disk usage - Files can grow quickly with verbose logging
Use JSON tools - jq is excellent for inspecting .jsonl files locally

Example local inspection:

# Count logs by level
cat graph-logs/QmExample.jsonl | jq -r '.level' | sort | uniq -c

# Find errors in last 1000 lines
tail -n 1000 graph-logs/QmExample.jsonl | jq 'select(.level == "error")'

# Search for specific text
cat graph-logs/QmExample.jsonl | jq 'select(.text | contains("timeout"))'

Elasticsearch

Use index patterns - Time-based indices for easier management
Configure retention - Use Index Lifecycle Management (ILM)
Monitor cluster health - Set up Elasticsearch monitoring
Tune for your workload - Adjust shards/replicas based on log volume
Use Kibana - Visualize and explore logs effectively

Example Elasticsearch retention policy:

{
  "policy": "graph-logs-policy",
  "phases": {
    "hot": { "min_age": "0ms", "actions": {} },
    "warm": { "min_age": "7d", "actions": {} },
    "delete": { "min_age": "30d", "actions": { "delete": {} } }
  }
}

Loki

Use proper labels - Don't over-index, keep label cardinality low
Configure retention - Set retention period in Loki config
Use Grafana - Native integration provides best experience
Compress efficiently - Loki's compression works best with batch writes
Multi-tenancy - Use tenant IDs if running multiple environments

Example Grafana query:

{subgraph_id="QmExample", level="error"} |= "timeout"

Troubleshooting

File-based Logs

Problem: Log file doesn't exist

Check GRAPH_LOG_STORE_FILE_DIR is set correctly
Verify directory is writable by graph-node

Problem: Queries are slow

Subgraph logs file may be very large
Consider archiving old logs or implementing retention
For high-volume production use, switch to Elasticsearch or Loki

Problem: Disk filling up

Implement log rotation
Reduce log verbosity in subgraph code
Set up monitoring for disk usage

Elasticsearch

Problem: Cannot connect to Elasticsearch

Verify GRAPH_LOG_STORE_ELASTICSEARCH_URL is correct
Check Elasticsearch is running: curl http://localhost:9200
Verify authentication credentials if using security features
Check network connectivity and firewall rules

Problem: No logs appearing in Elasticsearch

Check Elasticsearch cluster health
Verify index exists: curl http://localhost:9200/_cat/indices
Check graph-node logs for write errors
Verify Elasticsearch has disk space

Problem: Queries are slow

Check Elasticsearch cluster health and resources
Verify indices are not over-sharded
Consider adding replicas for query performance
Review query patterns and add appropriate indices

Loki

Problem: Cannot connect to Loki

Verify GRAPH_LOG_STORE_LOKI_URL is correct
Check Loki is running: curl http://localhost:3100/ready
Verify tenant ID if using multi-tenancy
Check network connectivity

Problem: No logs appearing in Loki

Check Loki service health
Verify Loki has disk space for chunks
Check graph-node logs for write errors
Verify Loki retention settings aren't deleting logs immediately

Problem: Queries return no results in Grafana

Check label selectors match what graph-node is sending
Verify time range includes when logs were written
Check Loki retention period
Verify tenant ID matches if using multi-tenancy

FilesExpand file tree

log-store.md

Latest commit

History

log-store.md

File metadata and controls

Log Store Configuration and Usage

Table of Contents

Overview

How Log Stores Work

Architecture

Log Flow

Log Entry Structure

Log Store Types

File-based Logs

How It Works

Storage Format

Query Performance

Configuration

Features

When to Use

Elasticsearch

How It Works

Features

Configuration

Index Configuration

Query Performance

When to Use

Loki

How It Works

Features

Configuration

Labels

Query Performance

When to Use

Disabled

How It Works

Configuration

Features

When to Use

Configuration

Environment Variables

Backend Selection

Elasticsearch

Loki

File

CLI Arguments

Backend Selection

Elasticsearch

Loki

File

Configuration Precedence

Querying Logs

Basic Query

Query with Filters

Available Filters

Response Fields

Query Examples

Recent Errors

Search for Specific Text

Handler Execution Logs

Time Range Query

Pagination

Querying the logs store using cURL

Performance Considerations

Choosing the Right Backend

Decision Matrix

Resource Requirements

File-based

Elasticsearch

Loki

Best Practices

General

File-based Logs

Elasticsearch

Loki

Troubleshooting

File-based Logs

Elasticsearch

Loki