This guide explains how to configure subgraph indexing logs storage in graph-node.
- Overview
- How Log Stores Work
- Log Store Types
- Configuration
- Querying Logs
- Migrating from Deprecated Configuration
- Choosing the Right Backend
- Best Practices
- Troubleshooting
Graph Node supports multiple logs storage backends for subgraph indexing logs. Subgraph indexing logs include:
- User-generated logs: Explicit logging from subgraph mapping code (
log.info(),log.error(), etc.) - Runtime logs: Handler execution, event processing, data source activity
- System logs: Warnings, errors, and diagnostics from the indexing system
Available backends:
- File: JSON Lines files on local filesystem (for local development)
- Elasticsearch: Enterprise-grade search and analytics (for production)
- Loki: Grafana's lightweight log aggregation system (for production)
- Disabled: No log storage (default)
All backends share the same query interface through GraphQL, making it easy to switch between them.
Important Note: When log storage is disabled (the default), subgraph logs still appear in stdout/stderr as they always have. The "disabled" setting simply means logs are not stored separately in a queryable format. You can still see logs in your terminal or container logs - they just won't be available via the _logs GraphQL query.
┌─────────────────┐
│ Subgraph Code │
│ (mappings) │
└────────┬────────┘
│ log.info(), log.error(), etc.
▼
┌─────────────────┐
│ Graph Runtime │
│ (WebAssembly) │
└────────┬────────┘
│ Log events
▼
┌─────────────────┐
│ Log Drain │ ◄─── slog-based logging system
└────────┬────────┘
│ Write
▼
┌─────────────────┐
│ Log Store │ ◄─── Configurable backend
│ (ES/Loki/File) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ GraphQL API │ ◄─── Unified query interface
│ (port 8000) │
└─────────────────┘
- Log sources generate logs from:
- User mapping code (explicit
log.info(),log.error(), etc. calls) - Subgraph runtime (handler execution, event processing, data source triggers)
- System warnings and errors (indexing issues, constraint violations, etc.)
- User mapping code (explicit
- Graph runtime captures these logs with metadata (timestamp, level, source location)
- Log drain formats logs and writes to configured backend
- Log store persists logs and handles queries
- GraphQL API exposes logs through the
_logsquery
Each log entry contains:
id: Unique identifiersubgraphId: Deployment hash (QmXxx...)timestamp: ISO 8601 timestamp (e.g.,2024-01-15T10:30:00.123456789Z)level: CRITICAL, ERROR, WARNING, INFO, or DEBUGtext: Log messagearguments: Key-value pairs from structured loggingmeta: Source location (module, line, column)
Best for: Local development, testing
File-based logs store each subgraph's logs in a separate JSON Lines (.jsonl) file:
graph-logs/
├── QmSubgraph1Hash.jsonl
├── QmSubgraph2Hash.jsonl
└── QmSubgraph3Hash.jsonl
Each line in the file is a complete JSON object representing one log entry.
{"id":"QmTest-2024-01-15T10:30:00.123456789Z","subgraphId":"QmTest","timestamp":"2024-01-15T10:30:00.123456789Z","level":"error","text":"Handler execution failed, retries: 3","arguments":[{"key":"retries","value":"3"}],"meta":{"module":"mapping.ts","line":42,"column":10}}File-based logs stream through files line-by-line with bounded memory usage.
Performance characteristics:
- Query time: O(n) where n = number of log entries
- Memory usage: O(skip + first) - only matching entries kept in memory
- Suitable for: Development and testing
Minimum configuration (CLI):
graph-node \
--postgres-url postgresql://graph:pass@localhost/graph-node \
--ethereum-rpc mainnet:https://... \
--ipfs 127.0.0.1:5001 \
--log-store-backend file \
--log-store-file-dir ./graph-logsFull configuration (environment variables):
export GRAPH_LOG_STORE_BACKEND=file
export GRAPH_LOG_STORE_FILE_DIR=/var/log/graph-node
export GRAPH_LOG_STORE_FILE_MAX_SIZE=104857600 # 100MB
export GRAPH_LOG_STORE_FILE_RETENTION_DAYS=30Advantages:
- No external dependencies
- Simple setup (just specify a directory)
- Human-readable format (JSON Lines)
- Easy to inspect with standard tools (
jq,grep, etc.) - Good for debugging during development
Limitations:
- Not suitable for production with high log volume
- No indexing (O(n) query time scales with file size)
- No automatic log rotation or retention management
- Single file per subgraph (no sharding)
Use file-based logs when:
- Developing subgraphs locally
- Testing on a development machine
- Running low-traffic subgraphs (< 1000 total logs/day including system logs)
- You want simple log access without external services
Best for: Production deployments, high log volume, advanced search
Elasticsearch stores logs in indices with full-text search capabilities, making it ideal for production deployments with high log volume.
Architecture:
graph-node → Elasticsearch HTTP API → Elasticsearch cluster
→ Index: subgraph-logs-*
→ Query DSL for filtering
Advantages:
- Indexed searching: Fast queries even with millions of logs
- Full-text search: Powerful text search across log messages
- Scalability: Handles billions of log entries
- High availability: Supports clustering and replication
- Kibana integration: Rich visualization and dashboards for operators
- Time-based indices: Efficient retention management
Considerations:
- Requires Elasticsearch cluster (infrastructure overhead)
- Resource-intensive (CPU, memory, disk)
Minimum configuration (CLI):
graph-node \
--postgres-url postgresql://graph:pass@localhost/graph-node \
--ethereum-rpc mainnet:https://... \
--ipfs 127.0.0.1:5001 \
--log-store-backend elasticsearch \
--log-store-elasticsearch-url http://localhost:9200Full configuration with authentication:
graph-node \
--postgres-url postgresql://graph:pass@localhost/graph-node \
--ethereum-rpc mainnet:https://... \
--ipfs 127.0.0.1:5001 \
--log-store-backend elasticsearch \
--log-store-elasticsearch-url https://es.example.com:9200 \
--log-store-elasticsearch-user elastic \
--log-store-elasticsearch-password secret \
--log-store-elasticsearch-index subgraph-logsEnvironment variables:
export GRAPH_LOG_STORE_BACKEND=elasticsearch
export GRAPH_LOG_STORE_ELASTICSEARCH_URL=http://localhost:9200
export GRAPH_LOG_STORE_ELASTICSEARCH_USER=elastic
export GRAPH_LOG_STORE_ELASTICSEARCH_PASSWORD=secret
export GRAPH_LOG_STORE_ELASTICSEARCH_INDEX=subgraph-logsLogs are stored in the configured index (default: subgraph). The index mapping is automatically created.
Recommended index settings for production:
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "5s"
}
}Performance characteristics:
- Query time: O(log n) with indexing
- Memory usage: Minimal (server-side filtering)
- Suitable for: Millions to billions of log entries
Use Elasticsearch when:
- Running production deployments
- High log volume
- Need advanced search and filtering
- Want to build dashboards with Kibana
- Need high availability and scalability
- Have DevOps resources to manage Elasticsearch or can set up a managed ElasticSearch deployment
Best for: Production deployments, Grafana users, cost-effective at scale
Loki is Grafana's log aggregation system, designed to be cost-effective and easy to operate. Unlike Elasticsearch, Loki only indexes metadata (not full-text), making it more efficient for time-series log data.
Architecture:
graph-node → Loki HTTP API → Loki
→ Stores compressed chunks
→ Indexes labels only
Advantages:
- Cost-effective: Lower storage costs than Elasticsearch
- Grafana integration: Native integration with Grafana
- Horizontal scalability: Designed for cloud-native deployments
- Multi-tenancy: Built-in tenant isolation
- Efficient compression: Optimized for log data
- LogQL: Powerful query language similar to PromQL
- Lower resource usage: Less CPU/memory than Elasticsearch
Considerations:
- No full-text indexing (slower text searches)
- Best used with Grafana (less tooling than Elasticsearch)
- Younger ecosystem than Elasticsearch
- Query performance depends on label cardinality
Minimum configuration (CLI):
graph-node \
--postgres-url postgresql://graph:pass@localhost/graph-node \
--ethereum-rpc mainnet:https://... \
--ipfs 127.0.0.1:5001 \
--log-store-backend loki \
--log-store-loki-url http://localhost:3100With multi-tenancy:
graph-node \
--postgres-url postgresql://graph:pass@localhost/graph-node \
--ethereum-rpc mainnet:https://... \
--ipfs 127.0.0.1:5001 \
--log-store-backend loki \
--log-store-loki-url http://localhost:3100 \
--log-store-loki-tenant-id my-graph-nodeEnvironment variables:
export GRAPH_LOG_STORE_BACKEND=loki
export GRAPH_LOG_STORE_LOKI_URL=http://localhost:3100
export GRAPH_LOG_STORE_LOKI_TENANT_ID=my-graph-nodeLoki uses labels for indexing. Graph Node automatically creates labels:
subgraph_id: Deployment hashlevel: Log leveljob: "graph-node"
Performance characteristics:
- Query time: O(n) for text searches, O(log n) for label queries
- Memory usage: Minimal (server-side processing)
- Suitable for: Millions to billions of log entries
- Best performance with label-based filtering
Use Loki when:
- Already using Grafana for monitoring
- Need cost-effective log storage at scale
- Want simpler operations than Elasticsearch
- Multi-tenancy is required
- Log volume is very high (> 1M logs/day)
- Full-text search is not critical
Best for: Minimalist deployments, reduced overhead
When log storage is disabled (the default), subgraph logs are still written to stdout/stderr along with all other graph-node logs. They are just not stored separately in a queryable format.
Important: "Disabled" does NOT mean logs are discarded. It means:
- Logs appear in stdout/stderr (traditional behavior)
- Logs are not stored in a separate queryable backend
- The
_logsGraphQL query returns empty results
This is the default behavior - logs continue to work exactly as they did before this feature was added.
Explicitly disable:
export GRAPH_LOG_STORE_BACKEND=disabledOr simply don't configure a backend (defaults to disabled):
# No log store configuration = disabled
graph-node \
--postgres-url postgresql://graph:pass@localhost/graph-node \
--ethereum-rpc mainnet:https://... \
--ipfs 127.0.0.1:5001Advantages:
- Zero additional overhead
- No external dependencies
- Minimal configuration
- Logs still appear in stdout/stderr for debugging
Limitations:
- Cannot query logs via GraphQL (
_logsreturns empty results) - No separation of subgraph logs from other graph-node logs in stdout
- Logs mixed with system logs (harder to filter programmatically)
- No structured querying or filtering capabilities
Use disabled log storage when:
- Running minimal test deployments with less dependencies
- Exposing logs to users is not required for your use case
- You'd like subgraph logs sent to external log collection (e.g., container logs)
Environment variables are the recommended way to configure log stores, especially in containerized deployments.
GRAPH_LOG_STORE_BACKEND=<backend>Valid values: disabled, elasticsearch, loki, file
GRAPH_LOG_STORE_ELASTICSEARCH_URL=http://localhost:9200
GRAPH_LOG_STORE_ELASTICSEARCH_USER=elastic # Optional
GRAPH_LOG_STORE_ELASTICSEARCH_PASSWORD=secret # Optional
GRAPH_LOG_STORE_ELASTICSEARCH_INDEX=subgraph # Default: "subgraph"GRAPH_LOG_STORE_LOKI_URL=http://localhost:3100
GRAPH_LOG_STORE_LOKI_TENANT_ID=my-tenant # OptionalGRAPH_LOG_STORE_FILE_DIR=/var/log/graph-node
GRAPH_LOG_STORE_FILE_MAX_SIZE=104857600 # Default: 100MB
GRAPH_LOG_STORE_FILE_RETENTION_DAYS=30 # Default: 30CLI arguments provide the same functionality as environment variables and the two can be mixed together.
--log-store-backend <backend>--log-store-elasticsearch-url <URL>
--log-store-elasticsearch-user <USER>
--log-store-elasticsearch-password <PASSWORD>
--log-store-elasticsearch-index <INDEX>--log-store-loki-url <URL>
--log-store-loki-tenant-id <TENANT_ID>--log-store-file-dir <DIR>
--log-store-file-max-size <BYTES>
--log-store-file-retention-days <DAYS>When multiple configuration methods are used:
- CLI arguments take highest precedence
- Environment variables are used if no CLI args provided
- Defaults are used if neither is set
All log backends share the same GraphQL query interface. Logs are queried through the subgraph-specific GraphQL endpoint:
- Subgraph by deployment:
http://localhost:8000/subgraphs/id/<deployment-hash> - Subgraph by name:
http://localhost:8000/subgraphs/name/<subgraph-name>
The _logs query is automatically scoped to the subgraph in the URL, so you don't need to pass a subgraphId parameter.
Note: Queries return all log types - both user-generated logs from mapping code and system-generated runtime logs (handler execution, events, warnings, etc.). Use the search filter to search for specific messages, or level to filter by severity.
Query the _logs field at your subgraph's GraphQL endpoint:
query {
_logs(
first: 100
) {
id
timestamp
level
text
}
}Example endpoint: http://localhost:8000/subgraphs/id/QmYourDeploymentHash
query {
_logs(
level: ERROR
from: "2024-01-01T00:00:00Z"
to: "2024-01-31T23:59:59Z"
search: "timeout"
first: 50
skip: 0
) {
id
timestamp
level
text
arguments {
key
value
}
meta {
module
line
column
}
}
}| Filter | Type | Description |
|---|---|---|
level |
LogLevel | Filter by level: CRITICAL, ERROR, WARNING, INFO, DEBUG |
from |
String | Start timestamp (ISO 8601) |
to |
String | End timestamp (ISO 8601) |
search |
String | Case-insensitive substring search in log messages |
first |
Int | Number of results to return (default: 100, max: 1000) |
skip |
Int | Number of results to skip for pagination (max: 10000) |
| Field | Type | Description |
|---|---|---|
id |
String | Unique log entry ID |
timestamp |
String | ISO 8601 timestamp with nanosecond precision |
level |
LogLevel | Log level (CRITICAL, ERROR, WARNING, INFO, DEBUG) |
text |
String | Complete log message with arguments |
arguments |
[(String, String)] | Structured key-value pairs |
meta.module |
String | Source file name |
meta.line |
Int | Line number |
meta.column |
Int | Column number |
query RecentErrors {
_logs(
level: ERROR
first: 20
) {
timestamp
text
meta {
module
line
}
}
}query SearchTimeout {
_logs(
search: "timeout"
first: 50
) {
timestamp
level
text
}
}query HandlerLogs {
_logs(
search: "handler"
first: 50
) {
timestamp
level
text
}
}query LogsInRange {
_logs(
from: "2024-01-15T00:00:00Z"
to: "2024-01-15T23:59:59Z"
first: 1000
) {
timestamp
level
text
}
}# First page
query Page1 {
_logs(
first: 100
skip: 0
) {
id
text
}
}
# Second page
query Page2 {
_logs(
first: 100
skip: 100
) {
id
text
}
}curl -X POST http://localhost:8000/subgraphs/id/<deployment-hash> \
-H "Content-Type: application/json" \
-d '{
"query": "{ _logs(level: ERROR, first: 10) { timestamp level text } }"
}'File-based: for development only
- Streams through files line-by-line (bounded memory usage)
- Memory usage limited to O(skip + first) entries
- Query time is O(n) where n = total log entries in file
Elasticsearch:
- Indexed queries are fast regardless of size
- Text searches are optimized with full-text indexing
- Can handle billions of log entries
- Best for production with high query volume
Loki:
- Label-based queries are fast (indexed)
- Text searches scan compressed chunks (slower than Elasticsearch)
- Good performance with proper label filtering
- Best for production with Grafana integration
| Scenario | Recommended Backend | Reason |
|---|---|---|
| Local development | File | Simple, no dependencies, easy to inspect |
| Testing/staging | File or Elasticsearch | File for simplicity, ES if testing production config |
| Production | Elasticsearch or Loki | Both handle scale well |
| Using Grafana | Loki | Native integration |
| Cost-sensitive at scale | Loki | Lower storage costs |
| Want rich ecosystem | Elasticsearch | More tools and plugins |
| Minimal deployment | Disabled | No overhead |
- Disk: Minimal (log files only)
- Memory: Depends on file size during queries
- CPU: Minimal
- Network: None
- External services: None
- Disk: High (indices + replicas)
- Memory: 4-8GB minimum for small deployments
- CPU: Medium to high
- Network: HTTP API calls
- External services: Elasticsearch cluster
- Disk: Medium (compressed chunks)
- Memory: 2-4GB minimum
- CPU: Low to medium
- Network: HTTP API calls
- External services: Loki server
- Start with file-based for development - Simplest setup, easy debugging
- Use Elasticsearch or Loki for production - Better performance and features
- Monitor log volume - Set up alerts if log volume grows unexpectedly (includes both user logs and system-generated runtime logs)
- Set retention policies - Don't keep logs forever (disk space and cost)
- Use structured logging - Pass key-value pairs to log functions for better filtering
- Monitor file size - While queries use bounded memory, larger files take longer to scan (O(n) query time)
- Archive old logs - Manually archive/delete old files or implement external rotation
- Monitor disk usage - Files can grow quickly with verbose logging
- Use JSON tools -
jqis excellent for inspecting .jsonl files locally
Example local inspection:
# Count logs by level
cat graph-logs/QmExample.jsonl | jq -r '.level' | sort | uniq -c
# Find errors in last 1000 lines
tail -n 1000 graph-logs/QmExample.jsonl | jq 'select(.level == "error")'
# Search for specific text
cat graph-logs/QmExample.jsonl | jq 'select(.text | contains("timeout"))'- Use index patterns - Time-based indices for easier management
- Configure retention - Use Index Lifecycle Management (ILM)
- Monitor cluster health - Set up Elasticsearch monitoring
- Tune for your workload - Adjust shards/replicas based on log volume
- Use Kibana - Visualize and explore logs effectively
Example Elasticsearch retention policy:
{
"policy": "graph-logs-policy",
"phases": {
"hot": { "min_age": "0ms", "actions": {} },
"warm": { "min_age": "7d", "actions": {} },
"delete": { "min_age": "30d", "actions": { "delete": {} } }
}
}- Use proper labels - Don't over-index, keep label cardinality low
- Configure retention - Set retention period in Loki config
- Use Grafana - Native integration provides best experience
- Compress efficiently - Loki's compression works best with batch writes
- Multi-tenancy - Use tenant IDs if running multiple environments
Example Grafana query:
{subgraph_id="QmExample", level="error"} |= "timeout"
Problem: Log file doesn't exist
- Check
GRAPH_LOG_STORE_FILE_DIRis set correctly - Verify directory is writable by graph-node
Problem: Queries are slow
- Subgraph logs file may be very large
- Consider archiving old logs or implementing retention
- For high-volume production use, switch to Elasticsearch or Loki
Problem: Disk filling up
- Implement log rotation
- Reduce log verbosity in subgraph code
- Set up monitoring for disk usage
Problem: Cannot connect to Elasticsearch
- Verify
GRAPH_LOG_STORE_ELASTICSEARCH_URLis correct - Check Elasticsearch is running:
curl http://localhost:9200 - Verify authentication credentials if using security features
- Check network connectivity and firewall rules
Problem: No logs appearing in Elasticsearch
- Check Elasticsearch cluster health
- Verify index exists:
curl http://localhost:9200/_cat/indices - Check graph-node logs for write errors
- Verify Elasticsearch has disk space
Problem: Queries are slow
- Check Elasticsearch cluster health and resources
- Verify indices are not over-sharded
- Consider adding replicas for query performance
- Review query patterns and add appropriate indices
Problem: Cannot connect to Loki
- Verify
GRAPH_LOG_STORE_LOKI_URLis correct - Check Loki is running:
curl http://localhost:3100/ready - Verify tenant ID if using multi-tenancy
- Check network connectivity
Problem: No logs appearing in Loki
- Check Loki service health
- Verify Loki has disk space for chunks
- Check graph-node logs for write errors
- Verify Loki retention settings aren't deleting logs immediately
Problem: Queries return no results in Grafana
- Check label selectors match what graph-node is sending
- Verify time range includes when logs were written
- Check Loki retention period
- Verify tenant ID matches if using multi-tenancy