Skip to content

[FEATURE] Add include_metadata request parameter for PPL queries #5235

@Hailong-am

Description

@Hailong-am

Is your feature request related to a problem?

Yes. When using PPL queries, there's no way to include metadata fields like _id and _index in the results without explicitly listing them. This creates problems for common workflows like searching around documents.

Use Case: Search Around Documents

A typical workflow is:

  1. Query to find N interesting documents (e.g., errors, anomalies, specific patterns)
  2. Use the returned _id values to search for surrounding context:
    • Log entries before/after an error event
    • Related events in the same time window
    • Documents from the same session/user
    • Time-series data around an anomaly

Current problem: The wildcard * excludes metadata fields by default, so source=logs | fields * returns all regular fields but no _id. Users must either list all fields explicitly (tedious and breaks with dynamic schemas) or run separate queries (double the latency).

What solution would you like?

Add a request-level parameter include_metadata to the PPL query API:

POST /_plugins/_ppl?include_metadata=true
{
  "query": "source=logs | where level='ERROR' | fields * | head 10"
}

Result: All regular fields PLUS metadata fields (_id, _index, _score, etc.)

Benefits:

  • ✅ Get document IDs in a single query - no double roundtrip
  • ✅ No query text changes needed - just a request parameter
  • ✅ Easy to use in SDKs and client applications
  • ✅ Works with any field selection (wildcards, patterns, explicit fields)
  • ✅ Non-breaking - default behavior unchanged (metadata excluded)
  • ✅ Enables search-around-documents and correlation workflows

What alternatives have you considered?

  1. Explicit listing - source=logs | fields *, _id, _index

    • Requires knowing metadata field names
    • Only works with Calcite enabled
    • Undocumented
  2. Changing default behavior - Make fields * include metadata

    • Breaking change for existing queries

Do you have any additional context?

This feature would make PPL more practical for observability and log analysis workflows where document correlation is essential. The request parameter approach is simple, discoverable, and aligns with how similar features work in other query APIs.

Metadata fields to include when enabled:

  • _id - Document ID
  • _index - Index name
  • _score - Search relevance score
  • _routing - Routing value
  • _sort - Sort values

Metadata

Metadata

Assignees

Labels

PPLPiped processing languageenhancementNew feature or request

Type

No type

Projects

Status

Not Started

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions