Skip to content

[FEATURE] PPL query support for Analytics engine integration #5247

Description

@dai-chen

Is your feature request related to a problem?

PPL queries currently only execute against Lucene-backed indices. With the Analytics engine providing Parquet-backed storage, PPL queries targeting non-Lucene indices need to be routed through the unified query pipeline to the Analytics engine for execution. The Unified Query API already supports PPL V3 Calcite-based RelNode generation, but the end-to-end integration — query routing, schema building, execution handoff, and response formatting — is not yet ready.

Technical requirements:

  • PPL queries against non-Lucene indices return correct results through _plugins/_ppl endpoint
  • PPL explain API (_plugins/_ppl/_explain) returns the logical plan handed off to the Analytics engine
  • Default response format (JDBC JSON with schema, datarows, total, size, status) matches existing PPL response format
  • Clear error messages distinguishing client errors (query parsing/planning in SQL plugin) from server errors (query optimization/distributed planning/execution from Analytics engine)
  • Observability: metrics and latency tracking across both SQL plugin (routing, parsing, planning) and Analytics engine (optimization, execution)
  • PPL queries on Lucene indices are unaffected (no regression)

What solution would you like?

  1. Plugin wiring and dependency integration: Add analytics-engine as extendedPlugins dependency, resolve Calcite jar conflicts (classloader sharing and bundlePlugin excludes), wire SchemaBuilder and QueryPlanExecutor from the Analytics engine via Guice.
  2. Query routing and execution handoff: Add RestUnifiedQueryAction and AnalyticsExecutionEngine that detect non-Lucene indices, route PPL queries through UnifiedQueryPlanner.plan()QueryPlanExecutor.execute(), and schedule execution on sql-worker thread pool with security context propagation.
  3. Response formatting and explain support: Format Iterable<Object[]> results via existing JdbcResponseFormatter; return logical RelNode plan via _plugins/_ppl/_explain, include physical plan if Analytics engine provides QueryPlanExecutor.explain() API.
  4. Error handling and observability: Client vs server / SQL plugin vs Analytics engine error classification, query size limit enforcement, request/failure metrics, and planning/execution latency logging.
  5. Integration and regression tests: End-to-end ITs with analytics-engine plugin verifying PPL query, explain, response format, error handling, non-Lucene routing, and Lucene regression.

What alternatives have you considered?

See parent issue #5246 for the design comparison of Option A (Query Delegation), Option B (Unified Query Pipeline), and Option C (Calcite Schema Adapter).

Do you have any additional context?

  • PoC PR: dai-chen/sql-1#10 — validates end-to-end flow with real analytics-engine plugin in integration tests
  • Pending from Analytics engine team: stable API interfaces (SchemaBuilder, QueryPlanExecutor), plugin integration mechanism, and integration-ready build for testing
  • If a workable Analytics engine build is not available, we may create an analytics-engine stub with mock data to unblock integration work and testing

Metadata

Metadata

Assignees

Labels

PPLPiped processing languageenhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

Status
Not Started

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions