Skip to content

[feature] AI-assisted Lucene query-builder #219

@righel

Description

@righel

🔍 Feature Request: AI-assisted Lucene Query Builder

Summary

Add an AI-assisted natural language to Lucene query builder in the misp-workbench search interface, allowing analysts to describe what they're looking for in plain language and have it automatically translated into valid Lucene/OpenSearch queries.

Motivation

Building Lucene queries manually against the OpenSearch backend requires analysts to know the exact field names, syntax, and boolean logic. This creates a steep learning curve and slows down threat hunting workflows — especially for less experienced users or during time-sensitive investigations.

An AI-assisted query builder would lower this barrier significantly and speed up attribute and correlation searches.

Proposed Behaviour

  • A text input in the search UI allows the user to type a natural language description (e.g. "show me all IP attributes from the last 7 days correlated with APT28 events")
  • The UI sends the prompt to a backend endpoint (or directly to an LLM API) that returns a valid Lucene/OpenSearch query string
  • The generated query is pre-filled into the existing query input field, ready for review and execution
  • The user can edit the generated query before running it

Suggested Implementation Approach

  • Backend: add a POST /api/v1/search/ai-query endpoint that accepts a { "prompt": "..." } body, calls an LLM (e.g. via OpenAI-compatible API or a locally hosted model), and returns { "query": "<lucene query string>" }
  • Prompt engineering: the system prompt should include the relevant OpenSearch index mappings / field names used in misp-workbench (e.g. attribute.type, attribute.value, event.info, correlation.uuid, etc.) so the model can generate accurate field-scoped queries
  • Frontend: add an "Ask AI" button or input section alongside the existing Lucene query field in the search view; show a loading state while the query is being generated, then inject the result into the query field
  • Configuration: the LLM endpoint and API key should be configurable via environment variables (.env), keeping the feature optional and self-hostable

Example

User input:

Find all domain attributes tagged tlp:red seen in the past 30 days

Generated Lucene query:

attribute.type:domain AND tags:tlp\:red AND attribute.timestamp:[now-30d TO now]

Acceptance Criteria

  • Natural language input is accepted in the search UI
  • A valid Lucene/OpenSearch query is generated and pre-filled in the query field
  • The feature is disabled/skipped gracefully when no LLM is configured
  • Field names in generated queries match the actual OpenSearch index schema
  • The generated query is editable by the user before execution
  • Basic error handling when the LLM returns an invalid or empty response

Notes

  • This feature is scoped for the hackathon and can start as a thin prototype — full production polish (streaming, query explanation, history) can follow in a separate issue
  • Consider whether the LLM call should happen client-side (e.g. browser → LLM API directly) or server-side (browser → FastAPI → LLM API) — server-side is preferred to keep API keys out of the frontend

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions