Skip to content

fix(repository-elasticsearch): sanitize orgId/envId per OTel data stream rules#17246

Open
remibaptistegio wants to merge 1 commit into
masterfrom
fix-otel-dataset-orgId-sanitization
Open

fix(repository-elasticsearch): sanitize orgId/envId per OTel data stream rules#17246
remibaptistegio wants to merge 1 commit into
masterfrom
fix-otel-dataset-orgId-sanitization

Conversation

@remibaptistegio
Copy link
Copy Markdown
Contributor

Issue

Hyphenated org / env ids (e.g. my-org-1, prod-eu) silently returned zero results from the OTel-mode tracing + logs Elasticsearch lookups.

Description

The OTel collector's elasticsearchexporter (in mapping.mode: otel) runs each data-stream component through sanitizeDataStreamField before writing — disallowed runes (including the hyphen) become _, the rest is lowercased, truncated to 100 bytes. The Gravitee-side lookup used the generic IndexNameUtils.format which only lowercases, so any org / env id with a hyphen produced a different index name on read than the collector had written, and every query missed.

This PR adds OtelDataStreamIndexUtils mirroring the exporter's algorithm and routes both ElasticsearchTracingRepository and ElasticsearchOtelLogRepository through it.

Design note: always applies the dataset rule (the stricter superset over namespace) to every placeholder regardless of position in the template. Losing potential cosmetic hyphens in a namespace-position placeholder is preferable to miscategorising the position; both sides agree on the substitution and the look-up works.

Additional context

  • Upstream reference: opentelemetry-collector-contrib/exporter/elasticsearchexporter/data_stream_router.go — see sanitizeDataStreamField, disallowedDatasetRunes, disallowedNamespaceRunes, maxDataStreamBytes.
  • elastic/apm-data implements the same sanitization (verified via source), so this also works against Elastic Cloud managed OTLP intake.
  • 9 new unit tests on OtelDataStreamIndexUtils (hyphens, mixed case, full disallowed-rune set, structural separators preserved, multi-substitution, 100-byte truncation, edge cases).
  • Both repos' captured-body tests updated with the sanitized expectations (test-orgtest_org).

…eam rules

The OTel collector elasticsearchexporter (mapping.mode: otel) runs
each data stream component through sanitizeDataStreamField before
writing — disallowed runes (including the hyphen) become '_' and the
result is lowercased. The Gravitee-side lookup used the generic
IndexNameUtils.format which only lowercases, so any org/env id with
a hyphen produced a different index name on read than the collector
wrote, and every query missed.

Adds OtelDataStreamIndexUtils mirroring the exporter's algorithm
(see opentelemetry-collector-contrib/exporter/elasticsearchexporter/
data_stream_router.go) and routes both ElasticsearchTracingRepository
and ElasticsearchOtelLogRepository through it. Always applies the
dataset rule (the stricter superset over namespace) to every
placeholder, regardless of position in the template — losing
cosmetic hyphens in namespace-position values is preferable to
miscategorising a placeholder.
@remibaptistegio remibaptistegio requested a review from a team as a code owner May 29, 2026 14:58
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant