Skip to content

Commit abd27fb

Browse files
MikeGoldsmithcodebotenxrmx
authored
config: load and validate generated config models (#4898)
* config: generate model code from json schema Proposing that the first step towards implementing OpenTelemetry Configuration is to produce the model code from the json schema. I did a quick search for tools available to do this and came across datamodel-codegen which seems to do what i expected. Will open following pull requests (in draft) to use this model code, i just want to keep these as clearly separated as possible to make reviewing them easier. Signed-off-by: alex boten <223565+codeboten@users.noreply.github.com> * update tox command’s deps and allowlist * add use-union-operator to datamodel-codegen and regenerate models file * add changelog * disable union-operator and set target python to 3.10 * ignore generated file for linting * fix TypeAlias import for python 3.9 in generated models file * update uv.lock with new dev dependencies * run precommit Signed-off-by: alex boten <223565+codeboten@users.noreply.github.com> * config: add yam/json file loading and env var substitution * Fix pyright config structure in pyproject.toml datamodel-codegen section was inserted between [tool.pyright] and its include/exclude config, causing pyright to check entire repo (599 files) instead of just included paths. Moved datamodel-codegen section after pyright config. * Fix typecheck and pylint errors - Fix re.sub callback return type (must return str, not str | None) - Rename DOLLAR_PLACEHOLDER to dollar_placeholder (pylint naming) - Rename f to temp_file/config_file (pylint naming) - Update uv.lock * Add types-PyYAML and file-configuration extra to typecheck Fixes yaml import warning by installing PyYAML type stubs and the file-configuration optional dependencies. * run precommit * config: add jsonschema validation and vendor OTel config schema - Bundle schema.json (v1.0.0-rc.3) alongside models.py - Add jsonschema >= 4.0 to file-configuration optional extra - Validate parsed config against schema before constructing model, with field path included in error messages for nested violations - Switch datamodel-codegen to use local schema (drop [http] extra) - Add schema validation tests (wrong type, missing required, nested path, enum violation) Assisted-by: Claude Sonnet 4.6 * config: fix pylint warnings in _loader.py - Replace global statement with list cache in _get_schema - Extract _validate_schema helper to reduce branches/statements in load_config_file Assisted-by: Claude Sonnet 4.6 * fix: bump typing_extensions to 4.12.0 for Python 3.13+ compat jsonschema's `referencing` dep uses TypeVar defaults which requires typing_extensions>=4.12.0 on Python 3.13+ (4.10.0 raises AttributeError on TypeVar.__default__). Assisted-by: Claude Sonnet 4.6 * config: address review feedback - simplify env substitution and bump deps - replace $$ placeholder trick with single-pass regex matching both $$ and \${VAR} patterns, correctly handling edge case of $$\${VAR} - bump pyyaml to 6.0.3 and jsonschema to 4.26.0 in test-requirements - add PR link to CHANGELOG entry Co-authored-by: Riccardo Magliocchetti <riccardo.magliocchetti@gmail.com> Assisted-by: Claude Sonnet 4.6 * fix: downgrade jsonschema to 4.25.1 for Python 3.9 compatibility jsonschema 4.26.0 requires Python>=3.10. Assisted-by: Claude Sonnet 4.6 --------- Signed-off-by: alex boten <223565+codeboten@users.noreply.github.com> Co-authored-by: alex boten <223565+codeboten@users.noreply.github.com> Co-authored-by: Riccardo Magliocchetti <riccardo.magliocchetti@gmail.com>
1 parent f08e522 commit abd27fb

17 files changed

Lines changed: 4326 additions & 749 deletions

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1212

1313
## Unreleased
1414

15+
- `opentelemetry-sdk`: Add file configuration support with YAML/JSON loading, environment variable substitution, and schema validation against the vendored OTel config JSON schema
16+
([#4898](https://github.com/open-telemetry/opentelemetry-python/pull/4898))
1517
- Fix intermittent CI failures in `getting-started` and `tracecontext` jobs caused by GitHub git CDN SHA propagation lag by installing contrib packages from the already-checked-out local copy instead of a second git clone
1618
([#4958](https://github.com/open-telemetry/opentelemetry-python/pull/4958))
1719
- `opentelemetry-sdk`: fix type annotations on `MetricReader` and related types

opentelemetry-sdk/pyproject.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,12 @@ dependencies = [
3232
"typing-extensions >= 4.5.0",
3333
]
3434

35+
[project.optional-dependencies]
36+
file-configuration = [
37+
"pyyaml >= 6.0",
38+
"jsonschema >= 4.0",
39+
]
40+
3541
[project.entry-points.opentelemetry_environment_variables]
3642
sdk = "opentelemetry.sdk.environment_variables"
3743

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Copyright The OpenTelemetry Authors
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
"""OpenTelemetry SDK File Configuration.
16+
17+
This module provides support for configuring the OpenTelemetry SDK
18+
using declarative configuration files (YAML or JSON).
19+
20+
Example:
21+
>>> from opentelemetry.sdk._configuration.file import load_config_file
22+
>>> config = load_config_file("otel-config.yaml")
23+
>>> print(config.file_format)
24+
'1.0-rc.3'
25+
"""
26+
27+
from opentelemetry.sdk._configuration.file._env_substitution import (
28+
EnvSubstitutionError,
29+
substitute_env_vars,
30+
)
31+
from opentelemetry.sdk._configuration.file._loader import (
32+
ConfigurationError,
33+
load_config_file,
34+
)
35+
36+
__all__ = [
37+
"load_config_file",
38+
"substitute_env_vars",
39+
"ConfigurationError",
40+
"EnvSubstitutionError",
41+
]
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Copyright The OpenTelemetry Authors
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
"""Environment variable substitution for configuration files."""
16+
17+
import logging
18+
import os
19+
import re
20+
21+
_logger = logging.getLogger(__name__)
22+
23+
24+
class EnvSubstitutionError(Exception):
25+
"""Raised when environment variable substitution fails.
26+
27+
This occurs when a ${VAR} reference is found but the environment
28+
variable is not set and no default value is provided.
29+
"""
30+
31+
32+
def substitute_env_vars(text: str) -> str:
33+
"""Substitute environment variables in configuration text.
34+
35+
Supports the following syntax:
36+
- ${VAR}: Substitute with environment variable VAR. Raises error if not found.
37+
- ${VAR:-default}: Substitute with VAR if set, otherwise use default value.
38+
- $$: Escape sequence for literal $.
39+
40+
Args:
41+
text: Configuration text with potential ${VAR} placeholders.
42+
43+
Returns:
44+
Text with environment variables substituted.
45+
46+
Raises:
47+
EnvSubstitutionError: If a required environment variable is not found.
48+
49+
Examples:
50+
>>> os.environ['SERVICE_NAME'] = 'my-service'
51+
>>> substitute_env_vars('name: ${SERVICE_NAME}')
52+
'name: my-service'
53+
>>> substitute_env_vars('name: ${MISSING:-default}')
54+
'name: default'
55+
>>> substitute_env_vars('price: $$100')
56+
'price: $100'
57+
"""
58+
# Pattern matches $$ (escape sequence) or ${VAR_NAME} / ${VAR_NAME:-default_value}
59+
# Handling both in a single pass ensures $$ followed by ${VAR} works correctly
60+
pattern = r"\$\$|\$\{([A-Za-z_][A-Za-z0-9_]*)(:-([^}]*))?\}"
61+
62+
def replace_var(match) -> str:
63+
if match.group(1) is None:
64+
# Matched $$, return literal $
65+
return "$"
66+
67+
var_name = match.group(1)
68+
has_default = match.group(2) is not None
69+
default_value = match.group(3) if has_default else None
70+
71+
value = os.environ.get(var_name)
72+
73+
if value is None:
74+
if has_default:
75+
return default_value or ""
76+
_logger.error(
77+
"Environment variable '%s' not found and no default provided",
78+
var_name,
79+
)
80+
raise EnvSubstitutionError(
81+
f"Environment variable '{var_name}' not found and no default provided"
82+
)
83+
84+
return value
85+
86+
return re.sub(pattern, replace_var, text)
Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
# Copyright The OpenTelemetry Authors
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
"""Configuration file loading and parsing."""
16+
17+
import importlib.resources
18+
import json
19+
import logging
20+
from pathlib import Path
21+
from typing import Any
22+
23+
from opentelemetry.sdk._configuration.file._env_substitution import (
24+
substitute_env_vars,
25+
)
26+
from opentelemetry.sdk._configuration.models import OpenTelemetryConfiguration
27+
28+
try:
29+
import yaml
30+
except ImportError as exc:
31+
raise ImportError(
32+
"File configuration requires pyyaml. "
33+
"Install with: pip install opentelemetry-sdk[file-configuration]"
34+
) from exc
35+
36+
try:
37+
import jsonschema
38+
except ImportError as exc:
39+
raise ImportError(
40+
"File configuration requires jsonschema. "
41+
"Install with: pip install opentelemetry-sdk[file-configuration]"
42+
) from exc
43+
44+
_schema_cache: list[dict] = []
45+
46+
47+
def _get_schema() -> dict:
48+
if not _schema_cache:
49+
schema_path = (
50+
importlib.resources.files("opentelemetry.sdk._configuration")
51+
/ "schema.json"
52+
)
53+
_schema_cache.append(
54+
json.loads(schema_path.read_text(encoding="utf-8"))
55+
)
56+
return _schema_cache[0]
57+
58+
59+
_logger = logging.getLogger(__name__)
60+
61+
62+
class ConfigurationError(Exception):
63+
"""Raised when configuration file loading, parsing, or validation fails.
64+
65+
This includes errors from:
66+
- File not found or inaccessible
67+
- Invalid YAML/JSON syntax
68+
- Schema validation failures
69+
- Environment variable substitution errors
70+
"""
71+
72+
73+
def load_config_file(file_path: str) -> OpenTelemetryConfiguration:
74+
"""Load and parse an OpenTelemetry configuration file.
75+
76+
Supports YAML and JSON formats. Performs environment variable substitution
77+
before parsing.
78+
79+
Args:
80+
file_path: Path to the configuration file (.yaml, .yml, or .json).
81+
82+
Returns:
83+
Parsed OpenTelemetryConfiguration object.
84+
85+
Raises:
86+
ConfigurationError: If file cannot be read, parsed, or validated.
87+
EnvSubstitutionError: If required environment variable is missing.
88+
89+
Examples:
90+
>>> config = load_config_file("otel-config.yaml")
91+
>>> print(config.tracer_provider)
92+
"""
93+
path = Path(file_path)
94+
95+
if not path.exists():
96+
_logger.error("Configuration file not found: %s", file_path)
97+
raise ConfigurationError(f"Configuration file not found: {file_path}")
98+
99+
if not path.is_file():
100+
_logger.error("Configuration path is not a file: %s", file_path)
101+
raise ConfigurationError(
102+
f"Configuration path is not a file: {file_path}"
103+
)
104+
105+
try:
106+
with open(path, encoding="utf-8") as config_file:
107+
content = config_file.read()
108+
except (OSError, IOError) as exc:
109+
_logger.exception("Failed to read configuration file: %s", file_path)
110+
raise ConfigurationError(
111+
f"Failed to read configuration file: {file_path}"
112+
) from exc
113+
114+
# Perform environment variable substitution
115+
try:
116+
content = substitute_env_vars(content)
117+
except Exception as exc:
118+
raise ConfigurationError(
119+
f"Environment variable substitution failed: {exc}"
120+
) from exc
121+
122+
# Parse based on file extension
123+
suffix = path.suffix.lower()
124+
try:
125+
if suffix in (".yaml", ".yml"):
126+
data = yaml.safe_load(content)
127+
elif suffix == ".json":
128+
data = json.loads(content)
129+
else:
130+
_logger.error("Unsupported file format: %s", suffix)
131+
raise ConfigurationError(
132+
f"Unsupported file format: {suffix}. Use .yaml, .yml, or .json"
133+
)
134+
except yaml.YAMLError as exc:
135+
_logger.exception("Failed to parse YAML from %s", file_path)
136+
raise ConfigurationError(f"Failed to parse YAML: {exc}") from exc
137+
except json.JSONDecodeError as exc:
138+
_logger.exception("Failed to parse JSON from %s", file_path)
139+
raise ConfigurationError(f"Failed to parse JSON: {exc}") from exc
140+
141+
if data is None:
142+
_logger.error("Configuration file is empty: %s", file_path)
143+
raise ConfigurationError("Configuration file is empty")
144+
145+
if not isinstance(data, dict):
146+
_logger.error(
147+
"Configuration must be a mapping/object, got %s",
148+
type(data).__name__,
149+
)
150+
raise ConfigurationError(
151+
f"Configuration must be a mapping/object, got {type(data).__name__}"
152+
)
153+
154+
_validate_schema(data)
155+
156+
# Convert to OpenTelemetryConfiguration model
157+
try:
158+
config = _dict_to_model(data)
159+
except Exception as exc:
160+
_logger.exception(
161+
"Failed to validate configuration from %s", file_path
162+
)
163+
raise ConfigurationError(
164+
f"Failed to validate configuration: {exc}"
165+
) from exc
166+
167+
return config
168+
169+
170+
def _validate_schema(data: dict) -> None:
171+
"""Validate configuration dict against the OTel configuration JSON schema.
172+
173+
Raises:
174+
ConfigurationError: If the data does not conform to the schema.
175+
"""
176+
try:
177+
jsonschema.validate(
178+
instance=data,
179+
schema=_get_schema(),
180+
cls=jsonschema.Draft202012Validator,
181+
)
182+
except jsonschema.ValidationError as exc:
183+
raise ConfigurationError(
184+
f"Configuration does not match schema: {exc.message} "
185+
f"(at {' -> '.join(str(p) for p in exc.absolute_path)})"
186+
if exc.absolute_path
187+
else f"Configuration does not match schema: {exc.message}"
188+
) from exc
189+
except jsonschema.SchemaError as exc:
190+
raise ConfigurationError(
191+
f"Invalid configuration schema: {exc.message}"
192+
) from exc
193+
194+
195+
def _dict_to_model(data: dict[str, Any]) -> OpenTelemetryConfiguration:
196+
"""Convert dictionary to OpenTelemetryConfiguration model.
197+
198+
Uses the generated dataclass from models.py. This provides basic
199+
validation through dataclass field types.
200+
201+
Args:
202+
data: Parsed configuration dictionary.
203+
204+
Returns:
205+
OpenTelemetryConfiguration instance.
206+
207+
Raises:
208+
TypeError: If data doesn't match expected structure.
209+
ValueError: If values are invalid.
210+
"""
211+
# Construct the top-level model from the validated dict. Nested fields
212+
# are stored as dicts rather than their dataclass types; factory functions
213+
# in later PRs will handle the full recursive conversion when building
214+
# SDK objects.
215+
try:
216+
config = OpenTelemetryConfiguration(**data)
217+
return config
218+
except TypeError as exc:
219+
# Provide more helpful error message
220+
raise TypeError(
221+
f"Configuration structure is invalid. "
222+
f"Check that all required fields are present and correctly typed: {exc}"
223+
) from exc

0 commit comments

Comments
 (0)