Refine Presto worker config validation#6
Conversation
WalkthroughAdds automatic worker memory detection and env var configuration: init.py now uses psutil to compute and export query/system memory values, validates required coordinator keys, and updates discovery URI; requirements and templates were updated to include new memory variables. Changes
Sequence Diagram(s)sequenceDiagram
participant Init as init.py
participant Ps as psutil
participant Env as Env / Filesystem
Init->>Init: start worker env setup
Init->>Init: _get_required_env_var(PRESTO_COORDINATOR_SERVICENAME)
alt key missing
Init-->>Env: log error, return False (abort setup)
else key present
Init->>Init: _get_required_env_var(PRESTO_COORDINATOR_HTTPPORT)
Init->>Ps: get total system memory
Ps-->>Init: total memory bytes
rect rgb(220,240,200)
Note over Init: compute query/system memory GB\nand system mem limit (math)
end
Init->>Env: set PRESTO_WORKER_CONFIGPROPERTIES_* env vars
Init->>Init: build PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI
Init-->>Env: write/update worker.env and templates
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
tools/deployment/presto-clp/worker.env (1)
3-5: Avoid hardcoding generated memory values in template env.Since init.py always computes these, keep them unset or commented to prevent drift/confusion across hosts.
Proposed diff:
+## Memory values are auto-generated by scripts/init.py based on the host/container limit. +## Leave unset unless you intend to override. -PRESTO_WORKER_CONFIGPROPERTIES_SYSTEM_MEMORY_GB=8 -PRESTO_WORKER_CONFIGPROPERTIES_QUERY_MEMORY_GB=5 +# PRESTO_WORKER_CONFIGPROPERTIES_SYSTEM_MEMORY_GB= +# PRESTO_WORKER_CONFIGPROPERTIES_QUERY_MEMORY_GB= ... -PRESTO_WORKER_CONFIGPROPERTIES_SYSTEM_MEM_LIMIT_GB=8 +# PRESTO_WORKER_CONFIGPROPERTIES_SYSTEM_MEM_LIMIT_GB=Please confirm the runtime actually sources the init.py–generated env file (not worker.env) so these can remain commented.
Also applies to: 8-8
tools/deployment/presto-clp/scripts/init.py (3)
3-4: Guard psutil import to fail fast with a clear message.Top-level import will crash before logging if psutil isn’t installed. Make it explicit and check at use-site.
Proposed diff:
-import math -import os +import math +import os @@ -import psutil +try: + import psutil # type: ignore[import-not-found] +except ImportError: + psutil = None # type: ignore[assignment]And at the start of
_configure_worker_memory_env_vars:def _configure_worker_memory_env_vars(env_vars: Dict[str, str]) -> bool: @@ - try: + if psutil is None: + logger.error( + "psutil is required to detect system memory. Install with: pip install -r tools/deployment/presto-clp/scripts/requirements.txt" + ) + return False + try: total_memory_bytes = int(psutil.virtual_memory().total)Also applies to: 9-9
236-243: Validate coordinator port value before composing discovery URI.Ensure the port is numeric and within [1, 65535] to catch misconfig early.
Proposed diff:
- env_vars["PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI"] = ( - f"http://{coordinator_service_name}:{coordinator_http_port}" - ) + try: + port_int = int(coordinator_http_port) + if not (1 <= port_int <= 65535): + raise ValueError + except ValueError: + logger.error( + "Invalid PRESTO_COORDINATOR_HTTPPORT '%s' in '%s'", + coordinator_http_port, + coordinator_common_env_file_path, + ) + return False + env_vars["PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI"] = ( + f"http://{coordinator_service_name}:{port_int}" + )
305-314: Treat whitespace-only values as empty for required env keys.dotenv entries like KEY=" " should fail validation, not pass.
Proposed diff:
- value = config.get(key) - if value is None: + raw_value = config.get(key) + if raw_value is None: logger.error("Missing required key %s in '%s'", key, env_file_path) raise KeyError(key) - - if value == "": + value = raw_value.strip() + if value == "": logger.error("Empty value for required key %s in '%s'", key, env_file_path) raise KeyError(key)Consider applying the same empty-string check to
_get_required_config_valuefor parity with file-based configs.tools/deployment/presto-clp/scripts/requirements.txt (1)
1-1: Pin versions for reproducible, secure installs with correct major version bounds.Unpinned packages risk breakage and supply-chain drift. The proposed diff needs a correction: psutil's upper bound should allow major version 7, which is now stable.
Proposed diff:
-psutil -python-dotenv -PyYAML +psutil>=5.9,<8 +python-dotenv>=1.0,<2 +PyYAML>=6.0,<7psutil 7.1.2 (released Oct 25, 2025) is the latest stable version, so the original constraint
<7was too restrictive.Optionally manage dependencies via pip-tools (constraints/lock).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tools/deployment/presto-clp/scripts/init.py(2 hunks)tools/deployment/presto-clp/scripts/requirements.txt(1 hunks)tools/deployment/presto-clp/worker.env(1 hunks)tools/deployment/presto-clp/worker/config-template/config.properties(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.1)
tools/deployment/presto-clp/scripts/init.py
257-257: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
tools/deployment/presto-clp/worker/config-template/config.properties (1)
6-8: LGTM: property placeholders match new env vars.The three properties line up with init.py outputs. No issues spotted.
| try: | ||
| total_memory_bytes = int(psutil.virtual_memory().total) | ||
| except (AttributeError, ValueError) as exc: | ||
| logger.error("Unable to determine total system memory: %s", exc) | ||
| return False | ||
|
|
||
| total_memory_gb = total_memory_bytes / (1024**3) | ||
| total_memory_gb_floor = max(1, math.floor(total_memory_gb)) | ||
|
|
||
| system_memory_gb = min( | ||
| total_memory_gb_floor, | ||
| max(1, math.floor(total_memory_gb * 0.90)), | ||
| ) | ||
| query_memory_gb = min( | ||
| system_memory_gb, | ||
| max(1, math.floor(system_memory_gb * (2 / 3))), | ||
| ) | ||
| system_mem_limit_gb = min( | ||
| total_memory_gb_floor, | ||
| max(system_memory_gb, max(1, math.floor(total_memory_gb * 0.94))), | ||
| ) | ||
|
|
There was a problem hiding this comment.
Make memory detection cgroup-aware; switch to exception logging.
psutil reports host RAM, not container limits. In containers, this can over-allocate Presto and cause OOM. Also, Ruff flags TRY400 here.
Apply these changes:
- Detect Linux cgroup limits (v2: /sys/fs/cgroup/memory.max; v1: /sys/fs/cgroup/memory/memory.limit_in_bytes) and cap to the lower of psutil total and cgroup limit.
- Use logger.exception when memory detection fails.
Patch inside function:
- try:
- total_memory_bytes = int(psutil.virtual_memory().total)
- except (AttributeError, ValueError) as exc:
- logger.error("Unable to determine total system memory: %s", exc)
- return False
+ try:
+ total_memory_bytes = int(psutil.virtual_memory().total)
+ except Exception:
+ logger.exception("Unable to determine total system memory")
+ return False
+
+ # Respect container memory limits when present (cgroup v2/v1)
+ cgroup_limit_bytes = _read_cgroup_memory_limit_bytes()
+ if cgroup_limit_bytes and cgroup_limit_bytes > 0:
+ total_memory_bytes = min(total_memory_bytes, cgroup_limit_bytes)Add this helper (outside the function) to keep things tidy:
from pathlib import Path
def _read_cgroup_memory_limit_bytes() -> Optional[int]:
# cgroup v2
v2 = Path("/sys/fs/cgroup/memory.max")
# cgroup v1
v1 = Path("/sys/fs/cgroup/memory/memory.limit_in_bytes")
for p in (v2, v1):
try:
with open(p, "r", encoding="utf-8") as f:
raw = f.read().strip()
if raw and raw.lower() != "max":
val = int(raw)
if val > 0:
return val
except FileNotFoundError:
continue
except Exception as exc:
logger.debug("Ignoring cgroup memory read error from %s: %s", p, exc)
return NoneThis prevents overshooting container RAM and aligns with typical Presto deployment patterns.
🧰 Tools
🪛 Ruff (0.14.1)
257-257: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/scripts/init.py around lines 254 to 275,
psutil.virtual_memory() is being used alone which returns host RAM and can
over-allocate inside containers; change the logic to detect cgroup limits and
cap total_memory_bytes to the lower of psutil total and any cgroup memory limit
(check cgroup v2 /sys/fs/cgroup/memory.max and cgroup v1
/sys/fs/cgroup/memory/memory.limit_in_bytes, ignore values of "max" or
non-positive, and treat missing files as no limit). Add a helper function
(placed outside the current function) that reads those two paths robustly,
returns an Optional[int] cgroup limit, and logs debug on read errors; then use
that helper to set total_memory_bytes = min(psutil_total, cgroup_limit) when
cgroup_limit is present. Also switch the existing except handler to
logger.exception(...) so exceptions during memory detection are logged with
traceback.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tools/deployment/presto-clp/scripts/init.py (1)
251-286: Container memory limits are not respected; missing error handling.This is the same concern raised in a previous review: psutil reports host RAM, which can cause over-allocation in containers leading to OOM kills. Additionally, there's no error handling for the psutil call.
As suggested in the previous review:
- Detect cgroup limits (v2:
/sys/fs/cgroup/memory.max; v1:/sys/fs/cgroup/memory/memory.limit_in_bytes) and cap to the lower of psutil total and cgroup limit- Add try-except with
logger.exceptionfor the memory detection callBased on learnings
🧹 Nitpick comments (1)
tools/deployment/presto-clp/scripts/init.py (1)
258-261: Simplify redundant min() operation.The outer
min(total_memory_gb_floor, ...)is redundant sincemath.floor(total_memory_gb * 0.90)will always be ≤total_memory_gb_floor.Apply this diff:
- system_memory_gb = min( - total_memory_gb_floor, - max(1, math.floor(total_memory_gb * 0.90)), - ) + system_memory_gb = max(1, math.floor(total_memory_gb * 0.90))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tools/deployment/presto-clp/scripts/init.py(2 hunks)tools/deployment/presto-clp/worker.env(1 hunks)
🔇 Additional comments (4)
tools/deployment/presto-clp/worker.env (1)
3-5: Clarify the purpose of these hardcoded memory values.The init.py script now dynamically computes these memory environment variables using psutil. Are these hardcoded values intended as:
- Fallback defaults if init.py fails?
- Example values for documentation?
- Temporary values that get overwritten by the init script output?
If they're meant to be computed dynamically, consider adding a comment explaining their purpose to avoid confusion.
tools/deployment/presto-clp/scripts/init.py (3)
3-4: LGTM!The new imports are correctly added to support memory calculation (math) and system memory detection (psutil).
Also applies to: 9-9
232-246: LGTM!The refactored validation logic correctly uses the new
_get_required_env_varhelper to enforce explicit handling of missing/empty values, and properly constructs the discovery URI.
289-309: LGTM!The helper correctly enforces explicit validation for both missing and empty environment variable values, with appropriate error logging and exception handling.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tools/deployment/presto-clp/scripts/init.py (1)
251-287: Add error handling and cgroup-awareness to memory detection.The function has two critical issues:
No error handling: If
psutil.virtual_memory()fails, the function will crash. Wrap the call in a try-except block.Container limits not respected:
psutil.virtual_memory().totalreports host RAM, not container cgroup limits. In containerized Presto deployments, this can allocate more memory than the container has, causing OOM kills.Apply the solution from the previous review:
Add a helper function to read cgroup limits:
from pathlib import Path def _read_cgroup_memory_limit_bytes() -> Optional[int]: """Reads container memory limit from cgroup v2 or v1.""" # cgroup v2 v2 = Path("/sys/fs/cgroup/memory.max") # cgroup v1 v1 = Path("/sys/fs/cgroup/memory/memory.limit_in_bytes") for p in (v2, v1): try: with open(p, "r", encoding="utf-8") as f: raw = f.read().strip() if raw and raw.lower() != "max": val = int(raw) if val > 0: return val except FileNotFoundError: continue except Exception as exc: logger.debug("Ignoring cgroup memory read error from %s: %s", p, exc) return NoneThen update
_configure_worker_memory_env_vars:def _configure_worker_memory_env_vars(env_vars: Dict[str, str]) -> bool: """Adds memory-related environment variables for the worker.""" - total_memory_bytes = psutil.virtual_memory().total + try: + total_memory_bytes = int(psutil.virtual_memory().total) + except Exception: + logger.exception("Unable to determine total system memory") + return False + + # Respect container memory limits when present (cgroup v2/v1) + cgroup_limit_bytes = _read_cgroup_memory_limit_bytes() + if cgroup_limit_bytes and cgroup_limit_bytes > 0: + total_memory_bytes = min(total_memory_bytes, cgroup_limit_bytes) + total_memory_gb = total_memory_bytes / (1024**3)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tools/deployment/presto-clp/scripts/init.py(2 hunks)tools/deployment/presto-clp/worker.env(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tools/deployment/presto-clp/worker.env
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: package-image
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (2)
tools/deployment/presto-clp/scripts/init.py (2)
3-3: LGTM!The new imports are properly utilized in the memory configuration logic.
Also applies to: 9-9
232-246: LGTM!The refactoring properly enforces explicit handling of missing coordinator config entries. The try-except pattern cleanly propagates validation failures by returning
False.
| def _get_required_env_var(config: Dict[str, Optional[str]], key: str, env_file_path: Path) -> str: | ||
| """ | ||
| Fetches a required entry from an environment variables dictionary. | ||
|
|
||
| :param config: | ||
| :param key: The key that must exist in `config` with a non-empty value. | ||
| :param env_file_path: | ||
| :return: The string value associated with `key`. | ||
| :raises KeyError: If `key` is missing. | ||
| """ | ||
|
|
||
| value = config.get(key) | ||
| if value is None: | ||
| logger.error("Missing required key %s in '%s'", key, env_file_path) | ||
| raise KeyError(key) | ||
|
|
||
| return value | ||
|
|
There was a problem hiding this comment.
Consider validating empty strings to match the docstring.
The docstring states the function requires a "non-empty value," but the implementation only checks for None. Empty strings would pass validation but might be invalid for coordinator settings.
If empty strings should be rejected, apply this diff:
def _get_required_env_var(config: Dict[str, Optional[str]], key: str, env_file_path: Path) -> str:
"""
Fetches a required entry from an environment variables dictionary.
:param config:
:param key: The key that must exist in `config` with a non-empty value.
:param env_file_path:
:return: The string value associated with `key`.
:raises KeyError: If `key` is missing.
"""
value = config.get(key)
- if value is None:
+ if not value:
logger.error("Missing required key %s in '%s'", key, env_file_path)
raise KeyError(key)
return valueAlternatively, update the docstring to clarify that only None is checked:
- :param key: The key that must exist in `config` with a non-empty value.
+ :param key: The key that must exist in `config` with a non-None value.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _get_required_env_var(config: Dict[str, Optional[str]], key: str, env_file_path: Path) -> str: | |
| """ | |
| Fetches a required entry from an environment variables dictionary. | |
| :param config: | |
| :param key: The key that must exist in `config` with a non-empty value. | |
| :param env_file_path: | |
| :return: The string value associated with `key`. | |
| :raises KeyError: If `key` is missing. | |
| """ | |
| value = config.get(key) | |
| if value is None: | |
| logger.error("Missing required key %s in '%s'", key, env_file_path) | |
| raise KeyError(key) | |
| return value | |
| def _get_required_env_var(config: Dict[str, Optional[str]], key: str, env_file_path: Path) -> str: | |
| """ | |
| Fetches a required entry from an environment variables dictionary. | |
| :param config: | |
| :param key: The key that must exist in `config` with a non-empty value. | |
| :param env_file_path: | |
| :return: The string value associated with `key`. | |
| :raises KeyError: If `key` is missing. | |
| """ | |
| value = config.get(key) | |
| if not value: | |
| logger.error("Missing required key %s in '%s'", key, env_file_path) | |
| raise KeyError(key) | |
| return value |
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/scripts/init.py around lines 289 to 306, the
function claims to require a "non-empty value" but only checks for None; change
the guard to reject empty or whitespace-only strings (e.g., treat value is None
or value.strip() == "" as missing), keep the same logger.error and raise
KeyError(key) behavior, and return the value otherwise; if you instead prefer
only to disallow None, update the docstring to remove "non-empty" wording.
Summary
Testing
https://chatgpt.com/codex/tasks/task_e_68fcd8c77ea48327afeeddb4baf320f9
Summary by CodeRabbit
New Features
Configuration
Dependencies