feat: add CelestialBody ingestion layer to A2A handoff server by igor-holt · Pull Request #73 · Genesis-Conductor-Engine/Yennefer

Igor Holt (igor-holt) · 2026-04-10T00:35:45Z

Motivation

Provide an A2A ingestion pipeline to convert a raw repository into a standardized CelestialBody JSON object for downstream Yennefer/hand-off workflows.
Surface simple heuristics (mass, atmosphere, gravity, orbital_state) and a deterministic "seismic test" so external agents can validate repository invariants before handoff.
Expose ingestion over the existing A2A service so other agents can programmatically register or inspect repository artifacts.

Description

Added genesis-q-mem/celestial_ingestion.py containing a CelestialBody dataclass and ingestion pipeline that infers language counts, LOC, summary, mass, atmosphere, gravity, orbital_state, and a seismic_test (schema check + deterministic hash citation).
Added POST /api/a2a/ingest to genesis-q-mem/a2a_handoff_server.py which validates input (repo_path), calls ingest_repository, and returns the celestial_body payload or a 400 error on invalid input.
Integrated ingest_repository and IngestionError into the A2A server and kept all existing endpoints unchanged.
Added unit tests at genesis-q-mem/tests/test_celestial_ingestion.py to verify successful ingestion and missing-path error behavior.

Testing

Ran the unit tests with cd genesis-q-mem && pytest -q tests/test_celestial_ingestion.py and the final run reported 2 passed (tests cover successful ingestion and missing-path rejection).
An initial test collection error due to import path was observed and resolved in the test file (adjusted sys.path), after which automated tests passed.
No other automated test suites were modified; the A2A service continues to listen on port 8200 and only the single new endpoint POST /api/a2a/ingest was added.

Codex Task

cloudflare-workers-and-pages · 2026-04-10T00:35:56Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
❌ Deployment failed View logs	yennefer	`c22db0a`	Apr 10 2026, 12:36 AM

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c22db0ab2a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-04-10T00:38:56Z

+        seismic_test={},
+    )
+    payload = body.to_dict()
+    payload["seismic_test"] = _run_seismic_test(payload)


Compute seismic citation from the returned payload

The citation is generated before seismic_test is populated (payload["seismic_test"] is still {} at hash time), and then the payload is mutated with the final seismic block. As a result, consumers cannot validate the returned object by hashing the response body: the advertised deterministic citation will always disagree with the final payload content, breaking downstream verification workflows.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-10T00:38:56Z

+    mass = _compute_mass(file_count=file_count, loc=loc)
+    gravity = _compute_gravity(mass, atmosphere, language_counts)
+    orbital_state = {
+        "phase": metadata.get("phase", "stable"),


Validate metadata type before field access

ingest_repository assumes metadata is a mapping and calls .get(...), but the new /api/a2a/ingest endpoint accepts arbitrary JSON values for metadata. If a client sends a non-object (for example a string or array), this raises AttributeError and escapes as a 500 because the route only catches IngestionError; invalid client input should be rejected as 400 instead of crashing the request.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request introduces a new API endpoint for repository ingestion and adds utility functions to process repository metadata into CelestialBody objects. My review highlights a critical path traversal vulnerability in the API endpoint, as user-provided paths are not validated. Additionally, I have suggested optimizing the line counting logic to avoid loading entire files into memory, simplifying the redundant hash calculation in the seismic test, and refactoring the filesystem traversal to reduce I/O overhead.

gemini-code-assist · 2026-04-10T00:38:59Z

+def ingest_celestial_body():
+    """Convert a raw repository path into a CelestialBody payload."""
+    data = request.json or {}
+    repo_path = data.get("repo_path")


The repo_path is taken directly from user input and used to access the filesystem without validation. This presents a significant security risk as it allows for Path Traversal attacks. An attacker could provide paths like /etc/passwd or other sensitive directories, which the server would then attempt to process and potentially expose through the ingestion metadata. You should validate that the provided path is within a designated, safe base directory.

gemini-code-assist · 2026-04-10T00:38:59Z

+            continue
+        try:
+            if file.suffix.lower() in CODE_EXTENSIONS:
+                total += len(file.read_text(encoding="utf-8", errors="ignore").splitlines())


Using file.read_text().splitlines() loads the entire content of every source file into memory at once. For repositories containing very large files, this can lead to excessive memory consumption or Out-Of-Memory (OOM) errors. It is more efficient to iterate over the file object to count lines.

Suggested change

total += len(file.read_text(encoding="utf-8", errors="ignore").splitlines())

with file.open("r", encoding="utf-8", errors="ignore") as f:

total += sum(1 for _ in f)

gemini-code-assist · 2026-04-10T00:38:59Z

+    digest_a = hashlib.sha256(canonical.encode()).hexdigest()
+    digest_b = hashlib.sha256(canonical.encode()).hexdigest()
+    invariant = digest_a == digest_b


Calculating the same SHA-256 hash twice (digest_a and digest_b) from the same input and then comparing them is redundant. This does not provide a meaningful "invariance" check or stress test. If the intent is to provide a deterministic citation for the payload, a single calculation is sufficient.

Suggested change

digest_a = hashlib.sha256(canonical.encode()).hexdigest()

digest_b = hashlib.sha256(canonical.encode()).hexdigest()

invariant = digest_a == digest_b

digest_a = hashlib.sha256(canonical.encode()).hexdigest()

invariant = True

gemini-code-assist · 2026-04-10T00:38:59Z

+    language_counts = _infer_language_counts(root)
+    file_count = sum(language_counts.values())
+    loc = _line_count(root)


The repository is traversed twice: once in _infer_language_counts and again in _line_count. This doubles the I/O overhead, which can be significant for large repositories. Consider refactoring these into a single pass over the filesystem that collects both language statistics and line counts simultaneously.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a repository “ingestion” pipeline that infers heuristic metadata and exposes it via a new A2A API endpoint for downstream handoff workflows.

Changes:

Introduces CelestialBody + ingest_repository() to derive language/LOC signals, heuristics, and a deterministic “seismic test” hash.
Adds POST /api/a2a/ingest to call ingestion and return a celestial_body payload (or 400 on invalid input).
Adds pytest coverage for successful ingestion and missing-path behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File	Description
genesis-q-mem/celestial_ingestion.py	Implements CelestialBody model + repository ingestion heuristics and seismic test output.
genesis-q-mem/a2a_handoff_server.py	Adds a new API endpoint that validates input and returns ingestion results.
genesis-q-mem/tests/test_celestial_ingestion.py	Adds unit tests for ingestion success and error handling.

Copilot · 2026-04-10T03:55:13Z

+class CelestialBody:
+    body_id: str
+    name: str
+    source_path: str


The ingestion payload includes source_path as an absolute resolved server filesystem path. Since this payload is returned by the API, it can unintentionally disclose server directory structure. Consider omitting source_path from the returned payload, or returning only a sanitized identifier (e.g., repo basename) or a path relative to a configured allowed base directory.

Copilot · 2026-04-10T03:55:14Z

+    body = CelestialBody(
+        body_id=body_id,
+        name=body_name,
+        source_path=str(root),


The ingestion payload includes source_path as an absolute resolved server filesystem path. Since this payload is returned by the API, it can unintentionally disclose server directory structure. Consider omitting source_path from the returned payload, or returning only a sanitized identifier (e.g., repo basename) or a path relative to a configured allowed base directory.

Copilot · 2026-04-10T03:55:14Z

+@app.route('/api/a2a/ingest', methods=['POST'])
+def ingest_celestial_body():
+    """Convert a raw repository path into a CelestialBody payload."""
+    data = request.json or {}
+    repo_path = data.get("repo_path")


This endpoint accepts an arbitrary repo_path and passes it to ingest_repository, which walks the filesystem and reads files (README + code files). Without an allowlist/sandbox, a client can point this at any readable directory on the server, causing unintended data exposure. Recommend restricting ingestion to a configured base directory (and verifying the resolved path stays within it), or requiring a pre-registered repo identifier instead of a raw filesystem path.

Copilot · 2026-04-10T03:55:14Z

+    try:
+        body = ingest_repository(repo_path, metadata)


This endpoint accepts an arbitrary repo_path and passes it to ingest_repository, which walks the filesystem and reads files (README + code files). Without an allowlist/sandbox, a client can point this at any readable directory on the server, causing unintended data exposure. Recommend restricting ingestion to a configured base directory (and verifying the resolved path stays within it), or requiring a pre-registered repo identifier instead of a raw filesystem path.

Copilot · 2026-04-10T03:55:14Z

+    data = request.json or {}
+    repo_path = data.get("repo_path")
+    metadata = data.get("metadata", {})


metadata is assumed to be a dict but isn’t validated. If a client sends a non-object JSON value (e.g., string/list/null) for metadata, ingest_repository will later call metadata.get(...) and throw AttributeError, returning a 500 instead of a 400. Add input validation here (ensure metadata is a dict, otherwise return 400) and/or defensive validation inside ingest_repository.

Copilot · 2026-04-10T03:55:14Z

+
+    if not repo_path:
+        return jsonify({"error": "repo_path is required"}), 400
+


metadata is assumed to be a dict but isn’t validated. If a client sends a non-object JSON value (e.g., string/list/null) for metadata, ingest_repository will later call metadata.get(...) and throw AttributeError, returning a 500 instead of a 400. Add input validation here (ensure metadata is a dict, otherwise return 400) and/or defensive validation inside ingest_repository.

Suggested change

if not isinstance(metadata, dict):

return jsonify({"error": "metadata must be an object"}), 400

Copilot · 2026-04-10T03:55:15Z

+@app.route('/api/a2a/ingest', methods=['POST'])
+def ingest_celestial_body():
+    """Convert a raw repository path into a CelestialBody payload."""
+    data = request.json or {}


Using request.json can raise a BadRequest exception for invalid JSON with an application/json content-type, which would bypass this handler and return a default error. Prefer request.get_json(silent=True) or {} so you can consistently return a structured 400 response for invalid bodies.

Copilot · 2026-04-10T03:55:15Z

+def _run_seismic_test(payload: dict[str, Any]) -> dict[str, Any]:
+    canonical = json.dumps(payload, sort_keys=True, separators=(",", ":"))
+    digest_a = hashlib.sha256(canonical.encode()).hexdigest()
+    digest_b = hashlib.sha256(canonical.encode()).hexdigest()


The seismic test hashes the same canonical string twice, so invariant will always be true unless there’s an internal hashing failure. This is redundant work and doesn’t add validation value. Consider computing a single digest and basing the test on that, or (if the intent is to detect non-determinism) re-canonicalize from independently re-materialized data rather than re-hashing the same bytes.

Suggested change

digest_b = hashlib.sha256(canonical.encode()).hexdigest()

rematerialized = json.loads(canonical)

canonical_rematerialized = json.dumps(rematerialized, sort_keys=True, separators=(",", ":"))

digest_b = hashlib.sha256(canonical_rematerialized.encode()).hexdigest()

Copilot · 2026-04-10T03:55:15Z

+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+
+from celestial_ingestion import IngestionError, ingest_repository


Modifying sys.path in the test file is a brittle workaround and can cause import-order issues as the test suite grows. Prefer configuring imports via packaging (e.g., making genesis-q-mem importable) or pytest configuration (e.g., pythonpath/PYTHONPATH in a pytest.ini/conftest.py) so tests don’t need to mutate global interpreter state.

Suggested change

import sys

from pathlib import Path

sys.path.insert(0, str(Path(__file__).resolve().parents[1]))

from celestial_ingestion import IngestionError, ingest_repository

import importlib.util

from pathlib import Path

_MODULE_PATH = Path(__file__).resolve().parents[1] / "celestial_ingestion.py"

_SPEC = importlib.util.spec_from_file_location("celestial_ingestion", _MODULE_PATH)

_MODULE = importlib.util.module_from_spec(_SPEC)

assert _SPEC is not None and _SPEC.loader is not None

_SPEC.loader.exec_module(_MODULE)

IngestionError = _MODULE.IngestionError

ingest_repository = _MODULE.ingest_repository

Copilot · 2026-04-10T03:55:15Z

+def test_ingest_repository_rejects_missing_path():
+    try:
+        ingest_repository("/tmp/does-not-exist-123")
+        assert False, "Expected IngestionError"
+    except IngestionError:
+        assert True


This test is likely to be flaky because it hardcodes a path under /tmp that could exist on some systems. Also, the try/except + assert True/False pattern is harder to read and can hide unexpected exceptions. Prefer using a tmp-based guaranteed-missing path (e.g., from tmp_path) and pytest.raises(IngestionError).

feat: add celestial body ingestion endpoint for A2A

c22db0a

Igor Holt (igor-holt) added the codex label Apr 10, 2026 — with ChatGPT Codex Connector

chatgpt-codex-connector Bot reviewed Apr 10, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 10, 2026

View reviewed changes

Igor Holt (igor-holt) requested a review from Copilot April 10, 2026 03:48

Copilot AI reviewed Apr 10, 2026

View reviewed changes

	total += len(file.read_text(encoding="utf-8", errors="ignore").splitlines())
	with file.open("r", encoding="utf-8", errors="ignore") as f:
	total += sum(1 for _ in f)


		if not repo_path:
		return jsonify({"error": "repo_path is required"}), 400



	if not isinstance(metadata, dict):
	return jsonify({"error": "metadata must be an object"}), 400

-    digest_b = hashlib.sha256(canonical.encode()).hexdigest()
+    rematerialized = json.loads(canonical)
+    canonical_rematerialized = json.dumps(rematerialized, sort_keys=True, separators=(",", ":"))
+    digest_b = hashlib.sha256(canonical_rematerialized.encode()).hexdigest()

-import sys
-from pathlib import Path
-sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
-from celestial_ingestion import IngestionError, ingest_repository
+import importlib.util
+from pathlib import Path
+_MODULE_PATH = Path(__file__).resolve().parents[1] / "celestial_ingestion.py"
+_SPEC = importlib.util.spec_from_file_location("celestial_ingestion", _MODULE_PATH)
+_MODULE = importlib.util.module_from_spec(_SPEC)
+assert _SPEC is not None and _SPEC.loader is not None
+_SPEC.loader.exec_module(_MODULE)
+IngestionError = _MODULE.IngestionError
+ingest_repository = _MODULE.ingest_repository

Conversation

Igor Holt (igor-holt) commented Apr 10, 2026

Motivation

Description

Testing

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloudflare-workers-and-pages Bot commented Apr 10, 2026 •

edited

Loading