Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions genesis-q-mem/a2a_handoff_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
from flask_cors import CORS
import threading

from celestial_ingestion import IngestionError, ingest_repository

app = Flask(__name__)
CORS(app)

Expand Down Expand Up @@ -244,6 +246,27 @@ def list_sessions():
"count": len(manager.active_sessions)
})


@app.route('/api/a2a/ingest', methods=['POST'])
def ingest_celestial_body():
"""Convert a raw repository path into a CelestialBody payload."""
data = request.json or {}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using request.json can raise a BadRequest exception for invalid JSON with an application/json content-type, which would bypass this handler and return a default error. Prefer request.get_json(silent=True) or {} so you can consistently return a structured 400 response for invalid bodies.

Copilot uses AI. Check for mistakes.
repo_path = data.get("repo_path")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The repo_path is taken directly from user input and used to access the filesystem without validation. This presents a significant security risk as it allows for Path Traversal attacks. An attacker could provide paths like /etc/passwd or other sensitive directories, which the server would then attempt to process and potentially expose through the ingestion metadata. You should validate that the provided path is within a designated, safe base directory.

Comment on lines +250 to +254
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint accepts an arbitrary repo_path and passes it to ingest_repository, which walks the filesystem and reads files (README + code files). Without an allowlist/sandbox, a client can point this at any readable directory on the server, causing unintended data exposure. Recommend restricting ingestion to a configured base directory (and verifying the resolved path stays within it), or requiring a pre-registered repo identifier instead of a raw filesystem path.

Copilot uses AI. Check for mistakes.
metadata = data.get("metadata", {})
Comment on lines +253 to +255
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata is assumed to be a dict but isn’t validated. If a client sends a non-object JSON value (e.g., string/list/null) for metadata, ingest_repository will later call metadata.get(...) and throw AttributeError, returning a 500 instead of a 400. Add input validation here (ensure metadata is a dict, otherwise return 400) and/or defensive validation inside ingest_repository.

Copilot uses AI. Check for mistakes.

if not repo_path:
return jsonify({"error": "repo_path is required"}), 400

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata is assumed to be a dict but isn’t validated. If a client sends a non-object JSON value (e.g., string/list/null) for metadata, ingest_repository will later call metadata.get(...) and throw AttributeError, returning a 500 instead of a 400. Add input validation here (ensure metadata is a dict, otherwise return 400) and/or defensive validation inside ingest_repository.

Suggested change
if not isinstance(metadata, dict):
return jsonify({"error": "metadata must be an object"}), 400

Copilot uses AI. Check for mistakes.
try:
body = ingest_repository(repo_path, metadata)
Comment on lines +260 to +261
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint accepts an arbitrary repo_path and passes it to ingest_repository, which walks the filesystem and reads files (README + code files). Without an allowlist/sandbox, a client can point this at any readable directory on the server, causing unintended data exposure. Recommend restricting ingestion to a configured base directory (and verifying the resolved path stays within it), or requiring a pre-registered repo identifier instead of a raw filesystem path.

Copilot uses AI. Check for mistakes.
except IngestionError as exc:
return jsonify({"error": str(exc)}), 400

return jsonify({
"status": "ingested",
"celestial_body": body
})

@app.route('/api/a2a/queue', methods=['GET'])
def get_queue():
"""Get operation queue"""
Expand Down
160 changes: 160 additions & 0 deletions genesis-q-mem/celestial_ingestion.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
#!/usr/bin/env python3
"""A2A ingestion utilities for converting repositories into CelestialBody objects."""

from __future__ import annotations

import hashlib
import json
from dataclasses import asdict, dataclass
from pathlib import Path
from typing import Any

CODE_EXTENSIONS = {
".py": "python",
".rs": "rust",
".js": "javascript",
".ts": "typescript",
".jsx": "react",
".tsx": "react",
".sol": "solidity",
".go": "go",
".java": "java",
".c": "c",
".cpp": "cpp",
}


@dataclass
class CelestialBody:
body_id: str
name: str
source_path: str
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ingestion payload includes source_path as an absolute resolved server filesystem path. Since this payload is returned by the API, it can unintentionally disclose server directory structure. Consider omitting source_path from the returned payload, or returning only a sanitized identifier (e.g., repo basename) or a path relative to a configured allowed base directory.

Copilot uses AI. Check for mistakes.
mass: float
atmosphere: dict[str, Any]
gravity: dict[str, Any]
orbital_state: dict[str, Any]
seismic_test: dict[str, Any]

def to_dict(self) -> dict[str, Any]:
return asdict(self)


class IngestionError(ValueError):
"""Raised when ingestion input is invalid."""


def _infer_language_counts(repo_path: Path) -> dict[str, int]:
counts: dict[str, int] = {}
for file in repo_path.rglob("*"):
if not file.is_file() or ".git" in file.parts:
continue
lang = CODE_EXTENSIONS.get(file.suffix.lower())
if lang:
counts[lang] = counts.get(lang, 0) + 1
return counts


def _line_count(repo_path: Path) -> int:
total = 0
for file in repo_path.rglob("*"):
if not file.is_file() or ".git" in file.parts:
continue
try:
if file.suffix.lower() in CODE_EXTENSIONS:
total += len(file.read_text(encoding="utf-8", errors="ignore").splitlines())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using file.read_text().splitlines() loads the entire content of every source file into memory at once. For repositories containing very large files, this can lead to excessive memory consumption or Out-Of-Memory (OOM) errors. It is more efficient to iterate over the file object to count lines.

Suggested change
total += len(file.read_text(encoding="utf-8", errors="ignore").splitlines())
with file.open("r", encoding="utf-8", errors="ignore") as f:
total += sum(1 for _ in f)

except OSError:
continue
return total


def _read_summary(repo_path: Path) -> str:
readme = repo_path / "README.md"
if readme.exists():
return readme.read_text(encoding="utf-8", errors="ignore")[:2000]
return ""


def _compute_atmosphere(summary: str, metadata: dict[str, Any]) -> dict[str, Any]:
text = f"{summary} {json.dumps(metadata, sort_keys=True)}".lower()
traits = {
"collaborative": any(k in text for k in ["agent", "handoff", "orchestr", "a2a"]),
"stability_focus": any(k in text for k in ["test", "verify", "integrity", "security"]),
"novelty_drive": any(k in text for k in ["experimental", "research", "prototype", "novel"]),
}
intent = "ecosystem" if traits["collaborative"] else "specialist"
return {
"intent": intent,
"traits": traits,
"summary_excerpt": summary[:240],
}


def _compute_mass(file_count: int, loc: int) -> float:
raw = (file_count * 0.4) + (loc * 0.001)
return round(min(max(raw, 0.1), 100.0), 3)


def _compute_gravity(mass: float, atmosphere: dict[str, Any], language_counts: dict[str, int]) -> dict[str, Any]:
language_diversity = len(language_counts)
influence = round(min(1.0, (mass / 100.0) + (language_diversity * 0.03)), 6)
pull = "high" if influence >= 0.75 else "medium" if influence >= 0.35 else "low"
return {
"influence_score": influence,
"pull_tier": pull,
"language_diversity": language_diversity,
"anchors": sorted(language_counts, key=language_counts.get, reverse=True)[:3],
"sanctuary_alignment": atmosphere["traits"]["stability_focus"],
}


def _run_seismic_test(payload: dict[str, Any]) -> dict[str, Any]:
canonical = json.dumps(payload, sort_keys=True, separators=(",", ":"))
digest_a = hashlib.sha256(canonical.encode()).hexdigest()
digest_b = hashlib.sha256(canonical.encode()).hexdigest()
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The seismic test hashes the same canonical string twice, so invariant will always be true unless there’s an internal hashing failure. This is redundant work and doesn’t add validation value. Consider computing a single digest and basing the test on that, or (if the intent is to detect non-determinism) re-canonicalize from independently re-materialized data rather than re-hashing the same bytes.

Suggested change
digest_b = hashlib.sha256(canonical.encode()).hexdigest()
rematerialized = json.loads(canonical)
canonical_rematerialized = json.dumps(rematerialized, sort_keys=True, separators=(",", ":"))
digest_b = hashlib.sha256(canonical_rematerialized.encode()).hexdigest()

Copilot uses AI. Check for mistakes.
invariant = digest_a == digest_b
Comment on lines +112 to +114
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calculating the same SHA-256 hash twice (digest_a and digest_b) from the same input and then comparing them is redundant. This does not provide a meaningful "invariance" check or stress test. If the intent is to provide a deterministic citation for the payload, a single calculation is sufficient.

Suggested change
digest_a = hashlib.sha256(canonical.encode()).hexdigest()
digest_b = hashlib.sha256(canonical.encode()).hexdigest()
invariant = digest_a == digest_b
digest_a = hashlib.sha256(canonical.encode()).hexdigest()
invariant = True

required = {"mass", "atmosphere", "gravity", "orbital_state"}
schema_ok = required.issubset(payload.keys())
return {
"schema_valid": schema_ok,
"stress_hash_consistent": invariant,
"invariance_score": 1.0 if (schema_ok and invariant) else 0.0,
"citation": f"sha256:{digest_a[:20]}",
}


def ingest_repository(repo_path: str, metadata: dict[str, Any] | None = None) -> dict[str, Any]:
metadata = metadata or {}
root = Path(repo_path).resolve()
if not root.exists() or not root.is_dir():
raise IngestionError(f"Repository path not found: {repo_path}")

language_counts = _infer_language_counts(root)
file_count = sum(language_counts.values())
loc = _line_count(root)
Comment on lines +131 to +133
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The repository is traversed twice: once in _infer_language_counts and again in _line_count. This doubles the I/O overhead, which can be significant for large repositories. Consider refactoring these into a single pass over the filesystem that collects both language statistics and line counts simultaneously.

summary = _read_summary(root)

atmosphere = _compute_atmosphere(summary, metadata)
mass = _compute_mass(file_count=file_count, loc=loc)
gravity = _compute_gravity(mass, atmosphere, language_counts)
orbital_state = {
"phase": metadata.get("phase", "stable"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate metadata type before field access

ingest_repository assumes metadata is a mapping and calls .get(...), but the new /api/a2a/ingest endpoint accepts arbitrary JSON values for metadata. If a client sends a non-object (for example a string or array), this raises AttributeError and escapes as a 500 because the route only catches IngestionError; invalid client input should be rejected as 400 instead of crashing the request.

Useful? React with 👍 / 👎.

"active_agents": metadata.get("active_agents", []),
"entropy": metadata.get("entropy", 0.5),
}

body_name = metadata.get("name", root.name)
body_id = f"body_{hashlib.md5(str(root).encode()).hexdigest()[:12]}"

body = CelestialBody(
body_id=body_id,
name=body_name,
source_path=str(root),
Comment on lines +148 to +151
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ingestion payload includes source_path as an absolute resolved server filesystem path. Since this payload is returned by the API, it can unintentionally disclose server directory structure. Consider omitting source_path from the returned payload, or returning only a sanitized identifier (e.g., repo basename) or a path relative to a configured allowed base directory.

Copilot uses AI. Check for mistakes.
mass=mass,
atmosphere=atmosphere,
gravity=gravity,
orbital_state=orbital_state,
seismic_test={},
)
payload = body.to_dict()
payload["seismic_test"] = _run_seismic_test(payload)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Compute seismic citation from the returned payload

The citation is generated before seismic_test is populated (payload["seismic_test"] is still {} at hash time), and then the payload is mutated with the final seismic block. As a result, consumers cannot validate the returned object by hashing the response body: the advertised deterministic citation will always disagree with the final payload content, breaking downstream verification workflows.

Useful? React with 👍 / 👎.

return payload
32 changes: 32 additions & 0 deletions genesis-q-mem/tests/test_celestial_ingestion.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import sys
from pathlib import Path

sys.path.insert(0, str(Path(__file__).resolve().parents[1]))

from celestial_ingestion import IngestionError, ingest_repository
Comment on lines +1 to +6
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying sys.path in the test file is a brittle workaround and can cause import-order issues as the test suite grows. Prefer configuring imports via packaging (e.g., making genesis-q-mem importable) or pytest configuration (e.g., pythonpath/PYTHONPATH in a pytest.ini/conftest.py) so tests don’t need to mutate global interpreter state.

Suggested change
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
from celestial_ingestion import IngestionError, ingest_repository
import importlib.util
from pathlib import Path
_MODULE_PATH = Path(__file__).resolve().parents[1] / "celestial_ingestion.py"
_SPEC = importlib.util.spec_from_file_location("celestial_ingestion", _MODULE_PATH)
_MODULE = importlib.util.module_from_spec(_SPEC)
assert _SPEC is not None and _SPEC.loader is not None
_SPEC.loader.exec_module(_MODULE)
IngestionError = _MODULE.IngestionError
ingest_repository = _MODULE.ingest_repository

Copilot uses AI. Check for mistakes.




def test_ingest_repository_generates_required_fields(tmp_path: Path):
repo = tmp_path / "orbit-agent"
repo.mkdir()
(repo / "README.md").write_text("Agent orchestration with security and integrity tests")
(repo / "agent.py").write_text("print('hi')\n")
(repo / "engine.rs").write_text("fn main() {}\n")

result = ingest_repository(str(repo), {"active_agents": ["Codex"]})

assert result["name"] == "orbit-agent"
assert result["mass"] > 0
assert result["gravity"]["language_diversity"] == 2
assert result["seismic_test"]["schema_valid"] is True
assert result["seismic_test"]["invariance_score"] == 1.0


def test_ingest_repository_rejects_missing_path():
try:
ingest_repository("/tmp/does-not-exist-123")
assert False, "Expected IngestionError"
except IngestionError:
assert True
Comment on lines +27 to +32
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is likely to be flaky because it hardcodes a path under /tmp that could exist on some systems. Also, the try/except + assert True/False pattern is harder to read and can hide unexpected exceptions. Prefer using a tmp-based guaranteed-missing path (e.g., from tmp_path) and pytest.raises(IngestionError).

Copilot uses AI. Check for mistakes.