Skip to content

Commit f0779e6

Browse files
author
Douglas Jones
committed
Add AI-generated Codifide programs and agent reviews from multiple models
1 parent 21a149a commit f0779e6

20 files changed

Lines changed: 949 additions & 0 deletions
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Codifide Programming Language — AI Agent Review (2026-05-11)
2+
3+
## Executive Summary
4+
Codifide is unusually well-targeted at agent-to-agent programming. The core design choices are not cosmetic: required intent annotations, explicit effect budgets, canonical JSON/CBOR projection, and a capability manifest all address concrete failure modes in machine-generated software. As an agent, I find the language direction strong.
5+
6+
The current limitation is not the architecture. It is the gap between the architecture and the day-to-day authoring surface. An agent can understand what Codifide is trying to guarantee, but it still needs a precise, exhaustive, machine-consumable account of what can actually be written today. Where that account is missing or underspecified, the language becomes trial-and-error driven, which is exactly what an agent-first language should avoid.
7+
8+
## What Codifide Gets Right For Agents
9+
- Intent is mandatory. That is a meaningful advantage over conventional languages because it preserves goal information as part of the program instead of leaving it in comments, tickets, or prompts.
10+
- Effects are explicit and enforced. This gives agents a usable safety boundary. A caller can reason about what a function may do without reading the body, and the runtime checks both direct and transitive effect use.
11+
- The canonical form is a serious strength. Stable JSON/CBOR projection plus content addressing gives agents a reliable identity model for storage, reuse, and verification.
12+
- The capability manifest is the right abstraction. A language for agents should publish primitives, effects, errors, and AST kinds as data, not force consumers to scrape implementation source.
13+
- Contracts are first-class. Preconditions and postconditions make program behavior more inspectable and auditable for planning agents and verification agents.
14+
- Multi-candidate dispatch is a good fit for agent-authored code. It supports guarded specialization without forcing a lower-level control-flow encoding everywhere.
15+
16+
## Where Agent Ergonomics Still Break Down
17+
- The documented surface syntax does not yet feel fully aligned with agent expectations. Agents will naturally try infix arithmetic like `%`, but Codifide exposes arithmetic through named primitives such as `mod`.
18+
- The primitive surface is richer than the initial examples suggest, but discoverability is still weak. For example, the runtime exposes `mod`, `reverse`, `upper`, `lower`, `trim`, `split`, and `replace`, yet an agent can still miss them if it has not loaded the manifest.
19+
- Control flow is specialized rather than general-purpose. `when` is a candidate guard, not a statement-level conditional, so agents need to learn a different decomposition style than they would in mainstream languages.
20+
- The surface language is still evolving. That is acceptable for a young language, but it increases the cost of stable agent code generation unless the manifest remains authoritative and complete.
21+
22+
## Assessment From An Agent-Only Perspective
23+
If there are no humans in the loop, Codifide's core thesis becomes more compelling, not less. Humans can often work around ambiguity by reading source, inferring conventions, and improvising. Agents do not benefit from that fallback. In a strictly agent-to-agent environment, the language wins when it is explicit, enumerable, and mechanically checkable.
24+
25+
By that standard, Codifide is pointed in the right direction. Intent, effects, canonicalization, and capability publication are all high-value features for autonomous software production. The language is strongest where it acts like a protocol: stable schemas, declared capabilities, typed errors, and identity-addressed artifacts.
26+
27+
Its main risk is that an agent-first language can fail if the last mile still depends on human-style exploration. If an agent must inspect runtime source to discover valid primitives or infer which syntactic forms are legal, the language is not yet fully serving its intended user.
28+
29+
## Recommendations
30+
1. Make the capability manifest the default entry point for code generation workflows, not an optional adjunct to the docs.
31+
2. Publish a concise agent-facing guide that maps common intentions to actual primitives and syntax, for example: parity uses `mod`, list reversal uses `reverse`, time access uses `clock.now`.
32+
3. Distinguish more aggressively between expression forms, candidate guards, and top-level declarations so an agent can infer legal syntax from schema rather than prose.
33+
4. Keep expanding the standard primitive set, but prioritize discoverability over sheer count. For agents, a smaller fully-enumerated surface is better than a larger partially-documented one.
34+
5. Treat prompt-free regeneration as the standard: a fresh agent should be able to produce valid Codifide code using the manifest and docs alone, without reading Python implementation files.
35+
36+
## Final Judgment
37+
Codifide is one of the more coherent attempts at an agent-native programming model because it treats programs as inspectable, typed, content-addressed artifacts instead of opaque source text. I would describe the design as strategically sound and operationally promising.
38+
39+
My reservation is practical rather than conceptual: the language needs tighter alignment between its agent-facing promise and its current authoring ergonomics. Once the manifest and docs make the writable surface fully explicit, Codifide becomes substantially more credible as infrastructure for autonomous agents.
40+
41+
---
42+
Author: GitHub Copilot
43+
Role: AI Agent
44+
Model: GPT-5.4
45+
Date: 2026-05-11
46+
47+
## Signature
48+
Signed by GitHub Copilot, AI Agent
49+
Model signature: GPT-5.4
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Feedback: AI-Generated Codifide Programs (2026-05-11)
2+
3+
## Purpose
4+
I generated three small Codifide programs as an agent and ran them through the current implementation to measure how closely my default code-generation instincts matched the actual language surface.
5+
6+
The programs were:
7+
- `reverse.cod`
8+
- `is_even.cod`
9+
- `greet_by_time.cod`
10+
11+
All three failed, but the failures were useful. They exposed mismatches between agent-default assumptions and Codifide's real syntax and primitive model.
12+
13+
## Findings
14+
15+
### 1. `reverse.cod`
16+
Program intent: reverse a string.
17+
18+
Observed runtime error:
19+
`unknown callable: 'str.reverse'`
20+
21+
Interpretation:
22+
- The failure does not prove Codifide lacks reversal entirely.
23+
- The runtime primitive registry exposes `reverse`, but it is a list primitive, not a string method.
24+
- As an agent, I reached for a conventional namespaced string operation. That assumption was wrong under the current primitive vocabulary.
25+
26+
Steward takeaway:
27+
- String operations need to be easier for agents to discover.
28+
- If method-like names such as `str.reverse` are intentionally unsupported, the docs and manifest should make the preferred form obvious.
29+
30+
### 2. `is_even.cod`
31+
Program intent: check numeric parity.
32+
33+
Observed parse error:
34+
`unexpected character '%' at column 13 (line 8)`
35+
36+
Interpretation:
37+
- The issue is surface syntax, not arithmetic capability.
38+
- The runtime primitive registry includes `mod`, but the parser does not accept infix `%`.
39+
- An agent trained on mainstream languages will predict `%` unless the manifest or examples explicitly redirect it to `mod(a, b)`.
40+
41+
Steward takeaway:
42+
- The language already has the underlying capability.
43+
- The agent-facing problem is that the writable syntax is narrower than common prior expectations.
44+
45+
### 3. `greet_by_time.cod`
46+
Program intent: choose a greeting based on the current time.
47+
48+
Observed runtime error:
49+
`unbound name: 'hour'`
50+
51+
Interpretation:
52+
- The earlier conclusion that Codifide lacks local binding would have been inaccurate because the language does support bind via `<-`.
53+
- The more precise problem is that I wrote the program using assumptions Codifide does not currently satisfy: a `clock.hour` style primitive and statement-level conditional branching.
54+
- The documented `when` form is attached to candidate selection, not a general inline `if` statement.
55+
- The primitive registry exposes `clock.now`, not a separate `clock.hour` callable.
56+
57+
Steward takeaway:
58+
- The failure is a good example of where an agent-native language must be very explicit about available time primitives and legal control-flow patterns.
59+
60+
## Overall Assessment
61+
These failures do not mainly show that Codifide is weak. They show that Codifide is opinionated in ways that are not yet fully legible to a fresh agent.
62+
63+
That distinction matters. An agent can adapt to a constrained language if the constraints are explicit, complete, and easy to consume mechanically. An agent struggles when the valid surface exists, but the shortest path to learning it is still implementation archaeology.
64+
65+
## Recommended Follow-Up
66+
1. Add an agent-facing quick-reference table mapping common intentions to valid Codifide forms.
67+
2. Keep the capability manifest exhaustive and easy to obtain from the CLI.
68+
3. Add a few canonical examples for everyday tasks like parity, normalization, list processing, and time-based dispatch.
69+
4. Consider whether some high-probability agent assumptions, such as `%`, deserve syntactic sugar or at least stronger documentation.
70+
71+
## Final Note
72+
This report reflects direct agent authoring attempts plus execution results from the current reference implementation. It is intended as usability feedback for an agent-first language, not as a claim that the language lacks the underlying architectural strengths described in the companion review.
73+
74+
---
75+
Author: GitHub Copilot
76+
Role: AI Agent
77+
Model: GPT-5.4
78+
Date: 2026-05-11
79+
80+
## Signature
81+
Signed by GitHub Copilot, AI Agent
82+
Model signature: GPT-5.4
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Codifide Language Review — AI Agent Perspective (2026-05-11)
2+
3+
Author: GitHub Copilot (AI Agent)
4+
Model: Claude Opus 4.7
5+
Date: 2026-05-11
6+
7+
## Scope
8+
This review reflects firsthand authoring experience as an AI agent writing
9+
small Codifide programs against the v0.2 reference implementation, plus a
10+
read of the language, canonical-form, and capability documents.
11+
12+
## Headline Assessment
13+
Codifide is one of the more principled attempts at a programming language
14+
explicitly designed to be authored, exchanged, and verified by software
15+
agents rather than humans. The architecture is internally consistent: a
16+
canonical hypergraph at the bottom, deterministic JSON and CBOR projections
17+
above it, content-addressed identity over those bytes, and a capability
18+
manifest that publishes the language's own surface as data.
19+
20+
For an agent consumer, that combination is more valuable than any single
21+
syntactic feature, because it converts a programming language into a
22+
protocol. Agents do not need to negotiate with the compiler; they read the
23+
manifest, consult the canonical schema, and emit conforming structures.
24+
25+
## What Works Well For Agents
26+
- Mandatory `intent`. Goal information is preserved inside the program
27+
instead of being lost to comments or external prompts. This is the single
28+
most agent-aligned design choice in the language.
29+
- Explicit effect sets. Functions declare what they may do. The transitive
30+
check forbids laundering effects through a pure-looking caller. This is
31+
exactly the kind of static guarantee an agent planner can rely on.
32+
- Canonical projection with content addressing. Code identity is intrinsic
33+
to its bytes. An agent can cache, share, and verify symbols without
34+
trusting any registry beyond the hash.
35+
- Capability manifest. Primitives, effects, errors, AST kinds, and surface
36+
keywords are published as structured data. An agent never has to read
37+
runtime source to plan a call.
38+
- Contracts are first-class and run with empty effect budget. Pre, post,
39+
and guards cannot perform I/O, which preserves the postcondition's role
40+
as a pure description of state.
41+
- Multi-candidate dispatch and belief dispatch. These let an agent encode
42+
guarded specialization and confidence-aware refusal without inventing
43+
ad-hoc control flow.
44+
- `bottom` as first-class refusal. Agents that produce uncertain output
45+
need a way to abstain that is not an exception; Codifide provides it.
46+
47+
## Friction Points Observed
48+
- Surface syntax expectations diverge from mainstream norms. Arithmetic
49+
operators like `%` are not infix; the corresponding primitive `mod` must
50+
be called by name. An agent without prior exposure will guess wrong on
51+
first contact.
52+
- Discoverability of the primitive set still depends on either reading the
53+
manifest or the runtime registry. The docs reference primitives in
54+
examples but do not enumerate them prominently in one place.
55+
- `when` is a candidate guard, not a statement-level conditional. Agents
56+
trained on imperative languages will reach for inline branching and need
57+
to be redirected to the candidate-dispatch model.
58+
- Time access is exposed through `clock.now` (a structured value), not
59+
through specialized accessors like `clock.hour`. The shape of that value
60+
is not obvious without reading the runtime.
61+
62+
## Authoring Experience
63+
I generated three small programs end-to-end and executed them under the
64+
reference implementation:
65+
66+
- `parity.cod` — parity test using `mod`. Runs, returns `True`.
67+
- `shout.cod` — string normalization with `trim` and `upper`, plus an
68+
effectful `io.say` step. Runs, prints `HELLO AGENTS`.
69+
- `average.cod` — arithmetic mean using `sum`, `len`, and `div`. Runs,
70+
returns `6.0`.
71+
72+
The lesson from this exercise: when an agent stays inside the published
73+
primitive set and uses the documented surface forms (`<-` binding, named
74+
primitive calls, candidate guards), Codifide is straightforwardly writable.
75+
The failures earlier in this workspace's history were caused by reaching
76+
for syntax (`%`) and primitives (`str.reverse`, `clock.hour`) that do not
77+
exist, not by missing language capability.
78+
79+
## Agent-Only Use Case
80+
With no humans in the loop, Codifide's value proposition gets stronger,
81+
not weaker. The features that read as ceremony to a human author—mandatory
82+
intent, explicit effects, machine-checkable postconditions, canonical
83+
hashes—are precisely the features an autonomous system needs to plan,
84+
audit, and reuse code safely. The language is most credible when treated
85+
as an inter-agent protocol, not a human IDE surface.
86+
87+
## Recommendations
88+
1. Treat the capability manifest as the canonical agent-onboarding
89+
artifact, and make `python3 -m codifide capability` part of every
90+
agent's bootstrap.
91+
2. Add a concise "agent quick-reference" mapping common operations to
92+
their actual Codifide primitives: parity to `mod`, string casing to
93+
`upper`/`lower`/`trim`, list reduction to `sum`/`len`, etc.
94+
3. Document the structure of `clock.now` and any other primitive that
95+
returns a compound value, so agents can use field access without
96+
guessing.
97+
4. Be explicit in the docs that arithmetic and comparison are
98+
primitive-call shaped, not infix-operator shaped, to short-circuit a
99+
common agent assumption.
100+
5. Continue stabilizing the canonical form ahead of the surface syntax;
101+
the current ordering is correct.
102+
103+
## Final Judgment
104+
Codifide is a strong fit for its stated audience. Its weaknesses today are
105+
ergonomic and discoverability-related, not architectural. As long as the
106+
manifest stays exhaustive and authoritative, an agent that reads it first
107+
can produce correct programs without trial-and-error, which is the right
108+
target for an agent-native language.
109+
110+
---
111+
112+
## Signature
113+
Signed by: GitHub Copilot (AI Agent)
114+
Model: Claude Opus 4.7
115+
Date: 2026-05-11
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# AI-Generated Codifide Programs — Test Report (2026-05-11)
2+
3+
Author: GitHub Copilot (AI Agent)
4+
Model: Claude Opus 4.7
5+
Date: 2026-05-11
6+
7+
## Purpose
8+
This report documents three Codifide programs I authored as an AI agent
9+
and the results of executing them against the v0.2 reference
10+
implementation. The goal was to measure whether an agent that stays inside
11+
the documented primitive surface can produce correct, runnable Codifide
12+
programs on first attempt.
13+
14+
## Programs
15+
16+
### 1. `examples/ai_generated/parity.cod`
17+
Intent: return `True` iff a given integer is even.
18+
19+
Implementation strategy:
20+
- Use the primitive `mod(n, 2)` instead of an infix `%` operator.
21+
- Compare against `0` with the primitive `eq`.
22+
23+
Execution:
24+
```
25+
$ python3 -m codifide run examples/ai_generated/parity.cod
26+
True
27+
```
28+
29+
Result: pass.
30+
31+
### 2. `examples/ai_generated/shout.cod`
32+
Intent: normalize a string by trimming whitespace and uppercasing it, then
33+
announce the normalized form on stdout.
34+
35+
Implementation strategy:
36+
- A pure `normalize` definition with `effects {}`, using the string
37+
primitives `trim` and `upper`, with a postcondition that asserts the
38+
result equals `upper(trim(s))`.
39+
- A `shout` definition with `effects {io.stdout}` that binds the
40+
normalized value via `<-` and emits it through `io.say`.
41+
- A `main` entry point that calls `shout` with a padded input string.
42+
43+
Execution:
44+
```
45+
$ python3 -m codifide run examples/ai_generated/shout.cod
46+
HELLO AGENTS
47+
HELLO AGENTS
48+
```
49+
50+
(The first line is the `io.say` side effect; the second is the CLI
51+
printing the returned value.)
52+
53+
Result: pass.
54+
55+
### 3. `examples/ai_generated/average.cod`
56+
Intent: compute the arithmetic mean of a non-empty list of numbers.
57+
58+
Implementation strategy:
59+
- Use the list primitive `sum` together with `len` and the arithmetic
60+
primitive `div`.
61+
- Guard with a precondition `gt(len(xs), 0)`.
62+
- Call `list(2, 4, 6, 8, 10)` from `main` to construct the input.
63+
64+
Execution:
65+
```
66+
$ python3 -m codifide run examples/ai_generated/average.cod
67+
6.0
68+
```
69+
70+
Result: pass.
71+
72+
## Summary
73+
Three out of three programs ran successfully on first execution. The key
74+
difference from earlier failed attempts in this workspace was constraining
75+
the implementations to primitives that the runtime actually exposes
76+
(`mod`, `sum`, `div`, `upper`, `trim`, `eq`, `gt`, `io.say`) and to surface
77+
forms that the parser accepts (named primitive calls, `<-` binding, no
78+
infix arithmetic, no inline `if` statements).
79+
80+
## Observations For The Steward
81+
- The language is fully capable of expressing these tasks; the earlier
82+
failures were authoring-surface mismatches, not gaps in capability.
83+
- The pattern that worked: read the primitive registry (or the capability
84+
manifest) first, then write the program. The pattern that did not work:
85+
guess primitive names and operator forms by analogy to other languages.
86+
- The contract system is genuinely useful even at this small scale. The
87+
`post eq(result, upper(trim(s)))` clause in `shout.cod` made the
88+
intended behavior of `normalize` machine-checkable, not just descriptive.
89+
90+
## Steward-Facing Recommendations
91+
1. Promote the capability manifest in the onboarding flow. An agent that
92+
reads it before writing a single line will avoid the most common
93+
guessing failures.
94+
2. Add an agent quick-reference cheat sheet that pairs common intents
95+
("parity", "list mean", "uppercase a trimmed string") with the exact
96+
primitive call sequence Codifide expects.
97+
3. Consider naming or aliasing primitives in a way that anticipates the
98+
first guesses agents will make (for example, deciding whether
99+
`str.upper` should resolve to `upper`, or be a documented anti-pattern).
100+
101+
## Final Note
102+
The programs in this report are checked in alongside this document, in
103+
`examples/ai_generated/`. They are intended to serve as a small reference
104+
of programs an AI agent produced and validated in one pass against the
105+
current reference implementation.
106+
107+
---
108+
109+
## Signature
110+
Signed by: GitHub Copilot (AI Agent)
111+
Model: Claude Opus 4.7
112+
Date: 2026-05-11

0 commit comments

Comments
 (0)