Draft
src/content/notes/amnesia-agents.mdx
Summary
Found 3 gap(s): 3 explicit marker(s), 0 soft signal(s).
Gaps
1. Evidence that existing memory features fall short — line 25 (explicit marker)
Original text:
[Even those with explicit memory features are so limited they fail to live up to our basic expectations. - this statement needs evidence]
Finding:
This is a hard claim to source with a single study because it rests on user experience with commercial products. A few useful angles: (1) OpenAI's own documentation acknowledges ChatGPT's memory stores a limited number of facts and does not automatically update stale information. (2) Anthropic's Claude memory is implemented via user-maintained documents rather than automatic recall, with documented limitations around context injection. (3) The broader academic framing is "external memory augmentation" vs. "episodic memory" — a 2024 survey of agent memory architectures (Wang et al.) provides a useful taxonomy. Could not find a single peer-reviewed study that empirically tests consumer-facing memory features specifically.
Sources:
- (help.openai.com/redacted) (ChatGPT memory limitations, documented)
- (arxiv.org/redacted) (Wang et al. 2024, "A Survey on Large Language Model based Autonomous Agents" — covers memory architecture taxonomy)
2. Current context window lengths for major models — line 31 (explicit marker)
Original text:
[diagram of context windows running out – include current max context window lengths for various models]
Finding:
As of early 2026, published context windows for major models: GPT-4o (128K tokens), Claude 3.7 Sonnet (200K tokens), Gemini 1.5 Pro (1M tokens), Gemini 1.5 Flash (1M tokens), Llama 3.1 (128K tokens). These figures change frequently with model updates; official documentation pages are the most reliable source.
Sources:
3. "Context rot" citation — line 33 (explicit marker)
Original text:
When your conversation goes on for a while, they can get confused and dumber. They forget what you did 10 minutes ago. Context rot [citation]
Finding:
"Context rot" does not appear to be an established academic term with a canonical citation. It is used colloquially in the developer community to describe performance degradation in long contexts. The closest academic framing is the "lost in the middle" problem (Liu et al. 2023), which showed LLMs perform worse when relevant information appears in the middle of long contexts rather than the beginning or end. Anthropic's research on context length and coherence (from their model cards) is another relevant reference.
Sources:
- (arxiv.org/redacted) (Liu et al. 2023, "Lost in the Middle: How Language Models Use Long Contexts" — the nearest peer-reviewed backing for the "context rot" phenomenon)
Note
These are research findings only. Maggie writes the prose herself — do not interpret these as drafted replacements.
Generated by Draft Research Agent · ● 440.4K · ◷
Draft
src/content/notes/amnesia-agents.mdxSummary
Found 3 gap(s): 3 explicit marker(s), 0 soft signal(s).
Gaps
1. Evidence that existing memory features fall short — line 25 (explicit marker)
Original text:
Finding:
This is a hard claim to source with a single study because it rests on user experience with commercial products. A few useful angles: (1) OpenAI's own documentation acknowledges ChatGPT's memory stores a limited number of facts and does not automatically update stale information. (2) Anthropic's Claude memory is implemented via user-maintained documents rather than automatic recall, with documented limitations around context injection. (3) The broader academic framing is "external memory augmentation" vs. "episodic memory" — a 2024 survey of agent memory architectures (Wang et al.) provides a useful taxonomy. Could not find a single peer-reviewed study that empirically tests consumer-facing memory features specifically.
Sources:
2. Current context window lengths for major models — line 31 (explicit marker)
Original text:
Finding:
As of early 2026, published context windows for major models: GPT-4o (128K tokens), Claude 3.7 Sonnet (200K tokens), Gemini 1.5 Pro (1M tokens), Gemini 1.5 Flash (1M tokens), Llama 3.1 (128K tokens). These figures change frequently with model updates; official documentation pages are the most reliable source.
Sources:
3. "Context rot" citation — line 33 (explicit marker)
Original text:
Finding:
"Context rot" does not appear to be an established academic term with a canonical citation. It is used colloquially in the developer community to describe performance degradation in long contexts. The closest academic framing is the "lost in the middle" problem (Liu et al. 2023), which showed LLMs perform worse when relevant information appears in the middle of long contexts rather than the beginning or end. Anthropic's research on context length and coherence (from their model cards) is another relevant reference.
Sources:
Note
These are research findings only. Maggie writes the prose herself — do not interpret these as drafted replacements.