Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Changed
- ⚠️ BREAKING: `vault.read` is now **faithful by default** (ADR-203, [#133](https://github.com/aaronsb/obsidian-mcp-plugin/issues/133)). It returns the **complete, byte-exact file source** (no more newline-flattened fragments) when the file fits a ~50k-char budget. Large files return a **verbatim page 1 with absolute line bookends** (`page=N` to continue) instead of a context-breaking raw dump. `returnFullFile: true` is the explicit whole-large-file override; `query`/`strategy`/`maxFragments` still return semantic fragments. The structured envelope no longer double-encodes the body. Clients that relied on the old fragmented default should pass fragment params explicitly.

### Security
- 🔴 CRITICAL: Identified authentication vulnerability - no API key validation ([#9](https://github.com/aaronsb/obsidian-mcp-plugin/issues/9))
- 🔴 CRITICAL: Identified path traversal vulnerability in file operations ([#10](https://github.com/aaronsb/obsidian-mcp-plugin/issues/10))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,14 +66,43 @@ from **action/result rendering** rather than by one blunt format switch.
Adopt a **two-bucket response contract**. This ADR is the decision of
record; implementation is tracked separately and is *not* part of this ADR.

1. **Content reads are faithful by default.** `vault.read` returns the
**complete, verbatim file source** by default — byte-exact as stored by
Obsidian (newlines, indentation, trailing whitespace, tabs, fenced
blocks, frontmatter delimiters preserved). Fragmentation / summarization
becomes an **explicit opt-in** for large-file context savings via the
*existing* knobs (`query`, `strategy`, `maxFragments`); `returnFullFile`
is no longer required to get the truth and is retired or demoted to a
no-op alias. The read→edit round-trip is correct by construction.
1. **Content reads are faithful by default — but never context-breaking.**
`vault.read` returns **byte-exact source** as stored by Obsidian
(newlines, indentation, trailing whitespace, tabs, fenced blocks,
frontmatter delimiters preserved — never flattened/fragmented unless
fragments are explicitly requested). The hard invariant: **a default read
must not hand the agent a context-breaking raw dump.**

The paginate/no-paginate boundary is a **character budget**
(`READ_PAGE_CHARS`, default **50000** ≈ ~12k tokens, one tunable
constant) — *not* a line count. A line count is an invalid proxy for the
thing being bounded: 1500 five-word lines and 1500 hundred-word lines
have wildly different context cost. **Bookends stay line-based** (for
`edit.at_line`); the two reconcile by defining a *page* as the longest
run of **whole lines whose cumulative size ≤ `READ_PAGE_CHARS`**,
reported with the line range it covers. Lines are never split mid-line.

- **Fits the budget (the common case — most notes):** the **whole file,
verbatim, in one load**. No pagination structure, no ceremony.
- **Exceeds the budget:** **page 1 only** — a single *contiguous*
verbatim block: the longest whole-line run within `READ_PAGE_CHARS`
(not an array of chunk objects), carrying **bookends**: `lineStart`,
`lineEnd`, `totalLines`, `bytes`, and a `nextPage` hint
(`vault.read(path, page=2)`). Absolute line numbers are preserved, so
`edit.at_line` on a large file still works precisely and the agent
pages forward instead of a truncated preview, lossy fragments, or a
context bomb. (A single line that alone exceeds the budget is returned
whole as its own page — never split — flagged; correctness wins over
the budget in that rare case.)
- **`returnFullFile: true`:** explicit override — the entire file
verbatim regardless of size. `returnFullFile` is therefore **retained
and repurposed** (not retired): it is the deliberate large-context
opt-in, used when the agent knowingly accepts the cost.
- **`query` / `strategy` / `maxFragments`:** semantic fragment retrieval,
unchanged.

The read→edit round-trip is correct by construction in every branch
(per-page content is byte-exact for its line range).

2. **Action/result outputs stay concise-formatted by default.** Tools whose
result is a status / confirmation / list (create, update, delete, move,
Expand All @@ -94,12 +123,13 @@ record; implementation is tracked separately and is *not* part of this ADR.
`_Formatter error_` fallback; the failing `returnFullFile` formatted
branch is fixed or removed as part of (1).

Backward compatibility: (1) changes the default shape/size of `vault.read`
for clients that currently rely on the fragmented/flattened default. This
is a deliberate breaking change to fix a correctness bug; it ships with a
clear changelog entry and a documented opt-in (`strategy`/`maxFragments`)
for the previous context-saving behaviour. A prerelease + functional
verification of a real read→`edit.window` round-trip gates the rollout.
Backward compatibility: (1) changes the default shape of `vault.read` for
clients relying on the fragmented/flattened default. Deliberate breaking
change to fix a correctness bug; ships with a changelog entry, the
`strategy`/`maxFragments` opt-in for the old context-saving behaviour, and
`returnFullFile:true` for whole-large-file access. A prerelease + functional
verification of a real read→`edit.window` round-trip (incl. a large file
exercising bookended page 1 → `at_line`) gates the rollout.

#133 is **reframed, not closed by fiat**: it remains the user-facing
tracking issue for bucket 1; this ADR records the agreed direction and
Expand All @@ -124,19 +154,24 @@ explicitly supersedes its specific "global `raw` flip" remedy.
default `vault.read` (smaller, lossy payloads). Mitigated by changelog,
the existing `strategy`/`maxFragments` opt-in for context savings, and a
gated prerelease test.
- Large files now return full content by default → higher per-call context
cost for the read-everything pattern. Mitigated: fragmentation is still
one explicit parameter away, and a `wordCount`/size `warning` is retained
to nudge large-file callers toward it.
- Large files no longer return verbatim by default — they return a
bookended page 1. This is a behavioural change for any client that read a
large file in one default call; `returnFullFile:true` restores that
explicitly. Accepted: a context-breaking default is the failure this ADR
exists to prevent.
- Pagination adds a small surface to `vault.read` (`page` param,
`lineStart/lineEnd/totalLines/bytes/nextPage` fields). Justified: it's
what lets large-file `edit.at_line` keep working instead of forcing
fragments.

### Neutral

- Two-bucket contract should be documented in the README and tool
descriptions so the default semantics are discoverable.
- `returnFullFile` becomes redundant; retire or alias with a deprecation
note.
- No change to `IObsidianAPI` or the MCP tool surface beyond the default
response shape/size of `vault.read` and the de-duplicated envelope.
- `returnFullFile` is retained and repurposed as the explicit
whole-large-file override (not retired).
- No change to `IObsidianAPI`; the MCP `vault` tool gains a `page`
parameter and the read response gains pagination bookends.

## Alternatives Considered

Expand All @@ -155,3 +190,15 @@ explicitly supersedes its specific "global `raw` flip" remedy.
- **New dedicated `vault.source` tool, leave `read` lossy.** Rejected:
splits the surface and leaves the obvious tool (`read`) a footgun; the
principled fix is making the obvious tool correct by default.
- **Return full verbatim regardless of size (no guard).** Rejected
outright: a large file dumped raw breaks the agent's context — this is
the exact failure mode the pagination guard exists to prevent, and the
reason `returnFullFile:true` is an *explicit, knowing* opt-in.
- **Line-count page budget (e.g. 1500 lines).** Rejected: line count does
not bound context — 1500 short lines vs 1500 long lines differ by orders
of magnitude. The budget must be size-based (`READ_PAGE_CHARS`); only the
*bookends* are line-based, for `at_line`.
- **Array of page/chunk objects in one response.** Rejected: that is just
the fragmented default in another shape and is not "one load." A page is
a single contiguous verbatim block; multi-page access is sequential via
the `page` parameter.
81 changes: 65 additions & 16 deletions src/formatters/vault.ts
Original file line number Diff line number Diff line change
Expand Up @@ -150,13 +150,15 @@ export interface FileReadFragment {
}

export interface FileReadResponse {
path: string;
path?: string;
content: string | FileReadFragment[];
metadata?: {
size: number;
modified: number;
size?: number;
modified?: number;
created?: number;
extension: string;
extension?: string;
totalLines?: number;
bytes?: number;
};
frontmatter?: Record<string, unknown>;
tags?: string[];
Expand All @@ -166,22 +168,41 @@ export interface FileReadResponse {
strategy: string;
query?: string;
};
pagination?: {
paginated: boolean;
page: number;
pageLineStart: number;
pageLineEnd: number;
totalLines: number;
bytes: number;
hasMore: boolean;
nextPage: string | null;
oversizedLine?: boolean;
beyondEnd?: boolean;
};
warning?: string;
}

export function formatFileRead(response: FileReadResponse): string {
const { path, content, metadata, frontmatter, tags, fragmentMetadata } = response;
const { path, content, metadata, frontmatter, tags, fragmentMetadata, pagination } = response;
const lines: string[] = [];

const fileName = path.split('/').pop() || path;
const safePath = path || 'file';
const fileName = safePath.split('/').pop() || safePath;
lines.push(header(1, `File: ${fileName}`));
lines.push('');

// Metadata summary
lines.push(property('Path', path, 0));
if (metadata) {
lines.push(property('Path', safePath, 0));
if (metadata && typeof metadata.size === 'number') {
lines.push(property('Size', formatFileSize(metadata.size), 0));
}
if (metadata && typeof metadata.modified === 'number') {
lines.push(property('Modified', formatDate(metadata.modified), 0));
}
if (metadata && typeof metadata.totalLines === 'number') {
lines.push(property('Lines', String(metadata.totalLines), 0));
}

// Tags
if (tags && tags.length > 0) {
Expand Down Expand Up @@ -217,6 +238,17 @@ export function formatFileRead(response: FileReadResponse): string {
// Content - handle both string and fragment array formats
lines.push('');

if (content != null && typeof content !== 'string' && !Array.isArray(content)) {
// Unrecognized/binary structured passthrough — render a safe note
// instead of falling through to the _Formatter error_ fallback.
lines.push(header(2, 'Content'));
lines.push('');
lines.push('_(non-text content; use `raw: true` for the structured payload)_');
lines.push(divider());
lines.push(summaryFooter());
return joinLines(lines);
}

if (Array.isArray(content)) {
// Fragment-based response
const fragments = content;
Expand All @@ -239,18 +271,35 @@ export function formatFileRead(response: FileReadResponse): string {
lines.push(`... and ${fragments.length - 5} more fragments`);
}
} else {
// Simple string content
// Verbatim string content (ADR-203: content reads are faithful by
// default — the formatted default path must NOT truncate, or an agent
// cannot derive a byte-matching edit.window oldText without raw:true,
// which is the exact #133 friction this ADR retires). The data layer
// already bounds size to READ_PAGE_CHARS (or it's an explicit
// returnFullFile override), so emitting the full block is safe. No
// wrapping code fence: the body may itself contain ``` fences, and a
// wrapper would corrupt round-trip fidelity.
lines.push(header(2, 'Content'));
lines.push('');
lines.push(content);
}

const contentLines = content.split('\n');
const previewLines = contentLines.slice(0, 50);
lines.push('```markdown');
lines.push(previewLines.join('\n'));
if (contentLines.length > 50) {
lines.push(`\n... (${contentLines.length - 50} more lines)`);
if (pagination && pagination.paginated) {
lines.push('');
lines.push(header(2, 'Pagination'));
lines.push(property('Page', `${pagination.page} (lines ${pagination.pageLineStart}-${pagination.pageLineEnd} of ${pagination.totalLines})`, 0));
if (pagination.hasMore && pagination.nextPage) {
lines.push(property('Next', pagination.nextPage, 0));
}
lines.push('```');
if (pagination.beyondEnd) {
lines.push(' (requested page is past end of file)');
}
lines.push(' returnFullFile=true for the whole file · query/strategy/maxFragments for fragments · line numbers are absolute (edit.at_line works)');
}

if (response.warning) {
lines.push('');
lines.push(`> ${response.warning}`);
}

lines.push(divider());
Expand Down
1 change: 1 addition & 0 deletions src/semantic/operations/vault.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ export async function executeVaultOperation(ctx: RouterContext, action: string,
return await readFileWithFragments(ctx.api, ctx.fragmentRetriever, {
path,
returnFullFile: paramBool(params, 'returnFullFile'),
page: paramNum(params, 'page'),
query: paramStr(params, 'query'),
strategy,
maxFragments: paramNum(params, 'maxFragments')
Expand Down
4 changes: 2 additions & 2 deletions src/tools/semantic-tools.ts
Original file line number Diff line number Diff line change
Expand Up @@ -433,7 +433,7 @@ function getParametersForOperation(operation: string): Record<string, unknown> {
},
page: {
type: 'number',
description: 'Page number for paginated results'
description: 'Page number for paginated results (list/search; also large-file read — see returnFullFile)'
},
pageSize: {
type: 'number',
Expand All @@ -450,7 +450,7 @@ function getParametersForOperation(operation: string): Record<string, unknown> {
},
returnFullFile: {
type: 'boolean',
description: 'Return full file instead of fragments (WARNING: large files can consume significant context)'
description: 'read: force the ENTIRE file verbatim regardless of size (explicit large-context override). Default read already returns the whole file verbatim when it fits the size budget; large files return a verbatim page 1 with absolute line bookends (use page=N to continue, or query/strategy/maxFragments for fragments).'
},
includeContent: {
type: 'boolean',
Expand Down
Loading
Loading