Skip to content

Commit 292d677

Browse files
[skills] Fix heavy file ingestion markdown lint
1 parent cdc4354 commit 292d677

8 files changed

Lines changed: 28 additions & 31 deletions

File tree

-10 Bytes
Binary file not shown.
-7 Bytes
Binary file not shown.
-8 Bytes
Binary file not shown.

skills/heavy-file-ingestion/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,10 @@ Heavy File Ingestion stops agents from wasting expensive context on raw PDFs, sl
2424
## Installation
2525

2626
1. Copy the entire [`heavy-file-ingestion`](./) folder into a place your AI client can access, not just `SKILL.md`. The skill expects the bundled `scripts/` and `references/` folders to stay next to it.
27-
2. For Claude Code, place the folder at `~/.claude/skills/heavy-file-ingestion/`.
28-
3. For Codex or Cursor, keep the folder in your workspace or copy the contents into that client's skills or rules location.
29-
4. Restart or reload the client so it picks up [`SKILL.md`](./SKILL.md).
30-
5. When you want the deterministic converters available, run the skill script with either:
27+
1. For Claude Code, place the folder at `~/.claude/skills/heavy-file-ingestion/`.
28+
1. For Codex or Cursor, keep the folder in your workspace or copy the contents into that client's skills or rules location.
29+
1. Restart or reload the client so it picks up [`SKILL.md`](./SKILL.md).
30+
1. When you want the deterministic converters available, run the skill script with either:
3131

3232
```bash
3333
uv run \
@@ -38,7 +38,7 @@ uv run \
3838
python skills/heavy-file-ingestion/scripts/convert_heavy_file.py /absolute/path/to/file.pdf
3939
```
4040

41-
6. If you already have `markitdown` installed and want to prefer it for rich document conversion, add `--prefer markitdown`.
41+
1. If you already have `markitdown` installed and want to prefer it for rich document conversion, add `--prefer markitdown`.
4242

4343
## Downloadable Variants
4444

skills/heavy-file-ingestion/SKILL.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,20 +21,20 @@ Agents waste money and context when they read heavyweight files raw. This skill
2121
## Core Policy
2222

2323
1. **Convert before reading.** Do not dump raw heavyweight files into model context if a deterministic converter can create a cheaper artifact.
24-
2. **Index before reasoning.** Read the generated `index.md` or `index.json` first. It should tell you what is in the file, how clean the extraction was, and whether escalation is justified.
25-
3. **Match the converter to the file type.**
24+
1. **Index before reasoning.** Read the generated `index.md` or `index.json` first. It should tell you what is in the file, how clean the extraction was, and whether escalation is justified.
25+
1. **Match the converter to the file type.**
2626
- PDFs and documents: markdown artifact
2727
- Presentations: markdown slide outline
2828
- Spreadsheets: CSV per sheet plus a markdown manifest
29-
4. **Escalate by cost tier, not instinct.**
29+
1. **Escalate by cost tier, not instinct.**
3030
- Tier 1: deterministic converter plus index
3131
- Tier 2: cheap model on the extracted artifact only if quality flags say the deterministic pass lost structure
3232
- Tier 3: expensive model only after the file has already been compressed into markdown, CSV, or a sampled subset
3333

3434
## Process
3535

3636
1. Identify the file path, extension, and rough size.
37-
2. Run the converter script instead of reading the original file directly:
37+
1. Run the converter script instead of reading the original file directly:
3838

3939
```bash
4040
uv run \
@@ -45,15 +45,15 @@ uv run \
4545
python skills/heavy-file-ingestion/scripts/convert_heavy_file.py /absolute/path/to/file.ext
4646
```
4747

48-
3. If you already have `markitdown` installed and want to prefer it for PDF or DOCX conversion, rerun with:
48+
1. If you already have `markitdown` installed and want to prefer it for PDF or DOCX conversion, rerun with:
4949

5050
```bash
5151
python skills/heavy-file-ingestion/scripts/convert_heavy_file.py /absolute/path/to/file.ext --prefer markitdown
5252
```
5353

54-
4. Read the generated `index.md` first.
55-
5. Only read the extracted markdown or CSV outputs that the index says are worth reading.
56-
6. If the index flags weak extraction, use a cheap fallback:
54+
1. Read the generated `index.md` first.
55+
2. Only read the extracted markdown or CSV outputs that the index says are worth reading.
56+
3. If the index flags weak extraction, use a cheap fallback:
5757
- Try an alternate deterministic converter
5858
- Use a small model to rebuild only the structure or outline from the extracted artifact
5959
- Escalate to a stronger model only when the cheaper passes still leave critical ambiguity

skills/heavy-file-ingestion/variants/claude-code/SKILL.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,14 @@ Claude Code has the tools to convert files locally, so it should not waste conte
2020
## Process
2121

2222
1. Do not read the original heavyweight file directly into context if conversion is possible.
23-
2. Resolve the bundled converter relative to this skill directory: `scripts/convert_heavy_file.py`
24-
3. Run the converter first. Default command:
23+
1. Resolve the bundled converter relative to this skill directory: `scripts/convert_heavy_file.py`
24+
1. Run the converter first. Default command:
2525

2626
```bash
2727
python scripts/convert_heavy_file.py /absolute/path/to/file.ext
2828
```
2929

30-
4. If dependencies are missing, prefer:
30+
1. If dependencies are missing, prefer:
3131

3232
```bash
3333
uv run \
@@ -38,12 +38,12 @@ uv run \
3838
python scripts/convert_heavy_file.py /absolute/path/to/file.ext
3939
```
4040

41-
5. Read the generated `index.md` before reading any converted artifact.
42-
6. Use the index to decide the cheapest next step:
41+
1. Read the generated `index.md` before reading any converted artifact.
42+
2. Use the index to decide the cheapest next step:
4343
- `read_extracted_artifact`: read the markdown or CSV and continue
4444
- `install_dependency_and_retry`: install the missing deterministic dependency and rerun
4545
- `cheap_model_or_stronger_converter`: retry with a better converter or use a cheaper model only on the extracted artifact
46-
7. Only escalate to a stronger model after the file has already been compressed into markdown, CSV, or a short sampled subset.
46+
3. Only escalate to a stronger model after the file has already been compressed into markdown, CSV, or a short sampled subset.
4747

4848
## Client Rules
4949

@@ -55,4 +55,3 @@ uv run \
5555
## Bundled References
5656

5757
- `references/open-source-stack.md` explains the tool choices and fallback strategy.
58-

skills/heavy-file-ingestion/variants/claude-desktop/SKILL.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,11 @@ Claude Desktop does not have the same local shell workflow as coding agents, so
2020
## Process
2121

2222
1. Do not ingest the raw heavyweight file by default.
23-
2. First ask for the cheapest workable artifact:
23+
1. First ask for the cheapest workable artifact:
2424
- PDF or DOCX: markdown
2525
- PPTX: markdown slide outline
2626
- XLSX: CSV per sheet or a small sample plus sheet names
27-
3. If the user has not converted it yet, offer exact commands they can run outside Claude Desktop.
27+
1. If the user has not converted it yet, offer exact commands they can run outside Claude Desktop.
2828

2929
### Suggested Conversion Commands
3030

@@ -41,16 +41,15 @@ If the script is not available, say so and ask the user for:
4141
- a CSV export
4242
- or a small representative excerpt
4343

44-
4. Once the user provides the converted artifact, create a quick index:
44+
1. Once the user provides the converted artifact, create a quick index:
4545
- file type
4646
- sections, slides, or sheet names
4747
- row counts or page counts if available
4848
- any obvious extraction-quality problems
49-
5. Only then analyze the content.
49+
2. Only then analyze the content.
5050

5151
## Client Rules
5252

5353
- Be explicit about the tradeoff: converting first is cheaper and usually better.
5454
- If the user insists on staying inside Claude Desktop, ask for a smaller excerpt rather than taking the whole file raw.
5555
- Use raw ingestion only for genuinely small files where conversion would cost more effort than it saves.
56-

skills/heavy-file-ingestion/variants/codex/SKILL.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,13 @@ Codex can run local commands and inspect files, so direct ingestion of bulky doc
2020
## Process
2121

2222
1. Do not open the raw heavyweight file as your first move if a deterministic conversion path exists.
23-
2. Run the bundled converter from this skill directory:
23+
1. Run the bundled converter from this skill directory:
2424

2525
```bash
2626
python scripts/convert_heavy_file.py /absolute/path/to/file.ext
2727
```
2828

29-
3. If the environment is clean and needs packages, prefer:
29+
1. If the environment is clean and needs packages, prefer:
3030

3131
```bash
3232
uv run \
@@ -37,12 +37,12 @@ uv run \
3737
python scripts/convert_heavy_file.py /absolute/path/to/file.ext
3838
```
3939

40-
4. Read `index.md` first, not the original file.
41-
5. Follow the index recommendation:
40+
1. Read `index.md` first, not the original file.
41+
2. Follow the index recommendation:
4242
- `read_extracted_artifact`: inspect the generated markdown or CSV
4343
- `cheap_model_or_stronger_converter`: retry with a better deterministic tool or use a cheaper model on the extracted artifact only
4444
- `manual_review`: tell the user the deterministic route failed and propose the next cheapest fallback
45-
6. Use expensive model context only after the file has already been compressed into a smaller artifact.
45+
3. Use expensive model context only after the file has already been compressed into a smaller artifact.
4646

4747
## Client Rules
4848

@@ -54,4 +54,3 @@ uv run \
5454
## Bundled References
5555

5656
- `references/open-source-stack.md` explains the tool choices and fallback tiers.
57-

0 commit comments

Comments
 (0)