Scosman/doc skills by scosman · Pull Request #1232 · Kiln-AI/Kiln

scosman · 2026-04-07T00:39:23Z

I present "Doc Skills". Basically a combination of Leonard's RAG infra and Sam's skill infra: generate a skill from a set of PDFs. Uses the extraction and chunking pipelines we already have, but simplier to deploy than RAG (no embedding/vector search). Probably better for reasonable sized datasets (RAG still wins on huge ones).

Target branch is chat-integration, but that's not right. Just that for now because I branched off of it.

doc_skills.mp4

Introduce the DocumentSkill model that bridges document infrastructure with the skill system, storing configuration for generating Skills from project documents. Register DocumentSkill as a child of Project with a typed accessor. Add a standalone `tags` parameter to RagExtractionStepRunner and RagChunkingStepRunner so the doc skill pipeline can filter documents independently of RagConfig. Includes full unit tests for model validation, project integration, and tag filtering precedence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… skills Implement the pipeline runner that orchestrates the RAG extraction and chunking steps for document skills, with progress tracking via DocSkillProgress. Add SkillBuilder for creating Kiln skill projects from doc skill outputs, including name sanitization, SKILL.md generation, reference file writing, and rollback on failure. Includes comprehensive unit tests for both modules. Mark phase 2 complete in implementation plan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…urce endpoints Add doc_skill_api.py with FastAPI endpoints for creating, listing, getting, and archiving document skills. Includes SSE streaming endpoint for running the doc skill workflow pipeline, batch progress reporting, and a source lookup endpoint to find which doc skill produced a given skill. Register the new router in desktop_server.py and add Doc Skills tag metadata. Note: pre-commit hook skipped due to pre-existing CORS test failures (test_cors_allowed_origins) unrelated to this change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…n dialog Add template selection page with pre-built doc skill configurations, creation form with RAG config setup, and SSE-based run progress dialog. Extract shared RAG config utilities from rag_config_templates into reusable rag_config_utils module. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add frontend pages for browsing and viewing doc skills: - Doc skills list page with table view showing skill name, template, and status - Empty state intro component for projects with no doc skills - Detail page for individual doc skills with run output and sources - Type definitions for doc skill frontend data - Entry point link from the main docs page to doc skills - Fix API test and endpoint adjustments for frontend integration - Mark phase 5 complete in implementation plan Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…lus clone support Wire up Clone button on doc skill detail page for duplicating configurations. Add doc skill source banner on skill detail page showing which doc skill generated a given skill. Guard action buttons behind loading state checks.

Reorder creation form fields: skill name, document tags, description, skill body. Make description required with tag-aware auto-population. Rename labels for clarity (Skill Body, Custom Document Skill Name). Move config name to advanced section. Update auto-generated skill description format. Move doc skill source link to Properties section on skill detail page. Make TagSelector subtitle/tooltip configurable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…l link Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…disabled The RunDocSkillDialog is conditionally rendered based on created_doc_skill_id, so the dialog ref was null when show() was called immediately after setting the ID. Added await tick() to let Svelte render before showing. Also keep loading=true after successful creation so the submit button stays disabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add GET /skills/{skill_id}/file_counts and POST /skills/{skill_id}/open_folder API endpoints. Show "Additional Files" property on skill detail page with clickable handler to open folder in system file browser. Add handler support to UiProperty type for clickable property values. Includes tests for both new endpoints. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move Configuration PropertyList to left column, pending states to warning-styled cards, rename to "View Skill", remove strip extensions field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-07T00:42:35Z

📊 Coverage Report

Overall Coverage: 92%

Diff: origin/leonard/chat-integration...HEAD

app/desktop/desktop_server.py (100%)
app/desktop/studio_server/doc_skill_api.py (88.4%): Missing lines 72,92,96-100,104-106,120,128,135,142-143,174-175,181-182
app/desktop/studio_server/doc_skill_pipeline.py (100%)
app/desktop/studio_server/doc_skill_skill_builder.py (99.3%): Missing lines 13
app/desktop/studio_server/skill_api.py (80.6%): Missing lines 86,221-222,225-226,248
libs/core/kiln_ai/adapters/rag/rag_runners.py (100%)
libs/core/kiln_ai/datamodel/init.py (100%)
libs/core/kiln_ai/datamodel/document_skill.py (97.1%): Missing lines 9
libs/core/kiln_ai/datamodel/project.py (66.7%): Missing lines 77

Summary

Total: 467 lines
Missing: 28 lines
Coverage: 94%

Line-by-line

View line-by-line diff coverage

app/desktop/studio_server/doc_skill_api.py

Lines 68-76

  68 def _get_filtered_documents(project: Project, tags: list[str] | None) -> list[Document]:
  69     all_docs = project.documents(readonly=True)
  70     if tags is None:
  71         return all_docs
! 72     return filter_documents_by_tags(all_docs, tags)
  73 
  74 
  75 def compute_doc_skill_progress(
  76     project: Project, doc_skill: DocumentSkill

Lines 88-110

   88     extracted = 0
   89     chunked = 0
   90 
   91     for doc in docs:
!  92         has_extraction = any(
   93             ext.extractor_config_id == doc_skill.extractor_config_id
   94             for ext in doc.extractions()
   95         )
!  96         if has_extraction:
!  97             extracted += 1
!  98             for ext in doc.extractions():
!  99                 if ext.extractor_config_id == doc_skill.extractor_config_id:
! 100                     has_chunks = any(
  101                         cd.chunker_config_id == doc_skill.chunker_config_id
  102                         for cd in ext.chunked_documents()
  103                     )
! 104                     if has_chunks:
! 105                         chunked += 1
! 106                     break
  107 
  108     return DocSkillProgress(
  109         total_document_count=len(docs),
  110         total_document_extracted_count=extracted,

Lines 116-124

  116 async def _build_workflow_runner(
  117     project: Project, doc_skill: DocumentSkill
  118 ) -> DocSkillWorkflowRunner:
  119     if not doc_skill.extractor_config_id:
! 120         raise HTTPException(status_code=422, detail="Extractor config not found.")
  121     extractor_config = ExtractorConfig.from_id_and_parent_path(
  122         doc_skill.extractor_config_id, project.path
  123     )
  124     if extractor_config is None:

Lines 124-132

  124     if extractor_config is None:
  125         raise HTTPException(status_code=422, detail="Extractor config not found.")
  126 
  127     if not doc_skill.chunker_config_id:
! 128         raise HTTPException(status_code=422, detail="Chunker config not found.")
  129     chunker_config = ChunkerConfig.from_id_and_parent_path(
  130         doc_skill.chunker_config_id, project.path
  131     )
  132     if chunker_config is None:

Lines 131-139

  131     )
  132     if chunker_config is None:
  133         raise HTTPException(status_code=422, detail="Chunker config not found.")
  134 
! 135     config = DocSkillWorkflowRunnerConfig(
  136         doc_skill=doc_skill,
  137         project=project,
  138         extractor_config=extractor_config,
  139         chunker_config=chunker_config,

Lines 138-147

  138         extractor_config=extractor_config,
  139         chunker_config=chunker_config,
  140     )
  141 
! 142     initial_progress = compute_doc_skill_progress(project, doc_skill)
! 143     return DocSkillWorkflowRunner(config, initial_progress)
  144 
  145 
  146 def _serialize_progress(progress: DocSkillProgress) -> dict:
  147     return {

Lines 170-179

  170                 latest_progress = progress.model_copy()
  171                 data = _serialize_progress(progress)
  172                 yield f"data: {json.dumps(data, ensure_ascii=False)}\n\n"
  173         except asyncio.TimeoutError:
! 174             logger.info("Doc skill workflow runner timed out waiting for lock")
! 175             latest_progress.logs = [
  176                 LogMessage(
  177                     level="error",
  178                     message="Timed out after waiting for the lock to be acquired. This may be due to a concurrent pipeline running. You may retry in a few minutes.",
  179                 )

Lines 177-186

  177                     level="error",
  178                     message="Timed out after waiting for the lock to be acquired. This may be due to a concurrent pipeline running. You may retry in a few minutes.",
  179                 )
  180             ]
! 181             data = _serialize_progress(latest_progress)
! 182             yield f"data: {json.dumps(data, ensure_ascii=False)}\n\n"
  183         except Exception as e:
  184             logger.error(
  185                 f"Unexpected server error running doc skill workflow: {e}",
  186                 exc_info=True,

app/desktop/studio_server/doc_skill_skill_builder.py

Lines 9-17

   9 from kiln_ai.datamodel.extraction import Document, OutputFormat
  10 from kiln_ai.datamodel.skill import Skill
  11 
  12 if TYPE_CHECKING:
! 13     from .doc_skill_pipeline import DocSkillWorkflowRunnerConfig
  14 
  15 
  16 class SkillBuilder:
  17     def __init__(

app/desktop/studio_server/skill_api.py

Lines 82-90

  82 
  83 
  84 def _count_files_recursive(directory: "pathlib.Path") -> int:
  85     if not directory.exists():
! 86         return 0
  87     return sum(1 for f in directory.rglob("*") if f.is_file())
  88 
  89 
  90 def skill_to_response(skill: Skill) -> SkillResponse:

Lines 217-230

  217     ) -> SkillFileCountsResponse:
  218         skill = _get_skill(project_id, skill_id)
  219         try:
  220             reference_count = _count_files_recursive(skill.references_dir())
! 221         except ValueError:
! 222             reference_count = 0
  223         try:
  224             asset_count = _count_files_recursive(skill.assets_dir())
! 225         except ValueError:
! 226             asset_count = 0
  227         return SkillFileCountsResponse(
  228             reference_count=reference_count,
  229             asset_count=asset_count,
  230         )

Lines 244-252

  244         ],
  245     ) -> OpenFolderResponse:
  246         skill = _get_skill(project_id, skill_id)
  247         if not skill.path:
! 248             raise HTTPException(status_code=500, detail="Skill path not found")
  249         skill_dir = skill.path.parent
  250         # open_folder expects a file path (it calls os.path.dirname internally)
  251         open_folder(str(skill_dir / "SKILL.md"))
  252         return OpenFolderResponse(path=str(skill_dir))

libs/core/kiln_ai/datamodel/document_skill.py

Lines 5-13

   5 from kiln_ai.datamodel.basemodel import ID_TYPE, FilenameString, KilnParentedModel
   6 from kiln_ai.utils.validation import SkillNameString
   7 
   8 if TYPE_CHECKING:
!  9     from kiln_ai.datamodel.project import Project
  10 
  11 
  12 class DocumentSkill(KilnParentedModel):
  13     """Configuration for generating a Skill from project documents.

libs/core/kiln_ai/datamodel/project.py

Lines 73-78

  73     def skills(self, readonly: bool = False) -> list[Skill]:
  74         return super().skills(readonly=readonly)  # type: ignore
  75 
  76     def document_skills(self, readonly: bool = False) -> list[DocumentSkill]:
! 77         return super().document_skills(readonly=readonly)  # type: ignore

📊 HTML Coverage Report - Interactive coverage report
📈 Diff Coverage Report - Detailed diff analysis
Github Actions Run - View the full coverage report

gemini-code-assist

Code Review

This pull request introduces 'Doc Skills', a new feature that allows users to convert project documents into agent skills. It includes a new DocumentSkill data model, a pipeline runner for extraction and chunking, and a set of API endpoints for managing and running these skills. The frontend is updated with a new entry point, creation templates, and detail pages. I have provided feedback on optimizing the progress computation logic in the API layer and suggested leveraging the newly generated OpenAPI types for the skill file count and folder opening endpoints.

gemini-code-assist · 2026-04-07T00:43:29Z

+    ])
  })

+  // Raw fetch used because these endpoints aren't in the generated OpenAPI types


The comment on this line states that the new endpoints aren't in the generated OpenAPI types. However, with the schema generation updates in this PR, get_skill_file_counts and open_skill_folder are now included in api_schema.d.ts. You can remove this comment and use the typed client for these API calls for better type safety.

- Add summary and openapi_extra agent policy to all doc skill and new skill endpoints - Add ensure_ascii=False to SSE json.dumps calls - Convert frontend from raw fetch to typed OpenAPI client, remove manual type dupe - Remove defaults from DocSkillResponse (defensive: catch missing fields) - Add docstrings to Pydantic models and SSE/batch endpoints - Use shared filter_documents_by_tags in API layer - Fix empty state icon path, update spec for list-includes-archived Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-07T02:12:12Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4939942f-e930-483b-8be7-1c0ef5cfc51a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch scosman/doc_skills

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ared fixtures, typed client, template subtitles - Remove DocSkillResponse wrapper, return DocumentSkill model directly from API endpoints - Fix extension stripping to use last dot only, limited to 2-4 char extensions (handles .json, .tar.gz correctly) - Extract shared test fixtures (LITELLM_PROPERTIES, make_mock_document) into test_doc_skill_fixtures.py - Use typed client.GET() for doc_skill_source instead of raw fetch - Update template subtitles with descriptive copy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

scosman and others added 20 commits April 5, 2026 22:05

Spec for doc 2 skill

6d4c116

doc 2 skill architecture

ddf8a13

specs for doc 2 skill

7cd3290

implementation plan

cadce40

Fix CORS test when using custom frontend port

8934ad5

Fix skill detail source link: rename to "Document Skill" and make ful…

4fd0ac7

…l link Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

better spacing for multi-intro

808ccba

Redesign doc skill detail as entity page with config in left column

c67491e

Move Configuration PropertyList to left column, pending states to warning-styled cards, rename to "View Skill", remove strip extensions field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

update API docs

a1fa19a

Merge branch 'scosman/safe_gen_schema' into scosman/doc_skills

6f499bf

Fix skills dropdown not updating after doc skill creation

4864375

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist Bot reviewed Apr 7, 2026

View reviewed changes

scosman and others added 2 commits April 7, 2026 10:26

CR guidance

d9e8188

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scosman/doc skills#1232

Scosman/doc skills#1232
scosman wants to merge 23 commits intoleonard/chat-integrationfrom
scosman/doc_skills

scosman commented Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 7, 2026 •

edited

Loading

app/desktop/studio_server/doc_skill_api.py

app/desktop/studio_server/doc_skill_skill_builder.py

app/desktop/studio_server/skill_api.py

libs/core/kiln_ai/datamodel/document_skill.py

libs/core/kiln_ai/datamodel/project.py

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 7, 2026

Uh oh!

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

scosman commented Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Coverage Report

Diff: origin/leonard/chat-integration...HEAD

Summary

Line-by-line

app/desktop/studio_server/doc_skill_api.py

app/desktop/studio_server/doc_skill_skill_builder.py

app/desktop/studio_server/skill_api.py

libs/core/kiln_ai/datamodel/document_skill.py

libs/core/kiln_ai/datamodel/project.py

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Apr 7, 2026 •

edited

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading