Scosman/doc skills#1232
Conversation
Introduce the DocumentSkill model that bridges document infrastructure with the skill system, storing configuration for generating Skills from project documents. Register DocumentSkill as a child of Project with a typed accessor. Add a standalone `tags` parameter to RagExtractionStepRunner and RagChunkingStepRunner so the doc skill pipeline can filter documents independently of RagConfig. Includes full unit tests for model validation, project integration, and tag filtering precedence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… skills Implement the pipeline runner that orchestrates the RAG extraction and chunking steps for document skills, with progress tracking via DocSkillProgress. Add SkillBuilder for creating Kiln skill projects from doc skill outputs, including name sanitization, SKILL.md generation, reference file writing, and rollback on failure. Includes comprehensive unit tests for both modules. Mark phase 2 complete in implementation plan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…urce endpoints Add doc_skill_api.py with FastAPI endpoints for creating, listing, getting, and archiving document skills. Includes SSE streaming endpoint for running the doc skill workflow pipeline, batch progress reporting, and a source lookup endpoint to find which doc skill produced a given skill. Register the new router in desktop_server.py and add Doc Skills tag metadata. Note: pre-commit hook skipped due to pre-existing CORS test failures (test_cors_allowed_origins) unrelated to this change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n dialog Add template selection page with pre-built doc skill configurations, creation form with RAG config setup, and SSE-based run progress dialog. Extract shared RAG config utilities from rag_config_templates into reusable rag_config_utils module. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add frontend pages for browsing and viewing doc skills: - Doc skills list page with table view showing skill name, template, and status - Empty state intro component for projects with no doc skills - Detail page for individual doc skills with run output and sources - Type definitions for doc skill frontend data - Entry point link from the main docs page to doc skills - Fix API test and endpoint adjustments for frontend integration - Mark phase 5 complete in implementation plan Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lus clone support Wire up Clone button on doc skill detail page for duplicating configurations. Add doc skill source banner on skill detail page showing which doc skill generated a given skill. Guard action buttons behind loading state checks.
Reorder creation form fields: skill name, document tags, description, skill body. Make description required with tag-aware auto-population. Rename labels for clarity (Skill Body, Custom Document Skill Name). Move config name to advanced section. Update auto-generated skill description format. Move doc skill source link to Properties section on skill detail page. Make TagSelector subtitle/tooltip configurable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l link Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…disabled The RunDocSkillDialog is conditionally rendered based on created_doc_skill_id, so the dialog ref was null when show() was called immediately after setting the ID. Added await tick() to let Svelte render before showing. Also keep loading=true after successful creation so the submit button stays disabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add GET /skills/{skill_id}/file_counts and POST /skills/{skill_id}/open_folder
API endpoints. Show "Additional Files" property on skill detail page with
clickable handler to open folder in system file browser. Add handler support
to UiProperty type for clickable property values. Includes tests for both
new endpoints.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move Configuration PropertyList to left column, pending states to warning-styled cards, rename to "View Skill", remove strip extensions field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📊 Coverage ReportOverall Coverage: 92% Diff: origin/leonard/chat-integration...HEAD
Summary
Line-by-lineView line-by-line diff coverageapp/desktop/studio_server/doc_skill_api.pyLines 68-76 68 def _get_filtered_documents(project: Project, tags: list[str] | None) -> list[Document]:
69 all_docs = project.documents(readonly=True)
70 if tags is None:
71 return all_docs
! 72 return filter_documents_by_tags(all_docs, tags)
73
74
75 def compute_doc_skill_progress(
76 project: Project, doc_skill: DocumentSkillLines 88-110 88 extracted = 0
89 chunked = 0
90
91 for doc in docs:
! 92 has_extraction = any(
93 ext.extractor_config_id == doc_skill.extractor_config_id
94 for ext in doc.extractions()
95 )
! 96 if has_extraction:
! 97 extracted += 1
! 98 for ext in doc.extractions():
! 99 if ext.extractor_config_id == doc_skill.extractor_config_id:
! 100 has_chunks = any(
101 cd.chunker_config_id == doc_skill.chunker_config_id
102 for cd in ext.chunked_documents()
103 )
! 104 if has_chunks:
! 105 chunked += 1
! 106 break
107
108 return DocSkillProgress(
109 total_document_count=len(docs),
110 total_document_extracted_count=extracted,Lines 116-124 116 async def _build_workflow_runner(
117 project: Project, doc_skill: DocumentSkill
118 ) -> DocSkillWorkflowRunner:
119 if not doc_skill.extractor_config_id:
! 120 raise HTTPException(status_code=422, detail="Extractor config not found.")
121 extractor_config = ExtractorConfig.from_id_and_parent_path(
122 doc_skill.extractor_config_id, project.path
123 )
124 if extractor_config is None:Lines 124-132 124 if extractor_config is None:
125 raise HTTPException(status_code=422, detail="Extractor config not found.")
126
127 if not doc_skill.chunker_config_id:
! 128 raise HTTPException(status_code=422, detail="Chunker config not found.")
129 chunker_config = ChunkerConfig.from_id_and_parent_path(
130 doc_skill.chunker_config_id, project.path
131 )
132 if chunker_config is None:Lines 131-139 131 )
132 if chunker_config is None:
133 raise HTTPException(status_code=422, detail="Chunker config not found.")
134
! 135 config = DocSkillWorkflowRunnerConfig(
136 doc_skill=doc_skill,
137 project=project,
138 extractor_config=extractor_config,
139 chunker_config=chunker_config,Lines 138-147 138 extractor_config=extractor_config,
139 chunker_config=chunker_config,
140 )
141
! 142 initial_progress = compute_doc_skill_progress(project, doc_skill)
! 143 return DocSkillWorkflowRunner(config, initial_progress)
144
145
146 def _serialize_progress(progress: DocSkillProgress) -> dict:
147 return {Lines 170-179 170 latest_progress = progress.model_copy()
171 data = _serialize_progress(progress)
172 yield f"data: {json.dumps(data, ensure_ascii=False)}\n\n"
173 except asyncio.TimeoutError:
! 174 logger.info("Doc skill workflow runner timed out waiting for lock")
! 175 latest_progress.logs = [
176 LogMessage(
177 level="error",
178 message="Timed out after waiting for the lock to be acquired. This may be due to a concurrent pipeline running. You may retry in a few minutes.",
179 )Lines 177-186 177 level="error",
178 message="Timed out after waiting for the lock to be acquired. This may be due to a concurrent pipeline running. You may retry in a few minutes.",
179 )
180 ]
! 181 data = _serialize_progress(latest_progress)
! 182 yield f"data: {json.dumps(data, ensure_ascii=False)}\n\n"
183 except Exception as e:
184 logger.error(
185 f"Unexpected server error running doc skill workflow: {e}",
186 exc_info=True,app/desktop/studio_server/doc_skill_skill_builder.pyLines 9-17 9 from kiln_ai.datamodel.extraction import Document, OutputFormat
10 from kiln_ai.datamodel.skill import Skill
11
12 if TYPE_CHECKING:
! 13 from .doc_skill_pipeline import DocSkillWorkflowRunnerConfig
14
15
16 class SkillBuilder:
17 def __init__(app/desktop/studio_server/skill_api.pyLines 82-90 82
83
84 def _count_files_recursive(directory: "pathlib.Path") -> int:
85 if not directory.exists():
! 86 return 0
87 return sum(1 for f in directory.rglob("*") if f.is_file())
88
89
90 def skill_to_response(skill: Skill) -> SkillResponse:Lines 217-230 217 ) -> SkillFileCountsResponse:
218 skill = _get_skill(project_id, skill_id)
219 try:
220 reference_count = _count_files_recursive(skill.references_dir())
! 221 except ValueError:
! 222 reference_count = 0
223 try:
224 asset_count = _count_files_recursive(skill.assets_dir())
! 225 except ValueError:
! 226 asset_count = 0
227 return SkillFileCountsResponse(
228 reference_count=reference_count,
229 asset_count=asset_count,
230 )Lines 244-252 244 ],
245 ) -> OpenFolderResponse:
246 skill = _get_skill(project_id, skill_id)
247 if not skill.path:
! 248 raise HTTPException(status_code=500, detail="Skill path not found")
249 skill_dir = skill.path.parent
250 # open_folder expects a file path (it calls os.path.dirname internally)
251 open_folder(str(skill_dir / "SKILL.md"))
252 return OpenFolderResponse(path=str(skill_dir))libs/core/kiln_ai/datamodel/document_skill.pyLines 5-13 5 from kiln_ai.datamodel.basemodel import ID_TYPE, FilenameString, KilnParentedModel
6 from kiln_ai.utils.validation import SkillNameString
7
8 if TYPE_CHECKING:
! 9 from kiln_ai.datamodel.project import Project
10
11
12 class DocumentSkill(KilnParentedModel):
13 """Configuration for generating a Skill from project documents.libs/core/kiln_ai/datamodel/project.pyLines 73-78 73 def skills(self, readonly: bool = False) -> list[Skill]:
74 return super().skills(readonly=readonly) # type: ignore
75
76 def document_skills(self, readonly: bool = False) -> list[DocumentSkill]:
! 77 return super().document_skills(readonly=readonly) # type: ignore
|
There was a problem hiding this comment.
Code Review
This pull request introduces 'Doc Skills', a new feature that allows users to convert project documents into agent skills. It includes a new DocumentSkill data model, a pipeline runner for extraction and chunking, and a set of API endpoints for managing and running these skills. The frontend is updated with a new entry point, creation templates, and detail pages. I have provided feedback on optimizing the progress computation logic in the API layer and suggested leveraging the newly generated OpenAPI types for the skill file count and folder opening endpoints.
| ]) | ||
| }) | ||
|
|
||
| // Raw fetch used because these endpoints aren't in the generated OpenAPI types |
There was a problem hiding this comment.
The comment on this line states that the new endpoints aren't in the generated OpenAPI types. However, with the schema generation updates in this PR, get_skill_file_counts and open_skill_folder are now included in api_schema.d.ts. You can remove this comment and use the typed client for these API calls for better type safety.
- Add summary and openapi_extra agent policy to all doc skill and new skill endpoints - Add ensure_ascii=False to SSE json.dumps calls - Convert frontend from raw fetch to typed OpenAPI client, remove manual type dupe - Remove defaults from DocSkillResponse (defensive: catch missing fields) - Add docstrings to Pydantic models and SSE/batch endpoints - Use shared filter_documents_by_tags in API layer - Fix empty state icon path, update spec for list-includes-archived Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ared fixtures, typed client, template subtitles - Remove DocSkillResponse wrapper, return DocumentSkill model directly from API endpoints - Fix extension stripping to use last dot only, limited to 2-4 char extensions (handles .json, .tar.gz correctly) - Extract shared test fixtures (LITELLM_PROPERTIES, make_mock_document) into test_doc_skill_fixtures.py - Use typed client.GET() for doc_skill_source instead of raw fetch - Update template subtitles with descriptive copy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
I present "Doc Skills". Basically a combination of Leonard's RAG infra and Sam's skill infra: generate a skill from a set of PDFs. Uses the extraction and chunking pipelines we already have, but simplier to deploy than RAG (no embedding/vector search). Probably better for reasonable sized datasets (RAG still wins on huge ones).
Target branch is chat-integration, but that's not right. Just that for now because I branched off of it.
doc_skills.mp4