feat: agent can regex through Skill reference files by leonardmq · Pull Request #1304 · Kiln-AI/Kiln

leonardmq · 2026-04-21T09:03:40Z

What does this PR do?

WIP prototype for letting agent regex through Skill reference files.

Checklists

Tests have been run locally and passed
New tests have been added to any work in /lib

coderabbitai · 2026-04-21T09:04:01Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6ffd18fa-69d1-421a-bec1-17c73ed8b9bb

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch leonard/skill-search

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a new SkillSearchTool that allows agents to perform regex searches across markdown reference files within a skill. It also enhances the existing SkillTool to support directory listings with line counts. Key changes include updates to the prompt builder to provide instructions for the new search capability, new data model methods for iterating through references and counting lines, and corresponding updates to the adapter logic for tool resolution and caching. Feedback focuses on performance and memory efficiency, specifically regarding the simultaneous loading of multiple files into memory and the non-streaming approach to line counting, as well as a potential ReDoS vulnerability from unconstrained regex patterns.

gemini-code-assist · 2026-04-21T09:05:42Z

+                prefix_rel = None
+                scope_label = None
+            try:
+                corpus = list(skill.iter_markdown_references(prefix=prefix_rel))


Converting the corpus generator to a list forces all markdown file contents into memory simultaneously. For skills with many or large reference files, this can lead to excessive memory consumption or OOM errors. It is safer to iterate over the generator directly to process files one by one. Note that if you remove list(), you should also ensure that any potential ValueError from the generator initialization is handled correctly.

Suggested change

corpus = list(skill.iter_markdown_references(prefix=prefix_rel))

corpus = skill.iter_markdown_references(prefix=prefix_rel)

gemini-code-assist · 2026-04-21T09:05:42Z

+                elif entry.suffix == ".md":
+                    rel = entry.relative_to(base_dir).as_posix()
+                    try:
+                        line_count = skill.count_file_lines(root, rel)


Calling count_file_lines for every markdown file during a directory listing is a major performance bottleneck. Since the current implementation of count_file_lines reads the full content of each file, listing a directory with many files results in O(Total Size of All Files) I/O operations. This makes the tool very slow and memory-intensive for large reference sets.

gemini-code-assist · 2026-04-21T09:05:42Z

+            raise ValueError(
+                f"Unknown resource root: {resource_root!r}. Expected 'references' or 'assets'."
+            )
+        text = self._read_resource(base_dir, relative_path)


The count_file_lines method reads the entire file content into memory just to count newlines. This is inefficient for large reference or asset files. A streaming approach (e.g., reading the file in chunks or line-by-line) would be significantly more memory-efficient.

gemini-code-assist · 2026-04-21T09:05:42Z

+            return ToolCallResult(output="Error: 'pattern' parameter is required.")
+
+        try:
+            regex = re.compile(pattern, re.IGNORECASE)


Compiling and executing a regex pattern provided by the agent without constraints or timeouts poses a risk of Regular Expression Denial of Service (ReDoS). A maliciously crafted or accidentally complex pattern can cause catastrophic backtracking, hanging the execution thread. Consider using a regex engine with built-in protections or implementing a timeout for the search operation.

github-actions · 2026-04-21T09:06:56Z

📊 Coverage Report

Overall Coverage: 92%

Diff: origin/main...HEAD

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (21.4%): Missing lines 488-490,492,495,497-499,503-505
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (88.7%): Missing lines 838-841,844-845
libs/core/kiln_ai/adapters/prompt_builders.py (100%)
libs/core/kiln_ai/datamodel/skill.py (97.1%): Missing lines 133
libs/core/kiln_ai/datamodel/tool_id.py (100%)
libs/core/kiln_ai/tools/skill_search_tool.py (96.9%): Missing lines 234,252-255,382-383
libs/core/kiln_ai/tools/skill_tool.py (96.4%): Missing lines 191-192

Summary

Total: 409 lines
Missing: 27 lines
Coverage: 93%

Line-by-line

View line-by-line diff coverage

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

Lines 484-508

  484                     self._tool_cache[cache_key] = skill_def
  485                     tool_definitions.append(skill_def)
  486 
  487         if skill_search_tool_ids:
! 488             cache_key = "search::" + "::".join(sorted(skill_search_tool_ids))
! 489             if cache_key in self._tool_cache:
! 490                 tool_definitions.append(self._tool_cache[cache_key])
  491             else:
! 492                 from kiln_ai.adapters.adapter_registry import (
  493                     load_skills_from_tool_ids,
  494                 )
! 495                 from kiln_ai.tools.skill_search_tool import SkillSearchTool
  496 
! 497                 skills_dict = load_skills_from_tool_ids(task, skill_search_tool_ids)
! 498                 if skills_dict:
! 499                     search_tool = SkillSearchTool(
  500                         f"{SKILL_SEARCH_TOOL_ID_PREFIX}_combined",
  501                         list(skills_dict.values()),
  502                     )
! 503                     search_def = await search_tool.toolcall_definition()
! 504                     self._tool_cache[cache_key] = search_def
! 505                     tool_definitions.append(search_def)
  506 
  507         return tool_definitions if tool_definitions else None

libs/core/kiln_ai/adapters/model_adapters/base_adapter.py

Lines 834-849

  834             tools.append(SkillTool(f"{SKILL_TOOL_ID_PREFIX}_combined", skills))
  835 
  836         search_skills = self._resolve_skill_search_tool_skills()
  837         if search_skills:
! 838             seen_names = set()
! 839             for skill in search_skills:
! 840                 if skill.name in seen_names:
! 841                     raise ValueError(
  842                         f"Duplicate skill name '{skill.name}'. Each skill must have a unique name."
  843                     )
! 844                 seen_names.add(skill.name)
! 845             tools.append(
  846                 SkillSearchTool(
  847                     f"{SKILL_SEARCH_TOOL_ID_PREFIX}_combined", search_skills
  848                 )
  849             )

libs/core/kiln_ai/datamodel/skill.py

Lines 129-137

  129             return
  130 
  131         for md_path in sorted(target.rglob("*.md")):
  132             if not md_path.is_file():
! 133                 continue
  134             try:
  135                 content = md_path.read_text(encoding="utf-8")
  136             except UnicodeDecodeError:
  137                 continue

libs/core/kiln_ai/tools/skill_search_tool.py

Lines 230-238

  230                     output="Error: Only markdown files (.md) are searchable"
  231                 )
  232             rel = resource[len(REFERENCES_PREFIX) :]
  233             if not rel:
! 234                 return ToolCallResult(
  235                     output="Error: 'resource' must be a file path. Use 'path_prefix' to search a directory."
  236                 )
  237             base_dir = skill.references_dir()
  238             try:

Lines 248-259

  248                     output="Error: 'resource' must be a file path. Use 'path_prefix' to search a directory."
  249                 )
  250             try:
  251                 content = skill.read_reference(rel)
! 252             except FileNotFoundError:
! 253                 return ToolCallResult(output=f"Error: Resource not found: {resource}")
! 254             except ValueError as e:
! 255                 return ToolCallResult(output=f"Error: {e}")
  256             corpus = [(rel, content)]
  257             scope_label = resource
  258         else:
  259             if path_prefix is not None:

Lines 378-387

  378         if body.startswith("---\n"):
  379             try:
  380                 end_idx = body.index("\n---\n", 3)
  381                 body = body[end_idx + 5 :]
! 382             except ValueError:
! 383                 pass
  384         body = body.lstrip("\n")
  385         for i, line in enumerate(body.splitlines()):
  386             if i >= _H1_SEARCH_LINES:
  387                 break

libs/core/kiln_ai/tools/skill_tool.py

Lines 187-196

  187                             f"{prefix}{connector}{entry.name} ({line_count} lines)"
  188                         )
  189                         totals["md_files"] += 1
  190                         totals["md_lines"] += line_count
! 191                     except (ValueError, OSError, UnicodeDecodeError):
! 192                         tree_lines.append(f"{prefix}{connector}{entry.name}")
  193                 else:
  194                     tree_lines.append(f"{prefix}{connector}{entry.name}")
  195 
  196         walk(dir_path, "")

📊 HTML Coverage Report - Interactive coverage report
📈 Diff Coverage Report - Detailed diff analysis
Github Actions Run - View the full coverage report

feat: agent can regex through Skill reference files

9fd190d

leonardmq marked this pull request as draft April 21, 2026 09:03

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: agent can regex through Skill reference files#1304

feat: agent can regex through Skill reference files#1304
leonardmq wants to merge 1 commit intomainfrom
leonard/skill-search

leonardmq commented Apr 21, 2026

Uh oh!

coderabbitai Bot commented Apr 21, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

libs/core/kiln_ai/adapters/model_adapters/base_adapter.py

libs/core/kiln_ai/datamodel/skill.py

libs/core/kiln_ai/tools/skill_search_tool.py

libs/core/kiln_ai/tools/skill_tool.py

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	corpus = list(skill.iter_markdown_references(prefix=prefix_rel))
	corpus = skill.iter_markdown_references(prefix=prefix_rel)

Conversation

leonardmq commented Apr 21, 2026

What does this PR do?

Checklists

Uh oh!

coderabbitai Bot commented Apr 21, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 21, 2026

📊 Coverage Report

Diff: origin/main...HEAD

Summary

Line-by-line

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

libs/core/kiln_ai/adapters/model_adapters/base_adapter.py

libs/core/kiln_ai/datamodel/skill.py

libs/core/kiln_ai/tools/skill_search_tool.py

libs/core/kiln_ai/tools/skill_tool.py

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant