Conversation
… add a non recursive version to limit return files
There was a problem hiding this comment.
Pull request overview
This PR adds graceful handling for directories with too many files and introduces a non-recursive file listing option. It implements a configurable file limit (default 100) via environment variable to prevent performance issues when listing large directories, and provides an alternative non-recursive listing function for better control over output size.
- Adds
FILE_LIMIT_FOR_LIST_FILESenvironment variable with default value of 100 - Implements
list_files_non_recursivetool to list files in a single directory level only - Adds file count check in
list_filesthat returns an informative error when limit is exceeded
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/seclab_taskflows/toolboxes/local_file_viewer.yaml | Adds FILE_LIMIT_FOR_LIST_FILES environment variable configuration |
| src/seclab_taskflows/mcp_servers/local_file_viewer.py | Implements file limit checking, adds recursive parameter to _list_files, and introduces list_files_non_recursive tool |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if entry.is_dir(): | ||
| continue | ||
| if not recursive: | ||
| dirname = remove_root_dir(entry.filename) | ||
| if Path(dirname).parent == Path(root_dir): | ||
| results.append(dirname) | ||
| filename = remove_root_dir(entry.filename) | ||
| if root_dir and not is_subdirectory(root_dir, filename): | ||
| continue | ||
| if not recursive and Path(filename).parent != Path(root_dir): | ||
| continue | ||
| results.append(filename) |
There was a problem hiding this comment.
Logic error in the _list_files function: when an entry is a directory, the code continues to process it as a file on line 84. When entry.is_dir() is True, the code should skip to the next iteration after handling the directory case, but currently it falls through and processes the directory entry again as if it were a file. The logic should add a 'continue' statement after line 83 to prevent processing directories as files. Additionally, line 84 onwards will execute for both files and directories in the current implementation.
| source_path = sanitize_file_path(source_path, [LOCAL_GH_DIR]) | ||
| if not source_path or not source_path.exists(): | ||
| return f"Invalid {owner} and {repo}. Check that the input is correct or try to fetch the repo from gh first." | ||
| content = _list_files(source_path, path, recursive=False) |
There was a problem hiding this comment.
The error message suggests using 'list_files_non_recursive' but this won't solve the problem if the non-recursive listing itself has more than FILE_LIMIT_FOR_LIST_FILES files in a single directory. The non-recursive function should also check and enforce the file limit. Consider applying the same limit check in list_files_non_recursive to prevent it from returning an excessive number of results.
| content = _list_files(source_path, path, recursive=False) | |
| content = _list_files(source_path, path, recursive=False) | |
| if len(content) > FILE_LIMIT_FOR_LIST_FILES: | |
| return f"Too many files to display in {owner}/{repo} at path {path} ({len(content)} files)." |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| repo: str = Field(description="The name of the repository"), | ||
| path: str = Field(description="The path to the directory in the repository")) -> str: | ||
| """ | ||
| List the files of a directory from a local GitHub repository non-recursively. |
There was a problem hiding this comment.
There's a spelling error in the docstring. "non recursively" should be hyphenated as "non-recursively" for proper English grammar.
| if not recursive: | ||
| dirname = remove_root_dir(entry.filename) | ||
| if Path(dirname).parent == Path(root_dir): | ||
| results.append(dirname) |
There was a problem hiding this comment.
The logic for handling directories in non-recursive mode has an issue. When a directory matches the parent path condition (line 82), the code appends it to results but then continues to line 84 which skips to the next entry. However, this means that if recursive is False and we're NOT in a directory entry, the code will still process files normally. The problem is that directory entries are being added to the results list, but the function should only list files and subdirectories at the specified level, not intermediate directory paths. This could lead to unexpected results where directory paths are included in what should be a file listing.
| if not recursive: | |
| dirname = remove_root_dir(entry.filename) | |
| if Path(dirname).parent == Path(root_dir): | |
| results.append(dirname) | |
| # In non-recursive mode, do not append directories to results |
| if root_dir and not is_subdirectory(root_dir, filename): | ||
| continue | ||
| if not recursive and Path(filename).parent != Path(root_dir): | ||
| continue |
There was a problem hiding this comment.
Inconsistent indentation detected. Line 89 has extra indentation (8 spaces instead of 4), which breaks Python's indentation consistency. This should be aligned with the if statement on line 88.
| continue | |
| continue |
| return json.dumps(content, indent=2) | ||
|
|
||
|
|
There was a problem hiding this comment.
The file limit check doesn't apply to the non-recursive version. This creates an inconsistency: if a user hits the limit with the recursive version and follows the suggestion to use list_files_non_recursive, they could still get thousands of files back if the directory has many files at that level. Consider adding the same file limit check to list_files_non_recursive for consistency and to prevent performance issues.
| return json.dumps(content, indent=2) | |
| if len(content) > FILE_LIMIT_FOR_LIST_FILES: | |
| return f"Too many files to display in {owner}/{repo} at path {path} ({len(content)} files)." | |
| return json.dumps(content, indent=2) |
| return f"Invalid {owner} and {repo}. Check that the input is correct or try to fetch the repo from gh first." | ||
| content = _list_files(source_path, path) | ||
| if len(content) > FILE_LIMIT_FOR_LIST_FILES: | ||
| return f"Too many files to display in {owner}/{repo} at path {path} ({len(content)} files). Try using `list_files_non_recursive` instead." |
There was a problem hiding this comment.
The error message suggests using list_files_non_recursive but wraps the function name in backticks, which may not render properly in all contexts where this error is displayed. Consider using either plain text or ensuring the output format properly handles markdown formatting.
| return f"Too many files to display in {owner}/{repo} at path {path} ({len(content)} files). Try using `list_files_non_recursive` instead." | |
| return f"Too many files to display in {owner}/{repo} at path {path} ({len(content)} files). Try using list_files_non_recursive instead." |
…list_file allow list files to exit gracefully when there are too many files and add a non recursive version to limit return files
No description provided.