Skip to content

Commit 32baf80

Browse files
👌 Replace gitignore-parser with ignore-python and add per-project follow_links (#64)
Replaces `gitignore-parser` with `ignore-python` (Rust `ignore` crate bindings) for file discovery. This provides native nested `.gitignore` support, improved performance, and behavioral parity with ubcode. Adds per-project `follow_links` config. ### Dependency swap - `gitignore-parser>=0.1.11` → `ignore-python>=0.3.3` ### Source discovery rewrite (`source_discover/source_discover.py`) - `WalkBuilder` replaces `Path.rglob()` + `gitignore_parser.parse_gitignore()` + `fnmatch` - Include/exclude patterns mapped to `OverrideBuilder` globs - File type filtering stays in Python (overrides would take precedence over gitignore rules) - Non-existent source directories now return empty list (`WalkBuilder` raises; old `rglob` silently yielded nothing) ### Walker filter alignment with ubc_codelinks The walker configuration now replicates the Rust `ignore` crate's `standard_filters(gitignore)` followed by `hidden(false)`, matching ubcode behaviour exactly. Since the Python `ignore-python` bindings don't expose `standard_filters()`, we set all six individual filter methods: ```python builder.ignore(gitignore) # .ignore file support builder.parents(gitignore) # parent ignore files builder.git_ignore(gitignore) # .gitignore builder.git_global(gitignore) # global gitignore builder.git_exclude(gitignore)# .git/info/exclude builder.hidden(False) # always show hidden files ``` Previously, `git_global` and `git_exclude` were hardcoded to `False` and `.ignore`/parent ignore support was not configured. ### New `follow_links` config - `follow_links: bool = False` added to `SourceDiscoverConfig`, both TypedDicts, CLI `discover` command, and `src_trace` directive passthrough ### Semantic change: include/exclude With the `ignore` crate's override system, include whitelists files (overriding gitignore), then exclude removes from that set. Previously include had absolute priority over both gitignore and exclude. ```toml [codelinks.projects.myproject.source_discover] src_dir = "./src" follow_links = true gitignore = true include = ["**/*.cpp"] exclude = ["build/*"] ``` ### Shared portable test fixture Added a shared JSON test fixture (`tests/data/discover_fixtures.json`) with 18 test cases covering all discovery behaviours. The same fixture is consumed by both this project and `ubc_codelinks` to verify matching behaviour across the Python and Rust implementations. Test cases cover: - Basic file extension discovery - `.gitignore` (root-level and nested) - `.ignore` file support - `include` and `exclude` patterns (individually and combined) - Default include derived from language extensions - Hidden file discovery - All comment types (cpp, python, rust, cs, yaml) - Non-existent `src_dir` edge case - Deeply nested directory trees - All C++ file extensions ### Documentation updates - **`configuration.rst`**: Added `follow_links` option docs, expanded `gitignore` section to list all supported ignore sources (nested `.gitignore`, `.ignore`, `.git/info/exclude`, global gitignore, parent ignore files), removed "Nested .gitignore is NOT supported" limitation note, corrected `include`/`exclude` priority description - **`discover.rst`**: Added `follow_links` to advanced filtering example - **`roadmap.rst`**: Marked nested `.gitignore` support as completed --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: chrisjsewell <2997570+chrisjsewell@users.noreply.github.com> Co-authored-by: Chris Sewell <chrisj_sewell@hotmail.com>
1 parent a2a79e4 commit 32baf80

10 files changed

Lines changed: 608 additions & 51 deletions

File tree

‎docs/source/components/configuration.rst‎

Lines changed: 38 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -178,15 +178,17 @@ Configures how **Sphinx-CodeLinks** discovers and processes source files within
178178
exclude = []
179179
include = []
180180
gitignore = true
181+
follow_links = false
181182
comment_type = "cpp"
182183
183184
**Configuration fields:**
184185

185186
- ``src_dir`` - Root directory for source file discovery (relative to Sphinx project root or the directory where the TOML config file is located if given)
186187
- ``exclude`` - List of glob patterns to exclude from processing
187188
- ``include`` - List of glob patterns to include (if empty, includes all files)
188-
- ``gitignore`` - Whether to respect ``.gitignore`` rules when discovering files (Nested .gitignore is NOT supported yet)
189-
- ``comment_type`` - Comment style for the programming language ("cpp" and "python" are currently supported)
189+
- ``gitignore`` - Whether to respect ``.gitignore``, ``.ignore``, and related ignore files when discovering files
190+
- ``follow_links`` - Whether to follow symbolic links during file discovery
191+
- ``comment_type`` - Comment style for the programming language
190192

191193
.. _`source_dir`:
192194

@@ -251,7 +253,9 @@ Defines a list of glob patterns for files to explicitly include in discovery. Wh
251253
"include/**/*.hpp"
252254
]
253255
254-
**Priority:** The ``include`` option has the highest priority and overrides both ``exclude`` and ``gitignore`` settings.
256+
**Priority:** When ``include`` patterns are specified, only files matching those patterns
257+
are considered (this overrides ``gitignore`` exclusions for matched files).
258+
``exclude`` patterns are then applied to remove files from that set.
255259

256260
**Common inclusion patterns:**
257261

@@ -317,7 +321,9 @@ Specifies the comment syntax style used in the source code files. This determine
317321
gitignore
318322
^^^^^^^^^
319323

320-
Controls whether to respect ``.gitignore`` files when discovering source files. When enabled, files and directories listed in ``.gitignore`` will be automatically excluded from processing.
324+
Controls whether to respect ignore files when discovering source files.
325+
When enabled, files and directories matched by ignore rules will be automatically
326+
excluded from processing.
321327

322328
**Type:** ``bool``
323329
**Default:** ``true``
@@ -329,10 +335,34 @@ Controls whether to respect ``.gitignore`` files when discovering source files.
329335
330336
**Behavior:**
331337

332-
- ``true`` - Respect ``.gitignore`` rules (recommended)
333-
- ``false`` - Ignore ``.gitignore`` files and process all matching files
338+
When set to ``true`` (recommended), the following ignore sources are respected:
334339

335-
.. important:: **Current Limitation:** This option only supports the root-level ``.gitignore`` file. Nested ``.gitignore`` files in subdirectories or parent directories are not currently processed.
340+
- ``.gitignore`` files (including nested ``.gitignore`` files in subdirectories)
341+
- ``.ignore`` files (same syntax as ``.gitignore``, useful for non-git projects)
342+
- ``.git/info/exclude``
343+
- Global gitignore (e.g. ``~/.config/git/ignore``)
344+
- Parent directory ignore files
345+
346+
When set to ``false``, all ignore files are disregarded and every matching file is processed.
347+
348+
follow_links
349+
^^^^^^^^^^^^
350+
351+
Controls whether symbolic links are followed during file discovery.
352+
When disabled, symbolic links to directories are not traversed.
353+
354+
**Type:** ``bool``
355+
**Default:** ``false``
356+
357+
.. code-block:: toml
358+
359+
[codelinks.projects.my_project.source_discover]
360+
follow_links = true
361+
362+
**Behavior:**
363+
364+
- ``false`` - Symbolic links to directories are skipped (default, safer)
365+
- ``true`` - Symbolic links are followed, discovering files inside linked directories
336366

337367
For more information about the usage examples, see :ref:`source discover <discover>`.
338368

@@ -355,6 +385,7 @@ Configures how **Sphinx-CodeLinks** analyse source files to extract markers from
355385
exclude = []
356386
include = []
357387
gitignore = true
388+
follow_links = false
358389
comment_type = "cpp"
359390
360391
[codelinks.projects.my_project.analyse]

‎docs/source/components/discover.rst‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ Usage Examples
2626
include = []
2727
exclude = ["src/legacy/**", "**/*_test.cpp"]
2828
gitignore = true
29+
follow_links = false
2930
comment_type = "cpp"
3031
3132
**Python Project:**

‎docs/source/development/roadmap.rst‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Source Code Parsing
1919

2020
- Introduce a configurable option to strip leading characters (e.g., ``*``) from commented RST blocks.
2121
- Enrich tagged scopes with additional metadata.
22-
- Enhance ``.gitignore`` handling to support nested ``.gitignore`` files.
22+
- ✅ Nested ``.gitignore`` files are now supported (implemented via ``ignore-python``).
2323

2424
Defining Needs in Source Code
2525
-----------------------------

‎pyproject.toml‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ readme = "README.md"
1212
requires-python = ">= 3.12"
1313
dependencies = [
1414
"comment-parser>=1.2.4",
15-
"gitignore-parser>=0.1.11",
15+
"ignore-python>=0.3.3",
1616
"typer>=0.16.0",
1717
"click < 8.2", # click 8.2.* produces empty errors if no args are given
1818
"jsonschema",

‎src/sphinx_codelinks/cmd.py‎

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ def analyse( # noqa: PLR0912 # for CLI, so it needs the branches
170170

171171

172172
@app.command(no_args_is_help=True)
173-
def discover(
173+
def discover( # noqa: PLR0913 # CLI command requires multiple parameters
174174
src_dir: Annotated[
175175
Path,
176176
typer.Argument(
@@ -203,9 +203,13 @@ def discover(
203203
gitignore: Annotated[
204204
bool,
205205
typer.Option(
206-
help="Respect .gitignore in the given directory. Nested .gitignore Not supported"
206+
help="Respect .gitignore files in the given directory and its parents"
207207
),
208208
] = True,
209+
follow_links: Annotated[
210+
bool,
211+
typer.Option(help="Follow symbolic links during file discovery"),
212+
] = False,
209213
comment_type: Annotated[
210214
CommentType,
211215
typer.Option(
@@ -222,6 +226,7 @@ def discover(
222226
"exclude": exclude,
223227
"include": include,
224228
"gitignore": gitignore,
229+
"follow_links": follow_links,
225230
"comment_type": comment_type,
226231
}
227232

‎src/sphinx_codelinks/source_discover/config.py‎

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ class SourceDiscoverSectionConfigType(TypedDict, total=False):
3030
exclude: list[str]
3131
include: list[str]
3232
gitignore: bool
33+
follow_links: bool
3334
comment_type: CommentType
3435

3536

@@ -40,6 +41,7 @@ class SourceDiscoverConfigType(TypedDict, total=False):
4041
exclude: list[str]
4142
include: list[str]
4243
gitignore: bool
44+
follow_links: bool
4345
comment_type: CommentType
4446

4547

@@ -69,6 +71,9 @@ def field_names(cls) -> set[str]:
6971
gitignore: bool = field(default=True, metadata={"schema": {"type": "boolean"}})
7072
"""Whether to respect .gitignore to exclude files."""
7173

74+
follow_links: bool = field(default=False, metadata={"schema": {"type": "boolean"}})
75+
"""Whether to follow symbolic links during file discovery."""
76+
7277
comment_type: str = field(
7378
default="cpp",
7479
metadata={

‎src/sphinx_codelinks/source_discover/source_discover.py‎

Lines changed: 57 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,8 @@
1-
from collections.abc import Callable
2-
import fnmatch
31
import os
42
from pathlib import Path
53

6-
from gitignore_parser import ( # type: ignore[import-untyped] # library has no stub
7-
parse_gitignore,
8-
)
4+
from ignore import WalkBuilder
5+
from ignore.overrides import OverrideBuilder
96

107
from sphinx_codelinks.source_discover.config import (
118
COMMENT_FILETYPE,
@@ -17,51 +14,72 @@
1714
class SourceDiscover:
1815
def __init__(self, src_discover_config: SourceDiscoverConfig):
1916
self.src_discover_config = src_discover_config
20-
# Only gitignore at source root is considered.
21-
# TODO: Support nested gitignore files
22-
gitignore_path = self.src_discover_config.src_dir / ".gitignore"
23-
self.gitignore_matcher: Callable[[str], bool] | None = (
24-
parse_gitignore(gitignore_path)
25-
if self.src_discover_config.gitignore and gitignore_path.exists()
26-
else None
27-
)
2817
# normalize the file types to lower case with leading dot
2918
self.file_types = {
3019
f".{ext}" for ext in COMMENT_FILETYPE[src_discover_config.comment_type]
3120
}
3221

3322
self.source_paths = self._discover()
3423

24+
def _build_overrides(self) -> OverrideBuilder | None:
25+
"""Build an OverrideBuilder for include/exclude patterns.
26+
27+
Include patterns are added as whitelist globs.
28+
Exclude patterns are added as negated globs (prefixed with ``!``).
29+
"""
30+
has_include = bool(self.src_discover_config.include)
31+
has_exclude = bool(self.src_discover_config.exclude)
32+
33+
if not has_include and not has_exclude:
34+
return None
35+
36+
ob = OverrideBuilder(self.src_discover_config.src_dir)
37+
38+
if has_include:
39+
for pattern in self.src_discover_config.include:
40+
ob.add(pattern)
41+
42+
if has_exclude:
43+
for pattern in self.src_discover_config.exclude:
44+
ob.add(f"!{pattern}")
45+
46+
return ob
47+
3548
def _discover(self) -> list[Path]:
3649
"""Discover source files recursively in the given directory."""
50+
src_dir = self.src_discover_config.src_dir
51+
if not src_dir.is_dir():
52+
return []
53+
54+
gitignore = self.src_discover_config.gitignore
55+
56+
builder = WalkBuilder(src_dir)
57+
# Replicate the Rust ignore crate's standard_filters(gitignore)
58+
# followed by hidden(false), matching ubc_codelinks behaviour.
59+
builder.ignore(gitignore)
60+
builder.parents(gitignore)
61+
builder.git_ignore(gitignore)
62+
builder.git_global(gitignore)
63+
builder.git_exclude(gitignore)
64+
builder.hidden(False)
65+
builder.follow_links(self.src_discover_config.follow_links)
66+
67+
override_builder = self._build_overrides()
68+
if override_builder is not None:
69+
builder.overrides(override_builder.build())
70+
3771
discovered_files = []
38-
for filepath in self.src_discover_config.src_dir.rglob("*"):
39-
if filepath.is_file():
40-
if self.file_types and filepath.suffix.lower() not in self.file_types:
41-
continue
42-
rel_filepath = str(
43-
filepath.relative_to(self.src_discover_config.src_dir)
44-
)
45-
if self.src_discover_config.include and self._matches_any(
46-
rel_filepath, self.src_discover_config.include
47-
):
48-
# "includes" has the highest priority over "gitignore" and "excludes"
49-
discovered_files.append(filepath)
50-
continue
51-
if self.gitignore_matcher and self.gitignore_matcher(
52-
str(filepath.absolute())
53-
):
54-
continue
55-
if self.src_discover_config.exclude and self._matches_any(
56-
rel_filepath, self.src_discover_config.exclude
57-
):
58-
continue
59-
discovered_files.append(filepath)
72+
for entry in builder.build():
73+
filepath = entry.path()
74+
if not filepath.is_file():
75+
continue
76+
if self.file_types and filepath.suffix.lower() not in self.file_types:
77+
continue
78+
# resolve() produces canonical absolute paths; follow_links only
79+
# controls whether the walker descends into symlinked directories
80+
discovered_files.append(filepath.resolve())
81+
6082
sorted_filepaths = sorted(
6183
discovered_files, key=lambda x: os.path.normcase(os.path.normpath(x))
6284
)
6385
return sorted_filepaths
64-
65-
def _matches_any(self, rel_filepath: str, patterns: list[str]) -> bool:
66-
"""Check if the given file path matches any of the given patterns."""
67-
return any(fnmatch.fnmatch(rel_filepath, pattern) for pattern in patterns)

‎src/sphinx_codelinks/sphinx_extension/directives/src_trace.py‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,7 @@ def get_src_files(
213213
gitignore=src_discover_config.gitignore,
214214
include=src_discover_config.include,
215215
exclude=src_discover_config.exclude,
216+
follow_links=src_discover_config.follow_links,
216217
comment_type=src_discover_config.comment_type,
217218
)
218219
source_discover = SourceDiscover(src_discover)

0 commit comments

Comments
 (0)