Skip to content

Commit 5b9845e

Browse files
committed
✨ Add JSONC language support for comment parsing and traceability
Adds jsonc language support, also checks .json files if they start with a comment (see jsonc.org).
1 parent a65483e commit 5b9845e

14 files changed

Lines changed: 231 additions & 8 deletions

File tree

docs/source/components/analyse.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ Limitations
4747

4848
**Current Limitations:**
4949

50-
- **Language Support**: C/C++ (``//``, ``/* */``), C# (``//``, ``/* */``, ``///``), Python (``#``), YAML (``#``) and Rust (``//``, ``/* */``, ``///``) comment styles are supported
50+
- **Language Support**: C/C++ (``//``, ``/* */``), C# (``//``, ``/* */``, ``///``), Python (``#``), YAML (``#``), Rust (``//``, ``/* */``, ``///``) and JSONC (``//``, ``/* */``) comment styles are supported
5151
- **Single Comment Style**: Each analysis run processes only one comment style at a time
5252

5353
Extraction Examples

docs/source/components/configuration.rst

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,7 @@ Specifies the comment syntax style used in the source code files. This determine
271271

272272
**Type:** ``str``
273273
**Default:** ``"cpp"``
274-
**Supported values:** ``"cpp"``, ``"python"``, ``"cs"``, ``"yaml"``, ``"rust"``
274+
**Supported values:** ``"cpp"``, ``"python"``, ``"cs"``, ``"yaml"``, ``"rust"``, ``"jsonc"``
275275

276276
.. code-block:: toml
277277
@@ -315,6 +315,12 @@ Specifies the comment syntax style used in the source code files. This determine
315315
``///`` (doc comments),
316316
``//!`` (inner doc comments)
317317
- ``.rs``
318+
* - JSON with Comments (JSONC)
319+
- ``"jsonc"``
320+
- ``//`` (single-line),
321+
``/* */`` (multi-line)
322+
- ``.jsonc`` (always); ``.json`` only when the file opens with a comment
323+
(e.g. the mode line ``// -*- mode: jsonc -*-``)
318324

319325
.. note:: Future versions may support additional programming languages.
320326

docs/source/components/features.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,33 @@ Features
158158
.. fault:: Sphinx-codelinks halucinates traceability objects in Rust
159159
:id: FAULT_RUST_2
160160

161+
.. feature:: JSONC Language Support
162+
:id: FE_JSONC
163+
164+
Support for defining traceability objects in JSON with Comments (JSONC) files.
165+
166+
The JSONC parser leverages tree-sitter to identify and extract single-line (``//``)
167+
and multi-line (``/* */``) comments from JSON data, associating each marker with the
168+
surrounding data structure such as the key/value pair, array item, or object it
169+
annotates.
170+
171+
``.jsonc`` files are always parsed as JSONC. A ``.json`` file is only treated as JSONC
172+
when it opens with a comment (e.g. the mode line ``// -*- mode: jsonc -*-``), following
173+
the `JSONC filename convention <https://jsonc.org/#filename-extension>`_.
174+
175+
Key capabilities:
176+
177+
* Detection of inline and leading comments
178+
* Association of comments with key/value pairs and array items
179+
* Support for both ``//`` and ``/* */`` comment styles
180+
* Opt-in handling of ``.json`` files via a leading comment
181+
182+
.. fault:: Traceability objects are not detected in JSONC
183+
:id: FAULT_JSONC_1
184+
185+
.. fault:: Sphinx-codelinks hallucinates traceability objects in JSONC
186+
:id: FAULT_JSONC_2
187+
161188
.. feature:: Customized comment styles
162189
:id: FE_CMT
163190

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ dependencies = [
3030
"tree-sitter-c-sharp>=0.23.1",
3131
"tree-sitter-yaml>=0.7.1",
3232
"tree-sitter-rust>=0.23.0",
33+
"tree-sitter-json>=0.24.8",
3334
]
3435

3536
[build-system]

src/sphinx_codelinks/analyse/utils.py

Lines changed: 60 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@
2929
"trait_item",
3030
"mod_item",
3131
},
32+
# @JSONC Scope Node Types, IMPL_JSONC_2, impl, [FE_JSONC]
33+
CommentType.jsonc: {"pair", "object", "array", "document"},
3234
}
3335

3436
# initialize logger
@@ -60,6 +62,19 @@
6062
(line_comment) @comment
6163
(block_comment) @comment
6264
"""
65+
JSONC_QUERY = """(comment) @comment"""
66+
67+
# JSON value node types that can be associated with a comment.
68+
JSON_STRUCTURE_TYPES = {
69+
"pair",
70+
"object",
71+
"array",
72+
"string",
73+
"number",
74+
"true",
75+
"false",
76+
"null",
77+
}
6378

6479

6580
def is_text_file(filepath: Path, sample_size: int = 2048) -> bool:
@@ -77,7 +92,7 @@ def is_text_file(filepath: Path, sample_size: int = 2048) -> bool:
7792
return False
7893

7994

80-
# @Tree-sitter parser initialization for multiple languages, IMPL_LANG_1, impl, [FE_C_SUPPORT, FE_CPP, FE_PY, FE_YAML, FE_RUST]
95+
# @Tree-sitter parser initialization for multiple languages, IMPL_LANG_1, impl, [FE_C_SUPPORT, FE_CPP, FE_PY, FE_YAML, FE_RUST, FE_JSONC]
8196
def init_tree_sitter(comment_type: CommentType) -> tuple[Parser, Query]:
8297
if comment_type == CommentType.cpp:
8398
import tree_sitter_cpp # noqa: PLC0415
@@ -104,6 +119,11 @@ def init_tree_sitter(comment_type: CommentType) -> tuple[Parser, Query]:
104119

105120
parsed_language = Language(tree_sitter_rust.language())
106121
query = Query(parsed_language, RUST_QUERY)
122+
elif comment_type == CommentType.jsonc:
123+
import tree_sitter_json # noqa: PLC0415
124+
125+
parsed_language = Language(tree_sitter_json.language())
126+
query = Query(parsed_language, JSONC_QUERY)
107127
else:
108128
raise ValueError(f"Unsupported comment style: {comment_type}")
109129
parser = Parser(parsed_language)
@@ -203,8 +223,11 @@ def find_yaml_next_structure(node: TreeSitterNode) -> TreeSitterNode | None:
203223
return None
204224

205225

206-
def find_yaml_prev_sibling_on_same_row(node: TreeSitterNode) -> TreeSitterNode | None:
207-
"""Find a previous named sibling that is on the same row as the comment."""
226+
def find_prev_sibling_on_same_row(node: TreeSitterNode) -> TreeSitterNode | None:
227+
"""Find a previous named sibling that is on the same row as the comment.
228+
229+
Grammar-agnostic: used to detect inline comments in both YAML and JSONC.
230+
"""
208231
comment_row = node.start_point.row
209232
current = node.prev_named_sibling
210233

@@ -225,7 +248,7 @@ def find_yaml_prev_sibling_on_same_row(node: TreeSitterNode) -> TreeSitterNode |
225248
def find_yaml_associated_structure(node: TreeSitterNode) -> TreeSitterNode | None:
226249
"""Find the YAML structure (key-value pair, list item, etc.) associated with a comment."""
227250
# First, check if this is an inline comment by looking for a previous sibling on the same row
228-
prev_sibling_same_row = find_yaml_prev_sibling_on_same_row(node)
251+
prev_sibling_same_row = find_prev_sibling_on_same_row(node)
229252
if prev_sibling_same_row:
230253
return prev_sibling_same_row
231254

@@ -244,6 +267,35 @@ def find_yaml_associated_structure(node: TreeSitterNode) -> TreeSitterNode | Non
244267
return None
245268

246269

270+
def find_jsonc_associated_structure(node: TreeSitterNode) -> TreeSitterNode | None:
271+
"""Find the JSON structure (key/value pair, value, list item) for a comment.
272+
273+
JSON is data rather than code, so association follows the same intent as YAML:
274+
an inline comment belongs to the value on its row, a leading comment belongs to
275+
the following structure, otherwise it belongs to the enclosing structure.
276+
"""
277+
# Inline comment: a value/pair on the same row, before the comment
278+
prev_sibling_same_row = find_prev_sibling_on_same_row(node)
279+
if prev_sibling_same_row:
280+
return prev_sibling_same_row
281+
282+
# Leading comment: the next structure following the comment
283+
current = node.next_named_sibling
284+
while current:
285+
if current.type in JSON_STRUCTURE_TYPES:
286+
return current
287+
current = current.next_named_sibling
288+
289+
# Otherwise: the enclosing structure
290+
parent = node.parent
291+
while parent:
292+
if parent.type in {"pair", "object", "array"}:
293+
return parent
294+
parent = parent.parent
295+
296+
return None
297+
298+
247299
def find_associated_scope(
248300
node: TreeSitterNode, comment_type: CommentType = CommentType.cpp
249301
) -> TreeSitterNode | None:
@@ -252,6 +304,10 @@ def find_associated_scope(
252304
# YAML uses different structure association logic
253305
return find_yaml_associated_structure(node)
254306

307+
if comment_type == CommentType.jsonc:
308+
# JSONC uses data-aware structure association logic
309+
return find_jsonc_associated_structure(node)
310+
255311
if node.type == CommentCategory.docstring:
256312
# Only for python's docstring
257313
return find_enclosing_scope(node, comment_type)

src/sphinx_codelinks/source_discover/config.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
"cs": ["cs"],
1212
"yaml": ["yml", "yaml"],
1313
"rust": ["rs"],
14+
"jsonc": ["jsonc", "json"],
1415
}
1516

1617

@@ -21,6 +22,8 @@ class CommentType(str, Enum):
2122
yaml = "yaml"
2223
# @Support Rust style comments, IMPL_RUST_1, impl, [FE_RUST];
2324
rust = "rust"
25+
# @Support JSONC style comments, IMPL_JSONC_1, impl, [FE_JSONC];
26+
jsonc = "jsonc"
2427

2528

2629
class SourceDiscoverSectionConfigType(TypedDict, total=False):

src/sphinx_codelinks/source_discover/source_discover.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,28 @@
66

77
from sphinx_codelinks.source_discover.config import (
88
COMMENT_FILETYPE,
9+
CommentType,
910
SourceDiscoverConfig,
1011
)
1112

1213

14+
def _json_starts_with_comment(filepath: Path, sample_size: int = 256) -> bool:
15+
"""Return True if a ``.json`` file's first non-whitespace content is a comment.
16+
17+
Used to decide whether a ``.json`` file should be treated as JSONC. Per
18+
https://jsonc.org/#filename-extension a ``.json`` file should only be treated as
19+
JSONC when it opens with a comment (e.g. the mode line ``// -*- mode: jsonc -*-``).
20+
"""
21+
try:
22+
with filepath.open("rb") as f:
23+
chunk = f.read(sample_size)
24+
except OSError:
25+
return False
26+
# strip a leading UTF-8 BOM, then leading whitespace
27+
text = chunk.removeprefix(b"\xef\xbb\xbf").lstrip()
28+
return text.startswith((b"//", b"/*"))
29+
30+
1331
# @Source code file discovery with gitignore support, IMPL_DISC_1, impl, [FE_DISCOVERY, FE_CLI_DISCOVER]
1432
class SourceDiscover:
1533
def __init__(self, src_discover_config: SourceDiscoverConfig):
@@ -75,6 +93,15 @@ def _discover(self) -> list[Path]:
7593
continue
7694
if self.file_types and filepath.suffix.lower() not in self.file_types:
7795
continue
96+
# @JSONC .json files require a leading comment, IMPL_JSONC_3, impl, [FE_JSONC]
97+
# A plain ``.json`` file is only treated as JSONC when it opens with a
98+
# comment; otherwise it is skipped under the ``jsonc`` comment type.
99+
if (
100+
self.src_discover_config.comment_type == CommentType.jsonc
101+
and filepath.suffix.lower() == ".json"
102+
and not _json_starts_with_comment(filepath)
103+
):
104+
continue
78105
# resolve() produces canonical absolute paths; follow_links only
79106
# controls whether the walker descends into symlinked directories
80107
discovered_files.append(filepath.resolve())

tests/data/jsonc/demo.jsonc

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
// -*- mode: jsonc -*-
2+
{
3+
// @JSONC alpha implementation, IMPL_JSONC_A, impl, [REQ_JSONC_1]
4+
"alpha": 1,
5+
"items": [
6+
"first", // @JSONC inline item, IMPL_JSONC_B, impl, [REQ_JSONC_2]
7+
"second"
8+
],
9+
/* Block comment with marker
10+
@JSONC beta implementation, IMPL_JSONC_C, impl, [REQ_JSONC_3]
11+
*/
12+
"beta": {
13+
"nested": true
14+
}
15+
}

tests/data/jsonc/plain.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"value": 42
3+
}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
// -*- mode: jsonc -*-
2+
{
3+
// @JSONC modeline file, IMPL_JSONC_D, impl, [REQ_JSONC_4]
4+
"value": 42
5+
}

0 commit comments

Comments
 (0)