Skip to content

Commit cf80633

Browse files
hyunhee-joclaude
andauthored
chore(repo): add issue/PR templates, ruff/black config, and CI lint workflow (#7)
chore(repo): add issue/PR templates, ruff/black config, and CI lint workflow Objective: No issue/PR templates and no automated linting caused inconsistent contribution format and missing context in bug reports. Approach: Add GitHub templates (bug/feature/question + PR), ruff/black config aligned with upstream opendataloader-pdf, and a lint CI workflow that enforces the config on every push/PR. Notebooks under docs/ are excluded, matching LlamaIndex's convention. Evidence: ruff check + black --check both pass locally; all 33 unit tests pass after auto-format; CI green (lint, unit-test 3.10/3.13, min-dep-test, CodeQL). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0c3fbb0 commit cf80633

10 files changed

Lines changed: 162 additions & 46 deletions

File tree

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
name: Bug report
3+
about: Report an issue
4+
title: ""
5+
labels: bug
6+
assignees: ""
7+
---
8+
9+
### Bug
10+
11+
<!-- Describe the buggy behavior you have observed. -->
12+
13+
...
14+
15+
### Steps to reproduce
16+
17+
<!-- Minimal code snippet or command sequence that reproduces the bug. -->
18+
19+
...
20+
21+
### Package version
22+
23+
<!-- Run `pip show opendataloader-pdf-llamaindex` and paste the version. -->
24+
25+
...
26+
27+
### Python version
28+
29+
<!-- Copy the output of `python --version`. -->
30+
31+
...
32+
33+
### Java version
34+
35+
<!-- Copy the output of `java --version`. OpenDataLoader PDF requires Java 11+. -->
36+
37+
...
38+
39+
### LlamaIndex version
40+
41+
<!-- Run `pip show llama-index-core` and paste the version. -->
42+
43+
...
44+
45+
<!-- ATTENTION: When sharing screenshots, attachments, or other data make sure not to include any sensitive information. -->

.github/ISSUE_TEMPLATE/config.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
blank_issues_enabled: false
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
---
2+
name: Feature request
3+
about: Suggest an idea
4+
title: ""
5+
labels: enhancement
6+
assignees: ""
7+
---
8+
9+
### Requested feature
10+
11+
<!-- Describe the feature you have in mind and the user need it addresses. -->
12+
13+
...
14+
15+
### Alternatives
16+
17+
<!-- Describe any alternatives you have considered. -->
18+
19+
...
20+
21+
<!-- ATTENTION: When sharing screenshots, attachments, or other data make sure not to include any sensitive information. -->

.github/ISSUE_TEMPLATE/question.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
name: Question
3+
about: Ask a question
4+
title: ""
5+
labels: question
6+
assignees: ""
7+
---
8+
9+
### Question
10+
11+
<!-- Describe what you would like to achieve and which part you need help with. -->
12+
13+
...
14+
15+
<!-- ATTENTION: When sharing screenshots, attachments, or other data make sure not to include any sensitive information. -->

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
<!-- Thank you for your contribution! -->
2+
3+
<!-- STEPS TO FOLLOW:
4+
1. Add a description of the changes (frequently the same as the commit description)
5+
2. Enter the issue number next to "Resolves #" below (if there is no tracking issue resolved, **remove that section**)
6+
3. Make sure the PR title follows the **Commit Message Formatting**: https://www.conventionalcommits.org/en/v1.0.0/#summary.
7+
4. Follow the steps in the checklist below, starting with the **Commit Message Formatting**.
8+
-->
9+
10+
<!-- Uncomment this section with the issue number if an issue is being resolved
11+
**Issue resolved by this Pull Request:**
12+
Resolves #
13+
--->
14+
15+
**Checklist:**
16+
17+
- [ ] Documentation has been updated, if necessary.
18+
- [ ] Examples have been added, if necessary.
19+
- [ ] Tests have been added, if necessary.
20+
- [ ] Lint passes locally (`ruff check .` and `black --check .`).

.github/workflows/lint.yml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
name: Lint
2+
3+
on:
4+
pull_request:
5+
branches: [main]
6+
push:
7+
branches: [main]
8+
9+
permissions:
10+
contents: read
11+
12+
jobs:
13+
lint:
14+
runs-on: ubuntu-latest
15+
timeout-minutes: 5
16+
steps:
17+
- uses: actions/checkout@v4
18+
- name: Set up Python
19+
uses: actions/setup-python@v5
20+
with:
21+
python-version: "3.10"
22+
- name: Install lint tools
23+
run: |
24+
python -m pip install --upgrade pip
25+
pip install "ruff>=0.5.0" "black>=24.0"
26+
- name: Run ruff
27+
run: ruff check .
28+
- name: Run black
29+
run: black --check .

llama_index/readers/opendataloader_pdf/base.py

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -200,8 +200,7 @@ def lazy_load_data(
200200
fmt = self.format.lower()
201201
if fmt not in _FORMAT_TO_EXT:
202202
raise ValueError(
203-
f"Invalid format '{self.format}'. "
204-
f"Valid options: {list(_FORMAT_TO_EXT.keys())}"
203+
f"Invalid format '{self.format}'. " f"Valid options: {list(_FORMAT_TO_EXT.keys())}"
205204
)
206205

207206
if isinstance(file_path, (str, Path)):
@@ -226,7 +225,7 @@ def lazy_load_data(
226225

227226
try:
228227
output_dir = tempfile.mkdtemp()
229-
except OSError as e:
228+
except OSError:
230229
logger.exception("Failed to create temp directory")
231230
return
232231

@@ -260,9 +259,7 @@ def lazy_load_data(
260259
}
261260
# --- END SYNCED CONVERT KWARGS ---
262261
# Omit None values so the core engine applies its own defaults.
263-
convert_kwargs = {
264-
k: v for k, v in convert_kwargs.items() if v is not None
265-
}
262+
convert_kwargs = {k: v for k, v in convert_kwargs.items() if v is not None}
266263

267264
convert(
268265
input_path=paths,
@@ -272,7 +269,7 @@ def lazy_load_data(
272269
text_page_separator=page_sep,
273270
html_page_separator=page_sep,
274271
)
275-
except Exception as e:
272+
except Exception:
276273
if self.hybrid:
277274
raise
278275
logger.exception("Error during conversion")
@@ -288,13 +285,9 @@ def lazy_load_data(
288285
if self.split_pages:
289286
if fmt == "json":
290287
data = json.loads(content)
291-
yield from self._split_json_into_pages(
292-
data, source_name, fmt, extra_info
293-
)
288+
yield from self._split_json_into_pages(data, source_name, fmt, extra_info)
294289
else:
295-
yield from self._split_into_pages(
296-
content, source_name, fmt, extra_info
297-
)
290+
yield from self._split_into_pages(content, source_name, fmt, extra_info)
298291
else:
299292
yield Document(
300293
text=content,

pyproject.toml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,4 +40,16 @@ Repository = "https://github.com/opendataloader-project/opendataloader-pdf-llama
4040
Issues = "https://github.com/opendataloader-project/opendataloader-pdf-llamaindex/issues"
4141

4242
[project.optional-dependencies]
43-
dev = ["pytest>=8.0", "pytest-socket>=0.7.0"]
43+
dev = ["pytest>=8.0", "pytest-socket>=0.7.0", "ruff>=0.5.0", "black>=24.0"]
44+
45+
[tool.black]
46+
line-length = 100
47+
48+
[tool.ruff]
49+
line-length = 100
50+
target-version = "py310"
51+
exclude = ["dist", "build", "docs", "*.ipynb"]
52+
53+
[tool.ruff.lint]
54+
select = ["E", "F", "I"]
55+
ignore = []

tests/test_integration.py

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88

99
import pytest
1010
from llama_index.core.schema import Document
11+
1112
from llama_index.readers.opendataloader_pdf import OpenDataLoaderPDFReader
1213

1314
from .conftest import java_available
@@ -81,23 +82,17 @@ def test_sanitize(self, sample_pdf) -> None:
8182
assert len(docs) >= 1
8283

8384
def test_pages_selection(self, multi_page_pdf) -> None:
84-
reader = OpenDataLoaderPDFReader(
85-
pages="1", split_pages=False
86-
)
85+
reader = OpenDataLoaderPDFReader(pages="1", split_pages=False)
8786
docs = list(reader.load_data(file_path=multi_page_pdf))
8887
assert len(docs) == 1
8988

9089
def test_use_struct_tree(self, sample_pdf) -> None:
91-
reader = OpenDataLoaderPDFReader(
92-
use_struct_tree=True, split_pages=False
93-
)
90+
reader = OpenDataLoaderPDFReader(use_struct_tree=True, split_pages=False)
9491
docs = list(reader.load_data(file_path=sample_pdf))
9592
assert len(docs) >= 1
9693

9794
def test_keep_line_breaks(self, sample_pdf) -> None:
98-
reader = OpenDataLoaderPDFReader(
99-
keep_line_breaks=True, split_pages=False
100-
)
95+
reader = OpenDataLoaderPDFReader(keep_line_breaks=True, split_pages=False)
10196
docs = list(reader.load_data(file_path=sample_pdf))
10297
assert len(docs) >= 1
10398

tests/test_readers_opendataloader_pdf.py

Lines changed: 8 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
from unittest.mock import MagicMock, mock_open, patch
66

77
import pytest
8-
from llama_index.core.schema import Document
8+
99
from llama_index.readers.opendataloader_pdf import OpenDataLoaderPDFReader
1010

1111
# Save original before any monkeypatching.
@@ -24,9 +24,7 @@ def _bypass_input_validation(monkeypatch):
2424
"llama_index.readers.opendataloader_pdf.base._java_available",
2525
lambda: True,
2626
)
27-
monkeypatch.setattr(
28-
"llama_index.readers.opendataloader_pdf.base._java_found", None
29-
)
27+
monkeypatch.setattr("llama_index.readers.opendataloader_pdf.base._java_found", None)
3028

3129
def _fake_exists(self):
3230
if self.suffix == ".pdf":
@@ -107,9 +105,7 @@ def test_format_case_insensitive(self) -> None:
107105
"llama_index.readers.opendataloader_pdf.base.Path.glob",
108106
return_value=[],
109107
):
110-
with patch(
111-
"llama_index.readers.opendataloader_pdf.base.shutil.rmtree"
112-
):
108+
with patch("llama_index.readers.opendataloader_pdf.base.shutil.rmtree"):
113109
list(reader.lazy_load_data(file_path="dummy.pdf"))
114110

115111

@@ -259,10 +255,7 @@ def test_split_two_pages(self) -> None:
259255

260256
def test_content_before_separator(self) -> None:
261257
reader = OpenDataLoaderPDFReader()
262-
content = (
263-
"Before separator"
264-
"\n<<<ODL_PAGE_BREAK_2>>>\nPage two"
265-
)
258+
content = "Before separator" "\n<<<ODL_PAGE_BREAK_2>>>\nPage two"
266259
docs = list(reader._split_into_pages(content, "test.pdf", "text"))
267260
assert len(docs) == 2
268261
assert docs[0].text == "Before separator"
@@ -292,9 +285,7 @@ def test_extra_info_merged(self) -> None:
292285
reader = OpenDataLoaderPDFReader()
293286
content = "\n<<<ODL_PAGE_BREAK_1>>>\nContent"
294287
docs = list(
295-
reader._split_into_pages(
296-
content, "doc.pdf", "text", extra_info={"custom": "value"}
297-
)
288+
reader._split_into_pages(content, "doc.pdf", "text", extra_info={"custom": "value"})
298289
)
299290
assert docs[0].metadata["custom"] == "value"
300291

@@ -350,9 +341,7 @@ def test_extra_info_merged(self) -> None:
350341
reader = OpenDataLoaderPDFReader(format="json")
351342
data = {"kids": [{"type": "paragraph", "page number": 1, "content": "p"}]}
352343
docs = list(
353-
reader._split_json_into_pages(
354-
data, "test.pdf", "json", extra_info={"key": "val"}
355-
)
344+
reader._split_json_into_pages(data, "test.pdf", "json", extra_info={"key": "val"})
356345
)
357346
assert docs[0].metadata["key"] == "val"
358347

@@ -400,9 +389,7 @@ def test_metadata_with_hybrid(
400389
patch("opendataloader_pdf.convert"),
401390
patch("builtins.open", mock_open(read_data="content")),
402391
):
403-
reader = OpenDataLoaderPDFReader(
404-
split_pages=False, hybrid="docling-fast"
405-
)
392+
reader = OpenDataLoaderPDFReader(split_pages=False, hybrid="docling-fast")
406393
docs = list(reader.lazy_load_data(file_path="doc.pdf"))
407394

408395
assert docs[0].metadata["hybrid"] == "docling-fast"
@@ -561,9 +548,7 @@ class TestImportError:
561548

562549
@patch("llama_index.readers.opendataloader_pdf.base.tempfile.mkdtemp")
563550
@patch("llama_index.readers.opendataloader_pdf.base.shutil.rmtree")
564-
def test_import_error_propagates(
565-
self, mock_rmtree: MagicMock, mock_mkdtemp: MagicMock
566-
) -> None:
551+
def test_import_error_propagates(self, mock_rmtree: MagicMock, mock_mkdtemp: MagicMock) -> None:
567552
mock_mkdtemp.return_value = "/tmp/test"
568553
reader = OpenDataLoaderPDFReader()
569554

0 commit comments

Comments
 (0)