Skip to content

Commit 04d82c0

Browse files
authored
Merge pull request #294 from freelawproject/293-listed-in-authorities-but-not-actually-cited
Exclude star-pagination text when extracting citations
2 parents 4d0bd74 + 10c42da commit 04d82c0

3 files changed

Lines changed: 12 additions & 2 deletions

File tree

CHANGES.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ Changes:
1212

1313
Fixes:
1414
- Modifies rendering of AhocorasickTokenizer parameter in API docs II
15+
- Removed star-pagination markers from extracted text #293
1516

1617
## Current
1718

eyecite/clean.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,9 @@ def html(html_content: str) -> str:
5151
parent::link |
5252
parent::head |
5353
parent::page-number |
54-
parent::script)]"""
54+
parent::script |
55+
parent::*[@class="star-pagination"]
56+
)]"""
5557
)
5658
return " ".join(text)
5759

tests/test_FindTest.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -828,7 +828,14 @@ def test_find_citations(self):
828828
# Fix for index error when searching for case name
829829
("<p>State v. Luna-Benitez (S53965). Alternative writ issued, dismissed, 342 Or 255</p>",
830830
[case_citation(volume="342", reporter="Or", page="255")],
831-
{'clean_steps': ['html', 'inline_whitespace']})
831+
{'clean_steps': ['html', 'inline_whitespace']}),
832+
# Test remove text with star-pagination class
833+
("<p>The somewhat similar cases of <i>Crane</i> v. <i>Hyde Park,</i> 135 <span class=\"star-pagination\">*355</span> Mass. 147, and <i>Mahoning County</i> v. <i>Young,</i> 16 U.S. App. 253, also cited by the defendant, likewise turned upon a question of forfeiture for breach of a condition subsequent in a deed to a municipal corporation.</p>",
834+
[case_citation(volume="135", reporter="Mass.", page="147",
835+
metadata={"plaintiff": "Crane",
836+
"defendant": "Hyde Park"}
837+
)],
838+
{'clean_steps': ['html', 'inline_whitespace']})
832839
)
833840

834841
# fmt: on

0 commit comments

Comments
 (0)