Skip to content

Commit 9caf31c

Browse files
committed
chore: add benchmark for standardize_quotes comparison
1 parent 9c3b27c commit 9caf31c

2 files changed

Lines changed: 27 additions & 0 deletions

File tree

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,3 +265,4 @@ tests-root = "test_unstructured"
265265
test-framework = "pytest"
266266
ignore-paths = []
267267
formatter-cmds = ["ruff check --exit-zero --fix-only $file", "ruff format $file"]
268+
benchmarks-root = "test_unstructured/benchmarks"
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
from unstructured.metrics.text_extraction import standardize_quotes
2+
3+
SAMPLE_TEXTS = [
4+
"She said \u201cHello\u201d and then whispered \u2018Goodbye\u2019 before leaving.",
5+
"\u201eTo be, or not to be, that is the question\u201d - Shakespeare\u2019s famous quote.",
6+
"\u00abWhen he said \u201clife is beautiful,\u201d I believed him\u00bb wrote Maria.",
7+
"\u275dDo you remember when we first met?\u275e she asked with a smile.",
8+
"\u301dThe meeting starts at 10:00, don\u2019t be late!\u301f announced the manager.",
9+
'\u300cHe told me "This is important" yesterday\u300d, she explained.',
10+
"\u300eThe sun was setting. The birds were singing. It was peaceful.\u300f",
11+
"\ufe42Meeting #123 @ 15:00 - Don\u2019t forget!\ufe41",
12+
"\u300cHello\u300d, \u275dWorld\u275e, \"Test\", 'Example', \u201eQuote\u201d, \u00abFinal\u00bb", # noqa: E501
13+
"It\u2019s John\u2019s book, isn\u2019t it?",
14+
'\u2039Testing the system\u2019s capability for "quoted" text\u203a',
15+
"\u275bFirst sentence. Second sentence. Third sentence.\u275c",
16+
"\u300cChapter 1\u300d: \u275dThe Beginning\u275e - \u201eA new story\u201d begins \u00abtoday\u00bb.", # noqa: E501
17+
]
18+
19+
20+
def run_standardize_quotes():
21+
for text in SAMPLE_TEXTS:
22+
standardize_quotes(text)
23+
24+
25+
def test_benchmark_standardize_quotes(benchmark):
26+
benchmark(run_standardize_quotes)

0 commit comments

Comments
 (0)