Skip to content

Commit 3c6954d

Browse files
authored
Fix Markdown/bulk export review follow-ups from PR #19 (#20)
Apply the ten code-review findings on the freshly-merged Markdown + bulk export workflow. No new features; correctness and UI/UX only. Converter (tsv_odf_converter.py): - _MD_LINE_START_RE: require whitespace (or end of string) after the digits-dot pattern so paragraphs starting with a decimal like "1.5 million" or "2025.06 release" are no longer mangled to "\1.5 million". - _emit_kv: emit blank lines before and after so adjacent KV elements no longer collapse into one CommonMark paragraph and the bolded key stays visually distinct. - _emit_paragraph: when DocElement.raw_lines preserved per-line breaks (multi-line addresses, poetry), emit each line with a CommonMark hard break (two trailing spaces) instead of collapsing into a single run-on line. Markdown export UI (conclusion_export_mixin.py): - _export_markdown_file: write to "<path>.tmp" then os.replace so a failed/cancelled conversion no longer destroys a pre-existing file the user picked to overwrite via FileDialog. - _on_md_save_response: guard against Gio.File.get_path() == None (remote/MTP/GVfs locations) and surface a clear toast instead of an AttributeError swallowed as a generic "Export failed". - _on_md_export_clicked → _on_md_save_response → _export_markdown_file → _on_md_export_finished: pass include_front_matter and open_after through the closure chain instead of self attributes, so overlapping per-row exports can no longer clobber each other's settings. - _on_folder_chosen (bulk): show _EXPORT_FAILED_MSG on non-Dismiss errors so the user gets feedback instead of a silently-vanishing dialog; also guards remote folders with the same toast as the single-file path. - _run_bulk_export / _bulk_convert_one: snapshot md_include_front_matter and odf_include_images once at batch start and pass them via an options dict, so a mid-batch toggle from another dialog can no longer produce a non-uniform batch. - _bulk_convert_one (md branch): write to "<path>.tmp" and os.replace, matching the single-file atomic-write pattern. - _bulk_export_worker: when fmt is not in _BULK_EXTENSIONS, close the dialog and toast a real export-failed message instead of falling back to a misleading "Saved 0 files". - _build_progress_dialog: cancel handler now disables the button, swaps its label and AT-SPI accessible name to "Cancelling…", and rewrites the subtitle to "Finishing current step…". The progress callback short-circuits after cancel so the message doesn't get overwritten by late per-file updates. Gives the user immediate feedback while parse_tsv_pages finishes its current step on a long PDF. Settings persistence (services/settings.py): - Add _load_md_settings / _save_md_settings handling the md_export.include_front_matter and md_export.open_after_export config keys; hooked into load_settings / _save_all_settings. - _update_md_setting (UI) now calls settings._save_md_settings() before config.save() so the Markdown toggles actually persist across restarts, matching the ODF flow. Tests (test_markdown_export.py): - TestEscapeMd.test_decimal_at_line_start_not_escaped: pins the new decimal-aware behavior; "1." alone still escapes via the end-of- string boundary. - TestCreateMarkdown.test_kv_elements_separated_by_blank_line: pins the new KV separator rule. - TestCreateMarkdown.test_paragraph_preserves_raw_lines: pins the new hard-break handling for multi-line OCR paragraphs. - TestSaveMdSettings.{test_save_md_settings_writes_both_keys, test_load_md_settings_reads_both_keys, test_save_md_settings_defaults_when_unset}: cover the new persistence path end-to-end with a minimal config double. 364 passing tests, ruff clean.
1 parent f5791e3 commit 3c6954d

4 files changed

Lines changed: 294 additions & 52 deletions

File tree

src/bigocrpdf/services/settings.py

Lines changed: 36 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,7 @@ def load_settings(self) -> None:
142142
self._load_date_settings()
143143
self._load_text_extraction_settings()
144144
self._load_odf_settings()
145+
self._load_md_settings()
145146
self._load_preprocessing_settings()
146147
self._load_image_export_settings()
147148
self._load_pdf_output_settings()
@@ -197,6 +198,10 @@ def _load_odf_settings(self) -> None:
197198
self.odf_use_formatting = self._config.get("odf_export.use_formatting", True)
198199
self.odf_open_after_export = self._config.get("odf_export.open_after_export", False)
199200

201+
def _load_md_settings(self) -> None:
202+
self.md_include_front_matter = self._config.get("md_export.include_front_matter", False)
203+
self.md_open_after_export = self._config.get("md_export.open_after_export", False)
204+
200205
def _load_preprocessing_settings(self) -> None:
201206
self.dpi = self._config.get("rapidocr.dpi", DEFAULT_DPI)
202207
self.enable_preprocessing = self._config.get(
@@ -473,6 +478,7 @@ def _save_all_settings(self) -> None:
473478
self._save_text_extraction_settings()
474479
self._save_editor_settings()
475480
self._save_odf_settings()
481+
self._save_md_settings()
476482
self._save_preprocessing_settings()
477483
self._save_image_export_settings()
478484
self._save_pdf_output_settings()
@@ -536,6 +542,18 @@ def _save_odf_settings(self) -> None:
536542
"odf_export.open_after_export", self.odf_open_after_export, save_immediately=False
537543
)
538544

545+
def _save_md_settings(self) -> None:
546+
self._config.set(
547+
"md_export.include_front_matter",
548+
getattr(self, "md_include_front_matter", False),
549+
save_immediately=False,
550+
)
551+
self._config.set(
552+
"md_export.open_after_export",
553+
getattr(self, "md_open_after_export", False),
554+
save_immediately=False,
555+
)
556+
539557
def _save_preprocessing_settings(self) -> None:
540558
self._config.set("rapidocr.dpi", self.dpi, save_immediately=False)
541559
self._config.set("rapidocr.language", self.ocr_language, save_immediately=False)
@@ -641,20 +659,26 @@ def get_pdf_suffix(self) -> str:
641659

642660
# Add date elements with their preferred order
643661
if self.include_year:
644-
date_components.append((
645-
self.date_format_order.get("year", 1),
646-
f"{now.tm_year}",
647-
))
662+
date_components.append(
663+
(
664+
self.date_format_order.get("year", 1),
665+
f"{now.tm_year}",
666+
)
667+
)
648668
if self.include_month:
649-
date_components.append((
650-
self.date_format_order.get("month", 2),
651-
f"{now.tm_mon:02d}",
652-
))
669+
date_components.append(
670+
(
671+
self.date_format_order.get("month", 2),
672+
f"{now.tm_mon:02d}",
673+
)
674+
)
653675
if self.include_day:
654-
date_components.append((
655-
self.date_format_order.get("day", 3),
656-
f"{now.tm_mday:02d}",
657-
))
676+
date_components.append(
677+
(
678+
self.date_format_order.get("day", 3),
679+
f"{now.tm_mday:02d}",
680+
)
681+
)
658682

659683
# Sort components by their position value
660684
date_components.sort(key=lambda x: x[0])

0 commit comments

Comments
 (0)