Add support for chapter ("avdelning") level in document structure#27
Merged
Add support for chapter ("avdelning") level in document structure#27
Conversation
Enable automatic processing of documents containing AVDELNING (division) structures. The formatting infrastructure was already in place - this change removes the blocking condition that prevented these documents from being processed. Changes: - Remove AVDELNING block from ignore_rules() in sfs_processor.py - Add integration tests for AVDELNING formatting and section tagging - Verify AVDELNING headers are formatted as H2 with class="avdelning" Tested with document 2025:400 (Socialtjänstlag) which contains 9 AVDELNING sections, all processed correctly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Generate clean, concise IDs for AVDELNING sections using the format "avd1", "avd2", etc., matching the existing "kap1", "kap2" format for chapters. Changes: - Convert Roman numerals (I, II, III) to Arabic numbers (1, 2, 3) - Convert Swedish ordinals (FÖRSTA, ANDRA) to Arabic numbers (1, 2, 3) - Use "avd" prefix instead of "avdelning" for brevity - Add tests for both Roman numeral and Swedish ordinal formats Example output: - "AVDELNING I" -> id="avd1" - "ANDRA AVDELNINGEN" -> id="avd2" - "AVD. III" -> id="avd3" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
AVDELNING sections now properly wrap all their child KAPITEL sections and
their descendants. Previously, AVDELNING sections would close immediately
after their heading, with KAPITEL sections as siblings instead of children.
Changes:
- Track when inside an AVDELNING section
- Increase effective level by +1 for all headings inside AVDELNING
- H2 KAPITEL becomes effective level 3
- H3 subsections become effective level 4
- H4 paragraphs become effective level 5
- AVDELNING only closes when encountering another AVDELNING or document end
Example structure:
```
<section id="avd1" class="avdelning">
## AVDELNING I
<section id="kap1" class="kapitel">
## 1 kap.
...child sections...
</section>
<section id="kap2" class="kapitel">
## 2 kap.
</section>
</section>
```
Verified with document 2025:400:
- 647 opening tags match 647 closing tags
- All KAPITEL sections properly nested inside AVDELNING
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix issue where title-case references to divisions in "Lagens disposition" (e.g., "Avdelning I") were incorrectly treated as division headers, creating spurious AVDELNING sections. Changes: - Remove re.IGNORECASE from is_chapter_header() pattern matching - Remove re.IGNORECASE from generate_section_id() pattern matching - Only all-uppercase "AVDELNING" is now recognized as a division header - Title-case "Avdelning" is treated as regular paragraph content Impact: - Document 2025:400 now has exactly 9 AVDELNING sections (not 18) - "Lagens disposition" section correctly lists divisions as text content - All section tags remain balanced (638 opening, 638 closing) Before: "AVDELNING I" and "Avdelning I" both matched After: Only "AVDELNING I" matches (case-sensitive) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Content under AVDELNING sections now uses lower heading levels to
reflect proper hierarchy:
- AVDELNING headers remain H2 (##)
- KAPITEL headers become H3 (###) when inside AVDELNING
- Subsection titles become H4 (####)
- Paragraph markers (§) become H5 (#####) when inside AVDELNING
This ensures the Markdown heading hierarchy properly represents the
document structure, where AVDELNING is the top-level organizational
unit containing KAPITEL and their paragraphs.
Example structure:
## AVDELNING I. (H2)
### 1 kap. (H3 - shifted from H2)
#### Section title (H4)
##### 1 § (H5 - shifted from H4)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds full support for documents containing AVDELNING (division) structures. The formatting infrastructure was already 90% in place - this change removes the blocking condition and improves ID generation.
Changes
1. Remove AVDELNING Block
sfs_processor.pyignore_rules()that prevented documents with AVDELNING from being processed2. Improve AVDELNING Section ID Generation
formatters/format_sfs_text.py3. Add Integration Tests
test/test_format_sfs_text.pyExample Output
AVDELNING Patterns Supported
AVDELNING I,AVDELNING II,AVD. III→id="avd1",id="avd2",id="avd3"FÖRSTA AVDELNING,ANDRA AVDELNINGEN→id="avd1",id="avd2"Testing
✅ All 33 tests pass
✅ Tested with document 2025:400 (Socialtjänstlag) containing 9 AVDELNING sections
✅ All sections processed correctly with proper formatting and IDs
Impact