Add parsers for YES BANK, Suryoday SFB, and Kotak Mahindra Bank by anshuman852 · Pull Request #8 · akhilnarang/cc-parser

anshuman852 · 2026-04-22T13:00:09Z

No description provided.

Add a bank-specific parser and detection path for YES BANK statements to prevent summary/header rows from being parsed as transactions and to correctly classify Dr/Cr entries.

Add SSFB-specific detection and parsing for statement summary fields while isolating MITC/example-page content so reconciliation remains stable.

… tests Correctness: - Rename shadowed `words` variable to `name_parts` in name extraction - Tighten is_credit check from {"CR", "C"} to == "CR" to avoid false positives from single-letter tokens - Use \s+ instead of \n in SSFB summary regex for PyMuPDF version robustness - Add comments explaining intentional (Dr|Cr) group ignoring in Purchases and Payments summary regexes - Add comment flagging SSFB hardcoded empty transactions list Style/maintainability: - Move _EXCLUDED_WORDS to module scope (consistent with _MERCHANT_CATEGORIES) - Extract _format_card_number() helper to deduplicate card number formatting (was inlined 3 times) - Remove unused detected_members variable and debug dict entry Tests: - Add BankOption and factory detection tests for ssfb and yesbank - Add parser-contract smoke tests for both parsers - Fix pre-existing test_browser.py bank count (12 -> 14)

- Remove unused parse_amount import - Remove unused multi_letter_words variable (populated but never read) - Fix email camel-case splitter: run ([a-z])([A-Z]) regex before .upper() so it can actually match mixed-case email local parts - Fix _MERCHANT_CATEGORIES comment to reflect that categories are stripped from narration, not included - Reapply merchant category and Dr/Cr stripping after context merge rebuilds narration from raw tokens (prevents re-introduction) - Remove unused current_member variable and its references

Copilot

Pull request overview

Adds Kotak Mahindra Bank support to the credit-card statement parsing system by wiring a new parser into the registry, detection factory, and CLI/browser surfaces, along with contract tests.

Changes:

Introduces a new KotakParser implementation and registers it in the parser registry.
Extends bank auto-detection rules and CLI enum to include the new kotak slug.
Updates contract/browser tests to cover Kotak availability, detection, and minimal parsing.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`cc_parser/parsers/kotak.py`	Adds the Kotak statement parsing implementation (name/due date/TAD + transactions).
`cc_parser/parsers/registry.py`	Registers `kotak` parser in the stable registry ordering.
`cc_parser/parsers/factory.py`	Adds a `KOTAK` detection rule and updates `detect_bank` docstring.
`cc_parser/cli.py`	Exposes `kotak` via `BankOption` and updates `parse_statement` docstring.
`tests/test_contracts.py`	Adds contract coverage for Kotak exposure, detection, and smoke parsing.
`tests/test_browser.py`	Updates browser bank list test to include `kotak` and new count.
`AGENTS.md`	Documents the new Kotak parser profile entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-22T13:05:54Z

+        self.assertEqual(result.file, "test.pdf")
+        self.assertEqual(result.name, "JOHN DOE")
+        self.assertEqual(result.due_date, "08/05/2026")
+        self.assertEqual(result.statement_total_amount_due, "4186.50")


The test expects statement_total_amount_due to be "4186.50" even though the input text contains a comma ("4,186.50"). The Kotak parser currently uses normalize_amount(), which does not remove thousands separators, so it will return "4,186.50" and this assertion will fail. Align the expectation with existing conventions (e.g., keep commas like the Equitas test) or change the parser to explicitly strip commas for this field (and ensure it’s consistent across parsers).

Suggested change

self.assertEqual(result.statement_total_amount_due, "4186.50")

self.assertEqual(result.statement_total_amount_due, "4,186.50")

Copilot · 2026-04-22T13:05:54Z

+def _extract_name(first_page_text: str) -> str | None:
+    """Extract cardholder name near the top of page 1."""
+    # Format: "Ansuman Mishra Monthly statement for your League Credit Card X3188"
+    match = re.search(
+        r"^([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)\s+Monthly\s+statement",
+        first_page_text,
+        re.MULTILINE,
+    )
+    if not match:
+        return None
+    candidate = " ".join(match.group(1).split()).strip().upper()
+    parts = candidate.split()
+    if 2 <= len(parts) <= 6:
+        return candidate
+    return None


_extract_name() only matches Title Case names ([A-Z][a-z]+ ...) but the new Kotak contract test input uses an ALL-CAPS name ("JOHN DOE"), so name will be None and the test/parsed output will not match. Update the regex to also accept all-caps (and/or mixed-case) names before "Monthly statement", while still guarding against overly long/short matches.

Copilot · 2026-04-22T13:05:54Z

+from cc_parser.parsers.tokens import (
+    format_amount,
+    normalize_amount,
+    parse_amount,
+    parse_amount_token,
+    sum_amounts,
+    sum_points,
+)


parse_amount is imported from cc_parser.parsers.tokens but never used in this module. Please remove the unused import to avoid lint noise and keep imports minimal.

anshuman852 added 6 commits April 12, 2026 04:53

feat(yesbank): add dedicated YES BANK statement parser

1358339

Add a bank-specific parser and detection path for YES BANK statements to prevent summary/header rows from being parsed as transactions and to correctly classify Dr/Cr entries.

feat(ssfb): add Suryoday SFB statement parser

2e4fbb0

Add SSFB-specific detection and parsing for statement summary fields while isolating MITC/example-page content so reconciliation remains stable.

Merge origin/master with upstream/master - keep equitas, yesbank, ssfb

d896717

feat(kotak): add Kotak Mahindra Bank statement parser

d218759

Copilot AI review requested due to automatic review settings April 22, 2026 13:00

Copilot started reviewing on behalf of anshuman852 April 22, 2026 13:00 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parsers for YES BANK, Suryoday SFB, and Kotak Mahindra Bank#8

Add parsers for YES BANK, Suryoday SFB, and Kotak Mahindra Bank#8
anshuman852 wants to merge 6 commits into
akhilnarang:masterfrom
anshuman852:master

anshuman852 commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	self.assertEqual(result.statement_total_amount_due, "4186.50")
	self.assertEqual(result.statement_total_amount_due, "4,186.50")

Conversation

anshuman852 commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants