Skip to content

Fix/import#665

Merged
jbpenrath merged 3 commits into
developmentfrom
fix/import
May 20, 2026
Merged

Fix/import#665
jbpenrath merged 3 commits into
developmentfrom
fix/import

Conversation

@jbpenrath
Copy link
Copy Markdown
Contributor

@jbpenrath jbpenrath commented May 12, 2026

Purpose

We get some pst that are unparsable by pypff. To help user to understand
that the issue is coming from the PST file, we improve the exception
raised by pst task and display a custom error message according to the
error format.

Another pst issue has been encounted.
Sent items from shared mailboxes — and many internal Exchange messages —
expose every PR_SENDER_/PR_SENT_REPRESENTING_ slot as an unresolvable
X.500 DN. compose_email then rejected the EML for lack of a valid From
address and pst_tasks silently dropped the message at debug level, so
entire folders disappeared from the import without a trace.

Summary by CodeRabbit

  • New Features

    • Import UI now shows an import recap with success %, imported/total and failure count; warns on high failure rates. Task recap is passed through the import workflow.
  • Bug Fixes

    • PST importer improves sender/recipient resolution for shared/delegated mailboxes, synthesizes safe fallback senders, and ensures deterministic Message-IDs.
    • Fails fast and surfaces a clear "PST unreadable" error for corrupt/unreadable archives.
  • Tests

    • Expanded PST importer and task-level tests covering unreadable PSTs, sender/recipient heuristics, and message-id behavior.
  • Localization / UI

    • Added localized import status and error strings; updated completed-import UI and styles.

Review Change Stack

@jbpenrath jbpenrath self-assigned this May 12, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cd6c7656-a7db-4a2e-88b5-bb293d3ae15e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR enhances PST archive importing: add early PST-readability checks, improve MAPI sender/recipient resolution and deterministic Message-ID fallback, make message read/reconstruction failures explicit, and surface an import recap UI including PST-unreadable errors and high-failure-rate warnings.

Changes

PST Import Enhancement with Better Error Handling and User Feedback

Layer / File(s) Summary
PST constants, imports and readability assertion
src/backend/core/services/importer/pst.py, src/backend/core/mda/rfc5322/composer.py
Adds hashlib import and new MAPI constants, PR_INTERNET_MESSAGE_ID, introduces PSTFileUnreadableError and assert_pst_readable() to validate PST root/store before processing; re-raises EmailComposeError in composer.
SMTP-first sender extraction
src/backend/core/services/importer/pst.py
Reworks _extract_sender_from_mapi to prefer SMTP-valued fields (direct sender, sent-representing, creator/last-modifier) and accept preferred_name to preserve a header display name while sourcing SMTP.
Display-recipient parsing
src/backend/core/services/importer/pst.py
Parses Outlook PR_DISPLAY_TO/CC/BCC semicolon-delimited strings and exposes helper to extract display recipients when recipient table is empty.
Recipient fallback & Message-ID/From synthesis
src/backend/core/services/importer/pst.py
Adds recipient fallback chain and Message-ID helpers that prefer header → PR_INTERNET_MESSAGE_ID → deterministic synthesized ID; trusts header From only when it parses to SMTP and synthesizes Unknown Sender using recipient_email domain when resolution fails.
walk_pst_messages explicit-failure yields
src/backend/core/services/importer/pst.py
Extends walk_pst_messages to accept recipient_email, documents its use as sender-fallback domain, and yields eml_bytes=None for read/reconstruction failures while logging appropriately.
Task integration and failure classification
src/backend/core/services/importer/pst_tasks.py
process_pst_file_task calls assert_pst_readable() after opening PST, passes recipient_email into walk_pst_messages, treats eml_bytes=None yields as reconstruction failures, and handles PSTFileUnreadableError by returning FAILURE with PST_UNREADABLE: error text.
Backend tests for sender/recipient, PST readability, and task behavior
src/backend/core/tests/importer/test_pst_import.py
Adds tests for Message-ID selection/synthesis, unknown-sender fallback, shared-mailbox delegate resolution, display-recipient parsing, assert_pst_readable behavior, and task-level unreadable-PST handling; adds shared-mailbox fixture.
Hook types and modal recap plumbing
src/frontend/src/hooks/use-import-task.ts, src/frontend/src/features/controlled-modals/message-importer/index.tsx
Exports ImportTaskStatus/ImportTaskRecap, exposes failureCount/totalMessages via the hook, and threads ImportTaskRecap through the modal to the completed step.
Loader/completed UI and styling
src/frontend/src/features/controlled-modals/message-importer/step-loader.tsx, src/frontend/src/features/controlled-modals/message-importer/step-completed.tsx, src/frontend/src/features/controlled-modals/message-importer/_index.scss
StepLoader invokes onComplete with a recap and shows PST_UNREADABLE-specific error text on failure; StepCompleted displays imported/failed counts, computes percent and high-failure warning; completed UI styles updated.
Localization for import status and errors
src/frontend/public/locales/common/en-US.json, src/frontend/public/locales/common/fr-FR.json
Adds pluralized failed-count, "High failure rate" label, "Import complete" and progress strings, PST-unreadable error message; removes legacy success-notification key.
End-to-end import UI test
src/e2e/src/__tests__/message-import.spec.ts
E2E test updated to assert new completion UI elements (percent, per-archive counts) and absence of high-failure warning for a single-message import.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • sylvinus
  • sdemagny
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Fix/import' is too vague and generic. It uses a non-descriptive convention (Fix/) without conveying the specific changes or main issues being addressed. Use a more descriptive title that captures the main change, such as 'Improve PST import error handling and X.500 sender recovery' or 'Fix PST parsing failures and shared mailbox sender resolution'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 96.23% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/backend/core/mda/rfc5322/composer.py`:
- Around line 561-564: Add a warning-level log before re-raising
EmailComposeError to surface validation issues without treating them as
unexpected failures: in the except EmailComposeError block (in composer.py,
handling EmailComposeError), call the module/logger's warning method with a
concise message including the exception details (e.g., "Email validation failed"
+ exception info) and then re-raise the error so behavior is unchanged but
operational visibility is improved.

In `@src/backend/core/services/importer/pst.py`:
- Around line 1089-1094: The return type of walk_pst_messages is incorrect
because the generator yields None for message content on failures; update the
annotation and docstring to reflect that the fourth or final yielded payload can
be None (e.g., change the Generator[Tuple[str, str, int, int, bytes], None,
None] typing to Generator[Tuple[str, str, int, int, Optional[bytes]], None,
None] or equivalent Union[bytes, None]) and adjust any local type comments;
locate the function by the name walk_pst_messages and the yield sites that
return None for failed message reads to ensure the typing and
docstring/annotations match the actual yields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 83e3c3b4-1662-4b4f-b234-2ca562a7d5c3

📥 Commits

Reviewing files that changed from the base of the PR and between 1b03e1d and 6c93039.

📒 Files selected for processing (11)
  • src/backend/core/mda/rfc5322/composer.py
  • src/backend/core/services/importer/pst.py
  • src/backend/core/services/importer/pst_tasks.py
  • src/backend/core/tests/importer/test_pst_import.py
  • src/frontend/public/locales/common/en-US.json
  • src/frontend/public/locales/common/fr-FR.json
  • src/frontend/src/features/controlled-modals/message-importer/_index.scss
  • src/frontend/src/features/controlled-modals/message-importer/index.tsx
  • src/frontend/src/features/controlled-modals/message-importer/step-completed.tsx
  • src/frontend/src/features/controlled-modals/message-importer/step-loader.tsx
  • src/frontend/src/hooks/use-import-task.ts

Comment thread src/backend/core/mda/rfc5322/composer.py Outdated
Comment thread src/backend/core/services/importer/pst.py
@jbpenrath jbpenrath force-pushed the fix/import branch 3 times, most recently from a8dc8be to 912fa93 Compare May 19, 2026 13:00
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/backend/core/services/importer/pst.py`:
- Around line 959-963: The warning currently logs PII by including
fallback_email and jmap_data.get("subject"); change the logger.warning call (the
one referencing fallback_email and jmap_data) to emit a non-PII message such as
"PST message has no resolvable sender; synthesizing sender" or include only a
non-identifying token (e.g. message id if safe) instead of fallback_email or
subject so no email addresses or subject content are written to logs.
- Line 1111: The generator signature for walk_pst_messages incorrectly types the
flag_status element as int while _get_flag_status() returns Optional[int];
update the return type in walk_pst_messages (and the other affected generator
signatures/ranges) so the yielded tuple uses Optional[int] for the flag_status
position to reflect that PR_FLAG_STATUS can be absent, and ensure any related
type annotations or type hints in functions referenced (e.g., walk_pst_messages
and calls that consume its tuples) are adjusted accordingly.

In
`@src/frontend/src/features/controlled-modals/message-importer/step-completed.tsx`:
- Line 53: The current label uses t('Failed: {{failed}} messages', { failed:
failure }) which yields incorrect grammar for singular; change the call in the
StepCompleted component (the t(...) invocation that renders the failure recap)
to use i18next count-based pluralization by providing a plural key (e.g. 'failed
messages') and passing { count: failure } instead of { failed: failure }. Add
corresponding locale entries using the _one and _other suffixes (e.g. "failed
messages_one" and "failed messages_other") in your translation JSONs to supply
singular and plural text (use {{count}} in the messages). Ensure any other
occurrences using the same pattern are updated to follow the same convention.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c671dc9e-e7b4-453b-9e8d-9b8ae5051c36

📥 Commits

Reviewing files that changed from the base of the PR and between 6c93039 and 912fa93.

📒 Files selected for processing (12)
  • src/backend/core/mda/rfc5322/composer.py
  • src/backend/core/services/importer/pst.py
  • src/backend/core/services/importer/pst_tasks.py
  • src/backend/core/tests/importer/test_pst_import.py
  • src/e2e/src/__tests__/message-import.spec.ts
  • src/frontend/public/locales/common/en-US.json
  • src/frontend/public/locales/common/fr-FR.json
  • src/frontend/src/features/controlled-modals/message-importer/_index.scss
  • src/frontend/src/features/controlled-modals/message-importer/index.tsx
  • src/frontend/src/features/controlled-modals/message-importer/step-completed.tsx
  • src/frontend/src/features/controlled-modals/message-importer/step-loader.tsx
  • src/frontend/src/hooks/use-import-task.ts

Comment thread src/backend/core/services/importer/pst.py
Comment thread src/backend/core/services/importer/pst.py Outdated
Comment thread src/frontend/src/features/controlled-modals/message-importer/step-completed.tsx Outdated
@jbpenrath jbpenrath force-pushed the fix/import branch 2 times, most recently from 9116d33 to 9c0e256 Compare May 19, 2026 14:51
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/backend/core/services/importer/pst_tasks.py (1)

108-116: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move pst.open_file_object(reader) inside the try block to catch malformed PST files at open time.

open_file_object() can throw OSError on malformed archives (per pypff's strict format validation), but currently sits outside the try block at line 112. Any exception from the file open is therefore caught by the generic except Exception handler at line 278 instead of the dedicated except PSTFileUnreadableError handler at line 262, causing malformed files to return a generic error instead of PST_UNREADABLE.

Proposed fix
             # Open PST file
             pst = pypff.file()
-            pst.open_file_object(reader)
-
             try:
+                pst.open_file_object(reader)
                 # Fail fast on archives whose MAPI tree is broken — otherwise
                 # the traversal crashes later with an opaque AttributeError.
                 assert_pst_readable(pst)

Note: The except PSTFileUnreadableError handler expects that exception type. If you need to also classify OSError from open_file_object() as unreadable, wrap the call in a try-except that converts the error, or add a separate handler for OSError that yields the same PST_UNREADABLE result.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/backend/core/services/importer/pst_tasks.py` around lines 108 - 116, Move
the call to pst.open_file_object(reader) inside the same try block that calls
assert_pst_readable(pst) so any OSError raised while opening a malformed PST is
caught and handled as unreadable; specifically, update the try around
assert_pst_readable to also include pst.open_file_object(reader) or wrap
pst.open_file_object(reader) in a small try/except that converts OSError into
PSTFileUnreadableError so the existing except PSTFileUnreadableError handler
will run and yield PST_UNREADABLE.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/backend/core/services/importer/pst_tasks.py`:
- Around line 145-152: The branch inside the loop in pst_tasks.py that handles
eml_bytes is None increments current_message and failure_count but then
continues without calling update_state(), leaving the frontend with stale
progress; modify the eml_bytes is None branch in the function that iterates PST
messages (the loop updating current_message/failure_count) to call the same
update_state() invocation used for successful messages before continuing so
every failure also publishes progress updates to the frontend.

In `@src/backend/core/services/importer/pst.py`:
- Around line 743-755: The debug logging in the display-recipient parsing block
leaks raw recipient tokens (the logger.debug calls that include token); remove
or replace those calls so they never emit full PII — e.g., drop the
logger.debug("Failed to parse display recipient token %r", token) and
logger.debug("Dropping display recipient with no email: %r", token) usages and
instead log a generic message or a non-PII metric (like "failed to parse display
recipient" or count/hashed/masked info). Update the code around
parse_email_address(...) and when building addresses with
_addr_tuple_to_dict(...) to ensure no raw token value is printed to logs.
- Around line 818-835: The function _apply_recipient_fallback_chain currently
returns early if any of jmap_data["to"], ["cc"], or ["bcc"] exist, which
prevents filling missing fields; change the early-return logic so it only
returns when all three recipient fields are already present (i.e., require AND
instead of OR) or remove the return and let the existing loop fill only the
missing keys. Keep the rest of the logic using _extract_recipients_from_mapi and
_extract_display_recipients_from_mapi and ensure the final for-loop only assigns
jmap_data[key] when recipients[key] is present and jmap_data[key] is not.

In `@src/frontend/public/locales/common/en-US.json`:
- Around line 319-320: Update the singular recap translations so the `_one` keys
use singular wording: change the value for "Failed: {{count}} messages_one" to
use "message" (not "messages"), and likewise fix any other `_one` keys (e.g.,
the pair currently at the other occurrence) so they render "Failed: {{count}}
message" for count=1 while leaving the `_other` keys unchanged.

---

Outside diff comments:
In `@src/backend/core/services/importer/pst_tasks.py`:
- Around line 108-116: Move the call to pst.open_file_object(reader) inside the
same try block that calls assert_pst_readable(pst) so any OSError raised while
opening a malformed PST is caught and handled as unreadable; specifically,
update the try around assert_pst_readable to also include
pst.open_file_object(reader) or wrap pst.open_file_object(reader) in a small
try/except that converts OSError into PSTFileUnreadableError so the existing
except PSTFileUnreadableError handler will run and yield PST_UNREADABLE.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0146b9d4-d49e-4b99-b1a0-27fcfcd7b8a6

📥 Commits

Reviewing files that changed from the base of the PR and between 912fa93 and 9c0e256.

📒 Files selected for processing (12)
  • src/backend/core/mda/rfc5322/composer.py
  • src/backend/core/services/importer/pst.py
  • src/backend/core/services/importer/pst_tasks.py
  • src/backend/core/tests/importer/test_pst_import.py
  • src/e2e/src/__tests__/message-import.spec.ts
  • src/frontend/public/locales/common/en-US.json
  • src/frontend/public/locales/common/fr-FR.json
  • src/frontend/src/features/controlled-modals/message-importer/_index.scss
  • src/frontend/src/features/controlled-modals/message-importer/index.tsx
  • src/frontend/src/features/controlled-modals/message-importer/step-completed.tsx
  • src/frontend/src/features/controlled-modals/message-importer/step-loader.tsx
  • src/frontend/src/hooks/use-import-task.ts

Comment thread src/backend/core/services/importer/pst_tasks.py
Comment thread src/backend/core/services/importer/pst.py Outdated
Comment thread src/backend/core/services/importer/pst.py
Comment thread src/frontend/public/locales/common/en-US.json Outdated
@jbpenrath jbpenrath changed the base branch from main to development May 19, 2026 21:19
jbpenrath added 2 commits May 19, 2026 23:21
We get some pst that are unparsable by pypff. To help user to understand
that the issue is coming from the PST file, we improve the exception
raised by pst task and display a custom error message according to the
error format.
Sent items from shared mailboxes — and many internal Exchange messages —
expose every PR_SENDER_*/PR_SENT_REPRESENTING_* slot as an unresolvable
X.500 DN. compose_email then rejected the EML for lack of a valid From
address and pst_tasks silently dropped the message at debug level, so
entire folders disappeared from the import without a trace.
@jbpenrath jbpenrath force-pushed the fix/import branch 2 times, most recently from d6177db to 8c823ab Compare May 19, 2026 21:45
PST messages without transport_headers (drafts, locally composed items)
were reconstructed with no Message-ID at all, and Exchange/O365 exports
sometimes drop the header even on received items. With no mime_id to
key on, deliver_inbound_message skipped its dedup check and inserted
the same message on every import — and even twice within a single
import when the same message appeared in multiple Outlook folders.
@jbpenrath jbpenrath merged commit f054040 into development May 20, 2026
9 checks passed
@jbpenrath jbpenrath deleted the fix/import branch May 20, 2026 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant