Fix/import#665
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR enhances PST archive importing: add early PST-readability checks, improve MAPI sender/recipient resolution and deterministic Message-ID fallback, make message read/reconstruction failures explicit, and surface an import recap UI including PST-unreadable errors and high-failure-rate warnings. ChangesPST Import Enhancement with Better Error Handling and User Feedback
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/backend/core/mda/rfc5322/composer.py`:
- Around line 561-564: Add a warning-level log before re-raising
EmailComposeError to surface validation issues without treating them as
unexpected failures: in the except EmailComposeError block (in composer.py,
handling EmailComposeError), call the module/logger's warning method with a
concise message including the exception details (e.g., "Email validation failed"
+ exception info) and then re-raise the error so behavior is unchanged but
operational visibility is improved.
In `@src/backend/core/services/importer/pst.py`:
- Around line 1089-1094: The return type of walk_pst_messages is incorrect
because the generator yields None for message content on failures; update the
annotation and docstring to reflect that the fourth or final yielded payload can
be None (e.g., change the Generator[Tuple[str, str, int, int, bytes], None,
None] typing to Generator[Tuple[str, str, int, int, Optional[bytes]], None,
None] or equivalent Union[bytes, None]) and adjust any local type comments;
locate the function by the name walk_pst_messages and the yield sites that
return None for failed message reads to ensure the typing and
docstring/annotations match the actual yields.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 83e3c3b4-1662-4b4f-b234-2ca562a7d5c3
📒 Files selected for processing (11)
src/backend/core/mda/rfc5322/composer.pysrc/backend/core/services/importer/pst.pysrc/backend/core/services/importer/pst_tasks.pysrc/backend/core/tests/importer/test_pst_import.pysrc/frontend/public/locales/common/en-US.jsonsrc/frontend/public/locales/common/fr-FR.jsonsrc/frontend/src/features/controlled-modals/message-importer/_index.scsssrc/frontend/src/features/controlled-modals/message-importer/index.tsxsrc/frontend/src/features/controlled-modals/message-importer/step-completed.tsxsrc/frontend/src/features/controlled-modals/message-importer/step-loader.tsxsrc/frontend/src/hooks/use-import-task.ts
a8dc8be to
912fa93
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/backend/core/services/importer/pst.py`:
- Around line 959-963: The warning currently logs PII by including
fallback_email and jmap_data.get("subject"); change the logger.warning call (the
one referencing fallback_email and jmap_data) to emit a non-PII message such as
"PST message has no resolvable sender; synthesizing sender" or include only a
non-identifying token (e.g. message id if safe) instead of fallback_email or
subject so no email addresses or subject content are written to logs.
- Line 1111: The generator signature for walk_pst_messages incorrectly types the
flag_status element as int while _get_flag_status() returns Optional[int];
update the return type in walk_pst_messages (and the other affected generator
signatures/ranges) so the yielded tuple uses Optional[int] for the flag_status
position to reflect that PR_FLAG_STATUS can be absent, and ensure any related
type annotations or type hints in functions referenced (e.g., walk_pst_messages
and calls that consume its tuples) are adjusted accordingly.
In
`@src/frontend/src/features/controlled-modals/message-importer/step-completed.tsx`:
- Line 53: The current label uses t('Failed: {{failed}} messages', { failed:
failure }) which yields incorrect grammar for singular; change the call in the
StepCompleted component (the t(...) invocation that renders the failure recap)
to use i18next count-based pluralization by providing a plural key (e.g. 'failed
messages') and passing { count: failure } instead of { failed: failure }. Add
corresponding locale entries using the _one and _other suffixes (e.g. "failed
messages_one" and "failed messages_other") in your translation JSONs to supply
singular and plural text (use {{count}} in the messages). Ensure any other
occurrences using the same pattern are updated to follow the same convention.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c671dc9e-e7b4-453b-9e8d-9b8ae5051c36
📒 Files selected for processing (12)
src/backend/core/mda/rfc5322/composer.pysrc/backend/core/services/importer/pst.pysrc/backend/core/services/importer/pst_tasks.pysrc/backend/core/tests/importer/test_pst_import.pysrc/e2e/src/__tests__/message-import.spec.tssrc/frontend/public/locales/common/en-US.jsonsrc/frontend/public/locales/common/fr-FR.jsonsrc/frontend/src/features/controlled-modals/message-importer/_index.scsssrc/frontend/src/features/controlled-modals/message-importer/index.tsxsrc/frontend/src/features/controlled-modals/message-importer/step-completed.tsxsrc/frontend/src/features/controlled-modals/message-importer/step-loader.tsxsrc/frontend/src/hooks/use-import-task.ts
9116d33 to
9c0e256
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/core/services/importer/pst_tasks.py (1)
108-116:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMove
pst.open_file_object(reader)inside the try block to catch malformed PST files at open time.
open_file_object()can throwOSErroron malformed archives (per pypff's strict format validation), but currently sits outside the try block at line 112. Any exception from the file open is therefore caught by the genericexcept Exceptionhandler at line 278 instead of the dedicatedexcept PSTFileUnreadableErrorhandler at line 262, causing malformed files to return a generic error instead ofPST_UNREADABLE.Proposed fix
# Open PST file pst = pypff.file() - pst.open_file_object(reader) - try: + pst.open_file_object(reader) # Fail fast on archives whose MAPI tree is broken — otherwise # the traversal crashes later with an opaque AttributeError. assert_pst_readable(pst)Note: The
except PSTFileUnreadableErrorhandler expects that exception type. If you need to also classifyOSErrorfromopen_file_object()as unreadable, wrap the call in a try-except that converts the error, or add a separate handler forOSErrorthat yields the samePST_UNREADABLEresult.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/backend/core/services/importer/pst_tasks.py` around lines 108 - 116, Move the call to pst.open_file_object(reader) inside the same try block that calls assert_pst_readable(pst) so any OSError raised while opening a malformed PST is caught and handled as unreadable; specifically, update the try around assert_pst_readable to also include pst.open_file_object(reader) or wrap pst.open_file_object(reader) in a small try/except that converts OSError into PSTFileUnreadableError so the existing except PSTFileUnreadableError handler will run and yield PST_UNREADABLE.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/backend/core/services/importer/pst_tasks.py`:
- Around line 145-152: The branch inside the loop in pst_tasks.py that handles
eml_bytes is None increments current_message and failure_count but then
continues without calling update_state(), leaving the frontend with stale
progress; modify the eml_bytes is None branch in the function that iterates PST
messages (the loop updating current_message/failure_count) to call the same
update_state() invocation used for successful messages before continuing so
every failure also publishes progress updates to the frontend.
In `@src/backend/core/services/importer/pst.py`:
- Around line 743-755: The debug logging in the display-recipient parsing block
leaks raw recipient tokens (the logger.debug calls that include token); remove
or replace those calls so they never emit full PII — e.g., drop the
logger.debug("Failed to parse display recipient token %r", token) and
logger.debug("Dropping display recipient with no email: %r", token) usages and
instead log a generic message or a non-PII metric (like "failed to parse display
recipient" or count/hashed/masked info). Update the code around
parse_email_address(...) and when building addresses with
_addr_tuple_to_dict(...) to ensure no raw token value is printed to logs.
- Around line 818-835: The function _apply_recipient_fallback_chain currently
returns early if any of jmap_data["to"], ["cc"], or ["bcc"] exist, which
prevents filling missing fields; change the early-return logic so it only
returns when all three recipient fields are already present (i.e., require AND
instead of OR) or remove the return and let the existing loop fill only the
missing keys. Keep the rest of the logic using _extract_recipients_from_mapi and
_extract_display_recipients_from_mapi and ensure the final for-loop only assigns
jmap_data[key] when recipients[key] is present and jmap_data[key] is not.
In `@src/frontend/public/locales/common/en-US.json`:
- Around line 319-320: Update the singular recap translations so the `_one` keys
use singular wording: change the value for "Failed: {{count}} messages_one" to
use "message" (not "messages"), and likewise fix any other `_one` keys (e.g.,
the pair currently at the other occurrence) so they render "Failed: {{count}}
message" for count=1 while leaving the `_other` keys unchanged.
---
Outside diff comments:
In `@src/backend/core/services/importer/pst_tasks.py`:
- Around line 108-116: Move the call to pst.open_file_object(reader) inside the
same try block that calls assert_pst_readable(pst) so any OSError raised while
opening a malformed PST is caught and handled as unreadable; specifically,
update the try around assert_pst_readable to also include
pst.open_file_object(reader) or wrap pst.open_file_object(reader) in a small
try/except that converts OSError into PSTFileUnreadableError so the existing
except PSTFileUnreadableError handler will run and yield PST_UNREADABLE.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0146b9d4-d49e-4b99-b1a0-27fcfcd7b8a6
📒 Files selected for processing (12)
src/backend/core/mda/rfc5322/composer.pysrc/backend/core/services/importer/pst.pysrc/backend/core/services/importer/pst_tasks.pysrc/backend/core/tests/importer/test_pst_import.pysrc/e2e/src/__tests__/message-import.spec.tssrc/frontend/public/locales/common/en-US.jsonsrc/frontend/public/locales/common/fr-FR.jsonsrc/frontend/src/features/controlled-modals/message-importer/_index.scsssrc/frontend/src/features/controlled-modals/message-importer/index.tsxsrc/frontend/src/features/controlled-modals/message-importer/step-completed.tsxsrc/frontend/src/features/controlled-modals/message-importer/step-loader.tsxsrc/frontend/src/hooks/use-import-task.ts
We get some pst that are unparsable by pypff. To help user to understand that the issue is coming from the PST file, we improve the exception raised by pst task and display a custom error message according to the error format.
Sent items from shared mailboxes — and many internal Exchange messages — expose every PR_SENDER_*/PR_SENT_REPRESENTING_* slot as an unresolvable X.500 DN. compose_email then rejected the EML for lack of a valid From address and pst_tasks silently dropped the message at debug level, so entire folders disappeared from the import without a trace.
d6177db to
8c823ab
Compare
PST messages without transport_headers (drafts, locally composed items) were reconstructed with no Message-ID at all, and Exchange/O365 exports sometimes drop the header even on received items. With no mime_id to key on, deliver_inbound_message skipped its dedup check and inserted the same message on every import — and even twice within a single import when the same message appeared in multiple Outlook folders.
Purpose
We get some pst that are unparsable by pypff. To help user to understand
that the issue is coming from the PST file, we improve the exception
raised by pst task and display a custom error message according to the
error format.
Another pst issue has been encounted.
Sent items from shared mailboxes — and many internal Exchange messages —
expose every PR_SENDER_/PR_SENT_REPRESENTING_ slot as an unresolvable
X.500 DN. compose_email then rejected the EML for lack of a valid From
address and pst_tasks silently dropped the message at debug level, so
entire folders disappeared from the import without a trace.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Localization / UI