Skip to content

Commit ff59a2a

Browse files
committed
V0.5.1 dev (#341)
* OpenAPI root url fix * Journaling OSS setup * feat: add preserve-original-file mode for email ingestion for GoBD compliance - Add `preserveOriginalFile` option to ingestion sources and connectors - Stream original EML/MBOX/PST emails to temp files instead of holding full buffers in memory, reducing memory allocation during ingestion - Skip attachment binary extraction and EML re-serialization when preserve mode is enabled; use raw file on disk as source of truth - Update `EmailObject` to use `tempFilePath` instead of in-memory `eml` buffer across all connectors (EML, MBOX, PST) - Add new database migration (0032) for `preserve_original_file` column - Add frontend UI toggle with tooltip (tippy.js) for the new option - Replace console.warn calls with structured pino logger in connectors * add isjournaled property to archived_email * feat(ingestion): add unmerge ingestion source functionality Introduces the ability to detach a child ingestion source from its merge group, making it a standalone root source. Changes include: - Add `unmerge` controller method with auth and error handling - Add POST `/v1/ingestion-sources/{id}/unmerge` route with OpenAPI docs - Implement `IngestionService.unmerge` backend logic - Add unmerge UI action and handler in the frontend ingestion view - Fix bulk delete to also remove children of deleted root sources - Update docs with new API operation and merging sources user guide * code formatting * Database migration file for enum `partially_active` * Error handling improvement
1 parent e5e1195 commit ff59a2a

77 files changed

Lines changed: 13053 additions & 446 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.env.example

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,3 +104,24 @@ ENCRYPTION_KEY=
104104
# Apache Tika Integration
105105
# ONLY active if TIKA_URL is set
106106
TIKA_URL=http://tika:9998
107+
108+
109+
# Enterprise features (Skip this part if you are using the open-source version)
110+
111+
# Batch size for managing retention policy lifecycle. (This number of emails will be checked each time when retention policy scans the database. Adjust based on your system capability.)
112+
RETENTION_BATCH_SIZE=1000
113+
114+
# --- SMTP Journaling (Enterprise only) ---
115+
# The port the embedded SMTP journaling listener binds to inside the container.
116+
# This is the port your MTA (Exchange, MS365, Postfix, etc.) will send journal reports to.
117+
# The docker-compose.yml maps this same port on the host side by default.
118+
SMTP_JOURNALING_PORT=2525
119+
# The domain used to generate routing addresses for journaling sources.
120+
# Each source gets a unique address like journal-<id>@<domain>.
121+
# Set this to the domain/subdomain whose MX record points to this server.
122+
SMTP_JOURNALING_DOMAIN=journal.yourdomain.com
123+
# Maximum number of waiting jobs in the journal queue before the SMTP listener
124+
# returns 4xx temporary failures (backpressure). The MTA will retry automatically.
125+
JOURNAL_QUEUE_BACKPRESSURE_THRESHOLD=10000
126+
#BullMQ worker concurrency for processing journaled emails. Increase on servers with more CPU cores.
127+
JOURNAL_WORKER_CONCURRENCY=3

docker-compose.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ services:
66
container_name: open-archiver
77
restart: unless-stopped
88
ports:
9-
- '3000:3000' # Frontend
9+
- '${PORT_FRONTEND:-3000}:3000' # Frontend
1010
env_file:
1111
- .env
1212
volumes:
@@ -42,7 +42,7 @@ services:
4242
- open-archiver-net
4343

4444
meilisearch:
45-
image: getmeili/meilisearch:v1.15
45+
image: getmeili/meilisearch:v1.38
4646
container_name: meilisearch
4747
restart: unless-stopped
4848
environment:

docs/api/ingestion.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,7 @@ Manage ingestion sources — the configured connections to email providers (Goog
3737
## Force Sync
3838

3939
<OAOperation operationId="triggerForceSync" />
40+
41+
## Unmerge an Ingestion Source
42+
43+
<OAOperation operationId="unmergeIngestionSource" />

docs/api/openapi.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
},
1616
"servers": [
1717
{
18-
"url": "http://localhost:3001",
18+
"url": "http://localhost:3000",
1919
"description": "Local development"
2020
}
2121
],

docs/user-guides/email-providers/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@ Choose your provider from the list below to get started:
1010
- [EML Import](./eml.md)
1111
- [PST Import](./pst.md)
1212
- [Mbox Import](./mbox.md)
13+
- [Merging Ingestion Sources](./merging-sources.md)
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Merging Ingestion Sources
2+
3+
Merged ingestion groups let you combine multiple ingestion sources so that their emails appear unified in browsing, search, and thread views. This is useful when you want to pair a historical archive (for example, a PST or Mbox import) with a live connection, or when migrating between providers.
4+
5+
## Concepts
6+
7+
| Term | Definition |
8+
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
9+
| **Root source** | An ingestion source where no merge parent is set. Shown as the primary row in the Ingestions table. All emails in the group are physically owned by the root. |
10+
| **Child source** | An ingestion source merged into a root. Acts as a fetch assistant — it connects to the provider and retrieves emails, but all data is stored under the root source. |
11+
| **Group** | A root source and all its children. All emails from every member are stored under and owned by the root. |
12+
13+
The hierarchy is **flat** — only one level of nesting is supported. If you merge a source into a child, the system automatically redirects the relationship to the root.
14+
15+
## Root Ownership — How Storage and Data Work
16+
17+
This is the key design principle of merged sources:
18+
19+
> **Child sources are assistants. They fetch emails from their provider but never own any stored data. Every email ingested by a child is written to the root source's storage folder and assigned the root source's ID in the database.**
20+
21+
In practical terms:
22+
23+
- The storage path for every email belongs to the root: `openarchiver/{root-name}-{root-id}/emails/...`
24+
- Every `archived_emails` database row created by a child ingestion will have `ingestionSourceId` set to the **root's ID**, not the child's.
25+
- Attachments are also stored under the root's folder and scoped to the root's ID.
26+
- The root's **Preserve Original File** (GoBD compliance) setting is inherited by all children in the group. A child's own `preserveOriginalFile` setting is ignored during ingestion — only the root's setting applies.
27+
28+
This means browsing the root source's emails will show all emails from the entire group, including those fetched by child sources, without any extra configuration.
29+
30+
## When to Use Merged Sources
31+
32+
- **Historical + live**: Import a PST archive and merge it into an active IMAP or Google Workspace connection so historical and current emails appear in one unified mailbox.
33+
- **Provider migration**: Add a new Microsoft 365 connector and merge it with your existing Google Workspace connector during a cutover period.
34+
- **Backfill**: Import an Mbox export and merge it with a live connection to cover a gap in the archive.
35+
36+
## How to Merge a New Source Into an Existing One
37+
38+
Merging can only be configured **at creation time**.
39+
40+
1. Navigate to the **Ingestions** page.
41+
2. Click **Create New** to open the ingestion source form.
42+
3. Fill in the provider details as usual.
43+
4. Expand the **Advanced Options** section at the bottom of the form. This section is only visible when at least one ingestion source already exists.
44+
5. Check **Merge into existing ingestion** and select the target root source from the dropdown.
45+
6. Click **Submit**.
46+
47+
The new source will run its initial import normally. Once complete, its emails will appear alongside those of the root source — all stored under the root.
48+
49+
## How Emails Appear When Merged
50+
51+
When you browse archived emails for a root source, you see all emails in the group because they are all physically owned by the root. There is nothing to aggregate — the data is already unified at the storage and database level.
52+
53+
The same applies to search: filtering by a root source ID returns all emails in the group.
54+
55+
Threads also span the merge group. If a reply arrived via a different source than the original message, it still appears in the correct thread.
56+
57+
## How Syncing Works
58+
59+
Each source syncs **independently**. The scheduler picks up all sources with status `active` or `error`, regardless of whether they are merged.
60+
61+
- File-based imports (PST, EML, Mbox) finish with status `imported` and are never re-synced automatically.
62+
- Live sources (IMAP, Google Workspace, Microsoft 365) continue their normal sync cycle.
63+
64+
When you trigger **Force Sync** on a root source, the system also queues a sync for all non-file-based children that are currently `active` or `error`.
65+
66+
## Deduplication Across the Group
67+
68+
When ingesting emails, duplicate detection covers the **entire merge group**. If the same email (matched by its RFC `Message-ID` header or provider-specific ID) already exists anywhere in the group, it is skipped and not stored again.
69+
70+
## Preserve Original File (GoBD Compliance) and Merged Sources
71+
72+
The **Preserve Original File** setting on the root source governs the entire group. When this setting is enabled on the root:
73+
74+
- All emails ingested by child sources are also stored unmodified (raw EML, no attachment stripping).
75+
- The child's own `preserveOriginalFile` setting has no effect — the root's setting is always used.
76+
77+
This ensures consistent compliance behaviour across the group. If you require GoBD or SEC 17a-4 compliance for an entire merged group, enable **Preserve Original File** on the root source before adding any children.
78+
79+
## Editing Sources in a Group
80+
81+
Each source in a group can be edited independently. Expand the group row in the Ingestions table by clicking the chevron, then use the **** actions menu on the specific source (root or child) you want to edit.
82+
83+
## Unmerging a Child Source
84+
85+
To detach a child from its group and make it standalone:
86+
87+
1. Expand the group row by clicking the chevron next to the root source name.
88+
2. Open the **** actions menu on the child source.
89+
3. Click **Unmerge**.
90+
91+
The child becomes an independent root source. No email data is moved or deleted.
92+
93+
> **Note:** Because all emails fetched by the child were stored under the root source's ID, unmerging the child does not transfer those emails. Historical emails ingested while the source was a child remain owned by the root. Only new emails ingested after unmerging will be stored under the (now standalone) child.
94+
95+
## Deleting Sources in a Group
96+
97+
- **Deleting a root source** also deletes all its children: their configuration, and all emails, attachments, storage files, and search index entries owned by the root are all removed. Because all group emails are stored under the root, this effectively removes the entire group's archive.
98+
- **Deleting a child source** removes only the child's configuration and sync state. Emails already ingested by the child are stored under the root and are **not** deleted.
99+
100+
A warning is shown in the delete confirmation dialog when a root source has children.
101+
102+
## Known Limitations
103+
104+
- **Merging existing standalone sources is not supported.** You can only merge a source into a group at creation time. To merge two existing sources, you must delete one and recreate it with the merge target selected.
105+
- **Historical data from a child source before unmerging remains with the root.** If you unmerge a child, emails it previously ingested stay owned by the root and are not migrated to the child.

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "open-archiver",
3-
"version": "0.5.0",
3+
"version": "0.5.1",
44
"private": true,
55
"license": "SEE LICENSE IN LICENSE file",
66
"scripts": {

packages/backend/scripts/generate-openapi-spec.mjs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ const options = {
3131
},
3232
servers: [
3333
{
34-
url: 'http://localhost:3001',
34+
url: 'http://localhost:3000',
3535
description: 'Local development',
3636
},
3737
],

packages/backend/src/api/controllers/ingestion.controller.ts

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,31 @@ export class IngestionController {
177177
}
178178
};
179179

180+
public unmerge = async (req: Request, res: Response): Promise<Response> => {
181+
try {
182+
const { id } = req.params;
183+
const userId = req.user?.sub;
184+
if (!userId) {
185+
return res.status(401).json({ message: req.t('errors.unauthorized') });
186+
}
187+
const actor = await this.userService.findById(userId);
188+
if (!actor) {
189+
return res.status(401).json({ message: req.t('errors.unauthorized') });
190+
}
191+
const updatedSource = await IngestionService.unmerge(id, actor, req.ip || 'unknown');
192+
const safeSource = this.toSafeIngestionSource(updatedSource);
193+
return res.status(200).json(safeSource);
194+
} catch (error) {
195+
logger.error({ err: error }, `Unmerge ingestion source ${req.params.id} error`);
196+
if (error instanceof Error && error.message === 'Ingestion source not found') {
197+
return res.status(404).json({ message: req.t('ingestion.notFound') });
198+
} else if (error instanceof Error) {
199+
return res.status(400).json({ message: error.message });
200+
}
201+
return res.status(500).json({ message: req.t('errors.internalServerError') });
202+
}
203+
};
204+
180205
public triggerForceSync = async (req: Request, res: Response): Promise<Response> => {
181206
try {
182207
const { id } = req.params;

packages/backend/src/api/controllers/storage.controller.ts

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,11 @@ export class StorageController {
4040

4141
const fileStream = await this.storageService.get(safePath);
4242
const fileName = path.basename(safePath);
43-
res.setHeader('Content-Disposition', `attachment; filename="${fileName}"`);
43+
const encodedFileName = encodeURIComponent(fileName);
44+
res.setHeader(
45+
'Content-Disposition',
46+
`attachment; filename="${encodedFileName}"; filename*=UTF-8''${encodedFileName}`
47+
);
4448
fileStream.pipe(res);
4549
} catch (error) {
4650
console.error('Error downloading file:', error);

0 commit comments

Comments
 (0)