Skip to content

Commit a09067c

Browse files
committed
Merge #312: docs: [#310] research database backup strategies
1168780 docs: [#310] add lessons learned from backup POC implementation (Jose Celano) 74a082d docs: [#310] update issue spec progress - all tasks complete (Jose Celano) c92c199 feat: [#310] add SQLite backup support to backup container (Jose Celano) c7cc2c5 docs: [#310] recommend maintenance-window as primary backup solution (Jose Celano) 2d97bf1 docs: [#310] add unit tests and improve documentation clarity (Jose Celano) 1c63945 docs: [#310] add maintenance-window solution artifacts and update default backup interval (Jose Celano) b9fd407 docs: [#310] add Torrust Demo database analysis for exclude-statistics (Jose Celano) 717b4b8 docs: [#310] add alternative backup solutions for large databases (Jose Celano) 6cff3b6 docs: [#310] reorganize backup-strategies folder structure (Jose Celano) 7b3fb8a docs: [#310] complete Phase 7 documentation and update conclusions (Jose Celano) 7ea89cf docs: [#310] add SQLite large database backup findings (Jose Celano) d040367 docs: [#310] Phase 6 - Restore validation complete (Jose Celano) e370b48 test: [#310] add unit tests for backup script with bats-core (Jose Celano) e9b506c refactor: [#310] add Rust-style documentation to backup script (Jose Celano) bf7e719 docs: [#310] implement Phase 5 - backup maintenance (packaging & retention) (Jose Celano) 4eb29fe docs: [#310] update Phase 5 plan with maintenance approach (Jose Celano) 7076964 docs: [#310] mark Phase 4 as complete in progress table (Jose Celano) 672fafc docs: [#310] reference issue #313 for Ansible ownership fix (Jose Celano) cb36c17 docs: [#310] add production considerations and fix non-root container (Jose Celano) 745b102 docs: [#310] update Phase 4 documentation with implementation details (Jose Celano) fd8c19a docs: [#310] implement Phase 4 - config files backup (Jose Celano) 550d382 docs: [#310] add backup integrity verification tests to Phase 3 (Jose Celano) bfbedd9 docs: [#310] reorganize artifacts with backup-container folder (Jose Celano) 270ef7c docs: [#310] complete Phase 3 - MySQL backup with mariadb-dump (Jose Celano) 7b661aa docs: [#310] complete Phase 2 - minimal backup container (Jose Celano) 37fca8a docs: [#310] reorganize PoC into structured folder with phase docs (Jose Celano) 1406df4 docs: [#310] complete Phase 1 of sidecar backup PoC (Jose Celano) 8b8aa51 docs: [#310] add PoC plan and answer backup requirements questions (Jose Celano) 2fcf835 docs: [#310] add MySQL backup research and sidecar container solution (Jose Celano) 17b15a8 docs: [#310] add preliminary conclusions for backup research (Jose Celano) 848dbde docs: [#310] research database backup strategies (Jose Celano) Pull request description: ## Summary Comprehensive research documentation for database backup strategies as part of Epic #309 (Add backup support). This PR includes **complete research** for SQLite and MySQL backup strategies, backup tools evaluation, container backup architectures, a **working proof-of-concept backup container** with **58 bats-core unit tests**, and a **recommended solution** (Maintenance Window Hybrid approach). ## What's Included ### Database Backup Strategies #### SQLite - **Backup approaches**: `.backup` command (Online Backup API), `VACUUM INTO`, file copy risks - **WAL mode analysis**: Checkpointing behavior, persistence, pros/cons - **Backup verification and restore procedures**: Integrity checks, recovery steps - **Torrust Live Demo analysis**: Current implementation (unsafe `cp`), proposed improvements - **⚠️ Critical Large Database Finding**: SQLite `.backup` stalls at 10% after 16+ hours for 17GB database (~37 MB/hour effective rate). Maintenance window backup completes in 72 seconds. #### MySQL - **Backup approaches**: `mysqldump`, physical backups, binary log backups - **Container-specific considerations**: Accessing MySQL in Docker containers - **Backup verification and restore procedures** ### Container Backup Architectures - 5 patterns documented: Host Crontab, Centralized, Sidecar, Orchestrator, External Tool - Comparison matrix with pros/cons - Decision flowchart for pattern selection ### Backup Tools Evaluation - ✅ **Restic**: Recommended - mature, encrypted, deduplicated, Docker support - ⚠️ **Kopia**: Alternative - newer, more features (GUI, ECC, server mode), less mature - ❌ **Rustic**: Discarded - beta status, not production-ready - Two-phase backup approach documented (DB dump → file backup) ### Solution Comparison (NEW) Four backup solutions evaluated with detailed trade-off analysis: | Solution | Best For | Complexity | |----------|----------|------------| | Continuous Sidecar | Hot backups, simple setup | Low | | **Maintenance Window** | Large DBs, complete consistency | Medium | | External Scheduler | Multi-service environments | High | | Native Database | WAL-enabled SQLite | Low | **Recommended Solution**: Maintenance Window Hybrid (95% container, 5% host script) ### Maintenance Window Backup POC (Complete - NEW) A working proof-of-concept with **58 bats-core unit tests** supporting both MySQL and SQLite: | Feature | Status | |---------|--------| | MySQL backup with mysqldump | ✅ Complete | | SQLite backup with sqlite3 | ✅ Complete | | Config file backup | ✅ Complete | | Retention policy (delete old backups) | ✅ Complete | | Single mode (run once, exit) | ✅ Complete | | Continuous mode (loop) | ✅ Complete | | Host orchestration script | ✅ Complete | | Crontab configuration | ✅ Complete | | 58 unit tests | ✅ All passing | **POC Artifacts**: - Multi-stage Dockerfile with MySQL and SQLite support - `backup.sh` script with modular functions - `maintenance-backup.sh` host orchestration script - Docker Compose examples for MySQL and SQLite - Production and test crontab configurations - Lessons learned document with implementation concerns ## Key Findings | Finding | Details | |---------|---------| | SQLite Safe Backup | Use `.backup` command (Online Backup API) - safe during concurrent writes | | SQLite Large DB Limitation | `.backup` impractical for DBs > 1GB due to locking overhead (~37 MB/hour) | | Maintenance Window Backup | 72 seconds for 17GB SQLite (vs ~17 days with `.backup`) | | Disk I/O Capacity | 445 MB/s proven - SQLite locking is bottleneck, not disk | | MySQL Backup | `mysqldump` works reliably for containerized deployments | | WAL Mode | Optional for safe backups, useful for read performance under high load | | Recommended Tool | Restic - battle-tested, simple, Docker-native, sufficient features | | Recommended Solution | Maintenance Window Hybrid - container + host crontab | | Sidecar Pattern | Best for single-server deployments with few services | ## Lessons Learned (Implementation Concerns) Key pain points discovered during POC that affect future implementation: | Pain Point | Severity | Notes | |------------|----------|-------| | Template conditionals for DB type | Medium | Docker Compose env vars differ for MySQL vs SQLite | | Path translation (host/container) | Medium | Multiple representations of same path | | SSH agent key selection | Low | Use `IdentitiesOnly=yes` | | Container exits in single mode | Low | Expected behavior, just surprising | | Log rotation missing | Low | Easy to add, often forgotten | | Backup verification missing | Medium | Important for production | ## Related Issues - Closes #310 (Research Database Backup Strategies) - Part of Epic #309 - Created on torrust-demo repo: - Issue #85: Use `.backup` instead of `cp` - Issue #86: Evaluate WAL mode for high-traffic scenario ## Checklist ### Research Complete - [x] SQLite backup approaches documented - [x] SQLite large database findings (17GB test) - [x] MySQL backup approaches documented - [x] WAL mode analysis with checkpointing behavior - [x] Backup verification and restore procedures - [x] Torrust Live Demo analysis - [x] Container backup architectures (5 patterns) - [x] Backup tools evaluation (Restic, Kopia, Rustic) - [x] Solution comparison (4 approaches) - [x] Recommended solution documented ### POC Complete - [x] Multi-stage Dockerfile with MySQL and SQLite support - [x] 58 bats-core unit tests (all passing) - [x] MySQL backup/restore validated - [x] SQLite backup/restore validated - [x] Config file backup - [x] Retention policy (delete expired backups) - [x] Single mode (run once, exit) - [x] Continuous mode (loop with interval) - [x] Host orchestration script - [x] Crontab configurations (production + test) - [x] Docker Compose examples (MySQL + SQLite) - [x] Lessons learned document - [x] Issue spec progress updated (all tasks complete) ### Future Work (out of scope for this PR) - [ ] Implement backup command in deployer - [ ] Off-site transfer automation (S3, Backblaze B2) - [ ] Backup encryption - [ ] Backup verification command ## Documentation Structure ``` docs/research/backup-strategies/ ├── README.md # Overview and navigation ├── conclusions.md # Key findings and recommendations ├── requirements.md # Design preferences ├── architectures/ │ └── container-patterns.md # 5 architecture patterns ├── databases/ │ ├── mysql/ │ │ ├── README.md │ │ └── backup-approaches.md │ └── sqlite/ │ ├── README.md │ ├── backup-approaches.md │ ├── large-database-backup.md # Critical 17GB findings │ └── torrust-live-demo/ │ ├── README.md │ ├── current-implementation.md │ └── proposed-improvements.md ├── tools/ │ ├── README.md # Tools overview │ ├── restic.md # Detailed Restic evaluation │ └── restic-vs-kopia.md # Comparison document └── solutions/ ├── README.md # Solution comparison (NEW) ├── sidecar-container/ # Original sidecar POC └── maintenance-window/ # Recommended solution (NEW) ├── README.md # Architecture and workflow ├── implementation-recommendations.md # Lessons learned └── artifacts/ ├── backup-container/ │ ├── Dockerfile │ ├── backup.sh │ └── backup_test.bats # 58 tests ├── docker-compose-with-backup-mysql.yml ├── docker-compose-with-backup-sqlite.yml ├── maintenance-backup.sh ├── maintenance-backup.cron └── maintenance-backup-test.cron ``` ACKs for top commit: josecelano: ACK 1168780 Tree-SHA512: 90338487494b44eefe56fc6943497a2caa2715e9e459a24d36f287d5ff50938d48ce33c9d2dfb3eef08929dfb79ec80fbf73b228c570ee16b0e26d2034939c98
2 parents 1b112c9 + 1168780 commit a09067c

53 files changed

Lines changed: 11249 additions & 74 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/issues/310-research-database-backup-strategies.md

Lines changed: 68 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -12,33 +12,33 @@ Research and document backup strategies for SQLite databases, MySQL databases, a
1212

1313
### SQLite Backup
1414

15-
- [ ] Learn how to backup SQLite files safely while used in production (no locks, safe copying)
16-
- [ ] Research tools and techniques for copying and compressing SQLite backups
17-
- [ ] Investigate redundancy strategies for SQLite backups (cloud volumes, S3, backup services, snapshots)
18-
- [ ] Document current Torrust Live Demo SQLite backup implementation
15+
- [x] Learn how to backup SQLite files safely while used in production (no locks, safe copying)
16+
- [x] Research tools and techniques for copying and compressing SQLite backups
17+
- [x] Investigate redundancy strategies for SQLite backups (cloud volumes, S3, backup services, snapshots)
18+
- [x] Document current Torrust Live Demo SQLite backup implementation
1919

2020
### MySQL Backup
2121

22-
- [ ] Research MySQL backup approaches for containerized deployments
23-
- [ ] Learn about MySQL-specific backup tools (mysqldump, hot backup, volume snapshots)
24-
- [ ] Investigate compression and redundancy strategies for MySQL backups
22+
- [x] Research MySQL backup approaches for containerized deployments
23+
- [x] Learn about MySQL-specific backup tools (mysqldump, hot backup, volume snapshots)
24+
- [x] Investigate compression and redundancy strategies for MySQL backups
2525

2626
### Complete Storage Folder Backup
2727

28-
- [ ] Research approaches for backing up the entire deployment storage folder
29-
- [ ] Learn about tools for full directory backups (tar, rsync, volume snapshots)
30-
- [ ] Understand trade-offs between full storage backup and selective approaches
28+
- [x] Research approaches for backing up the entire deployment storage folder
29+
- [x] Learn about tools for full directory backups (tar, rsync, volume snapshots)
30+
- [x] Understand trade-offs between full storage backup and selective approaches
3131

3232
### Selective Files Backup
3333

34-
- [ ] Identify which configuration files and directories need backup
35-
- [ ] Research strategies for backing up specific files (docker-compose, tracker config, etc.)
36-
- [ ] Learn about version control and organization for selective backups
34+
- [x] Identify which configuration files and directories need backup
35+
- [x] Research strategies for backing up specific files (docker-compose, tracker config, etc.)
36+
- [x] Learn about version control and organization for selective backups
3737

3838
### General Research
3939

40-
- [ ] Explore different backup scope strategies and their trade-offs
41-
- [ ] Document all findings in `docs/research/backup-strategies/` (to be created during research)
40+
- [x] Explore different backup scope strategies and their trade-offs
41+
- [x] Document all findings in `docs/research/backup-strategies/` (to be created during research)
4242

4343
## 🏗️ Architecture Requirements
4444

@@ -221,87 +221,87 @@ Research different approaches to defining backup scope:
221221

222222
### Phase 1: SQLite Research (estimated 4-6 hours)
223223

224-
- [ ] Read SQLite backup documentation
225-
- [ ] Research safe file copy approaches while database is in use
226-
- [ ] Investigate SQLite locking mechanisms and WAL mode
227-
- [ ] Research compression tools and techniques
228-
- [ ] Learn about cloud volume attachment and snapshot strategies
229-
- [ ] Study S3 and backup service integration options
230-
- [ ] Analyze Torrust Live Demo backup script implementation
231-
- [ ] Document all findings in `docs/research/backup-strategies/sqlite-backup-strategies.md` (create folder and file)
224+
- [x] Read SQLite backup documentation
225+
- [x] Research safe file copy approaches while database is in use
226+
- [x] Investigate SQLite locking mechanisms and WAL mode
227+
- [x] Research compression tools and techniques
228+
- [x] Learn about cloud volume attachment and snapshot strategies
229+
- [x] Study S3 and backup service integration options
230+
- [x] Analyze Torrust Live Demo backup script implementation
231+
- [x] Document all findings in `docs/research/backup-strategies/sqlite-backup-strategies.md` (create folder and file)
232232

233233
### Phase 2: MySQL Research (estimated 4-6 hours)
234234

235-
- [ ] Read MySQL backup documentation
236-
- [ ] Research `mysqldump` usage and locking behavior
237-
- [ ] Investigate physical backup tools (Percona XtraBackup)
238-
- [ ] Learn about Docker volume backup strategies
239-
- [ ] Research compression techniques for MySQL dumps
240-
- [ ] Study cloud redundancy options for MySQL backups
241-
- [ ] Test basic mysqldump in Docker container (optional hands-on)
242-
- [ ] Document all findings in `docs/research/backup-strategies/mysql-backup-strategies.md`
235+
- [x] Read MySQL backup documentation
236+
- [x] Research `mysqldump` usage and locking behavior
237+
- [x] Investigate physical backup tools (Percona XtraBackup)
238+
- [x] Learn about Docker volume backup strategies
239+
- [x] Research compression techniques for MySQL dumps
240+
- [x] Study cloud redundancy options for MySQL backups
241+
- [x] Test basic mysqldump in Docker container (optional hands-on)
242+
- [x] Document all findings in `docs/research/backup-strategies/mysql-backup-strategies.md`
243243

244244
### Phase 3: Configuration Research (estimated 2-3 hours)
245245

246-
- [ ] Identify all configuration files and directories
247-
- [ ] Research file copy and archive tools (`tar`, `rsync`)
248-
- [ ] Learn about compression options and trade-offs
249-
- [ ] Study configuration storage strategies
250-
- [ ] Research version control for config backups
251-
- [ ] Document all findings in `docs/research/backup-strategies/configuration-backup-strategies.md`
246+
- [x] Identify all configuration files and directories
247+
- [x] Research file copy and archive tools (`tar`, `rsync`)
248+
- [x] Learn about compression options and trade-offs
249+
- [x] Study configuration storage strategies
250+
- [x] Research version control for config backups
251+
- [x] Document all findings in `docs/research/backup-strategies/configuration-backup-strategies.md`
252252

253253
### Phase 4: Backup Scope Strategies (estimated 2-3 hours)
254254

255-
- [ ] Research full storage backup approaches
256-
- [ ] Compare database-only backup patterns
257-
- [ ] Study selective backup strategies
258-
- [ ] Learn about layered backup approaches
259-
- [ ] Document trade-offs for each strategy
260-
- [ ] Document all findings in `docs/research/backup-strategies/backup-scope-strategies.md`
255+
- [x] Research full storage backup approaches
256+
- [x] Compare database-only backup patterns
257+
- [x] Study selective backup strategies
258+
- [x] Learn about layered backup approaches
259+
- [x] Document trade-offs for each strategy
260+
- [x] Document all findings in `docs/research/backup-strategies/backup-scope-strategies.md`
261261

262262
### Phase 5: Documentation Review (estimated 1 hour)
263263

264-
- [ ] Review all research documents for completeness
265-
- [ ] Create README in research folder with overview
266-
- [ ] Ensure all research questions are addressed
267-
- [ ] Cross-reference with Torrust Live Demo implementation
268-
- [ ] Run linters and ensure documentation quality
269-
- [ ] Update issue with any follow-up questions or findings
264+
- [x] Review all research documents for completeness
265+
- [x] Create README in research folder with overview
266+
- [x] Ensure all research questions are addressed
267+
- [x] Cross-reference with Torrust Live Demo implementation
268+
- [x] Run linters and ensure documentation quality
269+
- [x] Update issue with any follow-up questions or findings
270270

271271
## Acceptance Criteria
272272

273273
> **Note for Contributors**: These criteria define what the PR reviewer will check. Use this as your pre-review checklist before submitting the PR to minimize back-and-forth iterations.
274274
275275
**Quality Checks**:
276276

277-
- [ ] Pre-commit checks pass: `./scripts/pre-commit.sh`
277+
- [x] Pre-commit checks pass: `./scripts/pre-commit.sh`
278278

279279
**Research Documentation**:
280280

281-
- [ ] SQLite backup approaches documented (safe copying, compression, redundancy)
282-
- [ ] MySQL backup approaches documented (tools, techniques, containerization)
283-
- [ ] Configuration backup approaches documented
284-
- [ ] Backup scope strategies compared
285-
- [ ] Torrust Live Demo implementation analyzed and documented
286-
- [ ] All research questions addressed with sufficient detail
287-
- [ ] Cloud redundancy strategies documented (volumes, S3, snapshots)
288-
- [ ] Compression techniques compared
281+
- [x] SQLite backup approaches documented (safe copying, compression, redundancy)
282+
- [x] MySQL backup approaches documented (tools, techniques, containerization)
283+
- [x] Configuration backup approaches documented
284+
- [x] Backup scope strategies compared
285+
- [x] Torrust Live Demo implementation analyzed and documented
286+
- [x] All research questions addressed with sufficient detail
287+
- [x] Cloud redundancy strategies documented (volumes, S3, snapshots)
288+
- [x] Compression techniques compared
289289

290290
**Research Completeness**:
291291

292-
- [ ] All research questions in specifications section answered
293-
- [ ] Tools and techniques identified for each backup type
294-
- [ ] Trade-offs documented for different approaches
295-
- [ ] References to official documentation included
296-
- [ ] Findings organized in `docs/research/backup-strategies/` folder
297-
- [ ] README created in research folder with overview
292+
- [x] All research questions in specifications section answered
293+
- [x] Tools and techniques identified for each backup type
294+
- [x] Trade-offs documented for different approaches
295+
- [x] References to official documentation included
296+
- [x] Findings organized in `docs/research/backup-strategies/` folder
297+
- [x] README created in research folder with overview
298298

299299
**Documentation Quality**:
300300

301-
- [ ] Markdown linting passes (markdownlint)
302-
- [ ] Spell checking passes (cspell)
303-
- [ ] All links valid and properly formatted
304-
- [ ] Code examples properly formatted with syntax highlighting (if any)
301+
- [x] Markdown linting passes (markdownlint)
302+
- [x] Spell checking passes (cspell)
303+
- [x] All links valid and properly formatted
304+
- [x] Code examples properly formatted with syntax highlighting (if any)
305305

306306
## Related Documentation
307307

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Backup Strategies Research
2+
3+
**Issue**: [#310 - Research database backup strategies](https://github.com/torrust/torrust-tracker-deployer/issues/310)
4+
**Parent Epic**: [#309 - Add backup support](https://github.com/torrust/torrust-tracker-deployer/issues/309)
5+
6+
## Overview
7+
8+
This folder contains research documentation for backup strategies in the context of Torrust Tracker deployments. The research covers different backup types, database-specific approaches, and general requirements.
9+
10+
> **📋 See [Conclusions](conclusions.md) for a summary of key findings and recommendations.**
11+
12+
## Backup Types
13+
14+
The research is organized around three backup types:
15+
16+
| Type | Description | Status |
17+
| -------------------- | ----------------------------------------------- | ----------- |
18+
| **Storage Backup** | Complete backup of entire storage (config + DB) | Not started |
19+
| **Database Backup** | Database-specific backup using safe tools | In progress |
20+
| **Selective Backup** | Partial storage backup (e.g., config only) | Not started |
21+
22+
See [requirements.md](requirements.md) for detailed explanation of each type.
23+
24+
## Documents
25+
26+
### General
27+
28+
| Document | Description |
29+
| ------------------------------- | ---------------------------------------------------------------- |
30+
| [Requirements](requirements.md) | Collected requirements, constraints, and backup type definitions |
31+
| [Conclusions](conclusions.md) | Summary of key findings and recommendations |
32+
33+
### Database-Specific
34+
35+
| Folder | Description |
36+
| -------------------------------------- | -------------------------------------------------------------------- |
37+
| [databases/sqlite/](databases/sqlite/) | SQLite backup research (approaches, current implementation analysis) |
38+
| [databases/mysql/](databases/mysql/) | MySQL backup research (mysqldump, hot backups, locking behavior) |
39+
40+
### Architectures
41+
42+
| Document | Description |
43+
| -------------------------------------------------------------------------- | -------------------------------------------- |
44+
| [architectures/container-patterns.md](architectures/container-patterns.md) | Container-based backup architecture patterns |
45+
46+
### Tools
47+
48+
| Document | Description |
49+
| ---------------- | ---------------------------------- |
50+
| [tools/](tools/) | Backup tool research (restic, etc) |
51+
52+
### Solutions
53+
54+
| Folder | Description |
55+
| ------------------------ | ---------------------------------------------------- |
56+
| [solutions/](solutions/) | Proposed backup solutions and architectural patterns |
57+
58+
**⭐ Recommended**: [Maintenance Window Pattern](solutions/maintenance-window/) -
59+
A hybrid approach combining container-based backup with host-level orchestration.
60+
The backup container runs once per day (triggered by crontab), not continuously.
61+
This works for databases of any size and preserves container portability.
62+
63+
**Alternative**: [Sidecar Container Pattern](solutions/sidecar-container/) -
64+
A dedicated backup container that runs continuously. Only practical for small
65+
databases (< 1GB) due to SQLite locking issues under load.
66+
67+
## Research Status
68+
69+
### SQLite
70+
71+
- [x] Document backup approaches
72+
- [x] Analyze Torrust Live Demo implementation
73+
- [x] Investigate journal mode
74+
- [ ] Test in containerized environment
75+
76+
### MySQL
77+
78+
- [x] Research mysqldump approaches
79+
- [x] Research hot backup tools (Percona XtraBackup)
80+
- [x] Document locking behavior (no lock needed for InnoDB)
81+
82+
### General
83+
84+
- [x] Document backup types
85+
- [x] Collect requirements from discussions
86+
- [ ] Research compression strategies
87+
- [ ] Research retention policies
88+
- [ ] Research restore procedures
89+
90+
## Key Decisions Captured
91+
92+
From research discussions:
93+
94+
1. **No data loss acceptable** - Safety is priority over simplicity
95+
2. **User provides backup path** - Deployer is storage-location agnostic
96+
3. **Keep complexity low** - No cloud provider APIs, focus on local backups
97+
4. **Database-aware backups** - Simple `cp` not acceptable for databases

0 commit comments

Comments
 (0)