feat(forum): show topics during backup + disk-usage storage stat (#200)#202
Conversation
Fixes #200 — forum/Topics groups showed "0 topics" during long media-heavy backups, and the Storage stat lagged actual disk usage. - fetch forum topics up front in _backup_dialog (before the media backfill) so the topic list appears within seconds instead of only at end-of-run; the end-of-run pass remains as an idempotent backstop - paginate GetForumTopicsRequest beyond 100 topics and wrap each page in call_with_flood_retry; skip deleted (title-less) topics - compute the "Storage" stat from actual on-disk usage (du, counting the deduplicated _shared store once) instead of SUM(file_size); label sizes with correct binary units (GiB/MiB/TiB) - expose a backup_in_progress flag via /api/stats and show a live indicator in the viewer; offer a "View all messages" link when a forum has no topics recorded yet - tests: early forum fetch, topic pagination, deleted-topic guard, backup-in-progress flag (cleared in finally), du sizing, stats field
|
Warning Review limit reached
More reviews will be available in 28 minutes and 25 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughAdds on-disk storage-based statistics, exposes a backup-in-progress flag through the API and web UI, updates forum-topic backup to fetch topics earlier and paginate them, and bumps the release to 7.17.0. ChangesBackup stats, forum flow, and release metadata
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🐳 Dev images published!
The dev/test instance will pick up these changes automatically (Portainer GitOps). To test locally: docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #202 +/- ##
==========================================
+ Coverage 92.21% 92.35% +0.13%
==========================================
Files 25 25
Lines 7196 7261 +65
==========================================
+ Hits 6636 6706 +70
+ Misses 560 555 -5
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/db/adapter.py`:
- Around line 1210-1217: The total media size path in the stats calculation is
blocking the event loop because `compute_directory_size` is synchronous and used
from async flows like `refresh_stats` and `lifespan`. Update the storage-path
branch in `src/db/adapter.py` to offload `compute_directory_size(storage_path)`
to a worker thread using `asyncio` (ensure it is imported), while keeping the
database aggregate branch unchanged. Use the existing stats method around this
block to locate the fix.
- Around line 1210-1217: The Storage total in the media size calculation is
using the broader backup directory instead of the media-only location, which
inflates the result. Update the size path used in the media summary logic inside
the code that computes total media size so it uses the media directory from the
config (or explicitly filters out non-media files like the database file) rather
than the full backup path.
In `@src/telegram_backup.py`:
- Around line 2249-2251: The skip-topic debug log in the topic-handling block
leaks PII by printing topic.id. Update the logging around
self.config.should_skip_topic(...) to avoid mentioning the topic identifier or
title, and instead emit an aggregated count or a generic skip message. Keep the
change localized to the logger.debug call in this section so the behavior stays
the same while removing all per-topic identifiers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 37a1a6a3-7bdc-4bb8-9968-5a015b38d708
📒 Files selected for processing (12)
docs/CHANGELOG.mdpyproject.tomlsrc/__init__.pysrc/db/adapter.pysrc/message_utils.pysrc/telegram_backup.pysrc/web/main.pysrc/web/templates/index.htmltests/test_db_adapter.pytests/test_telegram_backup.pytests/test_telegram_backup_extended.pytests/test_web_routes.py
- run the du walk off the event loop (asyncio.to_thread) and outside the DB session so it never stalls the viewer or pins a connection - fall back to the DB size when the storage path is missing/unmounted (du==0 while media exists) instead of caching a spurious 0 - topic pagination: keep partial results on a mid-pagination failure (avoid amplifying FloodWait via per-topic inference), anchor the message-based offsets on the last topic that has a top_message, and warn when the page cap truncates a very large forum - match the early forum-topic fetch guard to the end-of-run loop (isinstance Channel) and wrap the end-of-run fetch in try/except so a topic error can't skip folders/stats; drop a chat-name INFO log (PII) and scrub topic ids from the skip-topic debug logs - viewer: use an 'all' sentinel for "View all messages" so the topics sidebar and back navigation stay correct; label per-file sizes in KiB/MiB - tests: offset_date advance via the messages map, seen_count termination with skipped topics, count=None, partial-result-on-failure, and the missing-storage-path DB fallback
|
🐳 Dev images published!
The dev/test instance will pick up these changes automatically (Portainer GitOps). To test locally: docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev |
Summary
Fixes #200 — two problems with one root cause: forum/Topics-mode groups showed "0 topics" in the viewer during long media-heavy backups, and the Storage statistic lagged actual disk usage.
Root cause:
backup_all()recorded forum topics and recomputed stats only at the very end of a full pass, after all messages + media. A ~190 GiB media backlog meant that end-step never ran, so the topic table stayed empty and the cached stats stayed stale.Changes
_backup_dialognow fetches a forum's topics right afterupsert_chat(before the media backfill), so the topic list appears within seconds and message counts fill in as the backup progresses. The end-of-run pass is kept as an idempotent backstop.GetForumTopicsRequestnow paginates beyond 100 topics (viacount/offsets) and each page is wrapped incall_with_flood_retry; deleted (title-less) topics are skipped.compute_directory_size, du-style, counting the dedup_sharedstore once) instead ofSUM(file_size);calculate_and_store_statistics(storage_path=…)with a fallback to the old SUM when no path is given. Sizes are labeled in binary units (GiB/MiB/TiB).backup_in_progressmetadata flag (set at the start ofbackup_all, cleared in afinally) is exposed via/api/statsand shown live in the viewer's Backup Statistics panel.Tests
+15 tests covering: early forum fetch, topic pagination (>100), deleted-topic guard, the in-progress flag (cleared in
finallyeven on exception), du sizing vs the DB-SUM fallback, and the new/api/statsfield. Full suite: 1892 passing;ruff check .andruff format --check .clean.Notes
GetForumTopicsRequestis correctly imported fromtelethon.tl.functions.messagesfor the pinned Telethon (1.43.2) — verified thatfunctions.channelsdoes not exist there.Closes #200
Summary by CodeRabbit
New Features
Bug Fixes