Skip to content

feat(forum): show topics during backup + disk-usage storage stat (#200)#202

Merged
GeiserX merged 2 commits into
mainfrom
fix/issue-200-forum-topics-stats
Jun 25, 2026
Merged

feat(forum): show topics during backup + disk-usage storage stat (#200)#202
GeiserX merged 2 commits into
mainfrom
fix/issue-200-forum-topics-stats

Conversation

@GeiserX

@GeiserX GeiserX commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes #200 — two problems with one root cause: forum/Topics-mode groups showed "0 topics" in the viewer during long media-heavy backups, and the Storage statistic lagged actual disk usage.

Root cause: backup_all() recorded forum topics and recomputed stats only at the very end of a full pass, after all messages + media. A ~190 GiB media backlog meant that end-step never ran, so the topic table stayed empty and the cached stats stayed stale.

Changes

  • Topics up front: _backup_dialog now fetches a forum's topics right after upsert_chat (before the media backfill), so the topic list appears within seconds and message counts fill in as the backup progresses. The end-of-run pass is kept as an idempotent backstop.
  • Pagination + FloodWait: GetForumTopicsRequest now paginates beyond 100 topics (via count/offsets) and each page is wrapped in call_with_flood_retry; deleted (title-less) topics are skipped.
  • Disk-usage storage: the "Storage" stat is computed from on-disk usage (compute_directory_size, du-style, counting the dedup _shared store once) instead of SUM(file_size); calculate_and_store_statistics(storage_path=…) with a fallback to the old SUM when no path is given. Sizes are labeled in binary units (GiB/MiB/TiB).
  • Backup-in-progress indicator: a backup_in_progress metadata flag (set at the start of backup_all, cleared in a finally) is exposed via /api/stats and shown live in the viewer's Backup Statistics panel.
  • Graceful empty-topics UI: when a forum has no topics recorded yet, the viewer shows a context-aware hint and a "View all messages" button instead of a dead-end "No topics found".

Tests

+15 tests covering: early forum fetch, topic pagination (>100), deleted-topic guard, the in-progress flag (cleared in finally even on exception), du sizing vs the DB-SUM fallback, and the new /api/stats field. Full suite: 1892 passing; ruff check . and ruff format --check . clean.

Notes

  • GetForumTopicsRequest is correctly imported from telethon.tl.functions.messages for the pinned Telethon (1.43.2) — verified that functions.channels does not exist there.
  • Deferred to a follow-up: parallelizing the media download path so large archives finish faster.

Closes #200

Summary by CodeRabbit

  • New Features

    • Added a live backup status indicator in the web interface.
    • Added a “view all messages” option for forums with no detected topics.
    • Forum topics are now fetched more reliably during backups, including larger topic lists.
  • Bug Fixes

    • Fixed the “0 topics” display issue in the web viewer during long backups.
    • Improved storage statistics to better match actual disk usage and use correct binary units.

Fixes #200 — forum/Topics groups showed "0 topics" during long media-heavy
backups, and the Storage stat lagged actual disk usage.

- fetch forum topics up front in _backup_dialog (before the media backfill)
  so the topic list appears within seconds instead of only at end-of-run;
  the end-of-run pass remains as an idempotent backstop
- paginate GetForumTopicsRequest beyond 100 topics and wrap each page in
  call_with_flood_retry; skip deleted (title-less) topics
- compute the "Storage" stat from actual on-disk usage (du, counting the
  deduplicated _shared store once) instead of SUM(file_size); label sizes
  with correct binary units (GiB/MiB/TiB)
- expose a backup_in_progress flag via /api/stats and show a live indicator
  in the viewer; offer a "View all messages" link when a forum has no
  topics recorded yet
- tests: early forum fetch, topic pagination, deleted-topic guard,
  backup-in-progress flag (cleared in finally), du sizing, stats field
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@GeiserX, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 28 minutes and 25 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0b79b0dd-e076-495c-8538-4674b831c151

📥 Commits

Reviewing files that changed from the base of the PR and between e390394 and c3e3046.

📒 Files selected for processing (5)
  • src/db/adapter.py
  • src/telegram_backup.py
  • src/web/templates/index.html
  • tests/test_db_adapter.py
  • tests/test_telegram_backup_extended.py
📝 Walkthrough

Walkthrough

Adds on-disk storage-based statistics, exposes a backup-in-progress flag through the API and web UI, updates forum-topic backup to fetch topics earlier and paginate them, and bumps the release to 7.17.0.

Changes

Backup stats, forum flow, and release metadata

Layer / File(s) Summary
On-disk storage statistics
src/message_utils.py, src/db/adapter.py, src/telegram_backup.py, src/web/main.py, src/web/templates/index.html, tests/test_db_adapter.py
A directory-size helper is added, calculate_and_store_statistics can read on-disk media usage from storage_path, stats refresh callers pass the backup path, the UI formats sizes with binary units, and tests cover both size sources.
Backup progress state
src/telegram_backup.py, src/web/main.py, src/web/templates/index.html, tests/test_web_routes.py, tests/test_telegram_backup_extended.py
backup_all() sets and clears backup_in_progress, /api/stats exposes it as a boolean, and the stats UI renders the indicator and loads the field from the API; tests cover the flag lifecycle and endpoint output.
Forum topic backup flow
src/telegram_backup.py, src/web/templates/index.html, tests/test_telegram_backup.py, tests/test_telegram_backup_extended.py
_backup_dialog() fetches forum topics before message iteration, _backup_forum_topics() paginates Telethon forum-topic pages and skips untitled deleted topics, the forum empty state adds a messages view action, and tests cover the new topic-fetching paths.
Release metadata
docs/CHANGELOG.md, pyproject.toml, src/__init__.py
The changelog entry, package version, and module version are updated to 7.17.0.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.81% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly reflects the main forum-backup and storage-stat changes in this PR.
Description check ✅ Passed The description is detailed and covers the change, tests, and notes, though it doesn't follow the exact template format.
Linked Issues check ✅ Passed The changes satisfy #200 by fetching forum topics earlier, paginating topic loads, and showing progress while backups run.
Out of Scope Changes check ✅ Passed The remaining changes are support work for the stated objectives and don't introduce unrelated behavior.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/issue-200-forum-topics-stats

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions

Copy link
Copy Markdown

🐳 Dev images published!

  • drumsergio/telegram-archive:dev
  • drumsergio/telegram-archive-viewer:dev

The dev/test instance will pick up these changes automatically (Portainer GitOps).

To test locally:

docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.42857% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.35%. Comparing base (0c629f5) to head (c3e3046).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/telegram_backup.py 92.95% 5 Missing ⚠️
src/message_utils.py 87.50% 2 Missing ⚠️
src/web/main.py 60.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #202      +/-   ##
==========================================
+ Coverage   92.21%   92.35%   +0.13%     
==========================================
  Files          25       25              
  Lines        7196     7261      +65     
==========================================
+ Hits         6636     6706      +70     
+ Misses        560      555       -5     
Files with missing lines Coverage Δ
src/__init__.py 100.00% <100.00%> (ø)
src/db/adapter.py 88.54% <100.00%> (+0.07%) ⬆️
src/message_utils.py 79.21% <87.50%> (+0.81%) ⬆️
src/web/main.py 86.52% <60.00%> (+0.01%) ⬆️
src/telegram_backup.py 91.79% <92.95%> (+0.83%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/db/adapter.py`:
- Around line 1210-1217: The total media size path in the stats calculation is
blocking the event loop because `compute_directory_size` is synchronous and used
from async flows like `refresh_stats` and `lifespan`. Update the storage-path
branch in `src/db/adapter.py` to offload `compute_directory_size(storage_path)`
to a worker thread using `asyncio` (ensure it is imported), while keeping the
database aggregate branch unchanged. Use the existing stats method around this
block to locate the fix.
- Around line 1210-1217: The Storage total in the media size calculation is
using the broader backup directory instead of the media-only location, which
inflates the result. Update the size path used in the media summary logic inside
the code that computes total media size so it uses the media directory from the
config (or explicitly filters out non-media files like the database file) rather
than the full backup path.

In `@src/telegram_backup.py`:
- Around line 2249-2251: The skip-topic debug log in the topic-handling block
leaks PII by printing topic.id. Update the logging around
self.config.should_skip_topic(...) to avoid mentioning the topic identifier or
title, and instead emit an aggregated count or a generic skip message. Keep the
change localized to the logger.debug call in this section so the behavior stays
the same while removing all per-topic identifiers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 37a1a6a3-7bdc-4bb8-9968-5a015b38d708

📥 Commits

Reviewing files that changed from the base of the PR and between 0c629f5 and e390394.

📒 Files selected for processing (12)
  • docs/CHANGELOG.md
  • pyproject.toml
  • src/__init__.py
  • src/db/adapter.py
  • src/message_utils.py
  • src/telegram_backup.py
  • src/web/main.py
  • src/web/templates/index.html
  • tests/test_db_adapter.py
  • tests/test_telegram_backup.py
  • tests/test_telegram_backup_extended.py
  • tests/test_web_routes.py

Comment thread src/db/adapter.py Outdated
Comment thread src/telegram_backup.py
- run the du walk off the event loop (asyncio.to_thread) and outside the DB
  session so it never stalls the viewer or pins a connection
- fall back to the DB size when the storage path is missing/unmounted
  (du==0 while media exists) instead of caching a spurious 0
- topic pagination: keep partial results on a mid-pagination failure
  (avoid amplifying FloodWait via per-topic inference), anchor the
  message-based offsets on the last topic that has a top_message, and warn
  when the page cap truncates a very large forum
- match the early forum-topic fetch guard to the end-of-run loop
  (isinstance Channel) and wrap the end-of-run fetch in try/except so a
  topic error can't skip folders/stats; drop a chat-name INFO log (PII) and
  scrub topic ids from the skip-topic debug logs
- viewer: use an 'all' sentinel for "View all messages" so the topics
  sidebar and back navigation stay correct; label per-file sizes in KiB/MiB
- tests: offset_date advance via the messages map, seen_count termination
  with skipped topics, count=None, partial-result-on-failure, and the
  missing-storage-path DB fallback
@github-actions

Copy link
Copy Markdown

🐳 Dev images published!

  • drumsergio/telegram-archive:dev
  • drumsergio/telegram-archive-viewer:dev

The dev/test instance will pick up these changes automatically (Portainer GitOps).

To test locally:

docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev

@GeiserX GeiserX merged commit 8757e33 into main Jun 25, 2026
7 of 8 checks passed
@GeiserX GeiserX deleted the fix/issue-200-forum-topics-stats branch June 25, 2026 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Unable to display topic content for groups with "Topics" mode enabled

1 participant