Skip to content

Commit 59dd7c5

Browse files
chore(drive): address review feedback — quota counter, orphan reap, docs
1 parent 50cd424 commit 59dd7c5

3 files changed

Lines changed: 175 additions & 0 deletions

File tree

README.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,33 @@ Tune `drive_idle_timeout_secs` (default 300) upward if you tunnel long-poll HTTP
336336

337337
> **Security note:** `mhrv-drive-node` is effectively an open TCP relay for whoever has read/write access to the shared Drive folder — anything that can drop a `req-…mux-…bin` file in there can open arbitrary `host:port` connections through the node. Keep the folder narrowly scoped (one OAuth account, no link sharing) and don't run the node on a machine you don't control.
338338
339+
### Onboarding a non-technical user (Android)
340+
341+
Once one device has finished OAuth, you can hand the configured state to another via QR or text — no Cloud Console steps required on the receiving end. In the Drive section: **Share Drive setup****Show QR + payload** → copy / send the `mhrv-rs-setup://...` link via WhatsApp / Telegram / SMS. The recipient pastes the link, scans the QR, picks the QR image from their gallery, or just taps the link if their messenger linkifies it. The bundle includes the OAuth refresh token, so they don't run their own consent flow — they share the sharer's Google identity for `drive.file` scope.
342+
343+
Caveat: the **sharer** still needs an unfiltered path to `accounts.google.com` for the initial OAuth dance, since the consent page opens in their system browser. If your network blocks Google Accounts, do the initial OAuth on a different network (mobile data, friend's Wi-Fi) and then share the resulting setup. Recipients aren't bound by this — they get the refresh token via the QR.
344+
345+
When the consent page warns _"Google hasn't verified this app"_, that's expected for personal Cloud projects in **Testing** publication status. Click **Advanced → Go to mhrv-drive (unsafe)** → grant the `drive.file` scope. Same flow as deploying an Apps Script for the existing modes.
346+
347+
### Quota and reachability
348+
349+
Google Drive's free-tier per-user quota is **1,000 requests per 100 seconds**. Default `drive_poll_ms = 100` plus `drive_flush_ms = 100` is comfortably below that even under heavy traffic, but if you turn polling down further or run a single OAuth identity across many devices you can blow it. The Rust side logs a `WARN` at 80% and `ERROR`s past 100% — watch for `Drive API rate climbing` in the logs. Bump `drive_poll_ms` / `drive_flush_ms` if you see them.
350+
351+
Before deploying, sanity-check that your network can actually reach Drive's edge IPs. The most informative test (from the host that will run `mhrv-drive-node` or the client):
352+
353+
```bash
354+
curl --resolve www.googleapis.com:443:216.239.38.120 \
355+
-I https://www.googleapis.com/drive/v3/files
356+
```
357+
358+
A 401 response (no auth) is success — it means TCP reached Google and the TLS handshake completed. A connect timeout, RST, or TLS error means the same DPI / RST-injection path that affects the Apps Script outbound also hits Drive's API endpoint, and this mode won't work better than the existing Apps Script ones on that network.
359+
360+
### Garbage collection
361+
362+
Both sides reap their own files via `cleanup_loop` (every 5 s, deletes own files older than `OLD_FILE_TTL = 60 s` using Drive's `createdTime` so cross-machine clock skew can't false-positive). The poll path also auto-deletes peer files older than `STARTUP_STALE_TTL = 5 min` that look like leftovers from a previous run, plus reaps orphan response files for our own client ID at the same TTL — covers the edge case where `mhrv-drive-node` dies mid-batch and can't run its own cleanup.
363+
364+
If you ever notice `MHRV-Drive` accumulating files past these windows, check the Live logs / Docker logs on both sides for poll errors that prevent the cleanup loop from firing.
365+
339366
## Running on OpenWRT (or any musl distro)
340367

341368
The `*-linux-musl-*` archives ship a fully static CLI that runs on OpenWRT, Alpine, and any libc-less Linux userland. Put the binary on the router and start it as a service:

src/drive_tunnel.rs

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -646,6 +646,42 @@ impl DriveEngine {
646646
}
647647
}
648648
}
649+
650+
// Reap orphan peer files. Normal flow has each side
651+
// deleting its own files via `cleanup_loop` above plus the
652+
// `processed`-then-delete path in `poll_once`. The edge
653+
// case is the peer dying mid-batch: a `res-*` file it
654+
// wrote remains in the folder, the dead node can't run
655+
// its own cleanup, and our own cleanup above only
656+
// touches files matching our `my_dir` prefix. Without
657+
// the block below, those orphans accumulate forever.
658+
//
659+
// Scoped to `<peer_dir>-<my_client_id>-mux-` so a single
660+
// client sharing a folder with several others doesn't
661+
// touch their in-flight files. Uses STARTUP_STALE_TTL
662+
// (5 min) — much longer than the per-file lifetime in
663+
// normal operation, so this only fires on the orphan
664+
// case; a slow round-trip won't trip it.
665+
let orphan_prefix = format!(
666+
"{}-{}-mux-",
667+
self.peer_dir.as_str(),
668+
self.client_id,
669+
);
670+
if !self.client_id.is_empty() {
671+
if let Ok(orphans) = self.backend.list_query(&orphan_prefix).await {
672+
if let Some(orphan_cutoff) =
673+
SystemTime::now().checked_sub(STARTUP_STALE_TTL)
674+
{
675+
for file in orphans {
676+
if let Some(created) = file.created_time {
677+
if created < orphan_cutoff {
678+
let _ = self.backend.delete(&file.name).await;
679+
}
680+
}
681+
}
682+
}
683+
}
684+
}
649685
}
650686
}
651687
}

src/google_drive.rs

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,18 @@ use std::collections::HashMap;
99
use std::fs;
1010
use std::io::Write;
1111
use std::path::PathBuf;
12+
use std::sync::atomic::{AtomicU64, Ordering};
1213
use std::sync::Arc;
1314
use std::time::{Duration, Instant, SystemTime, UNIX_EPOCH};
1415

16+
/// Google Drive's free-tier quota is 1000 requests / 100 s / user. We
17+
/// surface a warning at 80% and an error past 100% so operators see
18+
/// quota pressure in logs before Drive starts handing back 403s. The
19+
/// per-project ceiling (10k / 100s / project) is shared across all
20+
/// users of an OAuth client and isn't observable from one client.
21+
const DRIVE_QUOTA_PER_USER_100S: u64 = 1000;
22+
const DRIVE_QUOTA_WARN_THRESHOLD: u64 = (DRIVE_QUOTA_PER_USER_100S * 80) / 100;
23+
1524
use bytes::Bytes;
1625
use http_body_util::{BodyExt, Full};
1726
use hyper::client::conn::http2::SendRequest;
@@ -34,6 +43,90 @@ const GOOGLE_API_HOST: &str = "www.googleapis.com";
3443
const DRIVE_SCOPE: &str = "https://www.googleapis.com/auth/drive.file";
3544
const HTTP_TIMEOUT: Duration = Duration::from_secs(60);
3645

46+
/// Lock-free counter of Drive REST calls within a 100-second sliding
47+
/// bucket. The bucket boundary is best-effort — a request right at the
48+
/// 100s mark may land in the previous or next window, which is fine
49+
/// since Google's quota is also approximate. Used purely for logging
50+
/// and the [`QuotaSnapshot`] surface; doesn't gate or rate-limit.
51+
#[derive(Debug)]
52+
pub struct QuotaTracker {
53+
start: Instant,
54+
bucket_start_secs: AtomicU64,
55+
bucket_count: AtomicU64,
56+
total: AtomicU64,
57+
}
58+
59+
impl QuotaTracker {
60+
fn new() -> Self {
61+
Self {
62+
start: Instant::now(),
63+
bucket_start_secs: AtomicU64::new(0),
64+
bucket_count: AtomicU64::new(0),
65+
total: AtomicU64::new(0),
66+
}
67+
}
68+
69+
/// Bump on every Drive REST call. Returns the count for the
70+
/// current 100-second window so callers can decide if the rate
71+
/// looks scary. Logs a warning at 80% of the per-user quota and
72+
/// an error past 100%, throttled to once per 50 calls past the
73+
/// limit so we don't spam the log under sustained overrun.
74+
fn record_call(&self) -> u64 {
75+
let now_secs = self.start.elapsed().as_secs();
76+
let bucket = self.bucket_start_secs.load(Ordering::Relaxed);
77+
if now_secs.saturating_sub(bucket) >= 100 {
78+
// Rolling window: stale bucket → reset. Race-prone in the
79+
// strict sense (two threads can both reset) but the
80+
// off-by-one calls don't matter for a logging counter.
81+
self.bucket_start_secs.store(now_secs, Ordering::Relaxed);
82+
self.bucket_count.store(0, Ordering::Relaxed);
83+
}
84+
let count = self.bucket_count.fetch_add(1, Ordering::Relaxed) + 1;
85+
self.total.fetch_add(1, Ordering::Relaxed);
86+
if count == DRIVE_QUOTA_WARN_THRESHOLD {
87+
tracing::warn!(
88+
"Drive API rate climbing: {}/100s — free-tier limit is {}/100s/user. \
89+
Consider increasing drive_poll_ms / drive_flush_ms to slow down.",
90+
count,
91+
DRIVE_QUOTA_PER_USER_100S,
92+
);
93+
} else if count >= DRIVE_QUOTA_PER_USER_100S && count.is_multiple_of(50) {
94+
tracing::error!(
95+
"Drive API rate {}/100s — exceeded free-tier per-user quota ({}/100s). \
96+
Expect 403/429 responses. Drive returns these with no Retry-After, so \
97+
the caller has to back off itself.",
98+
count,
99+
DRIVE_QUOTA_PER_USER_100S,
100+
);
101+
}
102+
count
103+
}
104+
105+
/// Snapshot of the live counters for UI display. Cheap (atomic
106+
/// loads only, no allocation).
107+
pub fn snapshot(&self) -> QuotaSnapshot {
108+
QuotaSnapshot {
109+
total: self.total.load(Ordering::Relaxed),
110+
current_window: self.bucket_count.load(Ordering::Relaxed),
111+
window_secs: 100,
112+
quota_per_user: DRIVE_QUOTA_PER_USER_100S,
113+
}
114+
}
115+
}
116+
117+
/// Read-only view of the quota counter for UI / status surfaces.
118+
/// `current_window` is the count of API calls in the most recent
119+
/// 100-second bucket; `quota_per_user` is the documented free-tier
120+
/// limit. Workspace / paid Cloud projects get higher ceilings, but
121+
/// without knowing the user's project tier we display the floor.
122+
#[derive(Clone, Copy, Debug, Default, serde::Serialize)]
123+
pub struct QuotaSnapshot {
124+
pub total: u64,
125+
pub current_window: u64,
126+
pub window_secs: u64,
127+
pub quota_per_user: u64,
128+
}
129+
37130
#[derive(Debug, thiserror::Error)]
38131
pub enum DriveError {
39132
#[error("io: {0}")]
@@ -73,6 +166,10 @@ struct GoogleApiClient {
73166
/// The live HTTP/2 sender. `None` until first use, replaced if a
74167
/// request fails because the connection went away.
75168
sender: Mutex<Option<SendRequest<Full<Bytes>>>>,
169+
/// Per-process Drive REST call counter. Wrapped in `Arc` so the
170+
/// outer [`GoogleDriveBackend`] can hand snapshots to the UI
171+
/// without holding a lock on this client.
172+
quota: Arc<QuotaTracker>,
76173
}
77174

78175
struct HttpResponse {
@@ -105,6 +202,7 @@ impl GoogleApiClient {
105202
host_header,
106203
tls_connector: TlsConnector::from(Arc::new(tls_config)),
107204
sender: Mutex::new(None),
205+
quota: Arc::new(QuotaTracker::new()),
108206
}
109207
}
110208

@@ -156,6 +254,12 @@ impl GoogleApiClient {
156254
headers: Vec<(String, String)>,
157255
body: &[u8],
158256
) -> Result<HttpResponse, DriveError> {
257+
// Tick the rate counter on every Drive REST call. This is what
258+
// surfaces "you're about to hit the free-tier quota" warnings
259+
// in the log without any UI work, and what feeds [`quota()`]
260+
// for surfaces that want to display it.
261+
self.quota.record_call();
262+
159263
// Build the request once. The :authority pseudo-header is what
160264
// routes the request inside Google's HTTP/2 frontend; it must be
161265
// the API host even though we're connected via SNI=front_domain.
@@ -776,6 +880,14 @@ impl GoogleDriveBackend {
776880
pub fn credentials_path(&self) -> &PathBuf {
777881
&self.credentials_path
778882
}
883+
884+
/// Snapshot of the Drive API rate counter — total calls since
885+
/// process start plus the count in the most recent 100-second
886+
/// window. Used by stats / status surfaces that want to render a
887+
/// quota meter; cheap (atomic loads only).
888+
pub fn quota_snapshot(&self) -> QuotaSnapshot {
889+
self.api.quota.snapshot()
890+
}
779891
}
780892

781893
fn http_error(resp: HttpResponse) -> DriveError {

0 commit comments

Comments
 (0)