Skip to content

Deploy#23

Draft
amrit110 wants to merge 4 commits intomainfrom
deploy
Draft

Deploy#23
amrit110 wants to merge 4 commits intomainfrom
deploy

Conversation

@amrit110
Copy link
Copy Markdown
Member

Summary

Clickup Ticket(s): Link(s) if applicable.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📝 Documentation update
  • 🔧 Refactoring (no functional changes)
  • ⚡ Performance improvement
  • 🧪 Test improvements
  • 🔒 Security fix

Changes Made

Testing

  • Tests pass locally (uv run pytest tests/)
  • Type checking passes (uv run mypy <src_dir>)
  • Linting passes (uv run ruff check src_dir/)
  • Manual testing performed (describe below)

Manual testing details:

Screenshots/Recordings

Related Issues

Deployment Notes

Checklist

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Documentation updated (if applicable)
  • No sensitive information (API keys, credentials) exposed

- Bump uv from 0.9.11 → 0.10.12 in all CI workflows to match local version
- Add index-strategy = "unsafe-best-match" to [tool.uv] so uv finds the
  correct torch/flash-attn variants across multiple indexes
- Regenerate uv.lock with uv 0.10.12 for a consistent cross-platform lock

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
amrit110 and others added 3 commits March 20, 2026 19:50
nginx answers the TCP probe in <1s, moving the container into
"serving" mode where Cloud Run throttles CPU between requests.
The background Python import process (torch/transformers) then
starves for CPU between 5s health check intervals, pushing
startup from ~30s to 100s+.

Add --no-cpu-throttling so the background process always gets
CPU during cold start.

Also fix a pre-existing health check bug: curl -sf writes the
HTTP code ("502") to stdout and then exits non-zero on 4xx/5xx,
causing || echo "000" to append "000" — HTTP becomes "502000"
which never matches "200". Switch to curl -s (no fail-on-error)
so the status code is captured cleanly. Bump retries 10→20 for
a 100s window as belt-and-suspenders.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rror

When the user submits text immediately after a cold start, nginx is up
but uvicorn hasn't finished loading yet (~30s). Previously this showed
a confusing browser alert("Analysis failed: Server error (502)").

Now:
- 502/503 with no tokens received → show "Server is starting up —
  retrying in 5s..." with a live countdown, auto-retry up to 10×
  (50s window covers uvicorn's ~30s startup)
- Other errors → show a styled inline error banner (magenta, fades
  after 8s) instead of a blocking alert() popup
- Tag the http status on thrown errors so detection doesn't rely on
  parsing the message string

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add slowapi>=0.1.9 dependency
- Add RATE_LIMIT env var (default 10/minute, configurable at deploy time)
- Add _get_client_ip helper respecting Cloud Run X-Forwarded-For header
- Apply @limiter.limit to /analyze and /analyze/stream endpoints
- Fix torch/torchvision/torchaudio uv sources to use platform markers
  so macOS falls back to PyPI CPU wheels instead of failing on CUDA index
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant