Tailor improvements v2#723
Conversation
…orkflow ## Partial change acceptance (diff preview) - Users can now toggle individual AI suggestions on/off in the diff preview modal - Added "Select all / Deselect all" controls and accepted/total count indicator - New `onConfirmPartial(acceptedIndices)` callback on DiffPreviewModal - Backend `/resumes/improve/confirm` accepts `partial_confirm` flag and filters improvements list to only accepted changes ## Job description from URL - New `POST /jobs/fetch-url` endpoint extracts job description from a public URL - Frontend tailor page gains a tab toggle: "Paste Text" / "From URL" - URL input with Fetch button; on success populates textarea and switches to text mode - New schemas: `FetchJobUrlRequest`, `FetchJobUrlResponse` - New i18n keys in all 5 languages (en, es, zh, ja, pt-BR): `inputMode`, `urlInput`, error messages ## Playwright-based extraction for JS-rendered portals - Rule-based `_SmartExtractor` (httpx) for static HTML sites - `SITE_MAP` with site-specific CSS selectors for known portals: - theprotocol.it → `[id='section-offerView']` - nofluffjobs.com → `[id='posting-description']` - pracuj.pl, justjoin.it, linkedin.com, indeed.com also configured - Playwright runs in `ThreadPoolExecutor` (sync API) — avoids Windows asyncio subprocess limitation with uvicorn SelectorEventLoop - All selectors are evaluated, longest match wins (prevents partial extraction) ## Granular LLM workflow - New `improve_resume_granular()`: two focused LLM calls instead of one large call - Call A: summary + technicalSkills only (IMPROVE_RESUME_PROMPT_SKILLS) - Call B: workExperience bullet descriptions only (IMPROVE_RESUME_PROMPT_EXPERIENCE) - New prompt templates and JSON schema examples in `prompts/templates.py` - Frontend workflow selector: "Standard" vs "Granular" mode passed as `workflow_mode` - Reduces hallucination risk by keeping each prompt small and focused ## Factual field protection - `_restore_factual_fields()`: after any LLM improvement, institution names, degree titles, employment dates, company names and job titles are always restored from the original resume — LLM cannot alter factual metadata Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| return None | ||
|
|
||
|
|
||
| def _is_private_host(hostname: str) -> bool: |
There was a problem hiding this comment.
[WARNING]: SSRF check can be bypassed with DNS rebinding attacks
The _is_private_host function resolves the hostname once at validation time, but a malicious DNS server could return a public IP initially and then rebind to a private IP for the actual request. This is a Time-Of-Check-To-Time-Of-Use (TOCTOU) vulnerability.
Consider:
- Using httpx's built-in
limitswithmax_redirects=0and validating after connection - Resolving DNS yourself and passing the IP directly to httpx
- Using a library designed for SSRF prevention
| raise HTTPException(status_code=400, detail=f"Unsupported provider: {provider}") | ||
|
|
||
| stored = _load_config() | ||
| api_key: str = stored.get("api_key", "") or settings.llm_api_key or "" |
There was a problem hiding this comment.
[WARNING]: API key may be empty or missing when fetching provider models
If stored.get("api_key") is empty string and settings.llm_api_key is also empty, the code will send requests with an empty Bearer token or query param. This should explicitly fail fast with a clear error message.
if provider != "anthropic" and not api_key:
raise HTTPException(
status_code=400,
detail=f"API key required to fetch models for {provider}. Please configure your API key first."
)| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| _playwright_executor = ThreadPoolExecutor(max_workers=2, thread_name_prefix="playwright") |
There was a problem hiding this comment.
[SUGGESTION]: ThreadPoolExecutor may not be properly cleaned up
While atexit.register is used, this doesn't guarantee cleanup in all cases (e.g., SIGKILL, exceptions during shutdown). Consider using a context manager pattern or FastAPI's lifespan events for proper resource management.
Code Review SummaryStatus: 1 New Issue | 2 Issues Carried Forward | Recommendation: Address before merge Overview
Changes Since Last Review (abb24a6...83ebf97)
New Issues (this review)WARNING
Carried Forward Issues (unchanged code)WARNING
SUGGESTION
Previous Fixes Verified (from earlier commits)
Files Reviewed in Incremental Diff (1 file)
Fix these issues in Kilo Cloud Reviewed by claude-opus-4.5 · 259,264 tokens |
There was a problem hiding this comment.
18 issues found across 23 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/backend/app/routers/jobs.py">
<violation number="1" location="apps/backend/app/routers/jobs.py:138">
P1: SSRF guard only checks a single IPv4 resolution via gethostbyname, so hosts with multiple A/AAAA records (including private/IPv6) can bypass the check if the resolved address differs from the one validated.</violation>
<violation number="2" location="apps/backend/app/routers/jobs.py:157">
P2: The multiline noise regex strips any leading number on every line, which will remove legitimate numeric prefixes in job descriptions (e.g., numbered requirements or "5+ years"). This degrades extracted content.</violation>
<violation number="3" location="apps/backend/app/routers/jobs.py:576">
P0: Custom agent: **Flag Security Vulnerabilities**
SSRF bypass via open redirect: the private-host check only validates the *initial* URL, but `follow_redirects=True` lets httpx follow a 302 to an internal/cloud-metadata IP. An attacker's public URL can redirect to `http://169.254.169.254/...` to exfiltrate cloud credentials.
Either disable automatic redirects or add an httpx `event_hook` on `response` that re-validates each redirect target against `_is_private_host` before following it.</violation>
</file>
<file name="apps/backend/app/routers/resumes.py">
<violation number="1" location="apps/backend/app/routers/resumes.py:234">
P2: Index-based restoration of education/workExperience factual fields can corrupt data if the LLM reorders entries; entries should be matched by id (or another stable key) instead of list position.</violation>
<violation number="2" location="apps/backend/app/routers/resumes.py:610">
P2: Immutable factual fields are restored before `refine_resume`, but refinement can mutate those fields (LLM keyword injection and recursive phrase removal). Since the refined output overwrites `improved_data` without a post-refinement restore, the final output can drift from the original factual data.</violation>
<violation number="3" location="apps/backend/app/routers/resumes.py:799">
P2: Custom agent: **Flag Security Vulnerabilities**
Client-controlled `partial_confirm` flag bypasses the server-side hash integrity check that ensures confirmed data matches a prior preview. Any client can set `partial_confirm: true` to submit arbitrary `improved_data` that was never generated or validated by the server, undermining the preview/confirm integrity flow. Consider enforcing the hash check server-side unconditionally, or storing partial edits server-side and validating them against the original preview.</violation>
<violation number="4" location="apps/backend/app/routers/resumes.py:1286">
P2: DB update failures are silently swallowed in retry failure path, allowing API to return `processing_status='failed'` even when that status was not persisted.</violation>
</file>
<file name="apps/backend/app/routers/health.py">
<violation number="1" location="apps/backend/app/routers/health.py:24">
P2: LLM health cache is global and not keyed by config, so `/status` can return health results for a previous provider/model/api_key when config changes within the TTL.</violation>
<violation number="2" location="apps/backend/app/routers/health.py:32">
P2: Cache timestamp is recorded before the long health check and then stored unchanged, so slow/timeout checks write an already-expired cache entry and force repeated expensive probes.</violation>
</file>
<file name="apps/frontend/components/ui/dropdown.tsx">
<violation number="1" location="apps/frontend/components/ui/dropdown.tsx:46">
P2: Filtering is enabled with `search.trim()` but matching uses the untrimmed `search`, so leading/trailing spaces can hide valid results.</violation>
<violation number="2" location="apps/frontend/components/ui/dropdown.tsx:130">
P2: New searchable dropdown UI strings are hardcoded in English instead of using the existing translation system.</violation>
</file>
<file name="apps/backend/app/schemas/models.py">
<violation number="1" location="apps/backend/app/schemas/models.py:387">
P2: customSections pre-validator drops valid CustomSection instances because it only preserves dict values, causing silent data loss when models are passed in before coercion.</violation>
</file>
<file name="apps/frontend/components/tailor/diff-preview-modal.tsx">
<violation number="1" location="apps/frontend/components/tailor/diff-preview-modal.tsx:40">
P2: acceptedIndices is initialized from detailedChanges only on first render and never resynced, so reopening the modal or receiving new detailedChanges can leave stale selections and submit indices from a previous diff.</violation>
</file>
<file name="apps/frontend/app/(default)/settings/page.tsx">
<violation number="1" location="apps/frontend/app/(default)/settings/page.tsx:305">
P2: Async model fetch results can overwrite the model list after the user switches providers, because the response is applied without verifying the provider is still current or canceling in‑flight requests.</violation>
<violation number="2" location="apps/frontend/app/(default)/settings/page.tsx:785">
P2: User-facing text for the new model fetch UI is hardcoded instead of using the page’s translation system, causing inconsistent localization.</violation>
</file>
<file name="apps/backend/app/llm.py">
<violation number="1" location="apps/backend/app/llm.py:373">
P2: Ollama health-check fallback still hardcodes localhost, bypassing the Windows loopback normalization when api_base is unset.</violation>
</file>
<file name="apps/backend/app/routers/config.py">
<violation number="1" location="apps/backend/app/routers/config.py:255">
P2: `/provider-models` ignores provider-specific API keys stored in `config["api_keys"]` and always uses the legacy global `api_key`. This breaks multi-provider setups and causes wrong-auth/401s when switching providers.</violation>
</file>
<file name="apps/backend/app/services/improver.py">
<violation number="1" location="apps/backend/app/services/improver.py:282">
P2: Call B merges workExperience descriptions by list index even though each entry has an id. If the LLM reorders or inserts entries, descriptions can be written onto the wrong experience record, corrupting resume content.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
24 issues found across 100 files (changes from recent commits).
Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name=".claude/skills/brainstorming/scripts/helper.js">
<violation number="1" location=".claude/skills/brainstorming/scripts/helper.js:2">
P2: Custom agent: **Flag Security Vulnerabilities**
WebSocket URL falls back to insecure `ws://` when not on HTTPS, violating the requirement that network requests use HTTPS/TLS.</violation>
</file>
<file name="apps/frontend/app/(default)/tailor/page.tsx">
<violation number="1" location="apps/frontend/app/(default)/tailor/page.tsx:156">
P2: Custom agent: **React Performance and Best Practices**
Unused `elapsed` state is updated every second, forcing unnecessary re-renders during generate/regenerate without affecting UI or logic.</violation>
</file>
<file name="apps/backend/app/services/parser.py">
<violation number="1" location="apps/backend/app/services/parser.py:61">
P2: Month restoration fails for ongoing ranges like "2021 - Present" because the lookup key only uses numeric years. When markdown has "May 2021 - Present", the key becomes "2021", so parsed entries with "2021 - Present" never match and months are not restored for current roles.</violation>
</file>
<file name="apps/frontend/messages/pt-BR.json">
<violation number="1" location="apps/frontend/messages/pt-BR.json:331">
P2: Date placeholder uses Portuguese month abbreviations, but resume date strings must use English month abbreviations ('Mon YYYY') for downstream parsing.</violation>
</file>
<file name="apps/frontend/messages/ja.json">
<violation number="1" location="apps/frontend/messages/ja.json:297">
P2: Japanese date placeholders now use numeric month text (e.g., "2020年1月 - 現在"), but the app requires English month abbreviations in "Mon YYYY" format; this placeholder will drive invalid input for parsers/normalizers.</violation>
<violation number="2" location="apps/frontend/messages/ja.json:331">
P2: Education placeholder uses Japanese numeric month formatting instead of the required English month abbreviations (Mon YYYY), which can lead to incompatible date parsing.</violation>
<violation number="3" location="apps/frontend/messages/ja.json:346">
P2: Project date placeholder uses a Japanese numeric month format, but date strings must use English month abbreviations ('Mon YYYY') for parser compatibility.</violation>
</file>
<file name="apps/frontend/lib/api/client.ts">
<violation number="1" location="apps/frontend/lib/api/client.ts:65">
P2: apiFetch always overrides any caller-provided AbortSignal, so external cancellation no longer works for callers that pass `options.signal`.</violation>
</file>
<file name="apps/backend/app/config_cache.py">
<violation number="1" location="apps/backend/app/config_cache.py:45">
P2: Cache TTL is bypassed when the cached config is an empty dict because the cache hit check relies on `_config_cache` truthiness. This causes repeated disk reads/logging even within the TTL when config.json is empty or missing.</violation>
</file>
<file name=".claude/skills/brainstorming/scripts/start-server.sh">
<violation number="1" location=".claude/skills/brainstorming/scripts/start-server.sh:125">
P2: Fixed 5-second readiness timeout can report failure while leaving a detached server process running.</violation>
</file>
<file name="apps/frontend/messages/es.json">
<violation number="1" location="apps/frontend/messages/es.json:297">
P2: Resume date placeholders now use Spanish month abbreviations (e.g., "Ene", "Ago"), but the parser expects English 'Mon YYYY' format only. Users following these placeholders can enter dates that won’t parse correctly.</violation>
</file>
<file name=".claude/skills/brainstorming/scripts/stop-server.sh">
<violation number="1" location=".claude/skills/brainstorming/scripts/stop-server.sh:24">
P2: Prefix-only check for /tmp can be bypassed with `..` or symlinks, allowing `rm -rf` to delete paths outside /tmp.</violation>
</file>
<file name="apps/backend/tests/conftest.py">
<violation number="1" location="apps/backend/tests/conftest.py:57">
P2: Education years uses numeric range instead of required 'Mon YYYY' format, which can break date validation/regex expectations for ResumeData fixtures.</violation>
</file>
<file name=".claude/skills/systematic-debugging/find-polluter.sh">
<violation number="1" location=".claude/skills/systematic-debugging/find-polluter.sh:29">
P2: `find -path` won’t match the example glob pattern (`src/**/*.test.ts`) because it matches `./`-prefixed paths and doesn’t support globstar semantics. This can incorrectly report “No test files found” even when tests exist.</violation>
<violation number="2" location=".claude/skills/systematic-debugging/find-polluter.sh:40">
P2: Unquoted iteration over `$TEST_FILES` causes word-splitting, breaking test paths containing whitespace.</violation>
</file>
<file name="apps/frontend/messages/zh.json">
<violation number="1" location="apps/frontend/messages/zh.json:297">
P2: Chinese numeric date placeholders now suggest formats the parser doesn’t recognize; date parsing/normalization only matches English month names/abbreviations, so these placeholders can lead to unparseable input.</violation>
</file>
<file name="apps/backend/app/routers/config.py">
<violation number="1" location="apps/backend/app/routers/config.py:303">
P2: Provider model discovery uses legacy `api_key` instead of provider-specific stored keys, so `/config/provider-models` can call provider APIs with the wrong or empty key.</violation>
</file>
<file name="apps/backend/app/services/refiner.py">
<violation number="1" location="apps/backend/app/services/refiner.py:299">
P2: Word-boundary matching in _keyword_in_text fails for skills with non-word characters (e.g., C++, C#, Node.js). The new alignment check uses this helper to decide whether to flag a skill as fabricated; false negatives will remove legitimate skills.</violation>
</file>
<file name="apps/backend/app/prompts/templates.py">
<violation number="1" location="apps/backend/app/prompts/templates.py:169">
P1: Prompt date rules preserve source month text without enforcing English month abbreviations, conflicting with English-only month parsing logic downstream.</violation>
</file>
<file name=".claude/skills/brainstorming/scripts/server.js">
<violation number="1" location=".claude/skills/brainstorming/scripts/server.js:66">
P1: decodeFrame allocates a buffer based solely on the client-provided payload length without enforcing a maximum frame size, allowing a large frame to exhaust memory and crash the server.</violation>
<violation number="2" location=".claude/skills/brainstorming/scripts/server.js:281">
P2: knownFiles is never updated on deletions, so re-created screen files with the same name are treated as updates and won’t reset .events.</violation>
</file>
<file name="apps/backend/app/routers/resumes.py">
<violation number="1">
P2: Preview hash is computed from raw LLM output while confirm hashes the Pydantic-normalized payload, so defaults/coercions can change the data and cause valid confirmations to be rejected due to hash mismatch.</violation>
<violation number="2" location="apps/backend/app/routers/resumes.py:716">
P2: Factual-field restoration for education/work history was removed; after LLM output only personalInfo is preserved, so company/title/years and education fields can now be mutated and persisted without correction.</violation>
</file>
<file name="apps/backend/app/routers/enrichment.py">
<violation number="1" location="apps/backend/app/routers/enrichment.py:63">
P2: `_extract_item_from_resume` can crash on malformed list elements because it calls `.get()` on `entries[index]` without verifying the entry is a dict.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| if (buffer.length < totalLen) return null; | ||
|
|
||
| const mask = buffer.slice(maskOffset, dataOffset); | ||
| const data = Buffer.alloc(payloadLen); |
There was a problem hiding this comment.
P1: decodeFrame allocates a buffer based solely on the client-provided payload length without enforcing a maximum frame size, allowing a large frame to exhaust memory and crash the server.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .claude/skills/brainstorming/scripts/server.js, line 66:
<comment>decodeFrame allocates a buffer based solely on the client-provided payload length without enforcing a maximum frame size, allowing a large frame to exhaust memory and crash the server.</comment>
<file context>
@@ -0,0 +1,338 @@
+ if (buffer.length < totalLen) return null;
+
+ const mask = buffer.slice(maskOffset, dataOffset);
+ const data = Buffer.alloc(payloadLen);
+ for (let i = 0; i < payloadLen; i++) {
+ data[i] = buffer[dataOffset + i] ^ mask[i % 4];
</file context>
| debounceTimers.delete(filename); | ||
| const filePath = path.join(SCREEN_DIR, filename); | ||
|
|
||
| if (!fs.existsSync(filePath)) return; // file was deleted |
There was a problem hiding this comment.
P2: knownFiles is never updated on deletions, so re-created screen files with the same name are treated as updates and won’t reset .events.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .claude/skills/brainstorming/scripts/server.js, line 281:
<comment>knownFiles is never updated on deletions, so re-created screen files with the same name are treated as updates and won’t reset .events.</comment>
<file context>
@@ -0,0 +1,338 @@
+ debounceTimers.delete(filename);
+ const filePath = path.join(SCREEN_DIR, filename);
+
+ if (!fs.existsSync(filePath)) return; // file was deleted
+ touchActivity();
+
</file context>
# Conflicts: # apps/backend/app/llm.py # apps/backend/app/prompts/templates.py # apps/backend/app/routers/resumes.py # apps/backend/app/services/improver.py # apps/frontend/app/(default)/tailor/page.tsx # apps/frontend/components/tailor/diff-preview-modal.tsx # apps/frontend/next.config.ts
| return f"data: {json.dumps(payload)}\n\n" | ||
|
|
||
|
|
||
| def _load_config() -> dict: |
There was a problem hiding this comment.
[WARNING]: Local function shadows imported alias
This _load_config() function shadows the load_config as _load_config imported from config_cache at line 17. The imported version provides 5-minute TTL caching, but this local version reads directly from disk on every call.
Impact:
- The import at line 17 becomes dead code
_load_feature_config()and_get_content_language()below use this uncached version- Config is read from disk on every request instead of using the cache
Recommended fix: Remove lines 73-94 (local _load_config, _load_feature_config, and _get_content_language functions) and use the imported versions consistently:
# At line 17, change to:
from app.config_cache import get_content_language, load_config
# Then use load_config() and get_content_language() throughoutThere was a problem hiding this comment.
2 issues found across 6 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/backend/app/services/refiner.py">
<violation number="1" location="apps/backend/app/services/refiner.py:55">
P1: ASCII-only boundary checks regress Unicode word-boundary matching and can produce false positives in non-English text.</violation>
</file>
<file name="apps/backend/app/services/parser.py">
<violation number="1" location="apps/backend/app/services/parser.py:70">
P2: Ongoing date restoration is incomplete: only `"YYYY - Present"` is mapped, so other accepted ongoing tokens/casing fail exact lookup and months are not restored.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
2 issues found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/backend/app/routers/health.py">
<violation number="1" location="apps/backend/app/routers/health.py:32">
P2: Global cache lock is held across awaited health check I/O, serializing concurrent `/status` requests and causing head-of-line blocking.</violation>
</file>
<file name="apps/backend/app/routers/config.py">
<violation number="1" location="apps/backend/app/routers/config.py:267">
P1: New API-key guard can incorrectly reject valid provider configs because it checks only legacy `api_key` instead of provider-resolved keys.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/backend/app/routers/health.py">
<violation number="1" location="apps/backend/app/routers/health.py:39">
P2: Cache stampede risk: moving health check outside the lock allows concurrent `/status` requests to trigger duplicate expensive LLM health checks when cache is cold/expired.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| async def _run_health_check(cache_key: str, config) -> dict: # type: ignore[no-untyped-def] | ||
| """Execute a health check, store the result, and remove the in-flight marker.""" | ||
| try: | ||
| result = await asyncio.wait_for(check_llm_health(config), timeout=60.0) |
There was a problem hiding this comment.
[WARNING]: Task cancellation leaves stale entry in _health_inflight
If the task is cancelled (e.g., client disconnect, server shutdown), asyncio.CancelledError propagates from this await, skipping the cleanup in the async with _health_cache_lock block below. The cancelled task remains in _health_inflight, causing all subsequent requests to await a cancelled task that immediately raises CancelledError.
Consider handling CancelledError explicitly to clean up _health_inflight:
async def _run_health_check(cache_key: str, config) -> dict:
try:
result = await asyncio.wait_for(check_llm_health(config), timeout=60.0)
except asyncio.CancelledError:
async with _health_cache_lock:
_health_inflight.pop(cache_key, None)
raise
except Exception as e:
logger.warning("LLM health check failed: %s", e)
result = {"healthy": False, "error": "health check timed out or failed"}
async with _health_cache_lock:
_health_cache[cache_key] = {"result": result, "ts": time.monotonic()}
_health_inflight.pop(cache_key, None)
return resultThere was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/backend/app/routers/health.py">
<violation number="1" location="apps/backend/app/routers/health.py:56">
P1: Shared in-flight health task can be cancelled by a requester and left poisoned in `_health_inflight`, causing repeated `/status` failures.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| if task is None: | ||
| task = asyncio.ensure_future(_run_health_check(cache_key, config)) | ||
| _health_inflight[cache_key] = task | ||
| return await task |
There was a problem hiding this comment.
P1: Shared in-flight health task can be cancelled by a requester and left poisoned in _health_inflight, causing repeated /status failures.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/backend/app/routers/health.py, line 56:
<comment>Shared in-flight health task can be cancelled by a requester and left poisoned in `_health_inflight`, causing repeated `/status` failures.</comment>
<file context>
@@ -16,33 +16,44 @@
+ if task is None:
+ task = asyncio.ensure_future(_run_health_check(cache_key, config))
+ _health_inflight[cache_key] = task
+ return await task
</file context>
| return await task | |
| return await asyncio.shield(task) |
Code Review Report — PR #723: Tailor Improvements v2Reviewer: @srbhr (via automated code review) Summary
CRITICAL (must fix before merge)C1. SSRF bypass via IPv6 —
|
| # | File | Issue |
|---|---|---|
| M1 | jobs.py |
No response size limit on fetched HTML — a malicious URL could serve 100MB+ and consume server memory |
| M2 | jobs.py |
_LLM_FILTER_PROMPT concatenates raw web page text without sanitization — prompt injection risk |
| M3 | jobs.py |
Playwright ThreadPoolExecutor(max_workers=2) has no queue depth limit — potential DoS under burst |
| M4 | jobs.py |
_SmartExtractor._in_skip() scans entire stack per character event — O(depth × nodes) |
| M5 | health.py |
asyncio.ensure_future() is deprecated — use asyncio.create_task() |
| M6 | config.py |
_ANTHROPIC_MODELS list is hardcoded and will become stale |
| M7 | config.py |
Error message echoes raw provider user input back in response |
| M8 | jobs.py |
_apply_site_cleanup never reads trim_before_last config key — dead config |
| M9 | tailor/page.tsx |
Frontend enforces HTTPS-only but backend accepts HTTP — inconsistent validation |
| M10 | tailor/page.tsx |
JSON.parse(JSON.stringify()) for deep clone — structuredClone() is more robust |
| M11 | tailor/page.tsx |
revertChange array removal via indexOf may remove wrong duplicate entry |
| M12 | config.ts |
provider param not URL-encoded in API fetch call |
| M13 | tests/diff-preview-modal.test.tsx |
No new tests for checkbox toggle, select all/none, partial confirm, disabled confirm button |
LOW
| # | File | Issue |
|---|---|---|
| L1 | refiner.py |
Regex compiled inside _keyword_in_text() on every call — move to module level |
| L2 | parser.py |
_ongoing_re compiled inside function body — move to module level |
| L3 | resumes.py |
generate() return type is None but function is an AsyncGenerator |
| L4 | resumes.py |
Inner function named generate could shadow imports |
| L5 | llm.py / config.py |
Ollama URL "http://127.0.0.1:11434" hardcoded in 4+ places — extract to constant |
| L6 | tailor/page.tsx |
handleConfirmPartial duplicates most of handleConfirmChanges — extract common logic |
| L7 | tailor/page.tsx |
result as unknown as ResumeData double cast bypasses type checking after mutations |
| L8 | dropdown.tsx |
searchable prop is backwards-compatible, no issues |
| L9 | tailor/page.tsx |
Timer/interval cleanup is correctly implemented |
What's Good
- SSE streaming implementation is well-structured with proper buffer parsing
- Timer cleanup and
useEffectteardown are correct - SITE_MAP approach for known job boards is solid
- Fallback chain (CSS selectors → Playwright → LLM filter) is reasonable
- Dropdown
searchableprop is backwards-compatible
Recommendation: Fix the 4 CRITICAL issues and the HIGH-severity security items (H2, H3, H4) before this can be merged. The SSRF issues in particular need attention since this PR introduces a URL-fetching endpoint that accepts user-provided URLs.
|
@Paranoidal97 I think we need to split up the PR into smaller chunks. And maybe introduce a plugin ecosystem that people can use to perform more actions on top of this. |
There was a problem hiding this comment.
1 issue found across 14 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/backend/app/routers/config.py">
<violation number="1" location="apps/backend/app/routers/config.py:211">
P2: `logger` is not defined in this module; this will raise `NameError` in the ConnectError handler and mask the intended 503 response.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| model_names: list[str] = [m["name"] for m in data.get("models", [])] | ||
| return {"models": sorted(model_names)} | ||
| except httpx.ConnectError: | ||
| logger.warning("Cannot connect to Ollama at %s", api_base) |
There was a problem hiding this comment.
P2: logger is not defined in this module; this will raise NameError in the ConnectError handler and mask the intended 503 response.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/backend/app/routers/config.py, line 211:
<comment>`logger` is not defined in this module; this will raise `NameError` in the ConnectError handler and mask the intended 503 response.</comment>
<file context>
@@ -208,9 +208,10 @@ async def get_ollama_models() -> dict:
model_names: list[str] = [m["name"] for m in data.get("models", [])]
return {"models": sorted(model_names)}
except httpx.ConnectError:
+ logger.warning("Cannot connect to Ollama at %s", api_base)
raise HTTPException(
status_code=503,
</file context>
| logger.warning("Cannot connect to Ollama at %s", api_base) | |
| logging.warning("Cannot connect to Ollama at %s", api_base) |
|
@srbhr What specific chunks would you like to break it down into? Regarding point 2, “Plugin ecosystem,” could you elaborate on what potential extensions you see here? Personally, I see two potential extensions at the moment
|
Pull Request Title
Related Issue
Description
copilot:summary
Type
Proposed Changes
Screenshots / Code Snippets (if applicable)
How to Test
Checklist
Additional Information
copilot:walkthrough
Summary by cubic
Improves the tailoring flow with partial acceptance, live streaming previews, and URL-based job fetching with JS-aware extraction. Adds dynamic model discovery (including local
ollama) with a searchable picker, plus stronger factual protection, parsing, health checks, and network safety.New Features
partial_confirm./resumes/improve/preview/streamwith frontend streaming support for slow models.POST /jobs/fetch-urlextracts descriptions via site CSS rules; Playwright fallback runs in a background thread for JS-heavy sites; LLM post-filter removes UI chrome; longest selector match avoids partial extraction; UI toggle “Paste Text” / “From URL”./config/provider-modelsand/config/ollama-models; Settings shows a searchable model dropdown and fetches provider models using stored keys.Bug Fixes
ollamabase to127.0.0.1, tighterlitellm/httpxtimeouts.Written for commit 9746217. Summary will update on new commits.