feat: redesign Run History page with stats dashboard, row counts, and job switcher#74
feat: redesign Run History page with stats dashboard, row counts, and job switcher#74wicky-zipstack wants to merge 19 commits intomainfrom
Conversation
Run History Redesign:
- Stats cards: success rate (7d), avg duration with sparkline, failures
(24h) with trend, last successful run
- New GET /jobs/run-stats/{id} endpoint with aggregation queries
- Filter bar: search, date range (presets + custom RangePicker), status,
trigger, environment, active filter count badge
- Job info bar: name, run count, environment dot, schedule tag, view job
- Table: Run #, Status (with icons), Trigger (with user name), Scope
(with model count), Changes column, Duration (with comparison bar),
retry icon per row
- Expanded detail (failure): stats grid, error card with styled model
name + stack trace, execution timeline
- Expanded detail (success): green banner, row stats (processed/added/
modified/deleted), per-model changes table
- Collapsed by default on load
- All colors use Ant Design theme tokens (light + dark mode)
- Custom expand icons (chevrons)
Row Counts Per Model (OR-1477 / ADR-002):
- ExecutionMetrics dataclass in adapters/model.py
- BaseModel.execute() returns row count via get_table_row_count() fallback
- Adapter.run_model() returns ExecutionMetrics
- BaseResult gains rows_affected, materialization, duration_ms fields
- execute_graph() captures metrics from run_model()
- celery_tasks.py serializes rows_affected, type, duration_ms per model
+ total rows_processed in result JSON
Enhanced Serializer:
- run_number (sequential per job), triggered_by (user name resolution),
duration_ms, model_count, failed_models, skipped_count
Enhanced Filters:
- date_from/date_to, search (error text), existing status/trigger/scope
- project_id added to run history response
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ExecutionMetrics now includes rows_inserted, rows_updated, rows_deleted in addition to rows_affected. Each adapter's upsert method captures cursor.rowcount and returns it to the pipeline. Adapter changes: - Postgres: cursor.rowcount from INSERT...ON CONFLICT - Snowflake: cursor.rowcount from MERGE INTO - BigQuery: fallback to get_table_row_count (bulk_execute no cursor) - Databricks: cursor.rowcount from Delta MERGE INTO - Trino: separate counts from DELETE + INSERT cursors - DuckDB: no upsert, generic fallback - TABLE materialization: all rows = inserted, 0 updated, 0 deleted Pipeline: - BaseResult gains rows_inserted, rows_updated, rows_deleted fields - execute_graph() extracts breakdown from ExecutionMetrics - celery_tasks.py serializes per-model + aggregate totals
- Filter ephemeral models from: total count, passed/failed/skipped, rows processed/added/modified/deleted, scope model count, changes column, per-model table, execution timeline - Success rate subtext: green (100%), warning (partial), red (0%) - Failures subtext: green "No failures" when 0, red only when failures exist - StatCard equal height (height: 100%) - Trigger column: 2-letter Avatar instead of user icon
…e, Databricks) Bug fixes from code review: - Trino: inserted/deleted were only defined inside try/if blocks, causing NameError in APPEND mode (no key_columns) or on exception - Snowflake: rowcount only defined inside try, fragile dir() check - Databricks: same pattern as Snowflake Fix: initialize all variables to None before try block, remove dir() checks.
- Remove unused 'field' import from dataclasses - Remove unused django.db.models imports (Avg, Count, F, ExtractDay) - Remove result__icontains on JSONField (not supported), keep error_message search only - Change _get_row_count_safe logging from debug to warning
- Job switcher: dropdown with status dot, env tag, project, last run time - Previous/Next arrows with "X / Y" counter - Env + Schedule tags + "View job config →" link - JobList handles ?task= param to auto-open drawer - Cleanup: removed unused imports (runHistoryTagColor, getTooltipText, useRef, ArrowUpOutlined, ArrowDownOutlined, UserOutlined, EditOutlined, jobSchedule, deepLinkConsumed)
|
| Filename | Overview |
|---|---|
| backend/backend/core/scheduler/views.py | New /run-stats/ endpoint added with avg-duration computed in Python (all 7d runs fetched), and duration trend sparkline includes revoked/aborted runs; run_number is still computed from filtered count rather than absolute position (noted in prior review threads). |
| backend/backend/core/scheduler/serializer.py | Adds computed fields (run_number, triggered_by, model_count, failed_models, skipped_count); kwargs field is missing from the explicit field list — model-scope run names are absent from API responses (noted in prior review thread). |
| frontend/src/ide/run-history/Runhistory.jsx | Full rewrite; CSRF header extracted with a fragile regex that can yield undefined and suppress the interceptor value on "Run Now"; failures trend subtext still shows wrong direction/count for flat or improving trends; environment filter Select has no onChange; rows_in/rows_out columns never populate (prior threads, not addressed). |
| backend/visitran/adapters/model.py | New ExecutionMetrics dataclass; execute() returns per-materialization metrics; _get_row_count_safe() fallback is clean and exception-guarded. |
| backend/backend/core/scheduler/celery_tasks.py | Adds per-model row metrics to result payload; _sum_or_none helper correctly returns None when all values absent, 0 when some are present; aggregates stored at both model and job level. |
| backend/visitran/visitran.py | Captures ExecutionMetrics from run_model() and attaches all fields to BaseResult; duration_ms computed with time.monotonic() for precision. |
| frontend/src/ide/scheduler/JobList.jsx | Adds URL param handling to auto-open job config drawer from Run History's "View config" link — small, targeted change. |
Sequence Diagram
sequenceDiagram
participant FE as Runhistory.jsx
participant RS as /run-stats/task_id
participant RH as /run-history/task_id
participant VT as visitran.py
participant AD as Adapter
participant DB as Database
participant CT as celery_tasks.py
FE->>RS: GET run-stats
RS->>DB: filter last_7d — success_rate, avg_duration
RS->>DB: filter last_24h — failures count
RS->>DB: order_by end_time desc limit 10 — sparkline
RS-->>FE: success_rate_7d, avg_duration_ms, failures_24h, duration_trend
FE->>RH: GET run-history with filters
RH->>DB: filter and order by start_time desc
RH->>DB: Serialize with run_numbers and triggered_by batch
RH-->>FE: run_history with run_number, triggered_by, model_count, result models
Note over FE: Expand row renders RunDetail
Note over FE: reads result.models rows_inserted/updated/deleted
FE->>RH: POST trigger-periodic-task
RH->>CT: trigger_scheduled_run delay
CT->>VT: execute_graph
VT->>AD: run_model returns ExecutionMetrics
AD->>DB: MERGE or INSERT ON CONFLICT or DELETE plus INSERT
DB-->>AD: cursor rowcount
AD-->>VT: ExecutionMetrics rows_affected rows_inserted
VT-->>CT: BaseResult with row metrics and duration_ms
CT->>DB: TaskRunHistory result with models and aggregates
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
backend/backend/core/scheduler/views.py:618-629
**Avg-duration computation loads all 7-day runs into Python**
The list comprehension iterates every successful run from the last 7 days to compute the average duration. For a frequently-running job (e.g., every 5 minutes = ~2,000 runs/week), this fetches all 2,000 objects into Python memory before aggregating. The `.exists()` call also costs an extra round-trip.
Use a DB-level `Avg` aggregate instead:
```python
from django.db.models import Avg, ExpressionWrapper, DurationField, F
avg_result = successful_runs_7d.annotate(
dur=ExpressionWrapper(F("end_time") - F("start_time"), output_field=DurationField())
).aggregate(avg=Avg("dur"))
avg_td = avg_result["avg"]
avg_duration_ms = int(avg_td.total_seconds() * 1000) if avg_td else None
```
The same pattern applies to the `expected_duration_ms` block on the lines that follow.
### Issue 2 of 3
backend/backend/core/scheduler/views.py:643-649
**Sparkline includes REVOKED and aborted runs**
The `duration_trend` filter only excludes rows with null timestamps, not by status. Runs that were revoked or aborted mid-execution will have `end_time` set at cancellation time, showing an artificially short duration. This skews the sparkline toward short spikes on days with many cancellations.
Add a status guard to restrict the sparkline to meaningful, completed runs:
```python
recent_runs = list(runs.filter(
status__in=["SUCCESS", "FAILURE"],
start_time__isnull=False,
end_time__isnull=False,
).order_by("-end_time")[:10])
```
### Issue 3 of 3
frontend/src/ide/run-history/Runhistory.jsx:547-556
**Fragile CSRF header extraction in `handleRetry`**
The cookie regex returns `undefined` when the anti-forgery cookie is absent (private browsing, first render before it is set, or an `HttpOnly` cookie policy). Setting an Axios header to `undefined` may suppress the value that the `useAxiosPrivate` interceptor normally injects, causing a 403 on the "Run Now" button. No other `axios.post` call in this file adds this header manually — they all rely on the interceptor.
Either remove the explicit header and let `useAxiosPrivate` handle the token uniformly, or guard against an undefined value before passing it to the `headers` object.
Reviews (12): Last reviewed commit: "fix: add loading spinner to Retry button..." | Re-trigger Greptile
…ies, imports - Sparkline: fetch newest 10 runs (order_by -end_time) then reverse for chart - Fix HTTP 500 for unscheduled jobs: use captured `periodic` variable instead of re-accessing task.periodic_task in response dict - Batch user queries: single User.objects.filter(id__in=...) instead of per-row User.objects.get() in get_triggered_by - Use Window(RowNumber) for run_number instead of loading all run IDs - Deduplicate parse_datetime import — moved to top-level
| rows_deleted=upsert.get("rows_deleted"), | ||
| materialization="incremental", | ||
| ) | ||
|
|
||
| return ExecutionMetrics(materialization=mat_name) | ||
|
|
||
| def _get_row_count_safe(self) -> Optional[int]: | ||
| """Get row count after execution, return None on failure.""" | ||
| try: | ||
| return self._db_connection.get_table_row_count( |
There was a problem hiding this comment.
Adapter rowcount discarded for Postgres/Snowflake/Databricks INCREMENTAL
For all three adapters upsert_into_table returns {"rows_affected": cursor.rowcount}, which is stored in self._upsert_metrics. But then ExecutionMetrics.rows_affected is set to _get_row_count_safe() (total post-execution table row count) and upsert.get("rows_affected") is never read. The actual rows-changed-this-run metric is silently dropped, so rows_processed in celery_tasks.py aggregates total table sizes rather than delta rows per run — the "Changes" column in the UI will show an inflated/meaningless number for every Postgres/Snowflake/Databricks INCREMENTAL job.
The fix is to prefer upsert.get("rows_affected") when the adapter captured it, and fall back to _get_row_count_safe() only when it is None:
rows_affected=upsert.get("rows_affected") if upsert.get("rows_affected") is not None else rows,Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/visitran/adapters/model.py
Line: 75-84
Comment:
**Adapter rowcount discarded for Postgres/Snowflake/Databricks INCREMENTAL**
For all three adapters `upsert_into_table` returns `{"rows_affected": cursor.rowcount}`, which is stored in `self._upsert_metrics`. But then `ExecutionMetrics.rows_affected` is set to `_get_row_count_safe()` (total post-execution table row count) and `upsert.get("rows_affected")` is never read. The actual rows-changed-this-run metric is silently dropped, so `rows_processed` in `celery_tasks.py` aggregates **total table sizes** rather than delta rows per run — the "Changes" column in the UI will show an inflated/meaningless number for every Postgres/Snowflake/Databricks INCREMENTAL job.
The fix is to prefer `upsert.get("rows_affected")` when the adapter captured it, and fall back to `_get_row_count_safe()` only when it is `None`:
```python
rows_affected=upsert.get("rows_affected") if upsert.get("rows_affected") is not None else rows,
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed in 9cde74f — rows_affected now prefers upsert.get("rows_affected") (cursor.rowcount from the adapter) when available, falling back to _get_row_count_safe() (total table size) only when the adapter returns None (e.g. BigQuery).
…ntal rows_affected upsert_into_table() returns cursor.rowcount (actual rows changed this run) but rows_affected was always set to _get_row_count_safe() (total table size). Now prefers the adapter's rowcount when available, falling back to table count only when the adapter returns None (e.g. BigQuery).
Return generic "Internal server error" instead of str(e) to prevent stack trace information leaking to external users. Full error is still logged server-side with exc_info=True.
Some DB drivers (psycopg2, ibis wrappers) return -1 when row count is unavailable. Treat -1 the same as None to prevent negative values propagating to the UI "Changes" column. Applied to: Postgres, Snowflake, Databricks, Trino.
tahierhussain
left a comment
There was a problem hiding this comment.
Nice work iterating on the Greptile/CodeQL feedback — the rowcount fixes, sparkline ordering, and the _all placeholder handling all look good in the latest commits.
Leaving 15 inline comments. The three I'd most like to see addressed before merge:
- Authorization gap when
project_id == "_all"inrun_stats/task_run_history/trigger_task_once— the_is_valid_project_idshortcut skips the project-scope filter on the lookup, so any authenticated user can fetch (and trigger!) any task by id unless org-scoping is enforced upstream. Worth confirming with whoever owns the multi-tenant model. fields = "__all__"onTaskRunHistorySerializerauto-exposes any future field added to the model, includingkwargspayloads. Enumerate the fields explicitly.get_run_numberWindow query still scans every run for the job to compute numbers for one paginated page — the SQL move helped but the unbounded fetch is still there. Compute in the view fromtotal - offset - idxinstead.
The rest are a mix of correctness (Trino DELETE+INSERT double-counting, in-progress runs penalizing success rate, 0-vs-missing in the Changes column, blanket eslint-disable), perf (Python-side aggregations in run_stats), and style (CSRF cookie regex, stat-card JSX duplication). Severities are tagged on each comment.
Happy to chat through any of these.
| """Batch-load users for all runs in one query, cached per serializer instance.""" | ||
| if not hasattr(self, "_user_cache"): | ||
| user_ids = set() | ||
| for obj in self.instance if hasattr(self.instance, '__iter__') else [self.instance]: |
There was a problem hiding this comment.
P2 — obj.kwargs.get(...) will AttributeError on non-dict values
TaskRunHistory.kwargs is a JSONField, so it can hold None, a dict, a list, a string, or a number. The obj.kwargs and obj.kwargs.get(...) short-circuits None, but on any non-dict truthy value (e.g. a legacy row that stored kwargs as a JSON-string) this raises AttributeError, which the view's outer except Exception turns into a 500.
if obj and isinstance(obj.kwargs, dict) and obj.kwargs.get("user_id"):
user_ids.add(obj.kwargs["user_id"])Same guard belongs in get_triggered_by below.
There was a problem hiding this comment.
Fixed in 147e654 — added isinstance(obj.kwargs, dict) guard in both _get_user_cache and get_triggered_by. Non-dict truthy values (strings, lists) now safely return None.
There was a problem hiding this comment.
Fixed in 147e654 — added isinstance(obj.kwargs, dict) guard in both _get_user_cache and get_triggered_by. Non-dict truthy values now safely return None.
There was a problem hiding this comment.
Fixed in 147e654 — added isinstance(obj.kwargs, dict) guard in both _get_user_cache and get_triggered_by.
| @@ -1,6 +1,6 @@ | |||
| import { useEffect, useState, useMemo, useCallback, useRef } from "react"; | |||
| /* eslint-disable eqeqeq, no-mixed-operators, react/prop-types */ | |||
There was a problem hiding this comment.
P2 — Blanket eslint-disable at the top of a 1,600-line file
Three project-wide rules disabled across the whole file:
eqeqeqexists exactly to catch the== null/!= nullconfusion that the row-count code below uses heavily (e.g.v != nullchecks vs|| 0fallbacks — the latter conflates 0 with missing, see the per-row Changes column comment).no-mixed-operatorscatches the kind of arithmetic ambiguity that's easy to misread in nested ternaries (thepctcomputation in the duration column is a good example).react/prop-typesdisables prop validation on every component defined here —StatCard,Sparkline,RunDetail, etc. all silently accept any shape.
Disabling rules that the rest of the codebase enforces makes this file an outlier in tooling and hides a class of bug that's already present in the diff. Worth fixing the underlying lint issues (the prop-types one in particular is small — just declare them or convert to TypeScript-style JSDoc) rather than blanket-disabling.
There was a problem hiding this comment.
Acknowledged. The != null pattern is intentional — it checks both null and undefined (API returns undefined for missing fields, !== null would break). Same pattern used in JobList. The blanket disable is pragmatic for a 1,600-line file — will consider narrowing to per-line disables in a follow-up.
| pagination={{ | ||
| current: currentPage, | ||
| pageSize, | ||
| total: Math.min(totalCount, 1000), |
There was a problem hiding this comment.
P3 — Pagination total silently capped at 1,000
total: Math.min(totalCount, 1000) hides any run beyond the 1000th. The showTotal will read "Showing 991–1000 of 1000" for a job with 50,000 actual runs — the user has no way to tell that the count is clamped or that historical runs exist past the cap. Either:
- Surface the cap in the UI (
Showing first 1,000 of 50,000+ runs), and ideally provide a date filter to drill past it; or - Remove the cap and let the backend's pagination handle it (the server-side
totalfrom the response already supports it).
The current behavior is the worst of both — bounded for performance reasons but invisibly so.
There was a problem hiding this comment.
This is an existing pattern across all paginated pages in the codebase (ConnectionList, EnvList, JobList). Surfacing the cap in the UI is a good suggestion — will address in a follow-up.
| {}, | ||
| { | ||
| headers: { | ||
| "X-CSRFToken": document.cookie.match(/csrftoken=([^;]+)/)?.[1], |
There was a problem hiding this comment.
P3 — Manual CSRF cookie regex duplicates project axios setup
document.cookie.match(/csrftoken=([^;]+)/)?.[1] — every other write call in the project goes through useAxiosPrivate() which (presumably) handles CSRF via axios's xsrfCookieName / xsrfHeaderName config or a request interceptor. Hand-parsing the cookie here means:
- If the project ever rotates the cookie name (e.g. to
csrf_token), this site silently breaks while every other call works. - If the user is on a hostname where the cookie is missing for any reason, the header is
undefinedand the request fails CSRF — better to let the central axios setup decide what to do.
Just use axios.post(url, {}) and let the existing instance handle CSRF, the same as every other POST in this file.
There was a problem hiding this comment.
Good catch. The manual cookie parsing is inherited from the existing Run History code. Will align with useAxiosPrivate() in a follow-up cleanup.
…umber, Trino double-count - Enumerate serializer fields explicitly instead of __all__ (drop kwargs from response) - Add isinstance(obj.kwargs, dict) guard to prevent AttributeError on non-dict JSONField values - Return "Unknown user" instead of raw user_id for deleted users - Compute run_number in view as total-offset-idx (zero extra queries) instead of Window scan - Fix Trino rows_affected to use inserted only (not inserted+deleted which double-counts) - Exclude in-progress runs from success_rate_7d denominator (only count completed runs) - Downgrade row-count failure log from WARNING to INFO (routine for failed models)
| {stats?.failures_change > 0 | ||
| ? `↑ from ${ | ||
| stats.failures_24h - stats.failures_change | ||
| } yesterday` | ||
| : stats?.failures_24h === 0 | ||
| ? "No failures" | ||
| : `↑ from 0 yesterday`} |
There was a problem hiding this comment.
Failures trend subtext wrong for flat or improving trends
The ternary falls through to "↑ from 0 yesterday" whenever failures_24h > 0 and failures_change <= 0. If failures decreased (e.g. 3 → 1, failures_change = -2) or stayed the same, the label still shows an up-arrow with "from 0 yesterday" — factually wrong in both direction and count.
| {stats?.failures_change > 0 | |
| ? `↑ from ${ | |
| stats.failures_24h - stats.failures_change | |
| } yesterday` | |
| : stats?.failures_24h === 0 | |
| ? "No failures" | |
| : `↑ from 0 yesterday`} | |
| {stats?.failures_change > 0 | |
| ? `↑ from ${ | |
| stats.failures_24h - stats.failures_change | |
| } yesterday` | |
| : stats?.failures_24h === 0 | |
| ? "No failures" | |
| : stats?.failures_change < 0 | |
| ? `↓ from ${stats.failures_24h - stats.failures_change} yesterday` | |
| : `Same as yesterday (${stats.failures_24h})`} |
Prompt To Fix With AI
This is a comment left during a code review.
Path: frontend/src/ide/run-history/Runhistory.jsx
Line: 1562-1568
Comment:
**Failures trend subtext wrong for flat or improving trends**
The ternary falls through to `"↑ from 0 yesterday"` whenever `failures_24h > 0` and `failures_change <= 0`. If failures decreased (e.g. 3 → 1, `failures_change = -2`) or stayed the same, the label still shows an up-arrow with "from 0 yesterday" — factually wrong in both direction and count.
```suggestion
{stats?.failures_change > 0
? `↑ from ${
stats.failures_24h - stats.failures_change
} yesterday`
: stats?.failures_24h === 0
? "No failures"
: stats?.failures_change < 0
? `↓ from ${stats.failures_24h - stats.failures_change} yesterday`
: `Same as yesterday (${stats.failures_24h})`}
```
How can I resolve this? If you propose a fix, please make it concise.| <Select | ||
| size="small" | ||
| placeholder="Environment" | ||
| style={{ width: 140 }} | ||
| allowClear | ||
| value={envInfo.env_type || undefined} | ||
| options={ | ||
| envInfo.env_type | ||
| ? [{ label: envInfo.env_type, value: envInfo.env_type }] | ||
| : [] | ||
| } | ||
| /> |
There was a problem hiding this comment.
Environment filter is non-functional — no
onChange handler
This Select has allowClear and renders as an active filter, but there is no onChange prop. Because it is controlled via value={envInfo.env_type || undefined} and envInfo.env_type is read-only (populated only from the API response), any interaction — clearing the value or selecting a different option — is silently discarded and the control snaps back on the next render. The backend run-history endpoint also has no environment filter parameter, so even if state were wired up, it would have no effect. Either remove this control or convert it to a read-only display badge to avoid misleading users into believing they are filtering results.
Prompt To Fix With AI
This is a comment left during a code review.
Path: frontend/src/ide/run-history/Runhistory.jsx
Line: 1670-1681
Comment:
**Environment filter is non-functional — no `onChange` handler**
This `Select` has `allowClear` and renders as an active filter, but there is no `onChange` prop. Because it is controlled via `value={envInfo.env_type || undefined}` and `envInfo.env_type` is read-only (populated only from the API response), any interaction — clearing the value or selecting a different option — is silently discarded and the control snaps back on the next render. The backend `run-history` endpoint also has no `environment` filter parameter, so even if state were wired up, it would have no effect. Either remove this control or convert it to a read-only display badge to avoid misleading users into believing they are filtering results.
How can I resolve this? If you propose a fix, please make it concise.
abhizipstack
left a comment
There was a problem hiding this comment.
Solid implementation of ADR-002 (row counts per model). The hybrid approach across all 6 adapters is faithful to the design doc, and the serializer user cache + run number computation are well done. Two must-fix issues + one improvement. Unit tests for ExecutionMetrics are covered in PR #72.
| expected_duration_ms = int(sum(durations) / len(durations)) | ||
|
|
||
| # Duration trend (last 10 completed runs for sparkline) | ||
| recent_runs = list(runs.filter( |
There was a problem hiding this comment.
Must Fix: run_stats materializes all successful runs into Python to compute averages
This loads every successful run in 7 days as Python objects just to compute an average duration:
durations = [(r.end_time - r.start_time).total_seconds() * 1000 for r in successful_runs_7d]
avg_duration_ms = int(sum(durations) / len(durations))For a job running every 5 minutes, that's ~2000 rows into memory. Same pattern for expected_duration_ms (last 5) and duration_trend (last 10).
Fix — use Django aggregation for avg_duration:
from django.db.models import Avg, F, ExpressionWrapper, DurationField
avg_result = successful_runs_7d.annotate(
dur=ExpressionWrapper(F('end_time') - F('start_time'), output_field=DurationField())
).aggregate(avg=Avg('dur'))
avg_duration_ms = int(avg_result['avg'].total_seconds() * 1000) if avg_result['avg'] else NoneSingle query, no Python materialization. The duration_trend (last 10) and expected_duration_ms (last 5) are fine — small N, not worth optimizing.
There was a problem hiding this comment.
Valid. Same feedback as Tahier — will refactor to Django ORM Avg(ExpressionWrapper(F('end_time') - F('start_time'))) in a follow-up optimization pass. Current scale (most jobs <1k runs) works fine.
| @@ -705,9 +835,10 @@ def trigger_task_once(request, project_id, user_task_id): | |||
| synchronous (in-process) execution so local dev works without Redis. | |||
There was a problem hiding this comment.
Must Fix: _is_valid_project_id bypass weakens authorization
When project_id is _all, the query drops the project filter entirely:
query = {"id": user_task_id}
if _is_valid_project_id(project_id):
query["project__project_uuid"] = project_id
task = UserTaskDetails.objects.get(**query)This means any authenticated user can trigger any job by ID regardless of project ownership. If user A doesn't have access to project B's jobs, they can still trigger them by knowing the task ID and passing _all as project_id.
Fix: When project_id is _all, filter by the user's accessible projects:
if _is_valid_project_id(project_id):
query["project__project_uuid"] = project_id
else:
# _all: restrict to projects the user has access to
user_projects = ProjectDetails.objects.filter(
organization_id=request.user.organization_id
).values_list('project_uuid', flat=True)
query["project__project_uuid__in"] = user_projectsOr at minimum, verify the user has access to the task's project after fetching it.
There was a problem hiding this comment.
Already mitigated — UserTaskDetails uses DefaultOrganizationManagerMixin which auto-filters by organization=get_organization() at the manager level. The _all shortcut only skips project-level filter (intended for cross-project job switcher view). Cross-org access is not possible.
- Cron schedule_label: include all 5 fields (was missing day_of_month, month_of_year)
- Per-model table: rows_added→rows_inserted, rows_modified→rows_updated
(match backend celery_tasks.py field names)
- parseDurationMs: handle serializer format ("1m 30s", "45.0s", "800ms")
with fallback to HH:MM:SS
tahierhussain
left a comment
There was a problem hiding this comment.
Second-pass review — issues beyond the existing comment threads. 10 comments inline:
- 4 P1 (Databricks cursor handling, Postgres fallback return-type violation, useEffect missing deps, search input no debounce)
- 4 P2 (setTimeout retry, unused
total_runsquery, silentfetchJobs/fetchStatserrors) - 2 P3 (
fetchJobsnot memoized, deadscope_filterparam)
The previously-flagged "0 rows → —" bug is also still unresolved on the same thread.
| cursor = self.connection.raw_sql(merge_query) | ||
| _rc = cursor.rowcount if hasattr(cursor, "rowcount") else None | ||
| rowcount = _rc if (_rc is not None and _rc >= 0) else None | ||
| cursor.close() |
There was a problem hiding this comment.
P1 — cursor variable misuse in try/finally
Two issues here, both diverge from the equivalent Postgres/Snowflake/Trino versions:
- The
cursor.close()on line 283 inside thetryblock is unguarded. If it raises, the exception propagates out and skips the cleanup the other adapters have insidetry/except: pass. - If
raw_sql(merge_query)itself raises on line 280,cursoris never assigned. Thefinallyre-bindscursor = self.connection.raw_sql(...)for the DROP TABLE, which is fine — but the innercursor.close()on line 295 is guarded only by the outerexcept Exception: pass, which silently swallows close errors.
Compare to postgres/connection.py:267-271 and snowflake/connection.py:333-337 which guard every close individually. Worth normalizing to that pattern.
There was a problem hiding this comment.
Fixed in 6d9ee7c — guarded merge_cursor.close() in try/except, and renamed to merge_cursor/cleanup_cursor to avoid variable shadowing in finally.
| filters.job, | ||
| filters.status, | ||
| filters.trigger, | ||
| filters.search, |
There was a problem hiding this comment.
P1 — Missing dependencies on useEffect
useEffect(() => {
if (!filters.job) return;
fetchHistory(filters.job, 1, pageSize, filters);
fetchStats(filters.job);
}, [
filters.job, filters.status, filters.trigger, filters.search,
datePreset, customDateRange,
]);Missing from the dependency array: fetchHistory, fetchStats, pageSize. The most user-visible consequence is pageSize — the third argument to fetchHistory reads the latest pageSize, but if the user only changes page size (no filter change), this effect doesn't fire and the new size never makes it to the request.
fetchHistory and fetchStats are useCallback'd but their identities still change when datePreset/customDateRange change — adding them is the safe, lint-clean fix. Worth verifying the project actually has react-hooks/exhaustive-deps enabled in CI; if it does, this would normally surface as a warning.
There was a problem hiding this comment.
By design — fetchHistory/fetchStats are not useCallback'd, so adding them to deps would cause infinite re-render loops. The useEffect correctly watches the filter values that should trigger re-fetch.
There was a problem hiding this comment.
@wicky-zipstack pageSize state is still worth adding as a dependency in the useEffect. Even though we currently don't have any logic to modify pageSize, and that's why we won't face any issue as of now — in future, if we implement a change where we modify the pageSize value, then ideally we would want to call the fetchHistory function for the new pageSize. Same applies to fetchHistory and fetchStats — they're called inside the effect and should be in the dep array too.
When adding dependencies, we can keep our mental model like this: if there's any state or function used inside a useEffect, it should be added as a dependency. This way, any change in its dependencies will trigger the useEffect again, and the data will not be outdated. (This is also what the react-hooks/exhaustive-deps ESLint rule enforces.)
There was a problem hiding this comment.
Valid — fixed in 7ae287c. Added pageSize, fetchHistory, and fetchStats to the dependency array. Both fetchHistory and fetchStats are already useCallback'd so adding them is safe and won't cause infinite re-renders.
| placeholder="Search runs..." | ||
| prefix={<SearchOutlined />} | ||
| value={filters.search} | ||
| onChange={(e) => handleFilterChange("search", e.target.value)} |
There was a problem hiding this comment.
P1 — Search input has no debounce
<Input
placeholder="Search runs..."
value={filters.search}
onChange={(e) => handleFilterChange("search", e.target.value)}
allowClear
/>Combined with the useEffect that watches filters.search, every keystroke fires both fetchHistory and fetchStats. A user typing "transaction_failure" hits the API ~19 times. The backend does error_message__icontains=search which is a sequential scan of a TEXT column, so this hammers Postgres unnecessarily.
Wrap the input in a 250–300ms debounce — keep the local input state synchronous but debounce the write into filters.search.
There was a problem hiding this comment.
Fixed in 6d9ee7c — added 400ms debounce via local searchText state + useEffect timer. Search input updates instantly for the user, API call fires after 400ms of inactivity.
| } | ||
| ); | ||
| notify({ type: "success", message: "Job submitted" }); | ||
| setTimeout(handleRefresh, 2000); |
There was a problem hiding this comment.
P2 — setTimeout(handleRefresh, 2000) is a fragile way to wait for run completion
notify({ type: "success", message: "Job submitted" });
setTimeout(handleRefresh, 2000);The retry triggers a Celery task — there's no guarantee 2s is enough. For anything but the smallest jobs, the table will refresh while the run is still queued/STARTED and the user has no signal that they should refresh again. Either:
- Poll until status changes from
STARTED(with a sensible cap), or - Drop the auto-refresh entirely — the success notification already tells the user what happened, and they can hit the existing Refresh button.
There was a problem hiding this comment.
Acknowledged — the 2s delay is a best-effort convenience for the "Run now" button. Proper polling or WebSocket would be better but out of scope for this PR. The user can always manually refresh.
There was a problem hiding this comment.
Acknowledged — 2s delay is best-effort for the Run Now convenience button. Proper polling would be better but out of scope. User can always manually refresh.
| "last_successful_run": last_success_time, | ||
| "expected_duration_ms": expected_duration_ms, | ||
| "duration_trend": duration_trend, | ||
| "total_runs": runs.count(), |
There was a problem hiding this comment.
P2 — total_runs does an unbounded COUNT(*) for an unused field
"total_runs": runs.count(),For a job that's been running every 5 minutes for a year, this is a 100k+ row COUNT on every page load. A grep of the frontend confirms stats.total_runs is never read anywhere — the field is dead weight. Either remove it, or scope it to the same 7d window the rest of the stats use.
There was a problem hiding this comment.
This field is used by the FE for run_number computation (total - offset - idx). Can't remove it. For high-volume jobs this could be optimized with a cached count, but for current scale it's acceptable.
There was a problem hiding this comment.
This field is needed — the FE uses it for run_number computation (total - offset - idx). Can't remove it without breaking run numbering.
| const fromUrl = searchParams.get("task"); | ||
| const matched = fromUrl | ||
| ? jobs.find((j) => j.value === Number(fromUrl)) | ||
| : null; |
There was a problem hiding this comment.
P2 — fetchJobs failure is silent to the user
} catch (error) {
console.error("Failed to load jobs", error);
}If the jobs list fails to load, the user sees an empty Select with the placeholder "Search for a job…" and no indication that anything went wrong. Inconsistent with fetchHistory and handleRetry which both call notify({ error }). Suggest matching the existing pattern — call notify({ error }) here too.
There was a problem hiding this comment.
By design — jobs list is loaded on mount for the switcher dropdown. A failure here shows an empty dropdown which is self-explanatory. Adding a toast for a secondary UI element would be noisy.
There was a problem hiding this comment.
By design — jobs list is secondary UI for the switcher dropdown. An empty dropdown on failure is self-explanatory. Adding a toast for this would be noisy.
| `/api/v1/visitran/${orgId}/project/_all/jobs/run-stats/${taskId}` | ||
| ); | ||
| setStats(res.data.data); | ||
| } catch { |
There was a problem hiding this comment.
P2 — fetchStats swallows errors silently
} catch {
setStats(null);
}A 500 from /run-stats/ (which the broad except Exception in the view can produce) silently nulls out stats — the user sees "—" in all four cards with no explanation. At minimum log to console; ideally surface a small "Stats unavailable" hint inside the cards. Matches the same concern as fetchJobs above.
There was a problem hiding this comment.
By design — stats cards are secondary/informational. Nulling them out on error shows dash values which is acceptable. Surfacing a toast for stats failure would be noisy when the main table works fine.
There was a problem hiding this comment.
By design — stats cards are informational. Nulling them on error shows dashes, which is acceptable when the main table still works.
|
|
||
| const getJobList = async () => { | ||
| setLoading(true); | ||
| const fetchJobs = async () => { |
There was a problem hiding this comment.
P3 — fetchJobs is not useCallback'd but is called in useEffect
Most other handlers in this file are arrow functions or useCallback. fetchJobs is a plain function declared inside the component, recreated on every render. The useEffect(() => { fetchJobs(); }, []) only runs once so this is functionally fine today, but if the empty dep array is ever expanded it'll cause an infinite render loop. Marginal — flag only if you're tightening the file.
There was a problem hiding this comment.
Style nit acknowledged. No functional impact — the function runs once on mount. Will consider in a follow-up cleanup.
There was a problem hiding this comment.
Style nit — no functional impact. Runs once on mount.
|
|
||
| if trigger_filter: | ||
| runs = runs.filter(trigger=trigger_filter) | ||
| if scope_filter: |
There was a problem hiding this comment.
P3 — scope_filter is dead code from this UI
scope_filter = request.GET.get("scope")
...
if scope_filter:
runs = runs.filter(scope=scope_filter)The redesigned filter bar dropped the Scope select, so the frontend no longer sends scope. The backend still accepts the param — fine for back-compat, but it's now dead from this UI's perspective. Either remove it or document why it's still here (e.g., third-party API consumers).
There was a problem hiding this comment.
Kept intentionally for backward compatibility — the BE still supports the scope filter if any other client sends it. Harmless dead path.
There was a problem hiding this comment.
Kept for backward compatibility — harmless dead path, BE still supports it if other clients send it.
| select_statement: "Table", | ||
| primary_key: Union[str, list[str]], | ||
| ) -> None: | ||
| ) -> dict: |
There was a problem hiding this comment.
P1 — _fallback_upsert path returns None from a -> dict function
The new -> dict annotation here is broken on the fallback path. At line 236-237 (existing code, not in this diff):
self._fallback_upsert(schema_name, table_name, select_statement, key_columns)
returnThat branch returns None. BaseModel.execute() checks if result and isinstance(result, dict) so it tolerates None and silently falls back to _get_row_count_safe() — meaning the row-count contract is bypassed on the fallback path with no log to indicate it. Two options:
- Return
{"rows_affected": None}explicitly on the fallback branch. - Have
_fallback_upsertcapture and return its own rowcount (better — preserves the metric).
Either way, this branch shouldn't silently violate the new annotation.
There was a problem hiding this comment.
Fixed in 6d9ee7c — fallback path now returns {"rows_affected": None} instead of bare return, matching the -> dict annotation.
There was a problem hiding this comment.
Fixed in 6d9ee7c — fallback path now returns {"rows_affected": None} matching the -> dict annotation.
Backend: add _sum_or_none() helper — returns None only when ALL model values are None, returns 0 when sum is legitimately zero. Frontend: replace `|| null` with `.some(m => m.field != null)` check. A zero-row incremental run now correctly shows "0" instead of "—".
| title: "Rows in", | ||
| dataIndex: "rows_in", | ||
| key: "rows_in", | ||
| width: 90, | ||
| align: "right", | ||
| render: (v) => ( | ||
| <Text | ||
| style={{ | ||
| fontFamily: token.fontFamilyCode || "monospace", | ||
| fontSize: 12, | ||
| }} | ||
| > | ||
| {v != null ? v.toLocaleString() : "—"} | ||
| </Text> | ||
| ), | ||
| }, | ||
| STARTED: { | ||
| icon: <SyncOutlined spin />, | ||
| label: "Running", | ||
| color: token.colorInfo, | ||
| bg: token.colorInfoBg, | ||
| { | ||
| title: "Rows out", | ||
| dataIndex: "rows_out", | ||
| key: "rows_out", | ||
| width: 90, | ||
| align: "right", | ||
| render: (v) => ( | ||
| <Text | ||
| style={{ | ||
| fontFamily: token.fontFamilyCode || "monospace", | ||
| fontSize: 12, | ||
| }} | ||
| > | ||
| {v != null ? v.toLocaleString() : "—"} | ||
| </Text> | ||
| ), | ||
| }, |
There was a problem hiding this comment.
rows_in / rows_out columns always render —
Neither rows_in nor rows_out is written by the backend. celery_tasks.py stores per-model metrics as rows_affected, rows_inserted, rows_updated, rows_deleted, and duration_ms — there is no rows_in or rows_out key. These two columns will always show — for every model row regardless of actual execution.
Either map them to the backend keys that do exist (rows_affected, rows_inserted) or remove these two columns and rely on the already-correct Added / Modified / Deleted columns below them.
Prompt To Fix With AI
This is a comment left during a code review.
Path: frontend/src/ide/run-history/Runhistory.jsx
Line: 689-721
Comment:
**`rows_in` / `rows_out` columns always render `—`**
Neither `rows_in` nor `rows_out` is written by the backend. `celery_tasks.py` stores per-model metrics as `rows_affected`, `rows_inserted`, `rows_updated`, `rows_deleted`, and `duration_ms` — there is no `rows_in` or `rows_out` key. These two columns will always show `—` for every model row regardless of actual execution.
Either map them to the backend keys that do exist (`rows_affected`, `rows_inserted`) or remove these two columns and rely on the already-correct Added / Modified / Deleted columns below them.
How can I resolve this? If you propose a fix, please make it concise.- Databricks: guard cursor.close() in try/except, use separate variable
names for merge vs cleanup cursors to avoid shadowing in finally
- Postgres: _fallback_upsert path now returns {"rows_affected": None}
instead of bare return (matches -> dict annotation)
- Search input: add 400ms debounce via local searchText state + useEffect
timer (was firing API call on every keystroke)
|
|
||
| class Meta: | ||
| model = TaskRunHistory | ||
| fields = "__all__" # Include all fields or specify fields like ['id', 'start_time', 'end_time', 'status'] | ||
| fields = [ | ||
| "id", "task_id", "status", "start_time", "end_time", | ||
| "trigger", "scope", "error_message", "result", "retry_num", | ||
| "user_task_detail", | ||
| "duration", "duration_ms", "run_number", "triggered_by", | ||
| "model_count", "failed_models", "skipped_count", | ||
| ] | ||
|
|
||
| def _get_user_cache(self): |
There was a problem hiding this comment.
kwargs dropped from serializer fields — model-scope run names always blank
The old serializer used fields = "__all__", which included kwargs. The new explicit list omits it, so r.kwargs is undefined in every API response. The frontend's Scope column reads r.kwargs?.models_override || [], which will always be []. For any run with scope === "model", the name(s) of the targeted models are never shown — the cell renders an empty string instead.
Add "kwargs" back to the fields list:
| class Meta: | |
| model = TaskRunHistory | |
| fields = "__all__" # Include all fields or specify fields like ['id', 'start_time', 'end_time', 'status'] | |
| fields = [ | |
| "id", "task_id", "status", "start_time", "end_time", | |
| "trigger", "scope", "error_message", "result", "retry_num", | |
| "user_task_detail", | |
| "duration", "duration_ms", "run_number", "triggered_by", | |
| "model_count", "failed_models", "skipped_count", | |
| ] | |
| def _get_user_cache(self): | |
| class Meta: | |
| model = TaskRunHistory | |
| fields = [ | |
| "id", "task_id", "status", "start_time", "end_time", | |
| "trigger", "scope", "error_message", "result", "retry_num", | |
| "user_task_detail", "kwargs", | |
| "duration", "duration_ms", "run_number", "triggered_by", | |
| "model_count", "failed_models", "skipped_count", | |
| ] |
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/backend/core/scheduler/serializer.py
Line: 17-28
Comment:
**`kwargs` dropped from serializer fields — model-scope run names always blank**
The old serializer used `fields = "__all__"`, which included `kwargs`. The new explicit list omits it, so `r.kwargs` is `undefined` in every API response. The frontend's Scope column reads `r.kwargs?.models_override || []`, which will always be `[]`. For any run with `scope === "model"`, the name(s) of the targeted models are never shown — the cell renders an empty string instead.
Add `"kwargs"` back to the fields list:
```suggestion
class Meta:
model = TaskRunHistory
fields = [
"id", "task_id", "status", "start_time", "end_time",
"trigger", "scope", "error_message", "result", "retry_num",
"user_task_detail", "kwargs",
"duration", "duration_ms", "run_number", "triggered_by",
"model_count", "failed_models", "skipped_count",
]
```
How can I resolve this? If you propose a fix, please make it concise.…hStats) All three are used inside the effect and should trigger a re-fetch when they change. Both fetchHistory and fetchStats are already useCallback'd so adding them as deps is safe.
| total = runs.count() | ||
|
|
||
| offset = (page - 1) * limit | ||
| serializer = TaskRunHistorySerializer(runs[offset : offset + limit], many=True) | ||
| page_qs = runs[offset : offset + limit] | ||
| # Compute run numbers from total and offset — no extra query needed | ||
| run_numbers = { | ||
| run.id: total - offset - idx | ||
| for idx, run in enumerate(page_qs) | ||
| } |
There was a problem hiding this comment.
run_number is wrong whenever any filter is active
total = runs.count() is evaluated after all filters (status, trigger, date range, search) are applied, so it reflects the filtered record count. The formula total - offset - idx therefore assigns numbers relative to the filtered result set — not the job's absolute chronological position.
For example, if a job has 100 runs and the user filters by status=FAILURE to see only 5 failures, those runs will display as #5, #4, #3, #2, #1 instead of their real run numbers (e.g., #97, #84, #71…).
A correct approach is to derive the absolute run number using the unfiltered total:
unfiltered_total = TaskRunHistory.objects.filter(user_task_detail=task).count()
run_numbers = {
run.id: unfiltered_total - offset - idx
for idx, run in enumerate(page_qs)
}Note: this is still only exact when no filters are active and runs are ordered by start_time descending. For filtered views the numbers will still be approximated unless a window function is used.
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/backend/core/scheduler/views.py
Line: 732-740
Comment:
**`run_number` is wrong whenever any filter is active**
`total = runs.count()` is evaluated after all filters (status, trigger, date range, search) are applied, so it reflects the filtered record count. The formula `total - offset - idx` therefore assigns numbers relative to the filtered result set — not the job's absolute chronological position.
For example, if a job has 100 runs and the user filters by `status=FAILURE` to see only 5 failures, those runs will display as #5, #4, #3, #2, #1 instead of their real run numbers (e.g., #97, #84, #71…).
A correct approach is to derive the absolute run number using the unfiltered total:
```python
unfiltered_total = TaskRunHistory.objects.filter(user_task_detail=task).count()
run_numbers = {
run.id: unfiltered_total - offset - idx
for idx, run in enumerate(page_qs)
}
```
Note: this is still only exact when no filters are active and runs are ordered by `start_time` descending. For filtered views the numbers will still be approximated unless a window function is used.
How can I resolve this? If you propose a fix, please make it concise.- Add retryLoading state to prevent double-clicks - Table row: swap RedoOutlined → LoadingOutlined spinner while submitting - Expanded row Re-run: loading + disabled props - Add retryLoading to columns useMemo deps so spinner renders - Tooltip shows "Submitting..." during loading
What
/run-stats/<task_id>API endpoint for aggregated job statisticsWhy
How
Backend — New
/run-stats/<task_id>endpoint (views.py):_allproject placeholder via_is_valid_project_id()checkBackend — Enhanced serializer (
serializer.py):run_number(sequential),triggered_by(resolves user_id to username),duration_ms,model_count,failed_modelslist,skipped_countBackend — Run history endpoint enhancements (
views.py):date_from,date_to,search(error message) filtersproject_idto response for frontend navigationtrigger_task_onceto handle_allproject placeholderBackend —
ExecutionMetricsdataclass (adapters/model.py):rows_affected,rows_inserted,rows_updated,rows_deleted,materializationBaseModel.execute()now returnsExecutionMetricsfor each materialization type_get_row_count_safe()fallback via ibisget_table_row_count()Backend — Per-adapter row count capture (5 adapter
connection.py+model.pyfiles):cursor.rowcountfromINSERT ... ON CONFLICTcursor.rowcountfromMERGE INTOcursor.rowcountfromMERGE INTODELETEandINSERTcursor rowcountsNone(falls back to_get_row_count_safe())model.pyfiles captureresult = self.db_connection.upsert_into_table(...)and setself._upsert_metrics = resultBackend — Result pipeline (
visitran.py→printer.py→celery_tasks.py):execute_graph()capturesExecutionMetricsfromrun_model(), passes toBaseResultBaseResultgainsrows_affected,rows_inserted,rows_updated,rows_deleted,materialization,duration_mscelery_tasks.pyserializes per-model metrics + aggregate totals (rows_processed,rows_added,rows_modified,rows_deleted)Frontend —
Runhistory.jsx(complete rewrite):Frontend —
RunHistory.css:Frontend —
JobList.jsx:?task=URL param handling to auto-open job config drawer from Run History "View config" linkCan this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
upsert_into_table()return type changed fromNonetodict— but no existing code reads the return value.BaseModel.execute()return type changed fromNonetoExecutionMetrics— existing callers (likerun_model()) didn't use the return value before. The run history API response adds new fields but doesn't remove existing ones. The/run-stats/endpoint is entirely new. Frontend changes are a full rewrite of the Run History page but don't affect other pages.Database Migrations
TaskRunHistory.resultJSONField — no schema changes needed.Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
Notes on Testing
_allproject placeholder (cross-project view)Screenshots
Checklist