Skip to content

Keep last metrics for finished jobs#2628

Merged
un-def merged 2 commits into
masterfrom
issue_2618_keep_metrics_from_finished_jobs
May 14, 2025
Merged

Keep last metrics for finished jobs#2628
un-def merged 2 commits into
masterfrom
issue_2618_keep_metrics_from_finished_jobs

Conversation

@un-def

@un-def un-def commented May 13, 2025

Copy link
Copy Markdown
Collaborator

The retention window is 1800 seconds (last 30 minutes) by default, configurable via the DSTACK_SERVER_METRICS_WINDOW_SECONDS environment variable.

Closes: #2618

The retention window is 1800 seconds (last 30 minutes) by default,
configurable via the `DSTACK_SERVER_METRICS_WINDOW_SECONDS`
environment variable.

Closes: #2618
@un-def un-def requested a review from r4victor May 13, 2025 16:07
@peterschmidt85

Copy link
Copy Markdown
Contributor

Anything required on the UI's side?

@r4victor

Copy link
Copy Markdown
Collaborator

So the last 30 minutes of logs are stored forever after the run is finished?

@peterschmidt85 is this what you proposed or having logs for some time after runs finish would be sufficient?

@peterschmidt85

Copy link
Copy Markdown
Contributor

Ideally of course we could introduce a TTL but I assume we don't support it yet for logs too?

@r4victor

r4victor commented May 14, 2025

Copy link
Copy Markdown
Collaborator

@peterschmidt85, sorry I meant metrics of course not logs. We already had a ttl for metrics:

SERVER_METRICS_TTL_SECONDS = int(os.getenv("DSTACK_SERVER_METRICS_TTL_SECONDS", 3600))

And the ttl applied to finished runs as well. So you could see metrics of finished runs for this ttl.

So what this PR does I believe is keeps last metrics of finished runs forever.

@peterschmidt85

peterschmidt85 commented May 14, 2025

Copy link
Copy Markdown
Contributor

That was TTL for metrics of a live run. For finished run, I guess it makes sense to set TTL as a week or so. We discussed that with @un-def no?

@r4victor

Copy link
Copy Markdown
Collaborator

That was TTL for metrics of a live run.

Well, it applied to all runs before. If the idea is to introduce different TTLs for active and finished runs, it's not what the PR does.

@un-def

un-def commented May 14, 2025

Copy link
Copy Markdown
Collaborator Author

That was TTL for metrics of a live run. For finished run, I guess it makes sense to set TTL as a week or so. We discussed that with @un-def no?

I've just replaced the retention window with a simpler solution using two separate TTL settings for running and finished jobs.

By default:

  • DSTACK_SERVER_METRICS_RUNNING_TTL_SECONDS — 1 hour
  • DSTACK_SERVER_METRICS_FINISHED_TTL_SECONDS — 1 week

@un-def un-def merged commit e69902f into master May 14, 2025
25 checks passed
@un-def un-def deleted the issue_2618_keep_metrics_from_finished_jobs branch May 14, 2025 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Ensure metrics are shown also for finished runs

3 participants