Skip to content

Commit d033e71

Browse files
style: apply /style-guide pass to support/weave/articles (#2671)
## Summary This PR applies an automated `/style-guide` pass (Google Developer Style Guide + CoreWeave conventions) to 16 FAQ articles under `support/weave/articles`. The run was automated; changes are style-only and require a technical reviewer to validate any items flagged below. ## Files edited - `support/weave/articles/how-can-i-disable-client-information-cap.mdx` - `support/weave/articles/how-can-i-disable-code-capture.mdx` - `support/weave/articles/how-can-i-disable-system-information-cap.mdx` - `support/weave/articles/how-do-i-render-markdown-in-the-ui.mdx` - `support/weave/articles/how-do-i-render-python-datetime-values-i.mdx` - `support/weave/articles/how-is-weave-data-ingestion-calculated.mdx` - `support/weave/articles/long-eval-clean-up-times.mdx` - `support/weave/articles/os-errors-too-many-open-files.mdx` - `support/weave/articles/server-response-caching.mdx` - `support/weave/articles/trace-data-is-truncated.mdx` - `support/weave/articles/trace-data-loss-in-worker-processes.mdx` - `support/weave/articles/trace-pages-load-slowly.mdx` - `support/weave/articles/weave-call-does-not-raise-exceptions.mdx` - `support/weave/articles/what-information-does-weave-capture-for.mdx` - `support/weave/articles/what-is-pairwise-evaluation-and-how-do-i.mdx` - `support/weave/articles/will-weave-affect-my-function-s-executio.mdx` ## Recommendations for technical review ### Technical accuracy — SDK surface and option names - Confirm `settings={"capture_client_info": False}`, `settings={"capture_code": False}`, and `settings={"capture_system_info": False}` are the current, canonical shapes for passing these options to `weave.init()` (vs. top-level keyword arguments, environment variables, or a separate settings API). Also confirm the exact key names (`capture_client_info`, `capture_code`, `capture_system_info`). See `how-can-i-disable-client-information-cap.mdx`, `how-can-i-disable-code-capture.mdx`, `how-can-i-disable-system-information-cap.mdx`. - Confirm whether a `WEAVE_CAPTURE_SYSTEM_INFO` (or similar) environment variable exists for parity with `WEAVE_CAPTURE_CODE`, and document it if so (`how-can-i-disable-system-information-cap.mdx`). - Confirm `weave.init("entity/project", …)` is the canonical example signature and that `entity/project` (and `"[TEAM-NAME]/[PROJECT-NAME]"` in `trace-data-loss-in-worker-processes.mdx`) is the recommended placeholder format. - Confirm `weave.publish(...)` is the correct surface for publishing a `datetime.datetime` with `tzinfo`, and that Weave recognizes the type and renders it as a timestamp regardless of nesting context (`how-do-i-render-python-datetime-values-i.mdx`). - Confirm `evaluation.Evaluate(dataset_id="my_dataset_id")` reflects the current evaluation API (capitalization and `dataset_id` argument) and that `WEAVE_CLIENT_PARALLELISM` / `settings={"client_parallelism": 100}` are the current canonical names (`long-eval-clean-up-times.mdx`). - Confirm `to_dict` is the canonical method name Weave looks for during serialization, and that "dictionary representation" is more accurate than "dictionary of strings" (`trace-data-is-truncated.mdx`). - Resolve API naming inconsistency in `trace-data-loss-in-worker-processes.mdx`: prose mixes `client.flush()` / `client.finish()` with `weave.flush()` / `weave.finish()`. Confirm which are aliases, distinct, or canonical, and align throughout. Also verify that `weave.init()` supports context-manager semantics with auto-flush on exit, and that calling `weave.init()` inside `process_task` is the recommended Celery pattern. - Verify the `pageSize` boundary in `trace-pages-load-slowly.mdx:14`: the prose says "less than the maximum of `100`" while the UI section lists `100` as valid. Confirm whether `100` itself is allowed and reword (e.g., "no greater than `100`"). - Verify that Weave's network activity still runs on a background thread in the current SDK — the load-bearing claim in `will-weave-affect-my-function-s-executio.mdx`. - Verify PIL's "keeps file descriptors open while the program runs" claim is still accurate for current Weave versions (`os-errors-too-many-open-files.mdx`). - Confirm the documented defaults and units in `server-response-caching.mdx`: whether `WEAVE_SERVER_CACHE_SIZE_LIMIT = 1000000000` is "1 GB" or "~0.93 GiB"; the `~4 MB per running client` WAL ceiling (and what "running client" means — process, thread, or connection); and whether the 32 KB minimum main database file size is platform-specific. - Confirm the pairwise-evaluation workaround in `what-is-pairwise-evaluation-and-how-do-i.mdx` is still current — has the planned first-class API shipped since this article was written? - Confirm whether "Op" (capitalized) or "function" is the canonical Weave term, and align titles, body, and code styling consistently across `what-information-does-weave-capture-for.mdx` and `will-weave-affect-my-function-s-executio.mdx`. - Confirm the "might capture" phrasing in `what-information-does-weave-capture-for.mdx` for *System information*, *Client information*, and *Derived information*: is capture conditional (opt-in/opt-out, redaction) or always-on? Update phrasing accordingly. Also confirm the *Op call hierarchy* claim about non-Op intermediate functions still matches current behavior. ### Missing content — verification, prerequisites, and edge cases - Add verification guidance across articles that currently lack it: how to confirm code/system/client info capture is actually disabled (`how-can-i-disable-*` trio); cache hit/miss signals after enabling server response caching (`server-response-caching.mdx`); what the trace UI shows after adding `to_dict` (`trace-data-is-truncated.mdx`); confirmation that traces flushed before worker exit (`trace-data-loss-in-worker-processes.mdx`); expected output of the pairwise evaluation example (`what-is-pairwise-evaluation-and-how-do-i.mdx`); how to confirm the new `ulimit -n` took effect (`os-errors-too-many-open-files.mdx`); and expected log/timing output after flushing or raising parallelism (`long-eval-clean-up-times.mdx`). - Add or link prerequisites: Weave SDK version / cache prerequisites (`server-response-caching.mdx`); cross-platform guidance for `ulimit -n` (POSIX-only) and how to make the change persistent (rc files, `launchctl limit maxfiles`, `/etc/security/limits.conf`) in `os-errors-too-many-open-files.mdx`; pointers to Weave installation, an existing W&B project, or a worker framework before the snippet in `trace-data-loss-in-worker-processes.mdx`; and a Weave tracing intro link for first-time readers of `trace-data-is-truncated.mdx`. - Document side effects and edge cases: what disabling client/code/system info capture loses (debuggability, missing trace context); naive-`datetime` behavior in `how-do-i-render-python-datetime-values-i.mdx`; troubleshooting for cases where `to_dict` doesn't resolve truncation (large dicts, nested unserializable objects); timeout / exception behavior of `flush()` and `finish()` and whether the same pattern applies to Lambda, Cloud Run, and notebooks (`trace-data-loss-in-worker-processes.mdx`); and whether `client.flush()` is safe to call multiple times or in async contexts (`long-eval-clean-up-times.mdx`). - Fill in `server-response-caching.mdx` gaps: define "idempotent requests" in Weave's context; document default cache-directory location per OS and cleanup guidance; describe cache invalidation and any manual flush mechanism; describe behavior when `WEAVE_SERVER_CACHE_SIZE_LIMIT` is exceeded (LRU eviction? write rejection?); and document concurrency expectations when multiple clients share a `WEAVE_SERVER_CACHE_DIR`. - Add a one-line motivation for *why* a reader would disable system info capture (privacy, sensitive environment data, performance) in `how-can-i-disable-system-information-cap.mdx`. - Add a minimal code example for the `datetime` + `tzinfo` pattern in `how-do-i-render-python-datetime-values-i.mdx`. - Add guidance on when flushing alone suffices versus when to also increase parallelism in `long-eval-clean-up-times.mdx`, and consider noting tradeoffs of high parallelism (memory pressure, server-side rate limits) and whether `100` is a ceiling or illustrative. - Consider a code-side alternative in `os-errors-too-many-open-files.mdx` (closing PIL `Image` objects explicitly or using a context manager) for users who can't raise system limits. - Confirm whether `pageSize` is present in the URL by default or must be appended (`trace-pages-load-slowly.mdx:14`); a short example URL would remove the ambiguity. - Add quantitative context to `will-weave-affect-my-function-s-executio.mdx`: order-of-magnitude overhead, name/link Weave's background-thread async logging mechanism, and document any user-facing API to control or flush the exit-time pause (e.g., `weave.finish()` / `flush()`). ### Cross-linking and terminology - Cross-link related articles: from `how-can-i-disable-system-information-cap.mdx` to the broader "what information does Weave capture" article; from `what-information-does-weave-capture-for.mdx` inline to the three `how-can-i-disable-*` articles, plus *Op versioning* and a *Runs* link for the `wandb` Run context bullet; from `trace-data-loss-in-worker-processes.mdx` add a one-line orientation for the existing links to `/support/weave/articles/long-eval-clean-up-times` and `/weave/guides/tracking/write-ahead-log`; and cross-link `trace metadata` and `LLM inputs/outputs` references in `how-is-weave-data-ingestion-calculated.mdx`. - Confirm product naming on first mention in `how-is-weave-data-ingestion-calculated.mdx` ("Weave" vs. "W&B Weave") and that neighboring billing/metering docs use consistent "ingested" terminology. - Consider hyphenating the frontmatter `title:` of `long-eval-clean-up-times.mdx` from "Long eval clean up times" to "Long eval clean-up times" — confirm URL/navigation impact before changing. - FAQ title convention (preserved intentionally, flagged for site-wide decision): the "How can I…?" interrogative pattern with trailing `?` deviates from Google Developer Style Guide heading guidance but is consistent across `support/weave/articles/`. If standardization is desired, it should be done as a directory-wide pass with redirect handling, not per-file. Similarly, the ` - ` separator between bold label and definition in `what-information-does-weave-capture-for.mdx` could be standardized to `:` or an em dash across support articles. - `server-response-caching.mdx`: confirm Pass 7's removal of the "future default behavior" line is acceptable; if a roadmap commitment exists, restore it with a specific version or date. - Consider whether `how-can-i-disable-client-information-cap.mdx` should be retitled to "client info" for consistency with the SDK key (`capture_client_info`) and UI tag ("Client Info"). ## How to review - Each file's changes are style edits only. Compare side-by-side and flag any that change technical meaning. - Approve and merge to accept the edits, or close to reject them. --------- Co-authored-by: johndmulhausen <5439615+johndmulhausen@users.noreply.github.com>
1 parent d8eb8df commit d033e71

24 files changed

Lines changed: 108 additions & 91 deletions

support/weave/articles/how-can-i-disable-client-information-cap.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: "How can I disable client information capture?"
33
keywords: ["Client Info"]
44
---
55

6-
You can disable client information capture during Weave client initialization: `weave.init("entity/project", settings={"capture_client_info": False})`.
6+
To disable client information capture, pass the setting during W&B Weave client initialization: `weave.init("entity/project", settings={"capture_client_info": False})`.
77

88
---
99

support/weave/articles/how-can-i-disable-code-capture.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,10 @@ title: "How can I disable code capture?"
33
keywords: ["Code Capture"]
44
---
55

6-
You can disable code capture during Weave client initialization: `weave.init("entity/project", settings={"capture_code": False})`.
7-
You can also use the [environment variable](/weave/guides/core-types/env-vars) `WEAVE_CAPTURE_CODE=false`.
6+
To disable code capture, use one of the following methods:
7+
8+
- During W&B Weave client initialization, set `capture_code` to `False`: `weave.init("entity/project", settings={"capture_code": False})`.
9+
- Set the [environment variable](/weave/guides/core-types/env-vars) `WEAVE_CAPTURE_CODE=false`.
810

911
---
1012

support/weave/articles/how-can-i-disable-system-information-cap.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: "How can I disable system information capture?"
33
keywords: ["System Info"]
44
---
55

6-
You can disable system information capture during Weave client initialization: `weave.init("entity/project", settings={"capture_system_info": False})`.
6+
To disable system information capture, pass the setting during Weave client initialization: `weave.init("entity/project", settings={"capture_system_info": False})`.
77

88
---
99

support/weave/articles/how-do-i-render-markdown-in-the-ui.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: "How do I render Markdown in the UI?"
33
keywords: ["UI Rendering"]
44
---
55

6-
Wrap your string with `weave.Markdown(...)` before saving, and use `weave.publish(...)` to store it. Weave uses the objects type to determine rendering, and `weave.Markdown` maps to a known UI renderer. The value will be shown as a formatted Markdown object in the UI. For a full code sample, see [Viewing calls](/weave/guides/tracking/tracing#viewing-calls).
6+
Before saving, wrap your string with `weave.Markdown(...)`, then use `weave.publish(...)` to store it. The object's type determines rendering, and `weave.Markdown` maps to a known UI renderer. The UI displays the value as a formatted Markdown object. For a full code sample, see [Viewing calls](/weave/guides/tracking/tracing#viewing-calls).
77

88
---
99

support/weave/articles/how-do-i-render-python-datetime-values-i.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: "How do I render Python datetime values in the UI?"
33
keywords: ["UI Rendering"]
44
---
55

6-
Use Pythons `datetime.datetime` (with timezone info), and publish the object using `weave.publish(...)`. Weave recognizes this type and renders it as a timestamp.
6+
Use Python's `datetime.datetime` with timezone information, and publish the object using `weave.publish(...)`. Weave recognizes this type and renders it as a timestamp.
77

88
---
99

support/weave/articles/how-is-weave-data-ingestion-calculated.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: "How is Weave data ingestion calculated?"
33
keywords: ["Data Capture"]
44
---
55

6-
We define ingested bytes as bytes that we receive, process, and store on your behalf. This includes trace metadata, LLM inputs/outputs, and any other information you explicitly log to Weave, but does not include communication overhead (e.g., HTTP headers) or any other data that is not placed in long-term storage. We count bytes as "ingested" only once at the time they are received and stored.
6+
Weave defines ingested bytes as bytes that Weave receives, processes, and stores on your behalf. Ingested bytes include trace metadata, LLM inputs/outputs, and any other information you explicitly log to Weave. Ingested bytes don't include communication overhead (for example, HTTP headers) or any other data that isn't placed in long-term storage. Weave counts bytes as "ingested" only once, when it receives and stores them.
77

88
---
99

support/weave/articles/long-eval-clean-up-times.mdx

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,17 @@ title: "Long eval clean up times"
33
keywords: ["Performance", "Evaluation"]
44
---
55

6-
The following two methods should be used together in order to improve performance when running evaluations with large datasets.
6+
This page describes two methods to use together to reduce long clean-up times when you run W&B Weave evaluations with large datasets. It's intended for users who have noticed extended delays after their evaluation code finishes but before the program exits.
77

8-
### Flushing
8+
The following sections describe how to flush pending background work and how to increase client parallelism.
99

10-
When running evaluations with large datasets, you may experience a long period of time before program execution, while the dataset is being uploaded in background threads. This generally occurs when main thread execution finished before background cleanup is complete. Calling `client.flush()` will force all background tasks to be processed in the main thread, ensuring parallel processing during main thread execution. This can improve performance when user code completes before data has been uploaded to the server.
10+
## Flush pending background work
1111

12-
Example:
12+
Flushing forces pending background work to complete in parallel with your main thread, rather than waiting for it after your code finishes.
13+
14+
When you run evaluations with large datasets, you may experience a long delay before program execution completes, while the dataset uploads in background threads. This occurs when main thread execution finishes before background clean-up completes. Calling `client.flush()` forces all background tasks to process in the main thread, ensuring parallel processing during main thread execution. This can improve performance when user code completes before data uploads to the server.
15+
16+
The following example flushes pending background work after an evaluation:
1317

1418
```python
1519
client = weave.init("fast-upload")
@@ -20,13 +24,15 @@ result = evaluation.Evaluate(dataset_id="my_dataset_id")
2024
client.flush()
2125
```
2226

23-
### Increasing client parallelism
27+
## Increase client parallelism
28+
29+
Increasing client parallelism gives Weave more threads to use for background work such as dataset uploads, which can further reduce clean-up time alongside flushing.
2430

25-
Client parallelism is automatically determined based on the environment, but can be set manually using the following environment variable:
31+
Weave determines client parallelism automatically based on the environment, but you can set it manually using the following environment variable:
2632

27-
- `WEAVE_CLIENT_PARALLELISM`: The number of threads available for parallel processing. Increasing this number will increase the number of threads available for parallel processing, potentially improving the performance of background tasks like dataset uploads.
33+
- `WEAVE_CLIENT_PARALLELISM`: The number of threads available for parallel processing. Increasing this value can improve the performance of background tasks such as dataset uploads.
2834

29-
This can also be set programmatically using the `settings` argument to `weave.init()`:
35+
You can also set this programmatically using the `settings` argument to `weave.init()`:
3036

3137
```python
3238
client = weave.init("fast-upload", settings={"client_parallelism": 100})

support/weave/articles/os-errors-too-many-open-files.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,11 @@ title: "OS errors - Too many open files"
33
keywords: ["Performance"]
44
---
55

6-
### `[Errno 24]: Too many open files`
6+
## `[Errno 24]: Too many open files`
77

8-
This error occurs when the number of open files exceeds the limit set by your operating system. In Weave, this may happen because you're working with large image datasets. Weave uses `PIL` for image processing, which keeps file descriptors open for the duration of the program.
8+
This error occurs when the number of open files exceeds the limit set by your operating system. In Weave, this can happen because you're working with large image datasets. Weave uses `PIL` for image processing, which keeps file descriptors open while the program runs.
99

10-
To resolve this issue, increase the system limit for open files to `65,536` using `ulimit`:
10+
To resolve this issue, increase the system limit for open files to `65536` using `ulimit`:
1111

1212
```bash
1313
ulimit -n 65536

support/weave/articles/server-response-caching.mdx

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -3,61 +3,61 @@ title: "Server response caching"
33
keywords: ["Performance"]
44
---
55

6-
Weave provides server response caching to improve performance when making repeated queries or working with limited network bandwidth. While currently disabled by default, this feature is expected to become the default behavior in a future release.
6+
Weave provides server response caching to improve performance when you run repeated queries or work with limited network bandwidth. This article explains when caching helps, how to enable it, which requests Weave caches, and how the cache uses disk space, so you can decide whether to turn it on and plan for its storage footprint. The feature is deactivated by default.
77

8-
### When to use caching
8+
## When to use caching
99

10-
Server response caching is particularly beneficial when:
10+
Server response caching is helpful when:
1111

12-
- You frequently run the same queries
13-
- You have limited network bandwidth
14-
- You're working in an environment with high latency
15-
- You're developing offline and want to cache responses for later use
12+
- You frequently run the same queries.
13+
- You have limited network bandwidth.
14+
- You're working in an environment with high latency.
15+
- You're developing offline and want to cache responses for later use.
1616

17-
This feature is especially useful when running repeated evaluations on a dataset, as it allows caching the dataset between runs.
17+
This feature is useful when you run repeated evaluations on a dataset, because Weave can cache the dataset between runs.
1818

19-
### How to enable caching
19+
## Enable caching
2020

21-
To enable caching, you can set the following environment variables:
21+
To enable caching, set the following environment variables. `WEAVE_USE_SERVER_CACHE` turns the cache on, and the other variables let you tune its size and location.
2222

2323
```bash
2424
# Enable server response caching
2525
export WEAVE_USE_SERVER_CACHE=true
2626

27-
# Set cache size limit (default is 1GB)
27+
# Set cache size limit (default is 1 GB)
2828
export WEAVE_SERVER_CACHE_SIZE_LIMIT=1000000000
2929

3030
# Set cache directory (optional, defaults to temporary directory)
3131
export WEAVE_SERVER_CACHE_DIR=/path/to/cache
3232
```
3333

34-
### Caching behavior
34+
## Caching behavior
3535

36-
Technically, this feature will cache idempotent requests against the server. Specifically, we cache:
36+
This section describes which server requests Weave caches, so you can predict which operations benefit from caching. The feature caches idempotent requests against the server. Weave caches the following requests:
3737

3838
- `obj_read`
3939
- `table_query`
4040
- `table_query_stats`
4141
- `refs_read_batch`
4242
- `file_content_read`
4343

44-
### Cache size and storage details
44+
## Cache size and storage details
4545

46-
The cache size is controlled by `WEAVE_SERVER_CACHE_SIZE_LIMIT` (in bytes). The actual disk space used consists of three components:
46+
Use this section to estimate the disk space the cache requires, so you can size `WEAVE_SERVER_CACHE_SIZE_LIMIT` for your environment. `WEAVE_SERVER_CACHE_SIZE_LIMIT` (in bytes) controls the cache size. The actual disk space used consists of three components:
4747

48-
1. A constant 32KB checksum file
49-
2. A Write-Ahead Log (WAL) file up to ~4MB per running client (automatically removed when the program exits)
50-
3. The main database file, which is at least 32KB and at most `WEAVE_SERVER_CACHE_SIZE_LIMIT`
48+
- A constant 32 KB checksum file.
49+
- A Write-Ahead Log (WAL) file up to ~4 MB per running client. Weave removes the WAL file automatically when the program exits.
50+
- The main database file, which is at least 32 KB and at most `WEAVE_SERVER_CACHE_SIZE_LIMIT`.
5151

5252
Total disk space used:
5353

54-
- While running >= 32KB + ~4MB + cache size
55-
- After exit >= 32KB + cache size
54+
- While running >= 32 KB + ~4 MB + cache size.
55+
- After exit >= 32 KB + cache size.
5656

57-
For example, with a 5MB cache limit:
57+
For example, with a 5 MB cache limit:
5858

59-
- While running: ~9MB maximum
60-
- After exit: ~5MB maximum
59+
- While running: ~9 MB maximum.
60+
- After exit: ~5 MB maximum.
6161

6262
---
6363

support/weave/articles/trace-data-is-truncated.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@ title: "Trace data is truncated"
33
keywords: ["Trace Data"]
44
---
55

6-
Sometimes, large trace data is partially cut off in the Weave UI. This problem occurs because default trace output is a raw, custom Python object that Weave doesn't know how to serialize.
6+
Sometimes, large trace data is cut off in the Weave UI. This happens because the default trace output is a raw, custom Python object that Weave can't serialize. This page shows how to expose the full trace data so it displays in the UI.
77

8-
To ensure that large trace data isn't cut off, define a dictionary of strings to return all trace data.
8+
To prevent large trace data from being cut off, define a `to_dict` method that returns a dictionary of strings containing all trace data. Because Weave can serialize dictionaries, this approach gives the UI access to the full object state. The following example shows the pattern:
99

1010
```python
1111
import weave
@@ -26,6 +26,8 @@ def make_my_obj():
2626
return MyObj(x)
2727
```
2828

29+
With this `to_dict` method in place, Weave can serialize the object and display its contents in the trace UI instead of truncating the raw representation.
30+
2931
---
3032

3133
{/* AUTO-GENERATED: tab badges */}

0 commit comments

Comments
 (0)