You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: Update README and torque documentation to include the new uniqueOneOf factory function, its usage examples, and enhancements in dataset generation with background token counting and new export formats.
T.generatedAssistant({ prompt: "Respond to greeting" }), // AI generated, gets remaining weight
34
+
T.generatedAssistant({
35
+
prompt: "Respond to greeting",
36
+
reasoning: T.generatedReasoning({
37
+
prompt: "Reason about the greeting",
38
+
}),
39
+
// or reasoning: reasoning({ content: "...." }),
40
+
}), // AI generated, gets remaining weight
35
41
]),
36
-
T.times(T.between(1, 3), [
42
+
T.times(between(1, 3), [
37
43
T.generatedUser({
38
44
prompt: "Chat about weather. Optionally mentioning previous message",
39
45
}),
@@ -261,6 +267,42 @@ const schema = () => [
261
267
262
268
The `collection` name identifies the shared pool (so multiple `oneOf` calls can coordinate). The `id` property must be a string, number, or boolean and is used to track uniqueness. Torque throws if the pool is exhausted, making it easy to guarantee perfect round-robin coverage.
263
269
270
+
#### Using `uniqueOneOf` factory function
271
+
272
+
For a simpler API, use `uniqueOneOf` to automatically generate IDs and create a reusable function. This is especially useful when you want to create the unique selection function outside of your schema:
- If omitted, a random seed is generated and displayed in the CLI
370
412
- Seeds control both `torque` random selections and AI model sampling (when supported by the provider)
371
413
414
+
### Background Token Counting
415
+
416
+
Token counts for each row are computed off the main thread using a worker pool so dataset generation stays responsive. Configure the pool with `tokenCounterWorkers` (default: `3`), or disable counting entirely by setting it to `0`.
Choose your preferred output format for generated datasets:
428
+
Choose your preferred output file format and data structure:
375
429
376
430
```typescript
377
-
// Export as JSONL (default - line-delimited JSON)
431
+
// Export as JSONL with default ai-sdk structure (default)
378
432
awaitgenerateDataset(schema, {
379
433
count: 100,
380
434
model: openai("gpt-4o-mini"),
381
-
format: "jsonl",// default, can be omitted
435
+
format: "jsonl",
382
436
output: "data/dataset.jsonl",
383
437
});
384
438
385
-
// Export as Parquet (columnar format, efficient for analytics)
439
+
// Export in OpenAI Chat Completions format (tools + messages structure)
386
440
awaitgenerateDataset(schema, {
387
441
count: 100,
388
442
model: openai("gpt-4o-mini"),
389
-
format: "parquet",
390
-
output: "data/dataset.parquet",
443
+
format: "jsonl",
444
+
exportFormat: "chat_template",
445
+
output: "data/finetune.jsonl",
391
446
});
392
447
```
393
448
394
-
**Supported formats:**
449
+
**Supported File Formats (`format`):**
395
450
396
451
-**`jsonl`** (default) - JSON Lines format, one row per line. Best for streaming and line-by-line processing.
397
452
-**`parquet`** - Apache Parquet columnar format. More efficient for large datasets and analytics tools (e.g., Pandas, DuckDB, Apache Spark).
398
453
454
+
**Supported Data Structures (`exportFormat`):**
455
+
456
+
-**`ai-sdk`** (default) - Internal Torque format, compatible with Vercel AI SDK. Includes schema metadata, tool definitions, and full message objects.
457
+
-**`chat_template`** - OpenAI Chat Completions compatible format. Flattened message structure with `tools` and `messages` top-level keys. Ideal for fine-tuning or direct API usage.
458
+
399
459
Both formats write rows incrementally as they're generated, so large datasets won't consume excessive memory.
400
460
401
461
> 💡 When `format` is specified without `output`, the file extension is automatically set based on the format.
Copy file name to clipboardExpand all lines: packages/torque/README.md
+36Lines changed: 36 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -270,6 +270,42 @@ const schema = () => [
270
270
271
271
The `collection` name identifies the shared pool (so multiple `oneOf` calls can coordinate). The `id` property must be a string, number, or boolean and is used to track uniqueness. Torque throws if the pool is exhausted, making it easy to guarantee perfect round-robin coverage.
272
272
273
+
#### Using `uniqueOneOf` factory function
274
+
275
+
For a simpler API, use `uniqueOneOf` to automatically generate IDs and create a reusable function. This is especially useful when you want to create the unique selection function outside of your schema:
0 commit comments