Skip to content

Commit 0163100

Browse files
author
Michal Warda
committed
docs: Update README and torque documentation to include the new uniqueOneOf factory function, its usage examples, and enhancements in dataset generation with background token counting and new export formats.
1 parent faa80e3 commit 0163100

2 files changed

Lines changed: 105 additions & 9 deletions

File tree

README.md

Lines changed: 69 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,15 @@ await T.generateDataset(
3131
T.oneOf([
3232
// pick one randomly (weights are optional)
3333
{ value: T.assistant({ content: "Hello!" }), weight: 0.3 }, // static
34-
T.generatedAssistant({ prompt: "Respond to greeting" }), // AI generated, gets remaining weight
34+
T.generatedAssistant({
35+
prompt: "Respond to greeting",
36+
reasoning: T.generatedReasoning({
37+
prompt: "Reason about the greeting",
38+
}),
39+
// or reasoning: reasoning({ content: "...." }),
40+
}), // AI generated, gets remaining weight
3541
]),
36-
T.times(T.between(1, 3), [
42+
T.times(between(1, 3), [
3743
T.generatedUser({
3844
prompt: "Chat about weather. Optionally mentioning previous message",
3945
}),
@@ -261,6 +267,42 @@ const schema = () => [
261267

262268
The `collection` name identifies the shared pool (so multiple `oneOf` calls can coordinate). The `id` property must be a string, number, or boolean and is used to track uniqueness. Torque throws if the pool is exhausted, making it easy to guarantee perfect round-robin coverage.
263269

270+
#### Using `uniqueOneOf` factory function
271+
272+
For a simpler API, use `uniqueOneOf` to automatically generate IDs and create a reusable function. This is especially useful when you want to create the unique selection function outside of your schema:
273+
274+
```ts
275+
import { uniqueOneOf } from "@qforge/torque";
276+
277+
// Create the factory function outside generation
278+
const tools = [weatherTool, calendarTool, flightTool];
279+
const oneOfTools = uniqueOneOf(tools);
280+
281+
// Or with weighted options
282+
const weightedTools = [
283+
{ value: weatherTool, weight: 0.5 },
284+
{ value: calendarTool, weight: 0.3 },
285+
flightTool, // unweighted, gets remaining weight
286+
];
287+
const oneOfWeightedTools = uniqueOneOf(weightedTools);
288+
289+
const schema = () => {
290+
const tool = oneOfTools(); // Returns a unique tool each time
291+
return [
292+
tool.toolFunction(),
293+
generatedUser({ prompt: "Ask question requiring this tool" }),
294+
generatedToolCall(tool, "t1"),
295+
generatedToolCallResult(tool, "t1"),
296+
];
297+
};
298+
```
299+
300+
The `uniqueOneOf` factory automatically:
301+
302+
- Generates unique IDs for each item
303+
- Creates a unique collection name
304+
- Returns a function that enforces uniqueness across calls
305+
264306
> 💡 See weighted example: [`examples/weighted-one-of.ts`](examples/weighted-one-of.ts)
265307
> 💡 Full utilities demo: [`examples/composition-utilities.ts`](examples/composition-utilities.ts) | [▶️ Try in Browser](https://stackblitz.com/github/qforge-dev/torque/tree/main/stackblitz-templates/composition-utilities)
266308
@@ -369,33 +411,51 @@ await generateDataset(schema, {
369411
- If omitted, a random seed is generated and displayed in the CLI
370412
- Seeds control both `torque` random selections and AI model sampling (when supported by the provider)
371413

414+
### Background Token Counting
415+
416+
Token counts for each row are computed off the main thread using a worker pool so dataset generation stays responsive. Configure the pool with `tokenCounterWorkers` (default: `3`), or disable counting entirely by setting it to `0`.
417+
418+
```typescript
419+
await generateDataset(schema, {
420+
count: 20,
421+
model: openai("gpt-5-mini"),
422+
tokenCounterWorkers: 5, // spawn 5 token-counting workers
423+
});
424+
```
425+
372426
### Output Formats
373427

374-
Choose your preferred output format for generated datasets:
428+
Choose your preferred output file format and data structure:
375429

376430
```typescript
377-
// Export as JSONL (default - line-delimited JSON)
431+
// Export as JSONL with default ai-sdk structure (default)
378432
await generateDataset(schema, {
379433
count: 100,
380434
model: openai("gpt-4o-mini"),
381-
format: "jsonl", // default, can be omitted
435+
format: "jsonl",
382436
output: "data/dataset.jsonl",
383437
});
384438

385-
// Export as Parquet (columnar format, efficient for analytics)
439+
// Export in OpenAI Chat Completions format (tools + messages structure)
386440
await generateDataset(schema, {
387441
count: 100,
388442
model: openai("gpt-4o-mini"),
389-
format: "parquet",
390-
output: "data/dataset.parquet",
443+
format: "jsonl",
444+
exportFormat: "chat_template",
445+
output: "data/finetune.jsonl",
391446
});
392447
```
393448

394-
**Supported formats:**
449+
**Supported File Formats (`format`):**
395450

396451
- **`jsonl`** (default) - JSON Lines format, one row per line. Best for streaming and line-by-line processing.
397452
- **`parquet`** - Apache Parquet columnar format. More efficient for large datasets and analytics tools (e.g., Pandas, DuckDB, Apache Spark).
398453

454+
**Supported Data Structures (`exportFormat`):**
455+
456+
- **`ai-sdk`** (default) - Internal Torque format, compatible with Vercel AI SDK. Includes schema metadata, tool definitions, and full message objects.
457+
- **`chat_template`** - OpenAI Chat Completions compatible format. Flattened message structure with `tools` and `messages` top-level keys. Ideal for fine-tuning or direct API usage.
458+
399459
Both formats write rows incrementally as they're generated, so large datasets won't consume excessive memory.
400460

401461
> 💡 When `format` is specified without `output`, the file extension is automatically set based on the format.

packages/torque/README.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,42 @@ const schema = () => [
270270

271271
The `collection` name identifies the shared pool (so multiple `oneOf` calls can coordinate). The `id` property must be a string, number, or boolean and is used to track uniqueness. Torque throws if the pool is exhausted, making it easy to guarantee perfect round-robin coverage.
272272

273+
#### Using `uniqueOneOf` factory function
274+
275+
For a simpler API, use `uniqueOneOf` to automatically generate IDs and create a reusable function. This is especially useful when you want to create the unique selection function outside of your schema:
276+
277+
```ts
278+
import { uniqueOneOf } from "@qforge/torque";
279+
280+
// Create the factory function outside generation
281+
const tools = [weatherTool, calendarTool, flightTool];
282+
const oneOfTools = uniqueOneOf(tools);
283+
284+
// Or with weighted options
285+
const weightedTools = [
286+
{ value: weatherTool, weight: 0.5 },
287+
{ value: calendarTool, weight: 0.3 },
288+
flightTool, // unweighted, gets remaining weight
289+
];
290+
const oneOfWeightedTools = uniqueOneOf(weightedTools);
291+
292+
const schema = () => {
293+
const tool = oneOfTools(); // Returns a unique tool each time
294+
return [
295+
tool.toolFunction(),
296+
generatedUser({ prompt: "Ask question requiring this tool" }),
297+
generatedToolCall(tool, "t1"),
298+
generatedToolCallResult(tool, "t1"),
299+
];
300+
};
301+
```
302+
303+
The `uniqueOneOf` factory automatically:
304+
305+
- Generates unique IDs for each item
306+
- Creates a unique collection name
307+
- Returns a function that enforces uniqueness across calls
308+
273309
> 💡 See weighted example: [`examples/weighted-one-of.ts`](examples/weighted-one-of.ts)
274310
> 💡 Full utilities demo: [`examples/composition-utilities.ts`](examples/composition-utilities.ts) | [▶️ Try in Browser](https://stackblitz.com/github/qforge-dev/torque/tree/main/stackblitz-templates/composition-utilities)
275311

0 commit comments

Comments
 (0)