Skip to content

Commit ce4640f

Browse files
committed
docs: document data loader APIs
1 parent b0eee98 commit ce4640f

1 file changed

Lines changed: 28 additions & 1 deletion

File tree

README.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -213,8 +213,35 @@ Runnable examples:
213213

214214
- Built-in loaders: MNIST, Fashion-MNIST, CIFAR-10
215215
- URI-backed data sources: `file://`, `https://`, `hf+https://`, and `hf://...`
216+
- Dataset operations: deterministic shuffle/split, stratified split, filter/map/transform views, batch flows, and epoch flows
217+
- Raw dataset parsers: CSV, TSV, JSON arrays/objects, JSON Lines (`.jsonl`, `.ndjson`)
218+
- Type-safe transform DSLs: image/tensor transforms plus suspendable raw data pipelines
216219
- Formats: GGUF, ONNX, SafeTensors, JSON, Image (JPEG, PNG)
217-
- Type-safe transform DSL: resize, crop, normalize, toTensor
220+
221+
```kotlin
222+
val raw = JvmDataSourceResolver().rawDataset {
223+
from("hf://datasets/org/repo@main/train.jsonl")
224+
format(DataFormat.JSON_LINES)
225+
cachePolicy(CachePolicy.Use)
226+
}
227+
228+
val withoutLabel = dataPipeline<RawDataset>()
229+
.stage(
230+
dataTransformer(
231+
name = "drop-label",
232+
outputSchema = { schema -> DataSchema(schema.columns - "label") }
233+
) { dataset ->
234+
val columns = dataset.schema.columns - "label"
235+
dataset.copy(
236+
schema = DataSchema(columns),
237+
rows = dataset.rows.map { row ->
238+
RawDataRow(row.values.filterKeys { key -> key in columns })
239+
}
240+
)
241+
}
242+
)
243+
.execute(raw)
244+
```
218245

219246

220247
### Edge AI: Arduino / C99 Export

0 commit comments

Comments
 (0)