Skip to content

Commit 0c4672c

Browse files
author
birchkwok
committed
Update version to 0.5.0 and add API reference documentation
1 parent a113dec commit 0c4672c

26 files changed

Lines changed: 4171 additions & 6 deletions

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "numpack"
3-
version = "0.4.5"
3+
version = "0.5.0"
44
edition = "2021"
55

66
[lib]

docs/02_core_operations.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ This document covers all core NumPack operations including save, load, replace,
1212
- [Drop Operations](#drop-operations)
1313
- [Random Access](#random-access)
1414
- [Metadata Operations](#metadata-operations)
15+
- [Array Operations](#array-operations)
1516
- [Stream Loading](#stream-loading)
1617
- [File Management](#file-management)
1718

@@ -531,6 +532,65 @@ with NumPack("data.npk") as npk:
531532

532533
---
533534

535+
## Array Operations
536+
537+
### `clone(source_name, target_name)`
538+
539+
Clone an existing array to a new array name.
540+
541+
#### Parameters
542+
543+
- `source_name` (str): Name of the source array to clone
544+
- `target_name` (str): Name for the cloned array
545+
546+
#### Example
547+
548+
```python
549+
with NumPack("data.npk") as npk:
550+
# Save original data
551+
npk.save({'original': np.random.rand(100, 50)})
552+
553+
# Clone to new array
554+
npk.clone('original', 'backup')
555+
556+
# Modify the clone independently
557+
backup = npk.load('backup')
558+
backup *= 2.0
559+
npk.save({'backup': backup})
560+
561+
# Original is unchanged
562+
original = npk.load('original')
563+
```
564+
565+
#### Notes
566+
567+
- The cloned array is independent of the original
568+
- Raises `KeyError` if source array doesn't exist
569+
- Raises `ValueError` if target array already exists
570+
571+
### `get_io_stats()`
572+
573+
Get I/O statistics for the NumPack instance.
574+
575+
#### Returns
576+
577+
- `Dict[str, Any]`: Dictionary containing backend statistics
578+
579+
#### Example
580+
581+
```python
582+
with NumPack("data.npk") as npk:
583+
stats = npk.get_io_stats()
584+
print(f"Backend: {stats['backend_type']}")
585+
```
586+
587+
#### Notes
588+
589+
- Currently returns basic backend information
590+
- Detailed per-call statistics may be added in future versions
591+
592+
---
593+
534594
## Stream Loading
535595

536596
### `stream_load(array_name, buffer_size=None)`

docs/06_quick_reference.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ to_torch_file('input.npk', 'output.pt')
201201
to_parquet_file('input.npk', 'output.parquet')
202202
```
203203

204-
### Metadata
204+
### Metadata & Array Operations
205205

206206
```python
207207
# Get shape
@@ -216,11 +216,17 @@ exists = npk.has_array('array')
216216
# Get modification time
217217
timestamp = npk.get_modify_time('array')
218218

219+
# Clone an array
220+
npk.clone('source_array', 'target_array')
221+
219222
# Reset (clear all)
220223
npk.reset()
221224

222225
# Compact after deletions
223226
npk.update('array')
227+
228+
# Get I/O statistics
229+
stats = npk.get_io_stats()
224230
```
225231

226232
### Batch Modes

docs/07_io_conversion.md

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,23 @@
22

33
NumPack provides comprehensive format conversion utilities for seamless integration with popular data frameworks.
44

5+
## Table of Contents
6+
7+
- [Overview](#overview)
8+
- [PyTorch Conversion](#pytorch-conversion)
9+
- [PyArrow/Feather Conversion](#pyarrowfeather-conversion)
10+
- [Parquet Conversion](#parquet-conversion)
11+
- [SafeTensors Conversion](#safetensors-conversion)
12+
- [Other Formats](#other-formats)
13+
- [Text File Conversion](#text-file-conversion)
14+
- [Pandas Conversion](#pandas-conversion)
15+
- [S3 Cloud Storage](#s3-cloud-storage)
16+
- [Zero-Copy Utilities](#zero-copy-utilities)
17+
- [Supported Formats Summary](#supported-formats-summary)
18+
- [Best Practices](#best-practices)
19+
20+
---
21+
522
## Overview
623

724
NumPack supports two types of conversions:
@@ -333,6 +350,140 @@ print("Model conversion pipeline complete!")
333350

334351
---
335352

353+
## Text File Conversion
354+
355+
### TXT Files
356+
357+
```python
358+
from numpack.io import from_txt, to_txt
359+
360+
# .txt → .npk (whitespace-delimited)
361+
from_txt('data.txt', 'output.npk', array_name='data', delimiter=None)
362+
363+
# .npk → .txt
364+
to_txt('input.npk', 'output.txt', array_name='data', delimiter='\t')
365+
```
366+
367+
**Parameters:**
368+
- `delimiter`: Field separator (default: whitespace)
369+
- `skip_header`: Number of header rows to skip
370+
- `dtype`: Target data type
371+
372+
---
373+
374+
## Pandas Conversion
375+
376+
### DataFrame ↔ .npk
377+
378+
```python
379+
from numpack.io import from_pandas, to_pandas
380+
import pandas as pd
381+
382+
# DataFrame → .npk
383+
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4.0, 5.0, 6.0]})
384+
from_pandas(df, 'output.npk', array_name='dataframe')
385+
386+
# .npk → DataFrame
387+
df = to_pandas('input.npk', array_name='dataframe')
388+
print(df.columns)
389+
```
390+
391+
**Notes:**
392+
- Numeric columns are converted to NumPy arrays
393+
- String columns may require special handling
394+
395+
---
396+
397+
## S3 Cloud Storage
398+
399+
NumPack supports direct reading and writing to Amazon S3.
400+
401+
### S3 ↔ .npk
402+
403+
```python
404+
from numpack.io import from_s3, to_s3
405+
406+
# Download from S3 and convert to .npk (uses default AWS credentials)
407+
from_s3('s3://my-bucket/data.npy', 'output.npk')
408+
409+
# Public bucket access
410+
from_s3('s3://public-bucket/data.csv', 'output.npk', anon=True)
411+
412+
# Upload .npk to S3
413+
to_s3('input.npk', 's3://my-bucket/output.parquet')
414+
415+
# Specify output format
416+
to_s3('input.npk', 's3://my-bucket/output.csv', format='csv')
417+
```
418+
419+
**Parameters:**
420+
- `s3_path`: S3 URI in the form `s3://bucket/path/to/file`
421+
- `format`: Input/output format (`'auto'`, `'numpy'`, `'csv'`, `'txt'`, `'parquet'`, `'feather'`, `'hdf5'`)
422+
- `**s3_kwargs`: Keyword arguments forwarded to `s3fs.S3FileSystem` (e.g., `anon=True` for public buckets)
423+
424+
**Dependencies:** `s3fs`
425+
426+
---
427+
428+
## Zero-Copy Utilities
429+
430+
NumPack provides zero-copy utilities for efficient data exchange with other libraries.
431+
432+
### DLPack Protocol
433+
434+
```python
435+
from numpack.io import to_dlpack, from_dlpack
436+
437+
# NumPy → DLPack capsule
438+
arr = np.random.rand(100, 50)
439+
capsule = to_dlpack(arr)
440+
441+
# DLPack capsule → NumPy
442+
arr_restored = from_dlpack(capsule)
443+
```
444+
445+
### Arrow Zero-Copy
446+
447+
```python
448+
from numpack.io import numpy_to_arrow_zero_copy, arrow_to_numpy_zero_copy
449+
450+
# NumPy → Arrow (zero-copy)
451+
arr = np.random.rand(100, 50).astype(np.float32)
452+
arrow_arr = numpy_to_arrow_zero_copy(arr)
453+
454+
# Arrow → NumPy (zero-copy)
455+
numpy_arr = arrow_to_numpy_zero_copy(arrow_arr)
456+
```
457+
458+
### PyTorch Zero-Copy
459+
460+
```python
461+
from numpack.io import numpy_to_torch_zero_copy, torch_to_numpy_zero_copy
462+
463+
# NumPy → PyTorch (shared memory)
464+
arr = np.random.rand(100, 50).astype(np.float32)
465+
tensor = numpy_to_torch_zero_copy(arr)
466+
467+
# PyTorch → NumPy (shared memory)
468+
numpy_arr = torch_to_numpy_zero_copy(tensor)
469+
```
470+
471+
### ZeroCopyArray Wrapper
472+
473+
```python
474+
from numpack.io import ZeroCopyArray, wrap_for_zero_copy
475+
476+
# Wrap array for zero-copy operations
477+
arr = np.random.rand(100, 50)
478+
zc_arr = wrap_for_zero_copy(arr)
479+
480+
# Access as different formats
481+
torch_tensor = zc_arr.to_torch()
482+
arrow_array = zc_arr.to_arrow()
483+
```
484+
485+
---
486+
336487
## Supported Formats Summary
337488

338489
| Format | Import | Export | Dependencies |
@@ -345,7 +496,9 @@ print("Model conversion pipeline complete!")
345496
| HDF5 (.h5) ||| `h5py` |
346497
| Zarr ||| `zarr` |
347498
| CSV ||| - |
499+
| TXT ||| - |
348500
| Pandas ||| `pandas` |
501+
| S3 ||| `boto3`, `s3fs` |
349502

350503
---
351504

docs/README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,11 @@ Welcome to the NumPack API documentation! NumPack is a high-performance array st
1515

1616
### API Reference
1717

18+
- **[API Reference (Detailed)](./api_reference/README.md)** ⭐ NEW
19+
- Complete function-level documentation
20+
- Parameters, return values, and examples
21+
- Organized by module (Core, IO, Utils)
22+
1823
- **[02. Core Operations](./02_core_operations.md)**
1924
- Complete API reference for all basic operations
2025
- `save()`, `load()`, `replace()`, `append()`, `drop()`
@@ -76,7 +81,6 @@ Welcome to the NumPack API documentation! NumPack is a high-performance array st
7681
| **API lookup** | [Core Operations](./02_core_operations.md) | Complete API reference |
7782
| **Performance optimization** | [Batch Processing](./03_batch_processing.md), [Performance Guide](./05_performance_guide.md) | 25-174x speedup |
7883
| **Large datasets** | [Advanced Features](./04_advanced_features.md) | Lazy loading, streaming |
79-
8084
| **Quick answers** | [Quick Reference](./06_quick_reference.md) | Cheatsheet, common patterns |
8185
| **Format conversion** | [IO Conversion](./07_io_conversion.md) | PyTorch, Arrow, Parquet, SafeTensors |
8286

docs/api_reference/README.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# NumPack API Reference
2+
3+
Complete API documentation for all NumPack modules and functions.
4+
5+
## Documentation Structure
6+
7+
```
8+
api_reference/
9+
├── README.md # This file
10+
├── core/
11+
│ ├── numpack_class.md # NumPack class
12+
│ ├── lazy_array.md # LazyArray class
13+
│ └── batch_modes.md # BatchModeContext & WritableBatchMode
14+
├── io/
15+
│ ├── README.md # IO module overview
16+
│ ├── pytorch.md # PyTorch conversion
17+
│ ├── arrow_feather.md # PyArrow/Feather conversion
18+
│ ├── parquet.md # Parquet conversion
19+
│ ├── safetensors.md # SafeTensors conversion
20+
│ ├── numpy.md # NumPy conversion
21+
│ ├── hdf5.md # HDF5 conversion
22+
│ ├── zarr.md # Zarr conversion
23+
│ ├── csv_txt.md # CSV/TXT conversion
24+
│ ├── pandas.md # Pandas conversion
25+
│ ├── s3.md # S3 cloud storage
26+
│ └── zero_copy.md # Zero-copy utilities
27+
└── utils/
28+
├── package_io.md # pack/unpack functions
29+
└── utilities.md # Utility functions
30+
```
31+
32+
## Quick Navigation
33+
34+
### Core Classes
35+
36+
| Class | Description | Documentation |
37+
|-------|-------------|---------------|
38+
| `NumPack` | Main array storage class | [numpack_class.md](./core/numpack_class.md) |
39+
| `LazyArray` | Memory-mapped lazy loading | [lazy_array.md](./core/lazy_array.md) |
40+
| `BatchModeContext` | In-memory batch caching | [batch_modes.md](./core/batch_modes.md) |
41+
| `WritableBatchMode` | Zero-copy writable batch | [batch_modes.md](./core/batch_modes.md) |
42+
43+
### IO Conversion Functions
44+
45+
| Format | Import | Export | Documentation |
46+
|--------|--------|--------|---------------|
47+
| PyTorch | `from_torch`, `from_torch_file` | `to_torch`, `to_torch_file` | [pytorch.md](./io/pytorch.md) |
48+
| Arrow/Feather | `from_arrow`, `from_feather_file` | `to_arrow`, `to_feather_file` | [arrow_feather.md](./io/arrow_feather.md) |
49+
| Parquet | `from_parquet_table`, `from_parquet_file` | `to_parquet_table`, `to_parquet_file` | [parquet.md](./io/parquet.md) |
50+
| SafeTensors | `from_safetensors`, `from_safetensors_file` | `to_safetensors`, `to_safetensors_file` | [safetensors.md](./io/safetensors.md) |
51+
| NumPy | `from_numpy` | `to_numpy` | [numpy.md](./io/numpy.md) |
52+
| HDF5 | `from_hdf5` | `to_hdf5` | [hdf5.md](./io/hdf5.md) |
53+
| Zarr | `from_zarr` | `to_zarr` | [zarr.md](./io/zarr.md) |
54+
| CSV/TXT | `from_csv`, `from_txt` | `to_csv`, `to_txt` | [csv_txt.md](./io/csv_txt.md) |
55+
| Pandas | `from_pandas` | `to_pandas` | [pandas.md](./io/pandas.md) |
56+
| S3 | `from_s3` | `to_s3` | [s3.md](./io/s3.md) |
57+
58+
### Utility Functions
59+
60+
| Function | Description | Documentation |
61+
|----------|-------------|---------------|
62+
| `pack` | Package NumPack directory | [package_io.md](./utils/package_io.md) |
63+
| `unpack` | Extract NumPack package | [package_io.md](./utils/package_io.md) |
64+
| `get_package_info` | Get package metadata | [package_io.md](./utils/package_io.md) |
65+
| `get_backend_info` | Get backend information | [utilities.md](./utils/utilities.md) |
66+
67+
## Import Patterns
68+
69+
```python
70+
# Core class
71+
from numpack import NumPack, LazyArray
72+
73+
# IO conversion functions
74+
from numpack.io import from_torch, to_torch
75+
from numpack.io import from_numpy, to_numpy
76+
77+
# Package operations
78+
from numpack import pack, unpack, get_package_info
79+
80+
# Backend info
81+
from numpack import get_backend_info
82+
```
83+
84+
## Version
85+
86+
This documentation is for NumPack version **0.5.0**.

0 commit comments

Comments
 (0)