22
33NumPack provides comprehensive format conversion utilities for seamless integration with popular data frameworks.
44
5+ ## Table of Contents
6+
7+ - [ Overview] ( #overview )
8+ - [ PyTorch Conversion] ( #pytorch-conversion )
9+ - [ PyArrow/Feather Conversion] ( #pyarrowfeather-conversion )
10+ - [ Parquet Conversion] ( #parquet-conversion )
11+ - [ SafeTensors Conversion] ( #safetensors-conversion )
12+ - [ Other Formats] ( #other-formats )
13+ - [ Text File Conversion] ( #text-file-conversion )
14+ - [ Pandas Conversion] ( #pandas-conversion )
15+ - [ S3 Cloud Storage] ( #s3-cloud-storage )
16+ - [ Zero-Copy Utilities] ( #zero-copy-utilities )
17+ - [ Supported Formats Summary] ( #supported-formats-summary )
18+ - [ Best Practices] ( #best-practices )
19+
20+ ---
21+
522## Overview
623
724NumPack supports two types of conversions:
@@ -333,6 +350,140 @@ print("Model conversion pipeline complete!")
333350
334351---
335352
353+ ## Text File Conversion
354+
355+ ### TXT Files
356+
357+ ``` python
358+ from numpack.io import from_txt, to_txt
359+
360+ # .txt → .npk (whitespace-delimited)
361+ from_txt(' data.txt' , ' output.npk' , array_name = ' data' , delimiter = None )
362+
363+ # .npk → .txt
364+ to_txt(' input.npk' , ' output.txt' , array_name = ' data' , delimiter = ' \t ' )
365+ ```
366+
367+ ** Parameters:**
368+ - ` delimiter ` : Field separator (default: whitespace)
369+ - ` skip_header ` : Number of header rows to skip
370+ - ` dtype ` : Target data type
371+
372+ ---
373+
374+ ## Pandas Conversion
375+
376+ ### DataFrame ↔ .npk
377+
378+ ``` python
379+ from numpack.io import from_pandas, to_pandas
380+ import pandas as pd
381+
382+ # DataFrame → .npk
383+ df = pd.DataFrame({' a' : [1 , 2 , 3 ], ' b' : [4.0 , 5.0 , 6.0 ]})
384+ from_pandas(df, ' output.npk' , array_name = ' dataframe' )
385+
386+ # .npk → DataFrame
387+ df = to_pandas(' input.npk' , array_name = ' dataframe' )
388+ print (df.columns)
389+ ```
390+
391+ ** Notes:**
392+ - Numeric columns are converted to NumPy arrays
393+ - String columns may require special handling
394+
395+ ---
396+
397+ ## S3 Cloud Storage
398+
399+ NumPack supports direct reading and writing to Amazon S3.
400+
401+ ### S3 ↔ .npk
402+
403+ ``` python
404+ from numpack.io import from_s3, to_s3
405+
406+ # Download from S3 and convert to .npk (uses default AWS credentials)
407+ from_s3(' s3://my-bucket/data.npy' , ' output.npk' )
408+
409+ # Public bucket access
410+ from_s3(' s3://public-bucket/data.csv' , ' output.npk' , anon = True )
411+
412+ # Upload .npk to S3
413+ to_s3(' input.npk' , ' s3://my-bucket/output.parquet' )
414+
415+ # Specify output format
416+ to_s3(' input.npk' , ' s3://my-bucket/output.csv' , format = ' csv' )
417+ ```
418+
419+ ** Parameters:**
420+ - ` s3_path ` : S3 URI in the form ` s3://bucket/path/to/file `
421+ - ` format ` : Input/output format (` 'auto' ` , ` 'numpy' ` , ` 'csv' ` , ` 'txt' ` , ` 'parquet' ` , ` 'feather' ` , ` 'hdf5' ` )
422+ - ` **s3_kwargs ` : Keyword arguments forwarded to ` s3fs.S3FileSystem ` (e.g., ` anon=True ` for public buckets)
423+
424+ ** Dependencies:** ` s3fs `
425+
426+ ---
427+
428+ ## Zero-Copy Utilities
429+
430+ NumPack provides zero-copy utilities for efficient data exchange with other libraries.
431+
432+ ### DLPack Protocol
433+
434+ ``` python
435+ from numpack.io import to_dlpack, from_dlpack
436+
437+ # NumPy → DLPack capsule
438+ arr = np.random.rand(100 , 50 )
439+ capsule = to_dlpack(arr)
440+
441+ # DLPack capsule → NumPy
442+ arr_restored = from_dlpack(capsule)
443+ ```
444+
445+ ### Arrow Zero-Copy
446+
447+ ``` python
448+ from numpack.io import numpy_to_arrow_zero_copy, arrow_to_numpy_zero_copy
449+
450+ # NumPy → Arrow (zero-copy)
451+ arr = np.random.rand(100 , 50 ).astype(np.float32)
452+ arrow_arr = numpy_to_arrow_zero_copy(arr)
453+
454+ # Arrow → NumPy (zero-copy)
455+ numpy_arr = arrow_to_numpy_zero_copy(arrow_arr)
456+ ```
457+
458+ ### PyTorch Zero-Copy
459+
460+ ``` python
461+ from numpack.io import numpy_to_torch_zero_copy, torch_to_numpy_zero_copy
462+
463+ # NumPy → PyTorch (shared memory)
464+ arr = np.random.rand(100 , 50 ).astype(np.float32)
465+ tensor = numpy_to_torch_zero_copy(arr)
466+
467+ # PyTorch → NumPy (shared memory)
468+ numpy_arr = torch_to_numpy_zero_copy(tensor)
469+ ```
470+
471+ ### ZeroCopyArray Wrapper
472+
473+ ``` python
474+ from numpack.io import ZeroCopyArray, wrap_for_zero_copy
475+
476+ # Wrap array for zero-copy operations
477+ arr = np.random.rand(100 , 50 )
478+ zc_arr = wrap_for_zero_copy(arr)
479+
480+ # Access as different formats
481+ torch_tensor = zc_arr.to_torch()
482+ arrow_array = zc_arr.to_arrow()
483+ ```
484+
485+ ---
486+
336487## Supported Formats Summary
337488
338489| Format | Import | Export | Dependencies |
@@ -345,7 +496,9 @@ print("Model conversion pipeline complete!")
345496| HDF5 (.h5) | ✅ | ✅ | ` h5py ` |
346497| Zarr | ✅ | ✅ | ` zarr ` |
347498| CSV | ✅ | ✅ | - |
499+ | TXT | ✅ | ✅ | - |
348500| Pandas | ✅ | ✅ | ` pandas ` |
501+ | S3 | ✅ | ✅ | ` boto3 ` , ` s3fs ` |
349502
350503---
351504
0 commit comments