Skip to content

Commit 64dd792

Browse files
committed
Support uninitialized tensors in file IO reads
Add IsInitialized helpers to tensor types so callers do not inspect Data() directly. Allocate destination tensors during NumPy-backed file reads when the tensor has no data pointer, using the shape discovered from the file before copying data. Document CSV, MAT, and NPY IO examples, and clarify that MAT helpers operate on MAT-file variables rather than providing a general HDF5 interface. Add CSV, MAT, NPY, and tensor initialization regression coverage.
1 parent 580b8cc commit 64dd792

9 files changed

Lines changed: 187 additions & 13 deletions

File tree

docs_input/api/io/index.rst

Lines changed: 60 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,67 @@
33
Input/Output
44
############
55

6+
MatX file IO helpers read and write common array file formats through the
7+
optional ``MATX_ENABLE_FILEIO`` support. Include ``matx.h`` and use the
8+
``matx::io`` namespace functions shown below.
9+
10+
Read functions can write into an already-sized tensor, or into a
11+
default-constructed tensor with the desired rank and value type. When the tensor
12+
has no storage yet, MatX allocates it after discovering the shape from the file.
13+
14+
CSV
15+
===
16+
17+
CSV files support rank-1 and rank-2 tensors. The delimiter is passed explicitly,
18+
and ``read_csv`` skips the first row by default.
19+
20+
.. code-block:: cpp
21+
22+
tensor_t<float, 2> samples;
23+
io::read_csv(samples, "samples.csv", ",");
24+
25+
io::write_csv(samples, "samples_out.csv", ",");
26+
27+
To read a file without skipping the first row, pass ``false`` for the final
28+
argument.
29+
30+
.. code-block:: cpp
31+
32+
io::read_csv(samples, "samples_out.csv", ",", false);
33+
34+
MAT
35+
===
36+
37+
MAT files can contain multiple named variables. ``read_mat`` and ``write_mat``
38+
therefore take a variable name in addition to the file name.
39+
40+
.. code-block:: cpp
41+
42+
tensor_t<float, 2> A;
43+
io::read_mat(A, "arrays.mat", "A");
44+
45+
auto B = io::read_mat<tensor_t<float, 2>>("arrays.mat", "B");
46+
47+
io::write_mat(A, "arrays_out.mat", "A");
48+
49+
MATLAB v7.3 MAT files are HDF5-based, but the MatX MAT helpers are variable
50+
oriented and use SciPy's MAT-file routines. Treat them as MAT-file readers and
51+
writers rather than as a general HDF5 interface.
52+
53+
NPY
54+
===
55+
56+
NPY files store a single NumPy array per file.
57+
58+
.. code-block:: cpp
59+
60+
tensor_t<float, 2> x;
61+
io::read_npy(x, "x.npy");
62+
63+
io::write_npy(x, "x_out.npy");
64+
665
.. toctree::
766
:maxdepth: 1
867
:glob:
968

10-
*
69+
*

docs_input/api/io/read_mat.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,15 @@
33
read_mat
44
========
55

6-
Read a CSV file into a tensor
6+
Read a variable from a MAT file into a tensor
77

88
.. note::
99
This function requires the optional ``MATX_ENABLE_FILEIO`` compile flag
1010

11+
MAT files can contain multiple named variables. Pass the variable name in
12+
``var`` to select the tensor to read. MATLAB v7.3 MAT files are HDF5-based, but
13+
these helpers use SciPy's MAT-file routines and are not a general HDF5
14+
interface.
1115

1216

1317
.. versionadded:: 0.3.0

docs_input/api/io/write_mat.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,13 @@
33
write_mat
44
=========
55

6-
Write an operator to a MAT file
6+
Write a tensor to a MAT file variable
77

88
.. note::
99
This function requires the optional ``MATX_ENABLE_FILEIO`` compile flag
1010

11+
MAT files can contain multiple named variables. ``write_mat`` writes the tensor
12+
under the variable name passed in ``var``.
1113

1214

1315
.. versionadded:: 0.3.0

include/matx/core/dynamic_tensor.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,11 @@ class dynamic_tensor_t {
218218

219219
__MATX_INLINE__ T *Data() const { return ldata_; }
220220

221+
__MATX_INLINE__ bool IsInitialized() const noexcept
222+
{
223+
return Data() != nullptr;
224+
}
225+
221226
__MATX_INLINE__ index_t TotalSize() const {
222227
index_t total = 1;
223228
for (int i = 0; i < rank_; ++i) {

include/matx/core/pybind.h

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ MATX_IGNORE_WARNING_PUSH_GCC("-Wnull-dereference")
4040
#include <pybind11/embed.h>
4141
#include <pybind11/numpy.h>
4242
MATX_IGNORE_WARNING_POP_GCC
43+
#include <algorithm>
4344
#include <optional>
4445
#include <filesystem>
4546

@@ -378,6 +379,11 @@ class MATX_PYBIND_VISIBILITY MatXPybind {
378379
return indices;
379380
}
380381

382+
template <typename TensorType>
383+
static constexpr bool has_is_initialized_v = requires(const TensorType &ten) {
384+
ten.IsInitialized();
385+
};
386+
381387
template <typename TensorType>
382388
void NumpyToTensorView(TensorType &ten,
383389
const std::string fname)
@@ -387,14 +393,35 @@ class MATX_PYBIND_VISIBILITY MatXPybind {
387393
}
388394

389395
template <typename TensorType>
390-
void NumpyToTensorView(TensorType ten,
396+
void NumpyToTensorView(TensorType &&ten,
391397
const pybind11::object &np_ten)
392398
{
393-
using T = typename TensorType::value_type;
394-
constexpr int RANK = TensorType::Rank();
399+
using Tensor = remove_cvref_t<TensorType>;
400+
using T = typename Tensor::value_type;
401+
constexpr int RANK = Tensor::Rank();
395402

396403
using ntype = matx_convert_complex_type<T>;
397404
auto ften = pybind11::array_t<ntype>(np_ten);
405+
auto info = ften.request();
406+
407+
MATX_ASSERT_STR(info.ndim == RANK, matxInvalidDim,
408+
"Numpy array rank does not match tensor rank");
409+
410+
if constexpr (has_is_initialized_v<Tensor>) {
411+
if (!ten.IsInitialized()) {
412+
cuda::std::array<matx::index_t, RANK> shape;
413+
std::copy_n(info.shape.begin(), RANK, std::begin(shape));
414+
// The copy below writes from host code, so use the default
415+
// host-accessible allocation path.
416+
make_tensor(ten, shape);
417+
}
418+
}
419+
420+
for (int d = 0; d < RANK; d++) {
421+
MATX_ASSERT_STR(ten.Size(d) == static_cast<index_t>(info.shape[d]),
422+
matxInvalidSize,
423+
"Numpy array dimension size does not match tensor size");
424+
}
398425

399426
if constexpr (RANK == 0) {
400427
ten() = ConvertComplex(ften.at());

include/matx/core/tensor_impl.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1557,6 +1557,16 @@ MATX_IGNORE_WARNING_POP_GCC
15571557
return data_.ldata_;
15581558
}
15591559

1560+
/**
1561+
* @brief Check whether this tensor has an assigned data pointer.
1562+
*
1563+
* @return true if the tensor has storage or a non-owning data pointer
1564+
*/
1565+
__MATX_INLINE__ __MATX_HOST__ __MATX_DEVICE__ bool IsInitialized() const noexcept
1566+
{
1567+
return Data() != nullptr;
1568+
}
1569+
15601570
/**
15611571
* @brief Set data pointer
15621572
*

include/matx/file_io/file_io.h

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -200,9 +200,9 @@ void write_csv(const TensorType &t, const std::string fname,
200200
/**
201201
* @brief Read a MAT file into a tensor view
202202
*
203-
* MAT files use SciPy's loadmat() function to read various MATLAB file
204-
* types in. MAT files are supersets of HDF5 files, and are allowed to
205-
* have multiple fields in them.
203+
* MAT files use SciPy's loadmat() function to read MATLAB variables. MATLAB
204+
* v7.3 MAT files are HDF5-based, but this helper is intended for MAT-file
205+
* variables rather than as a general HDF5 interface.
206206
*
207207
* @tparam TensorType
208208
* Data type of tensor
@@ -241,9 +241,9 @@ void read_mat(TensorType &t, const std::string fname,
241241
/**
242242
* @brief Read a MAT file and return a tensor view
243243
*
244-
* MAT files use SciPy's loadmat() function to read various MATLAB file
245-
* types in. MAT files are supersets of HDF5 files, and are allowed to
246-
* have multiple fields in them.
244+
* MAT files use SciPy's loadmat() function to read MATLAB variables. MATLAB
245+
* v7.3 MAT files are HDF5-based, but this helper is intended for MAT-file
246+
* variables rather than as a general HDF5 interface.
247247
*
248248
* @tparam TensorType
249249
* Data type of tensor

test/00_io/FileIOTests.cu

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,21 @@ TYPED_TEST(FileIoTestsNonComplexFloatTypes, SmallCSVRead)
7575
MATX_EXIT_HANDLER();
7676
}
7777

78+
TYPED_TEST(FileIoTestsNonComplexFloatTypes, SmallCSVReadUninitialized)
79+
{
80+
MATX_ENTER_HANDLER();
81+
using TestType = cuda::std::tuple_element_t<0, TypeParam>;
82+
tensor_t<TestType, 2> t;
83+
84+
io::read_csv(t, this->small_csv, ",");
85+
86+
ASSERT_EQ(t.Size(0), 10);
87+
ASSERT_EQ(t.Size(1), 2);
88+
MATX_TEST_ASSERT_COMPARE(this->pb, t, this->small_csv.c_str(), 0.01);
89+
90+
MATX_EXIT_HANDLER();
91+
}
92+
7893
TYPED_TEST(FileIoTestsNonComplexFloatTypes, CSVReadFileNotFound)
7994
{
8095
MATX_ENTER_HANDLER();
@@ -146,6 +161,21 @@ TYPED_TEST(FileIoTestsNonComplexFloatTypes, MATRead)
146161
MATX_EXIT_HANDLER();
147162
}
148163

164+
TYPED_TEST(FileIoTestsNonComplexFloatTypes, MATReadUninitialized)
165+
{
166+
MATX_ENTER_HANDLER();
167+
using TestType = cuda::std::tuple_element_t<0, TypeParam>;
168+
tensor_t<TestType, 2> t;
169+
170+
io::read_mat(t, "../test/00_io/test.mat", "myvar");
171+
172+
ASSERT_EQ(t.Size(0), 1);
173+
ASSERT_EQ(t.Size(1), 10);
174+
ASSERT_NEAR(t(0,0), 1.456, 0.001);
175+
176+
MATX_EXIT_HANDLER();
177+
}
178+
149179
TYPED_TEST(FileIoTestsNonComplexFloatTypes, MATReadFileNotFound)
150180
{
151181
MATX_ENTER_HANDLER();
@@ -338,6 +368,37 @@ TYPED_TEST(FileIoTestsNonComplexFloatTypes, NPYRead)
338368
MATX_EXIT_HANDLER();
339369
}
340370

371+
TYPED_TEST(FileIoTestsNonComplexFloatTypes, NPYReadUninitialized)
372+
{
373+
MATX_ENTER_HANDLER();
374+
using TestType = cuda::std::tuple_element_t<0, TypeParam>;
375+
376+
tensor_t<TestType, 2> t;
377+
378+
io::read_npy(t, "../test/00_io/test.npy");
379+
380+
ASSERT_EQ(t.Size(0), 2);
381+
ASSERT_EQ(t.Size(1), 3);
382+
ASSERT_NEAR(t(0, 0), 1.5, 0.001);
383+
ASSERT_NEAR(t(1, 2), 6.5, 0.001);
384+
385+
MATX_EXIT_HANDLER();
386+
}
387+
388+
TYPED_TEST(FileIoTestsNonComplexFloatTypes, NPYReadInitializedShapeMismatch)
389+
{
390+
MATX_ENTER_HANDLER();
391+
using TestType = cuda::std::tuple_element_t<0, TypeParam>;
392+
393+
auto t = make_tensor<TestType>({1, 3});
394+
395+
ASSERT_THROW({
396+
io::read_npy(t, "../test/00_io/test.npy");
397+
}, matx::detail::matxException);
398+
399+
MATX_EXIT_HANDLER();
400+
}
401+
341402
TYPED_TEST(FileIoTestsNonComplexFloatTypes, NPYReadFileNotFound)
342403
{
343404
MATX_ENTER_HANDLER();
@@ -375,4 +436,4 @@ TYPED_TEST(FileIoTestsNonComplexFloatTypes, NPYWrite)
375436
}
376437

377438
MATX_EXIT_HANDLER();
378-
}
439+
}

test/00_tensor/TensorCreationTests.cu

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,11 +210,17 @@ TYPED_TEST(TensorCreationTestsAll, StaticTensorDataPointer)
210210
{
211211
using TestType = cuda::std::tuple_element_t<0, TypeParam>;
212212

213+
tensor_t<TestType, 2> uninitialized;
214+
ASSERT_EQ(uninitialized.Data(), nullptr);
215+
ASSERT_FALSE(uninitialized.IsInitialized());
216+
213217
auto mt1 = make_tensor<TestType, 10>();
214218
ASSERT_NE(mt1.Data(), nullptr);
219+
ASSERT_TRUE(mt1.IsInitialized());
215220

216221
auto mt2 = make_tensor<TestType, 4, 5>();
217222
ASSERT_NE(mt2.Data(), nullptr);
223+
ASSERT_TRUE(mt2.IsInitialized());
218224
}
219225

220226
TYPED_TEST(TensorCreationTestsAll, StaticTensorAssignOnes)

0 commit comments

Comments
 (0)