Skip to content

Commit d0ca8b2

Browse files
chen2021673kilinchange
authored andcommitted
refactor(tests): streamline test fixtures and framework APIs
- Add requires_grad default parameter to Tensor ctor so tests can construct autograd-enabled tensors without a fixture helper. - Remove InfiniTrainTest::createTensor, AutogradTestBase, and FillConstantTensor; call sites use std::make_shared<Tensor>(...) and Tensor::Fill(value) directly. - Replace gtest_main with a custom tests/common/test_main.cc that initializes GlobalEnv once before RUN_ALL_TESTS, eliminating the need for GlobalEnv::IsInitialized and per-suite SetUpTestSuite init guards. - Gate CUDA test registration on USE_CUDA: when disabled, the CUDA parameterization is simply not instantiated instead of skipped at runtime. - Move test_macros.cmake to cmake/ and include test headers via full project-root paths. - Drop tests' dependency on example/*/config.h; TransformerModule tests now construct TransformerConfig directly. - Add SanitizeGPT2Config / SanitizeLLaMA3Config in example/.
1 parent 11ed37e commit d0ca8b2

44 files changed

Lines changed: 672 additions & 543 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,6 @@
1717

1818
include_guard(GLOBAL)
1919

20-
# Path to this file's directory (tests/common/)
21-
set(TEST_MACROS_DIR "${CMAKE_CURRENT_LIST_DIR}")
22-
2320
# -----------------------------------------------------------------------------
2421
# Load GoogleTest module (provides gtest_discover_tests)
2522
# -----------------------------------------------------------------------------
@@ -59,20 +56,16 @@ macro(infini_train_add_test)
5956
endif()
6057

6158
# 1. Create executable target
62-
add_executable(${ARG_TEST_NAME} ${ARG_SOURCES})
59+
add_executable(${ARG_TEST_NAME} ${ARG_SOURCES} $<TARGET_OBJECTS:test_main>)
6360

6461
# 2. Disable -Werror so tests can run under relaxed warning levels
6562
target_compile_options(${ARG_TEST_NAME} PRIVATE -Wno-error)
6663

67-
# 3. Link Google Test
68-
target_link_libraries(${ARG_TEST_NAME} PRIVATE
69-
GTest::gtest
70-
GTest::gtest_main
71-
)
64+
# 3. Link Google Test (uses custom main from test_main that initializes GlobalEnv)
65+
target_link_libraries(${ARG_TEST_NAME} PRIVATE GTest::gtest)
7266

7367
# 4. Add include paths
7468
target_include_directories(${ARG_TEST_NAME} PRIVATE
75-
${TEST_MACROS_DIR}
7669
${glog_SOURCE_DIR}/src
7770
)
7871

docs/test_infrastructure_design.md

Lines changed: 39 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ tests/
99
├── CMakeLists.txt # 顶层:include 宏 + add_subdirectory
1010
├── common/
1111
│ ├── CMakeLists.txt # header-only interface library
12-
│ ├── test_macros.cmake # CMake 宏:infini_train_add_test / infini_train_add_test_suite
1312
│ └── test_utils.h # C++ 基类、skip 宏、填充工具函数
1413
├── tensor/ # Tensor 创建 / 拷贝 / 销毁 / 算子
1514
├── optimizer/ # Optimizer 创建 / step
@@ -18,6 +17,9 @@ tests/
1817
├── lora/ # LoRA 相关
1918
├── dtype/ # Scalar / dtype dispatch + 编译期负面测试
2019
└── transformer/ # Transformer 架构测试
20+
21+
cmake/
22+
└── test_macros.cmake # CMake 宏:infini_train_add_test / infini_train_add_test_suite
2123
```
2224

2325
### 核心设计:设备参数化
@@ -32,14 +34,15 @@ tests/
3234

3335
| 基类 | 用途 | 提供的能力 |
3436
|------|------|-----------|
35-
| `InfiniTrainTest` | 通用参数化测试 | `GetDevice()`, `createTensor(shape, dtype, requires_grad)` |
36-
| `AutogradTestBase` | Autograd 测试 | `createTensor(shape, value)` 自动 `requires_grad=true` + 顺序填充 |
37+
| `InfiniTrainTest` | 通用参数化测试 | `GetDevice()`(当前参数化的 `Device`|
3738

38-
**为什么需要 AutogradTestBase?**
39+
测试中的张量直接通过 `Tensor` 构造接口创建:
3940

40-
- 所有 autograd 测试都需要 `requires_grad=true`
41-
- 所有 autograd 测试都需要填充数据
42-
- 前向/反向传播测试必须有输入数据才能验证结果。`AutogradTestBase``FillSequentialTensor` 内置了,避免每个测试都手动调用
41+
```cpp
42+
auto t = std::make_shared<Tensor>(shape, DataType::kFLOAT32, GetDevice());
43+
auto g = std::make_shared<Tensor>(shape, DataType::kFLOAT32, GetDevice(), /*requires_grad=*/true);
44+
t->Fill(1.0f); // 常量填充(framework 内置 API)
45+
```
4346
4447
### 跳过特定平台
4548
@@ -110,14 +113,14 @@ mkdir tests/foo
110113
// tests/foo/test_foo_basic.cc
111114
#include <gtest/gtest.h>
112115
#include "infini_train/include/tensor.h"
113-
#include "test_utils.h"
116+
#include "tests/common/test_utils.h"
114117

115118
using namespace infini_train;
116119

117120
class FooBasicTest : public infini_train::test::InfiniTrainTest {};
118121

119122
TEST_P(FooBasicTest, CreateTensor) {
120-
auto tensor = createTensor({2, 3});
123+
auto tensor = std::make_shared<Tensor>(std::vector<int64_t>{2, 3}, DataType::kFLOAT32, GetDevice());
121124
EXPECT_NE(tensor, nullptr);
122125
}
123126

@@ -129,12 +132,9 @@ TEST_P(FooBasicTest, CUDAOnlyFeature) {
129132
INFINI_TRAIN_REGISTER_TEST(FooBasicTest);
130133
```
131134
132-
**基类选择(或创建):**
135+
**基类选择:**
133136
134-
| 场景 | 基类 |
135-
|------|------|
136-
| 通用测试 | `InfiniTrainTest`(提供 `createTensor(shape, dtype, requires_grad)`) |
137-
| 需要 autograd | `AutogradTestBase`(提供 `createTensor(shape, value)`,自动 `requires_grad=true` + 顺序填充) |
137+
所有测试类都继承 `InfiniTrainTest`。需要梯度时,给 `Tensor` 构造传 `requires_grad=true`;需要填充数据时用 `Tensor::Fill`。
138138
139139
**Step 2: 写 CMakeLists.txt**
140140
@@ -176,8 +176,6 @@ add_subdirectory(foo)
176176
| 函数 / 宏 | 用途 |
177177
|-----------|------|
178178
| `GetDevice()` | 返回当前参数化的 `Device`(基类方法) |
179-
| `createTensor(shape, dtype, requires_grad)` | 在当前设备创建 tensor(`InfiniTrainTest` 基类方法) |
180-
| `FillSequentialTensor(tensor, start)` | 填充递增值,自动处理 Device tensor(先填 CPU 再 copy) |
181179
| `SKIP_CPU()` | 跳过 CPU 实例 |
182180
| `ONLY_CPU()` | 只在 CPU 实例运行 |
183181
| `ONLY_CUDA()` | 只在 CUDA 实例运行 |
@@ -201,23 +199,12 @@ enum class DeviceType : int8_t {
201199
202200
### 4.2 测试工具层:`test_utils.h`
203201
204-
1. 新增运行时检测函数和 `CudaDeviceTypes` 的对称版本
202+
1. 新增 MACA 头文件的编译期引入(和 CUDA 对称)
205203
206204
```cpp
207-
#ifdef USE_MACA
208-
inline int GetMacaDeviceCount() { /* macaGetDeviceCount ... */ }
209-
#else
210-
inline int GetMacaDeviceCount() { return 0; }
205+
#if defined(USE_MACA)
206+
#include <maca_runtime_api.h>
211207
#endif
212-
inline bool HasMacaRuntime() { return GetMacaDeviceCount() > 0; }
213-
214-
inline std::vector<Device::DeviceType> MacaDeviceTypes() {
215-
if (HasMacaRuntime()) {
216-
return {Device::DeviceType::kMACA};
217-
}
218-
LOG(INFO) << "No MACA runtime found, skipping MACA tests.";
219-
return {};
220-
}
221208
```
222209

223210
2. 新增 `ONLY_MACA()` 宏:
@@ -227,18 +214,36 @@ inline std::vector<Device::DeviceType> MacaDeviceTypes() {
227214
do { if (GetParam() != infini_train::Device::DeviceType::kMACA) { GTEST_SKIP() << "MACA-only test"; } } while (0)
228215
```
229216
217+
如果希望有类似 `REQUIRE_MIN_DEVICES(n)` 但针对 MACA 的语义,可以按 `USE_CUDA` 分支的写法增加一个新的宏;同理 `USE_MACA` 不开时该宏直接 skip 即可。
218+
230219
### 4.3 注册宏:新增 MACA 实例
231220
221+
沿用 `USE_CUDA` 的做法,未开启编译开关时不注册对应实例:
222+
232223
```cpp
224+
#if defined(USE_CUDA) && defined(USE_MACA)
233225
#define INFINI_TRAIN_REGISTER_TEST(TestName) \
234226
INSTANTIATE_TEST_SUITE_P(CPU, TestName, \
235227
::testing::Values(infini_train::Device::DeviceType::kCPU)); \
236228
INSTANTIATE_TEST_SUITE_P(CUDA, TestName, \
237-
::testing::ValuesIn(infini_train::test::CudaDeviceTypes())); \
229+
::testing::Values(infini_train::Device::DeviceType::kCUDA)); \
238230
INSTANTIATE_TEST_SUITE_P(MACA, TestName, \
239-
::testing::ValuesIn(infini_train::test::MacaDeviceTypes()))
231+
::testing::Values(infini_train::Device::DeviceType::kMACA))
232+
#elif defined(USE_CUDA)
233+
#define INFINI_TRAIN_REGISTER_TEST(TestName) /* CPU + CUDA, 同现状 */
234+
#elif defined(USE_MACA)
235+
#define INFINI_TRAIN_REGISTER_TEST(TestName) \
236+
INSTANTIATE_TEST_SUITE_P(CPU, TestName, \
237+
::testing::Values(infini_train::Device::DeviceType::kCPU)); \
238+
INSTANTIATE_TEST_SUITE_P(MACA, TestName, \
239+
::testing::Values(infini_train::Device::DeviceType::kMACA))
240+
#else
241+
#define INFINI_TRAIN_REGISTER_TEST(TestName) /* 仅 CPU */
242+
#endif
240243
```
241244

245+
运行时如果机器上没有对应设备(例如 `USE_MACA` 编译但无 MACA 硬件),让测试直接报错而不是静默跳过。
246+
242247
### 4.4 CMake 层:`test_macros.cmake`
243248

244249
将默认 label 列表从 `cpu cuda` 扩展为 `cpu cuda maca`
@@ -248,7 +253,7 @@ inline std::vector<Device::DeviceType> MacaDeviceTypes() {
248253
| 步骤 | 文件 | 改动 |
249254
|------|------|------|
250255
| 1 | `device.h` | `DeviceType` 枚举新增 `kMACA` |
251-
| 2 | `test_utils.h` | 新增 `GetMacaDeviceCount()` / `HasMacaRuntime()` / `MacaDeviceTypes()` / `ONLY_MACA()` |
252-
| 3 | `test_utils.h` | `INFINI_TRAIN_REGISTER_TEST` 新增 MACA 实例 |
256+
| 2 | `test_utils.h` | 新增 `USE_MACA` 下的 `<maca_runtime_api.h>` 引入、`ONLY_MACA()` |
257+
| 3 | `test_utils.h` | `INFINI_TRAIN_REGISTER_TEST` 按 `USE_MACA` 条件新增 MACA 实例 |
253258
| 4 | `test_macros.cmake` | 将默认 label 列表扩展为 `cpu cuda maca` |
254259
| 5 | `CMakeLists.txt`(根) | 新增 `USE_MACA` option + MACA SDK 查找 + kernel 编译 |

docs/test_usage_guide.md

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -49,15 +49,15 @@ ctest -R tensor --output-on-failure
4949
`tests/` 下对应子目录中新建文件,例如 `tests/tensor/test_tensor_copy.cc`
5050

5151
```cpp
52-
#include "common/test_utils.h"
52+
#include "tests/common/test_utils.h"
5353

5454
class TensorCopyTest : public infini_train::test::InfiniTrainTest {};
5555

5656
TEST_P(TensorCopyTest, CopiesDataCorrectly) {
57-
auto src = createTensor({4}, DataType::kFLOAT32);
58-
FillSequentialTensor(src);
57+
auto src = std::make_shared<Tensor>(std::vector<int64_t>{4}, DataType::kFLOAT32, GetDevice());
58+
src->Fill(1.0f);
5959

60-
auto dst = createTensor({4}, DataType::kFLOAT32);
60+
auto dst = std::make_shared<Tensor>(std::vector<int64_t>{4}, DataType::kFLOAT32, GetDevice());
6161
// ... 执行拷贝并断言 ...
6262
EXPECT_EQ(dst->Dims(), src->Dims());
6363
}
@@ -66,7 +66,7 @@ INFINI_TRAIN_REGISTER_TEST(TensorCopyTest);
6666
```
6767
6868
注意事项:
69-
- 继承 `InfiniTrainTest`(autograd 测试继承 `AutogradTestBase`)
69+
- 继承 `InfiniTrainTest`。
7070
- 使用 `TEST_P`,设备参数由框架自动注入。
7171
- 文件末尾调用 `INFINI_TRAIN_REGISTER_TEST`,自动实例化 CPU 和 CUDA 两个变体。
7272
@@ -89,11 +89,9 @@ infini_train_add_test_suite(test_tensor_copy test_tensor_copy.cc)
8989
| 方法 | 说明 |
9090
|---|---|
9191
| `GetDevice()` | 返回当前测试实例的设备(CPU 或 CUDA) |
92-
| `createTensor(shape)` | 在当前设备上创建 `kFLOAT32` 张量 |
93-
| `createTensor(shape, dtype)` | 创建指定数据类型的张量 |
94-
| `createTensor(shape, dtype, requires_grad)` | 创建启用自动微分的张量 |
95-
| `FillSequentialTensor(tensor)` | 用 0, 1, 2, … 填充张量(自动处理 CPU/GPU 传输) |
96-
| `FillConstantTensor(tensor, value)` | 用常量填充张量所有元素 |
92+
| `tensor->Fill(value)` | 用常量填充张量所有元素(`Tensor` 内置方法) |
93+
94+
张量创建直接使用 `std::make_shared<Tensor>(shape, dtype, GetDevice(), requires_grad)``requires_grad` 参数默认 `false`,需要梯度的测试传 `true` 即可。
9795

9896
---
9997

@@ -130,20 +128,21 @@ TEST_P(MyTest, 需要多卡) {
130128
131129
## Autograd 测试
132130
133-
需要预填充输入张量时,继承 `AutogradTestBase`:
131+
创建启用自动微分的张量时,给 `Tensor` 构造的第四个参数传 `true`:
134132
135133
```cpp
136-
#include "common/test_utils.h"
134+
#include "tests/common/test_utils.h"
137135
138-
class MyOpTest : public infini_train::test::AutogradTestBase {};
136+
class MyOpTest : public infini_train::test::InfiniTrainTest {};
139137
140138
TEST_P(MyOpTest, 前向传播) {
141-
// input_ 和 weight_ 已在当前设备上创建并填充好序列值
142-
auto output = MyOp(input_, weight_);
139+
auto input = std::make_shared<Tensor>(std::vector<int64_t>{2, 3}, DataType::kFLOAT32, GetDevice(), true);
140+
input->Fill(1.0f);
141+
auto weight = std::make_shared<Tensor>(std::vector<int64_t>{4, 3}, DataType::kFLOAT32, GetDevice(), true);
142+
weight->Fill(0.5f);
143+
auto output = MyOp(input, weight);
143144
EXPECT_NE(output, nullptr);
144145
}
145146
146147
INFINI_TRAIN_REGISTER_TEST(MyOpTest);
147148
```
148-
149-
`AutogradTestBase` 继承自 `InfiniTrainTest`,预先创建了 `input_``weight_` 张量并填充了序列值。

example/gpt2/checkpoint_loader.cc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ std::shared_ptr<nn::TransformerModel> LoadFromLLMC(const std::string &filepath)
8787
gpt2_config.n_layer = n_layer;
8888
gpt2_config.n_head = n_head;
8989
gpt2_config.n_embd = n_embd;
90+
gpt2::SanitizeGPT2Config(gpt2_config);
9091
auto local_gpt2 = std::make_shared<nn::TransformerModel>(gpt2_config);
9192

9293
LOG(INFO) << "magic: " << magic << " version: " << version << " block_size: " << block_size

example/gpt2/config.h

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
#pragma once
22

3+
#include "glog/logging.h"
4+
35
#include "infini_train/include/nn/modules/transformer/transformer_config.h"
46

57
namespace nn = infini_train::nn;
@@ -23,4 +25,18 @@ inline nn::TransformerConfig GPT2Config() {
2325
.multiple_of = 1};
2426
}
2527

28+
inline void SanitizeGPT2Config(const nn::TransformerConfig &c) {
29+
CHECK_GT(c.block_size, 0);
30+
CHECK_GT(c.vocab_size, 0);
31+
CHECK_GE(c.vocab_size, c.original_vocab_size);
32+
CHECK_GT(c.n_layer, 0);
33+
CHECK_GT(c.n_head, 0);
34+
CHECK_GT(c.n_embd, 0);
35+
CHECK_EQ(c.n_embd % c.n_head, 0) << "n_embd must be divisible by n_head";
36+
CHECK_EQ(c.n_kv_head, c.n_head) << "GPT-2 does not use GQA; n_kv_head must equal n_head";
37+
CHECK(c.attention_type == nn::AttentionType::kStandard) << "GPT-2 requires standard attention";
38+
CHECK(c.activation_type == nn::MLPType::kGELU) << "GPT-2 requires GELU activation";
39+
CHECK(c.norm_type == nn::NormType::kLayerNorm) << "GPT-2 requires LayerNorm";
40+
}
41+
2642
} // namespace gpt2

example/gpt2/main.cc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,7 @@ void Train(const nn::parallel::Rank &rank) {
190190
model = gpt2::LoadFromLLMC(FLAGS_llmc_filepath);
191191
} else if (kModelToConfigs.count(FLAGS_model)) {
192192
model_config = kModelToConfigs.at(FLAGS_model);
193+
gpt2::SanitizeGPT2Config(model_config);
193194
model = std::make_shared<nn::TransformerModel>(model_config);
194195
}
195196

example/llama3/checkpoint_loader.cc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ std::shared_ptr<nn::TransformerModel> LoadFromLLMC(const std::string &filepath)
8080
llama3_config.use_scaled_rope = static_cast<bool>(use_scaled_rope);
8181
llama3_config.norm_eps = norm_eps;
8282
llama3_config.max_gen_batch_size = max_gen_bs;
83+
llama3::SanitizeLLaMA3Config(llama3_config);
8384
auto llama3 = std::make_shared<nn::TransformerModel>(llama3_config);
8485

8586
// ========== pp_size:num_stages; vpp_size: num_chunks_per_stage ==========

example/llama3/config.h

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
#pragma once
22

3+
#include "glog/logging.h"
4+
35
#include "infini_train/include/nn/modules/transformer/transformer_config.h"
46

57
namespace nn = infini_train::nn;
@@ -22,4 +24,24 @@ inline nn::TransformerConfig LLaMA3Config() {
2224
.ffn_dim_multiplier = 1.5f,
2325
.multiple_of = 256};
2426
}
27+
28+
inline void SanitizeLLaMA3Config(const nn::TransformerConfig &c) {
29+
CHECK_GT(c.block_size, 0);
30+
CHECK_GT(c.vocab_size, 0);
31+
CHECK_GE(c.vocab_size, c.original_vocab_size);
32+
CHECK_GT(c.n_layer, 0);
33+
CHECK_GT(c.n_head, 0);
34+
CHECK_GT(c.n_kv_head, 0);
35+
CHECK_LE(c.n_kv_head, c.n_head);
36+
CHECK_EQ(c.n_head % c.n_kv_head, 0) << "n_head must be divisible by n_kv_head for GQA";
37+
CHECK_GT(c.n_embd, 0);
38+
CHECK_EQ(c.n_embd % c.n_head, 0) << "n_embd must be divisible by n_head";
39+
CHECK(c.attention_type == nn::AttentionType::kRoPE) << "LLaMA-3 requires RoPE attention";
40+
CHECK(c.activation_type == nn::MLPType::kSwiGLU) << "LLaMA-3 requires SwiGLU activation";
41+
CHECK(c.norm_type == nn::NormType::kRMSNorm) << "LLaMA-3 requires RMSNorm";
42+
CHECK(!c.add_bias_linear) << "LLaMA-3 has no bias in linear layers";
43+
CHECK(!c.tie_weights) << "LLaMA-3 does not tie embedding and lm_head weights";
44+
CHECK(c.ffn_dim_multiplier.has_value()) << "LLaMA-3 requires ffn_dim_multiplier";
45+
CHECK_GT(c.multiple_of, 0);
46+
}
2547
} // namespace llama3

example/llama3/main.cc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,7 @@ void Train(const nn::parallel::Rank &rank) {
174174
if (!FLAGS_llmc_filepath.empty()) {
175175
model = llama3::LoadFromLLMC(FLAGS_llmc_filepath);
176176
} else {
177+
llama3::SanitizeLLaMA3Config(model_config);
177178
model = std::make_shared<nn::TransformerModel>(model_config);
178179
}
179180

infini_train/include/nn/parallel/global.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,6 @@ class GlobalEnv {
3131
void Init(int threads_per_process, int tensor_parallel_size, bool sequence_parallel_enabled,
3232
int pipeline_parallel_size, int virtual_pipeline_parallel_size);
3333

34-
bool IsInitialized() const;
35-
3634
int nnodes() const;
3735

3836
int nproc_per_node() const;

0 commit comments

Comments
 (0)