Skip to content

Commit c6be596

Browse files
committed
add back files in CycleMLP
1 parent caf5f7d commit c6be596

4 files changed

Lines changed: 4 additions & 632 deletions

File tree

image_classification/CycleMLP/README.md

Lines changed: 4 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,6 @@ python main_multi_gpu.py \
126126
</details>
127127

128128
## Training
129-
### Training with single GPU
130129
To train the CycleMLP model on ImageNet2012 with single GPUs, run the following script using command line:
131130
```shell
132131
sh run_train.sh
@@ -141,9 +140,11 @@ python main_single_gpu.py \
141140
-data_path='/dataset/imagenet' \
142141
```
143142

144-
### Training with multi-GPU
145-
Run training using multi-GPUs:
143+
<details>
146144

145+
<summary>
146+
Run training using multi-GPUs:
147+
</summary>
147148

148149

149150
```shell
@@ -159,55 +160,9 @@ python main_multi_gpu.py \
159160
-data_path='/dataset/imagenet' \
160161
```
161162

162-
163-
164-
### Training with multi-node
165-
PaddleVit also supports multi-node distributed training under collective mode.
166-
167-
Suppose you have 2 hosts (denoted as node) with 4 gpus on each machine.
168-
Nodes IP addresses are `192.168.0.16` and `192.168.0.17`.
169-
170-
Then some lines of `run_train_multi_node.sh` should be modified:
171-
```shell
172-
CUDA_VISIBLE_DEVICES=0,1,2,3 # number of gpus
173-
174-
-ips= '192.168.0.16, 192.168.0.17' # seperated by comma
175-
```
176-
Run training script in every node:
177-
```shell
178-
sh run_train_multi.sh
179-
```
180-
181-
<details>
182-
<summary>It is possible to train with multi-node even when you have only one machine</summary>
183-
184-
1. Install docker and paddle. For more details, please refer to
185-
[here](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/docker/fromdocker.html).
186-
187-
2. Create a network between docker containers.
188-
```shell
189-
docker network create -d bridge paddle_net
190-
```
191-
3. Create multiple containers as virtual hosts/nodes. Suppose creating 2 containers
192-
with 2 gpus on each node.
193-
```shell
194-
docker run --name paddle0 -it -d --gpus "device=0,1" --network paddle_net\
195-
paddlepaddle/paddle:2.2.0-gpu-cuda10.2-cudnn7 /bin/bash
196-
docker run --name paddle1 -it -d --gpus "device=2,3" --network paddle_net\
197-
paddlepaddle/paddle:2.2.0-gpu-cuda10.2-cudnn7 /bin/bash
198-
```
199-
> Noted:
200-
> 1. One can assign same gpu device to different containers. But it may occur OOM since multiple models will run on the same gpu.
201-
> 2. One should use `-v` to bind PaddleViT repository to container.
202-
203-
4. Modify `run_train_multi_node.sh` as described above and run the training script on every container.
204-
205-
> Noted: One can use `ping` or `ip -a` bash command to check containers' ip addresses.
206-
207163
</details>
208164

209165

210-
211166
## Visualization Attention Map
212167
**(coming soon)**
213168

0 commit comments

Comments
 (0)