Skip to content

Commit b294d3b

Browse files
authored
Merge pull request #8 from modelai/ymir-dev
[yolov5 + doc + mmdet] add aldd mining algorithm for yolov5
2 parents 6b49b40 + ef09dcf commit b294d3b

19 files changed

Lines changed: 456 additions & 84 deletions

README.MD

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -99,13 +99,14 @@ gpu: single Tesla P4
9999
100100
gpu: single GeForce GTX 1080 Ti
101101
102-
| docker image | batch size | epoch number | model | voc2012 val map50 | training time | note |
103-
| - | - | - | - | - | - | - |
104-
| yolov5 | 16 | 100 | yolov5s | 70.35% | 2h | coco-pretrained |
105-
| yolov7 | 16 | 100 | yolov7-tiny | 70.4% | 5h | coco-pretrained |
106-
| mmdetection | 16 | 100 | yolox_tiny | 66.2% | 5h | coco-pretrained |
107-
| detectron2 | 2 | 20000 steps | retinanet_R_50_FPN_1x | 53.54% | 2h | imagenet-pretrained |
108-
| nanodet | 16 | 100 | nanodet-plus-m_416 | 58.63% | 5h | imagenet-pretrained |
102+
| docker image | image size | batch size | epoch number | model | voc2012 val map50 | training time | note |
103+
| - | - | - | - | - | - | - | - |
104+
| yolov4 | 608 | 64/32 | 20000 steps | yolov4 | 72.73% | 6h | imagenet-pretrained |
105+
| yolov5 | 640 | 16 | 100 | yolov5s | 70.35% | 2h | coco-pretrained |
106+
| yolov7 | 640 | 16 | 100 | yolov7-tiny | 70.4% | 5h | coco-pretrained |
107+
| mmdetection | 640 | 16 | 100 | yolox_tiny | 66.2% | 5h | coco-pretrained |
108+
| detectron2 | 640 | 2 | 20000 steps | retinanet_R_50_FPN_1x | 53.54% | 2h | imagenet-pretrained |
109+
| nanodet | 416 | 16 | 100 | nanodet-plus-m_416 | 58.63% | 5h | imagenet-pretrained |
109110
110111
---
111112
@@ -148,11 +149,14 @@ docker build -t ymir-executor/mmdet:cu111-tmi -f docker/Dockerfile.cuda111 .
148149
149150
## how to import pretrained model weights
150151
151-
- [import pretainted model weights](https://github.com/IndustryEssentials/ymir/blob/dev/dev_docs/import-extra-models.md)
152+
- [import and finetune model](https://github.com/modelai/ymir-executor-fork/wiki/import-and-finetune-model)
153+
154+
- ~~[import pretainted model weights](https://github.com/IndustryEssentials/ymir/blob/dev/dev_docs/import-extra-models.md)~~
152155
153156
## reference
154157
155158
- [mining algorithm: CALD](https://github.com/we1pingyu/CALD/)
159+
- [mining algorithm: ALDD](https://gitlab.com/haghdam/deep_active_learning)
156160
- [yolov4](https://github.com/AlexeyAB/darknet)
157161
- [yolov5](https://github.com/ultralytics/yolov5)
158162
- [mmdetection](https://github.com/open-mmlab/mmdetection)

README_zh-CN.MD

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@
8686
8787
- 训练集: voc2012-train 5717 images
8888
- 测试集: voc2012-val 5823 images
89-
- 图像大小: 640 (nanodet为416)
89+
- 图像大小: 640 (nanodet为416, yolov4为608)
9090
9191
**由于 coco 数据集包含 voc 数据集中的类, 因此这个对比并不公平, 仅供参考**
9292
@@ -101,13 +101,14 @@ gpu: single Tesla P4
101101
102102
gpu: single GeForce GTX 1080 Ti
103103
104-
| docker image | batch size | epoch number | model | voc2012 val map50 | training time | note |
105-
| - | - | - | - | - | - | - |
106-
| yolov5 | 16 | 100 | yolov5s | 70.35% | 2h | coco-pretrained |
107-
| yolov7 | 16 | 100 | yolov7-tiny | 70.4% | 5h | coco-pretrained |
108-
| mmdetection | 16 | 100 | yolox_tiny | 66.2% | 5h | coco-pretrained |
109-
| detectron2 | 2 | 20000 steps | retinanet_R_50_FPN_1x | 53.54% | 2h | imagenet-pretrained |
110-
| nanodet | 16 | 100 | nanodet-plus-m_416 | 58.63% | 5h | imagenet-pretrained |
104+
| docker image | image size | batch size | epoch number | model | voc2012 val map50 | training time | note |
105+
| - | - | - | - | - | - | - | - |
106+
| yolov4 | 608 | 64/32 | 20000 steps | yolov4 | 72.73% | 6h | imagenet-pretrained |
107+
| yolov5 | 640 | 16 | 100 | yolov5s | 70.35% | 2h | coco-pretrained |
108+
| yolov7 | 640 | 16 | 100 | yolov7-tiny | 70.4% | 5h | coco-pretrained |
109+
| mmdetection | 640 | 16 | 100 | yolox_tiny | 66.2% | 5h | coco-pretrained |
110+
| detectron2 | 640 | 2 | 20000 steps | retinanet_R_50_FPN_1x | 53.54% | 2h | imagenet-pretrained |
111+
| nanodet | 416 | 16 | 100 | nanodet-plus-m_416 | 58.63% | 5h | imagenet-pretrained |
111112
112113
---
113114
@@ -164,13 +165,16 @@ docker build -t ymir-executor/live-code:mxnet-tmi -f mxnet.dockerfile
164165
165166
## 如何导入预训练模型
166167
167-
- [如何导入外部模型](https://github.com/IndustryEssentials/ymir/blob/dev/dev_docs/import-extra-models.md)
168+
- [如何导入并精调外部模型](https://github.com/modelai/ymir-executor-fork/wiki/import-and-finetune-model)
169+
170+
- ~~[如何导入外部模型](https://github.com/IndustryEssentials/ymir/blob/dev/dev_docs/import-extra-models.md)~~
168171
169172
- 通过ymir网页端的 `模型管理/模型列表/导入模型` 同样可以导入模型
170173
171174
## 参考
172175
173176
- [挖掘算法CALD](https://github.com/we1pingyu/CALD/)
177+
- [挖掘算法ALDD](https://gitlab.com/haghdam/deep_active_learning)
174178
- [yolov4](https://github.com/AlexeyAB/darknet)
175179
- [yolov5](https://github.com/ultralytics/yolov5)
176180
- [mmdetection](https://github.com/open-mmlab/mmdetection)

det-mmdetection-tmi/mmdet/utils/util_ymir.py

Lines changed: 107 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
import logging
66
import os
77
import os.path as osp
8-
from typing import Any, List, Optional
8+
from typing import Any, Iterable, List, Optional
99

1010
import mmcv
1111
import yaml
@@ -14,18 +14,37 @@
1414
from nptyping import NDArray, Shape, UInt8
1515
from packaging.version import Version
1616
from ymir_exc import result_writer as rw
17+
from ymir_exc.util import get_merged_config
1718

1819
BBOX = NDArray[Shape['*,4'], Any]
1920
CV_IMAGE = NDArray[Shape['*,*,3'], UInt8]
2021

2122

22-
def modify_mmdet_config(mmdet_cfg: Config, ymir_cfg: edict) -> Config:
23+
def modify_mmcv_config(mmcv_cfg: Config, ymir_cfg: edict) -> None:
2324
"""
2425
useful for training process
2526
- modify dataset config
2627
- modify model output channel
2728
- modify epochs, checkpoint, tensorboard config
2829
"""
30+
def recursive_modify_attribute(mmcv_cfg: Config, attribute_key: str, attribute_value: Any):
31+
"""
32+
recursive modify mmcv_cfg:
33+
1. mmcv_cfg.attribute_key to attribute_value
34+
2. mmcv_cfg.xxx.xxx.xxx.attribute_key to attribute_value (recursive)
35+
3. mmcv_cfg.xxx[i].attribute_key to attribute_value (i=0, 1, 2 ...)
36+
4. mmcv_cfg.xxx[i].xxx.xxx[j].attribute_key to attribute_value
37+
"""
38+
for key in mmcv_cfg:
39+
if key == attribute_key:
40+
mmcv_cfg[key] = attribute_value
41+
elif isinstance(mmcv_cfg[key], Config):
42+
recursive_modify_attribute(mmcv_cfg[key], attribute_key, attribute_value)
43+
elif isinstance(mmcv_cfg[key], Iterable):
44+
for cfg in mmcv_cfg[key]:
45+
if isinstance(cfg, Config):
46+
recursive_modify_attribute(cfg, attribute_key, attribute_value)
47+
2948
# modify dataset config
3049
ymir_ann_files = dict(train=ymir_cfg.ymir.input.training_index_file,
3150
val=ymir_cfg.ymir.input.val_index_file,
@@ -35,8 +54,12 @@ def modify_mmdet_config(mmdet_cfg: Config, ymir_cfg: edict) -> Config:
3554
# so set smaller samples_per_gpu for validation
3655
samples_per_gpu = ymir_cfg.param.samples_per_gpu
3756
workers_per_gpu = ymir_cfg.param.workers_per_gpu
38-
mmdet_cfg.data.samples_per_gpu = samples_per_gpu
39-
mmdet_cfg.data.workers_per_gpu = workers_per_gpu
57+
mmcv_cfg.data.samples_per_gpu = samples_per_gpu
58+
mmcv_cfg.data.workers_per_gpu = workers_per_gpu
59+
60+
# modify model output channel
61+
num_classes = len(ymir_cfg.param.class_names)
62+
recursive_modify_attribute(mmcv_cfg.model, 'num_classes', num_classes)
4063

4164
for split in ['train', 'val', 'test']:
4265
ymir_dataset_cfg = dict(type='YmirDataset',
@@ -47,7 +70,7 @@ def modify_mmdet_config(mmdet_cfg: Config, ymir_cfg: edict) -> Config:
4770
data_root=ymir_cfg.ymir.input.root_dir,
4871
filter_empty_gt=False)
4972
# modify dataset config for `split`
50-
mmdet_dataset_cfg = mmdet_cfg.data.get(split, None)
73+
mmdet_dataset_cfg = mmcv_cfg.data.get(split, None)
5174
if mmdet_dataset_cfg is None:
5275
continue
5376

@@ -63,33 +86,60 @@ def modify_mmdet_config(mmdet_cfg: Config, ymir_cfg: edict) -> Config:
6386
else:
6487
raise Exception(f'unsupported source dataset type {src_dataset_type}')
6588

66-
# modify model output channel
67-
mmdet_model_cfg = mmdet_cfg.model.bbox_head
68-
mmdet_model_cfg.num_classes = len(ymir_cfg.param.class_names)
69-
7089
# modify epochs, checkpoint, tensorboard config
7190
if ymir_cfg.param.get('max_epochs', None):
72-
mmdet_cfg.runner.max_epochs = ymir_cfg.param.max_epochs
73-
mmdet_cfg.checkpoint_config['out_dir'] = ymir_cfg.ymir.output.models_dir
91+
mmcv_cfg.runner.max_epochs = int(ymir_cfg.param.max_epochs)
92+
mmcv_cfg.checkpoint_config['out_dir'] = ymir_cfg.ymir.output.models_dir
7493
tensorboard_logger = dict(type='TensorboardLoggerHook', log_dir=ymir_cfg.ymir.output.tensorboard_dir)
75-
if len(mmdet_cfg.log_config['hooks']) <= 1:
76-
mmdet_cfg.log_config['hooks'].append(tensorboard_logger)
94+
if len(mmcv_cfg.log_config['hooks']) <= 1:
95+
mmcv_cfg.log_config['hooks'].append(tensorboard_logger)
7796
else:
78-
mmdet_cfg.log_config['hooks'][1].update(tensorboard_logger)
97+
mmcv_cfg.log_config['hooks'][1].update(tensorboard_logger)
7998

99+
# TODO save only the best top-k model weight files.
80100
# modify evaluation and interval
81-
interval = max(1, mmdet_cfg.runner.max_epochs // 30)
82-
mmdet_cfg.evaluation.interval = interval
83-
mmdet_cfg.evaluation.metric = ymir_cfg.param.get('metric', 'bbox')
101+
val_interval: int = int(ymir_cfg.param.get('val_interval', 1))
102+
if val_interval > 0:
103+
val_interval = min(val_interval, mmcv_cfg.runner.max_epochs)
104+
else:
105+
val_interval = 1
106+
107+
mmcv_cfg.evaluation.interval = val_interval
108+
mmcv_cfg.evaluation.metric = ymir_cfg.param.get('metric', 'bbox')
109+
110+
# save best top-k model weights files
111+
# max_keep_ckpts <= 0 # save all checkpoints
112+
max_keep_ckpts: int = int(ymir_cfg.param.get('max_keep_checkpoints', 1))
113+
mmcv_cfg.checkpoint_config.interval = mmcv_cfg.evaluation.interval
114+
mmcv_cfg.checkpoint_config.max_keep_ckpts = max_keep_ckpts
115+
84116
# TODO Whether to evaluating the AP for each class
85117
# mmdet_cfg.evaluation.classwise = True
86118

87119
# fix DDP error
88-
mmdet_cfg.find_unused_parameters = True
89-
return mmdet_cfg
120+
mmcv_cfg.find_unused_parameters = True
121+
122+
# set work dir
123+
mmcv_cfg.work_dir = ymir_cfg.ymir.output.models_dir
124+
125+
args_options = ymir_cfg.param.get("args_options", '')
126+
cfg_options = ymir_cfg.param.get("cfg_options", '')
127+
128+
# auto load offered weight file if not set by user!
129+
if (args_options.find('--resume-from') == -1 and args_options.find('--load-from') == -1
130+
and cfg_options.find('load_from') == -1 and cfg_options.find('resume_from') == -1): # noqa: E129
131+
132+
weight_file = get_best_weight_file(ymir_cfg)
133+
if weight_file:
134+
if cfg_options:
135+
cfg_options += f' load_from={weight_file}'
136+
else:
137+
cfg_options = f'load_from={weight_file}'
138+
else:
139+
logging.warning('no weight file used for training!')
90140

91141

92-
def get_weight_file(cfg: edict) -> str:
142+
def get_best_weight_file(cfg: edict) -> str:
93143
"""
94144
return the weight file path by priority
95145
find weight file in cfg.param.pretrained_model_params or cfg.param.model_params_path
@@ -118,6 +168,7 @@ def get_weight_file(cfg: edict) -> str:
118168
if cfg.ymir.run_training:
119169
weight_files = [f for f in glob.glob('/weights/**/*', recursive=True) if f.endswith(('.pth', '.pt'))]
120170

171+
# load pretrained model weight for yolox only
121172
model_name_splits = osp.basename(cfg.param.config_file).split('_')
122173
if len(weight_files) > 0 and model_name_splits[0] == 'yolox':
123174
yolox_weight_files = [
@@ -145,6 +196,30 @@ def write_ymir_training_result(last: bool = False, key_score: Optional[float] =
145196
_write_ancient_ymir_training_result(key_score)
146197

147198

199+
def get_topk_checkpoints(files: List[str], k: int) -> List[str]:
200+
"""
201+
keep topk checkpoint files, remove other files.
202+
"""
203+
checkpoints_files = [f for f in files if f.endswith(('.pth', '.pt'))]
204+
205+
best_pth_files = [f for f in checkpoints_files if osp.basename(f).startswith('best_')]
206+
if len(best_pth_files) > 0:
207+
# newest first
208+
topk_best_pth_files = sorted(best_pth_files, key=os.path.getctime, reverse=True)
209+
else:
210+
topk_best_pth_files = []
211+
212+
epoch_pth_files = [f for f in checkpoints_files if osp.basename(f).startswith(('epoch_', 'iter_'))]
213+
if len(epoch_pth_files) > 0:
214+
topk_epoch_pth_files = sorted(epoch_pth_files, key=os.path.getctime, reverse=True)
215+
else:
216+
topk_epoch_pth_files = []
217+
218+
# python will check the length of list
219+
return topk_best_pth_files[0:k] + topk_epoch_pth_files[0:k]
220+
221+
222+
# TODO save topk checkpoints, fix invalid stage due to delete checkpoint
148223
def _write_latest_ymir_training_result(last: bool = False, key_score: Optional[float] = None):
149224
if key_score:
150225
logging.info(f'key_score is {key_score}')
@@ -165,6 +240,11 @@ def _write_latest_ymir_training_result(last: bool = False, key_score: Optional[f
165240

166241
if last:
167242
# save all output file
243+
ymir_cfg = get_merged_config()
244+
max_keep_checkpoints = int(ymir_cfg.param.get('max_keep_checkpoints', 1))
245+
if max_keep_checkpoints > 0:
246+
topk_checkpoints = get_topk_checkpoints(result_files, max_keep_checkpoints)
247+
result_files = [f for f in result_files if not f.endswith(('.pth', '.pt'))] + topk_checkpoints
168248
rw.write_model_stage(files=result_files, mAP=float(map), stage_name='last')
169249
else:
170250
# save newest weight file in format epoch_xxx.pth or iter_xxx.pth
@@ -201,13 +281,17 @@ def _write_ancient_ymir_training_result(key_score: Optional[float] = None):
201281
# eval_result may be empty dict {}.
202282
map = eval_result.get('bbox_mAP_50', 0)
203283

204-
WORK_DIR = os.getenv('YMIR_MODELS_DIR')
205-
if WORK_DIR is None or not osp.isdir(WORK_DIR):
206-
raise Exception(f'please set valid environment variable YMIR_MODELS_DIR, invalid directory {WORK_DIR}')
284+
ymir_cfg = get_merged_config()
285+
WORK_DIR = ymir_cfg.ymir.output.models_dir
207286

208287
# assert only one model config file in work_dir
209288
result_files = [osp.basename(f) for f in glob.glob(osp.join(WORK_DIR, '*')) if osp.basename(f) != 'result.yaml']
210289

290+
max_keep_checkpoints = int(ymir_cfg.param.get('max_keep_checkpoints', 1))
291+
if max_keep_checkpoints > 0:
292+
topk_checkpoints = get_topk_checkpoints(result_files, max_keep_checkpoints)
293+
result_files = [f for f in result_files if not f.endswith(('.pth', '.pt'))] + topk_checkpoints
294+
211295
training_result_file = osp.join(WORK_DIR, 'result.yaml')
212296
if osp.exists(training_result_file):
213297
with open(training_result_file, 'r') as f:

det-mmdetection-tmi/tools/train.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
from mmdet.datasets import build_dataset
1717
from mmdet.models import build_detector
1818
from mmdet.utils import collect_env, get_root_logger, setup_multi_processes
19-
from mmdet.utils.util_ymir import modify_mmdet_config
19+
from mmdet.utils.util_ymir import modify_mmcv_config
2020
from ymir_exc.util import get_merged_config
2121

2222

@@ -101,7 +101,7 @@ def main():
101101
cfg = Config.fromfile(args.config)
102102
print(cfg)
103103
# modify mmdet config from file
104-
cfg = modify_mmdet_config(mmdet_cfg=cfg, ymir_cfg=ymir_cfg)
104+
modify_mmcv_config(mmcv_cfg=cfg, ymir_cfg=ymir_cfg)
105105

106106
if args.cfg_options is not None:
107107
cfg.merge_from_dict(args.cfg_options)
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
shm_size: '32G'
22
export_format: 'ark:raw'
33
samples_per_gpu: 16
4-
workers_per_gpu: 16
4+
workers_per_gpu: 8
55
max_epochs: 300
66
config_file: 'configs/yolox/yolox_tiny_8x8_300e_coco.py'
77
args_options: ''
88
cfg_options: ''
99
metric: 'bbox'
10+
val_interval: 1 # <0 means evaluation every interval
11+
max_keep_checkpoints: 1 # <0 means save all weight file, 1 means save last and best weight files, k means save topk best weight files and topk epoch/step weigth files
1012
port: 12345

det-mmdetection-tmi/ymir_infer.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
from easydict import EasyDict as edict
1010
from mmcv import DictAction
1111
from mmdet.apis import inference_detector, init_detector
12-
from mmdet.utils.util_ymir import get_weight_file
12+
from mmdet.utils.util_ymir import get_best_weight_file
1313
from tqdm import tqdm
1414
from ymir_exc import dataset_reader as dr
1515
from ymir_exc import env, monitor
@@ -87,7 +87,7 @@ def __init__(self, cfg: edict):
8787

8888
# Specify the path to model config and checkpoint file
8989
config_file = get_config_file(cfg)
90-
checkpoint_file = get_weight_file(cfg)
90+
checkpoint_file = get_best_weight_file(cfg)
9191
options = cfg.param.get('cfg_options', None)
9292
cfg_options = parse_option(options) if options else None
9393

det-mmdetection-tmi/ymir_mining.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,10 @@ def mining(self):
283283
beta = 1.3
284284
mining_result = []
285285
for asset_path in tbar:
286+
# batch-level sync, avoid 30min time-out error
287+
if LOCAL_RANK != -1:
288+
dist.barrier()
289+
286290
img = cv2.imread(asset_path)
287291
# xyxy,conf,cls
288292
result = self.predict(img)

det-mmdetection-tmi/ymir_train.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@
55
import sys
66

77
from easydict import EasyDict as edict
8-
from mmdet.utils.util_ymir import get_weight_file, write_ymir_training_result
8+
from mmdet.utils.util_ymir import get_best_weight_file, write_ymir_training_result
99
from ymir_exc import monitor
10-
from ymir_exc.util import YmirStage, get_merged_config, get_ymir_process
10+
from ymir_exc.util import YmirStage, find_free_port, get_merged_config, get_ymir_process
1111

1212

1313
def main(cfg: edict) -> int:
@@ -32,7 +32,7 @@ def main(cfg: edict) -> int:
3232
(cfg_options is None or (cfg_options.find('load_from') == -1 and
3333
cfg_options.find('resume_from') == -1)):
3434

35-
weight_file = get_weight_file(cfg)
35+
weight_file = get_best_weight_file(cfg)
3636
if weight_file:
3737
if cfg_options:
3838
cfg_options += f' load_from={weight_file}'
@@ -55,7 +55,7 @@ def main(cfg: edict) -> int:
5555
f"--work-dir {work_dir} --gpu-id {gpu_id}"
5656
else:
5757
os.environ.setdefault('CUDA_VISIBLE_DEVICES', gpu_id)
58-
port = cfg.param.get('port')
58+
port = find_free_port()
5959
os.environ.setdefault('PORT', str(port))
6060
cmd = f"bash ./tools/dist_train.sh {config_file} {num_gpus} " + \
6161
f"--work-dir {work_dir}"

0 commit comments

Comments
 (0)