Skip to content

Commit 3f6f055

Browse files
Eval
1 parent f77eee3 commit 3f6f055

363 files changed

Lines changed: 35486 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Encoder_Eval/OpenTAD/.gitignore

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# ignore folder
2+
.vscode
3+
.idea
4+
dataset
5+
/pretrained/
6+
/logs/
7+
/exps/
8+
/trash/
9+
/temp/
10+
11+
# ignore annotation
12+
!/data/
13+
/data/*
14+
!/data/*.sh
15+
16+
dcgm
17+
log
18+
*.err
19+
*.out
20+
/wandb/
21+
build/
22+
dist/
23+
24+
# Byte-compiled / optimized / DLL files
25+
__pycache__/
26+
*.py[cod]
27+
*$py.class
28+
29+
# C extensions
30+
*.so
31+
32+
# Distribution / packaging
33+
.Python
34+
build/
35+
develop-eggs/
36+
dist/
37+
downloads/
38+
eggs/
39+
.eggs/
40+
parts/
41+
sdist/
42+
wheels/
43+
pip-wheel-metadata/
44+
share/python-wheels/
45+
*.egg-info/
46+
.installed.cfg
47+
*.egg
48+
MANIFEST
49+
50+
# PyInstaller
51+
# Usually these files are written by a python script from a template
52+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
53+
*.manifest
54+
*.spec
55+
56+
# Installer logs
57+
pip-log.txt
58+
pip-delete-this-directory.txt
59+
60+
# Unit test / coverage reports
61+
htmlcov/
62+
.tox/
63+
.nox/
64+
.coverage
65+
.coverage.*
66+
.cache
67+
nosetests.xml
68+
coverage.xml
69+
*.cover
70+
*.py,cover
71+
.hypothesis/
72+
.pytest_cache/
73+
74+
# Translations
75+
*.mo
76+
*.pot
77+
78+
# Django stuff:
79+
*.log
80+
local_settings.py
81+
db.sqlite3
82+
db.sqlite3-journal
83+
84+
# Flask stuff:
85+
instance/
86+
.webassets-cache
87+
88+
# Scrapy stuff:
89+
.scrapy
90+
91+
# Sphinx documentation
92+
docs/_build/
93+
94+
# PyBuilder
95+
target/
96+
97+
# Jupyter Notebook
98+
.ipynb_checkpoints
99+
100+
# IPython
101+
profile_default/
102+
ipython_config.py
103+
104+
# pyenv
105+
.python-version
106+
107+
# pipenv
108+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
109+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
110+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
111+
# install all needed dependencies.
112+
#Pipfile.lock
113+
114+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
115+
__pypackages__/
116+
117+
# Environments
118+
.env
119+
.venv
120+
env/
121+
venv/
122+
ENV/
123+
env.bak/
124+
venv.bak/
125+
126+
# model weight
127+
TAD_docker/
128+
pretrained/
129+
perception_models/

Encoder_Eval/OpenTAD/README.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# OpenTAD
2+
# OpenTAD (CUDA 12.5 Compatible)
3+
4+
> **Status**: *alpha* — installation verified on CUDA 12.5; dataset/model pipelines still untested (see **TODO**).
5+
6+
---
7+
8+
## Table of Contents
9+
10+
1. [Installation](#installation)
11+
2. [Usage](#usage)
12+
3. [Dataset Layout](#dataset-layout)
13+
4. [`video_TAD.sh` Arguments](#video_TADsh-arguments)
14+
5. [TODO](#todo)
15+
16+
---
17+
18+
## Installation
19+
20+
### 1. Create the environment & install **PyTorch**
21+
22+
```bash
23+
conda create -n opentad python=3.10.12
24+
conda activate opentad
25+
26+
# CUDA 12.4 wheels also work on 12.5
27+
pip install torch==2.2.2 torchvision==0.17.2 \
28+
--extra-index-url https://download.pytorch.org/whl/cu124
29+
```
30+
> Python < = 3.9 or Python > = 3.11 may fail to install the environment of OpenTAD
31+
### 2. Install **MMCV** & **MMAction2**
32+
33+
```bash
34+
pip install openmim
35+
mim install mmcv==2.1.0
36+
mim install mmaction2==1.2.0
37+
```
38+
39+
> **Heads‑up 📌** `mmaction2 == 1.2.0` may raise an `import drn` error. Fix:
40+
>
41+
> 1. Clone `https://github.com/open-mmlab/mmaction2` (matching tag).
42+
> 2. Copy the folder `mmaction/models/localizers/drn` into the same path inside your **conda** site‑packages for `mmaction2`.
43+
44+
### 3. Install **OpenTAD**
45+
46+
```bash
47+
git clone git@github.com:sming256/OpenTAD.git
48+
cd OpenTAD
49+
pip install -r requirements.txt
50+
```
51+
52+
---
53+
54+
## Usage
55+
56+
The project is wrapped by a single entry‑point script:
57+
58+
```bash
59+
bash video_TAD.sh
60+
```
61+
62+
This will perform:
63+
64+
1. **Feature extraction** (Hugging Face or local `.pth` backbones)
65+
2. **Training / inference** with an Action Detection model
66+
67+
---
68+
69+
## Dataset Layout
70+
71+
```
72+
<DATA_PATH>
73+
└── <dataset_name>/
74+
├── raw_data/
75+
│ └── video/ # *.mp4 | *.avi
76+
├── feature/ # extracted *.npy features
77+
└── annotations/ # *.json or *.csv labels
78+
```
79+
如果数据集需要重新下载,或者 annotation 缺失,可以通过 [Encoder_TAD 数据下载指南](https://github.com/FeilongTangmonash/Encoder_TAD/blob/41f101281c6c1259e5a38f8f642e539d0861932e/doc/en/data.md) 来查看如何下载数据。
80+
81+
82+
---
83+
84+
## `video_TAD.sh` Arguments
85+
86+
| Variable | Description | Example |
87+
| --------------------- | ---------------------------------------- | ----------------------------------- |
88+
| `DATA_PATH` | Root folder of the dataset (see above) | `/data/charades` |
89+
| `CONFIG_PATH` | Path to the model config you want to run | `configs/charades/temporalmaxer.py` |
90+
| `CHECKPOINT_PATH` | Where to save / load model checkpoints | `./work_dirs` |
91+
| **Hugging Face mode** | | |
92+
| `MODEL_NAME` | HF *short* model id | `videomae-base` |
93+
| `CKPT` | Full HF repo path | `facebook/videomae-base` |
94+
| `MODEL_TYPE` | Backbone family name | `videomae` / `internvideo` |
95+
| **Local `.pth` mode** | | |
96+
| `MODEL_NAME` | Name accepted by `timm.create_model` | `internvideo2_tem_dense_urope_tube_small_patch16_224_fc_512_v1` |
97+
| `CKPT` | `.pth` checkpoint path | `~/checkpoints/backbone_tube248_dense_moreepoch.pt` |
98+
| `MODEL_TYPE` | Custom family name | `univit` |
99+
100+
---
101+
102+
## TODO
103+
104+
* [ ] Validate **dataset preprocessing** scripts on target datasets
105+
* [ ] Benchmark **model training** & ensure checkpoints load correctly
106+
* [ ] Add CI workflow for CUDA 12.5 container build
107+
108+
---
109+
110+
## License
111+
112+
This fork inherits the original [OpenTAD license](LICENSE) unless otherwise noted.
113+
114+
---
115+
116+
*Enjoy Temporal Action Detection!* 🚀
117+
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
model = dict(
2+
type="ActionFormer",
3+
projection=dict(
4+
type="Conv1DTransformerProj",
5+
in_channels=2048,
6+
out_channels=512,
7+
arch=(2, 2, 5), # layers in embed / stem / branch
8+
conv_cfg=dict(kernel_size=3, proj_pdrop=0.0),
9+
norm_cfg=dict(type="LN"),
10+
attn_cfg=dict(n_head=4, n_mha_win_size=19),
11+
path_pdrop=0.1,
12+
use_abs_pe=False,
13+
max_seq_len=2304,
14+
),
15+
neck=dict(
16+
type="FPNIdentity",
17+
in_channels=512,
18+
out_channels=512,
19+
num_levels=6,
20+
),
21+
rpn_head=dict(
22+
type="ActionFormerHead",
23+
num_classes=20,
24+
in_channels=512,
25+
feat_channels=512,
26+
num_convs=2,
27+
cls_prior_prob=0.01,
28+
prior_generator=dict(
29+
type="PointGenerator",
30+
strides=[1, 2, 4, 8, 16, 32],
31+
regression_range=[(0, 4), (4, 8), (8, 16), (16, 32), (32, 64), (64, 10000)],
32+
),
33+
loss_normalizer=100,
34+
loss_normalizer_momentum=0.9,
35+
center_sample="radius",
36+
center_sample_radius=1.5,
37+
label_smoothing=0.0,
38+
loss=dict(
39+
cls_loss=dict(type="FocalLoss"),
40+
reg_loss=dict(type="DIOULoss"),
41+
),
42+
),
43+
)
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
model = dict(
2+
type="AFSD",
3+
neck=dict(
4+
type="AFSDNeck",
5+
in_channels=2048,
6+
out_channels=512,
7+
frame_num=768, # 96*8
8+
layer_num=6,
9+
),
10+
rpn_head=dict(
11+
type="AFSDCoarseHead",
12+
in_channels=512,
13+
out_channels=512,
14+
frame_num=768, # 96*8
15+
fpn_strides=[4, 8, 16, 32, 64, 128],
16+
num_classes=2,
17+
layer_num=6,
18+
feat_t=768 // 8,
19+
),
20+
roi_head=dict(
21+
type="AFSDRefineHead",
22+
in_channels=512,
23+
num_classes=2,
24+
# for loss
25+
overlap_thresh=0.6,
26+
loc_weight=1.0,
27+
loc_bounded=True,
28+
use_smooth_l1=True,
29+
),
30+
)
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
model = dict(
2+
type="BMN",
3+
projection=dict(
4+
type="ConvSingleProj",
5+
in_channels=400,
6+
out_channels=256,
7+
num_convs=2,
8+
conv_cfg=dict(groups=4),
9+
),
10+
rpn_head=dict(
11+
type="TemporalEvaluationHead", # tem
12+
in_channels=256,
13+
num_classes=2,
14+
conv_cfg=dict(groups=4),
15+
loss=dict(pos_thresh=0.5, gt_type=["startness", "endness"]),
16+
),
17+
roi_head=dict(
18+
type="StandardProposalMapHead",
19+
proposal_generator=dict(type="DenseProposalMap", tscale=128, dscale=128),
20+
proposal_roi_extractor=dict(
21+
type="BMNExtractor",
22+
in_channels=256,
23+
roi_channels=512,
24+
out_channels=128,
25+
tscale=128,
26+
dscale=128,
27+
prop_extend_ratio=0.5,
28+
),
29+
proposal_head=dict(
30+
type="PEMHead", # FC_head
31+
in_channels=128,
32+
feat_channels=128,
33+
num_convs=2,
34+
num_classes=2,
35+
loss=dict(
36+
cls_loss=dict(type="BalancedBCELoss", pos_thresh=0.9),
37+
reg_loss=dict(type="BalancedL2Loss", high_thresh=0.7, low_thresh=0.3, weight=5.0),
38+
),
39+
),
40+
),
41+
)

0 commit comments

Comments
 (0)