EvolvingLMMs-Lab
diff --git a/‎Encoder_Eval/OpenTAD/.gitignore‎
Lines changed: 129 additions & 0 deletions b/‎Encoder_Eval/OpenTAD/.gitignore‎
Lines changed: 129 additions & 0 deletions
diff --git a/‎Encoder_Eval/OpenTAD/README.md‎
Lines changed: 117 additions & 0 deletions b/‎Encoder_Eval/OpenTAD/README.md‎
Lines changed: 117 additions & 0 deletions
diff --git a/‎Encoder_Eval/OpenTAD/configs/_base_/models/actionformer.py‎
Lines changed: 43 additions & 0 deletions b/‎Encoder_Eval/OpenTAD/configs/_base_/models/actionformer.py‎
Lines changed: 43 additions & 0 deletions
diff --git a/‎Encoder_Eval/OpenTAD/configs/_base_/models/afsd.py‎
Lines changed: 30 additions & 0 deletions b/‎Encoder_Eval/OpenTAD/configs/_base_/models/afsd.py‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎Encoder_Eval/OpenTAD/configs/_base_/models/bmn.py‎
Lines changed: 41 additions & 0 deletions b/‎Encoder_Eval/OpenTAD/configs/_base_/models/bmn.py‎
Lines changed: 41 additions & 0 deletions
@@ -0,0 +1,129 @@
+# ignore folder
+.vscode
+.idea
+dataset
+/pretrained/
+/logs/
+/exps/
+/trash/
+/temp/
+
+# ignore annotation
+!/data/
+/data/*
+!/data/*.sh
+
+dcgm
+log
+*.err
+*.out
+/wandb/
+build/
+dist/
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+parts/
+sdist/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# model weight
+TAD_docker/
+pretrained/
+perception_models/
@@ -0,0 +1,117 @@
+# OpenTAD
+# OpenTAD (CUDA 12.5 Compatible)
+
+> **Status**: *alpha* — installation verified on CUDA 12.5; dataset/model pipelines still untested (see **TODO**).
+
+---
+
+## Table of Contents
+
+1. [Installation](#installation)
+2. [Usage](#usage)
+3. [Dataset Layout](#dataset-layout)
+4. [`video_TAD.sh` Arguments](#video_TADsh-arguments)
+5. [TODO](#todo)
+
+---
+
+## Installation
+
+### 1. Create the environment & install **PyTorch**
+
+```bash
+conda create -n opentad python=3.10.12
+conda activate opentad
+
+# CUDA 12.4 wheels also work on 12.5
+pip install torch==2.2.2 torchvision==0.17.2 \
+  --extra-index-url https://download.pytorch.org/whl/cu124
+```
+> Python < = 3.9 or Python > = 3.11 may fail to install the environment of OpenTAD
+### 2. Install **MMCV** & **MMAction2**
+
+```bash
+pip install openmim
+mim install mmcv==2.1.0
+mim install mmaction2==1.2.0
+```
+
+> **Heads‑up 📌** `mmaction2 == 1.2.0` may raise an `import drn` error. Fix:
+>
+> 1. Clone `https://github.com/open-mmlab/mmaction2` (matching tag).
+> 2. Copy the folder `mmaction/models/localizers/drn` into the same path inside your **conda** site‑packages for `mmaction2`.
+
+### 3. Install **OpenTAD**
+
+```bash
+git clone git@github.com:sming256/OpenTAD.git
+cd OpenTAD
+pip install -r requirements.txt
+```
+
+---
+
+## Usage
+
+The project is wrapped by a single entry‑point script:
+
+```bash
+bash video_TAD.sh
+```
+
+This will perform:
+
+1. **Feature extraction** (Hugging Face or local `.pth` backbones)
+2. **Training / inference** with an Action Detection model
+
+---
+
+## Dataset Layout
+
+```
+<DATA_PATH>
+└── <dataset_name>/
+    ├── raw_data/
+    │   └── video/          #   *.mp4 | *.avi
+    ├── feature/            #   extracted *.npy features
+    └── annotations/        #   *.json or *.csv labels
+```
+如果数据集需要重新下载，或者 annotation 缺失，可以通过 [Encoder_TAD 数据下载指南](https://github.com/FeilongTangmonash/Encoder_TAD/blob/41f101281c6c1259e5a38f8f642e539d0861932e/doc/en/data.md) 来查看如何下载数据。
+
+
+---
+
+## `video_TAD.sh` Arguments
+
+| Variable              | Description                              | Example                             |
+| --------------------- | ---------------------------------------- | ----------------------------------- |
+| `DATA_PATH`           | Root folder of the dataset (see above)   | `/data/charades`                    |
+| `CONFIG_PATH`         | Path to the model config you want to run | `configs/charades/temporalmaxer.py` |
+| `CHECKPOINT_PATH`     | Where to save / load model checkpoints   | `./work_dirs`                       |
+| **Hugging Face mode** |                                          |                                     |
+| `MODEL_NAME`          | HF *short* model id                      | `videomae-base`                     |
+| `CKPT`                | Full HF repo path                        | `facebook/videomae-base`            |
+| `MODEL_TYPE`          | Backbone family name                     | `videomae` / `internvideo`          |
+| **Local `.pth` mode** |                                          |                                     |
+| `MODEL_NAME`          | Name accepted by `timm.create_model`     | `internvideo2_tem_dense_urope_tube_small_patch16_224_fc_512_v1`          |
+| `CKPT`                | `.pth` checkpoint path                   | `~/checkpoints/backbone_tube248_dense_moreepoch.pt`     |
+| `MODEL_TYPE`          | Custom family name                       | `univit`                            |
+
+---
+
+## TODO
+
+* [ ] Validate **dataset preprocessing** scripts on target datasets
+* [ ] Benchmark **model training** & ensure checkpoints load correctly
+* [ ] Add CI workflow for CUDA 12.5 container build
+
+---
+
+## License
+
+This fork inherits the original [OpenTAD license](LICENSE) unless otherwise noted.
+
+---
+
+*Enjoy Temporal Action Detection!* 🚀
+
@@ -0,0 +1,43 @@
+model = dict(
+    type="ActionFormer",
+    projection=dict(
+        type="Conv1DTransformerProj",
+        in_channels=2048,
+        out_channels=512,
+        arch=(2, 2, 5),  # layers in embed / stem / branch
+        conv_cfg=dict(kernel_size=3, proj_pdrop=0.0),
+        norm_cfg=dict(type="LN"),
+        attn_cfg=dict(n_head=4, n_mha_win_size=19),
+        path_pdrop=0.1,
+        use_abs_pe=False,
+        max_seq_len=2304,
+    ),
+    neck=dict(
+        type="FPNIdentity",
+        in_channels=512,
+        out_channels=512,
+        num_levels=6,
+    ),
+    rpn_head=dict(
+        type="ActionFormerHead",
+        num_classes=20,
+        in_channels=512,
+        feat_channels=512,
+        num_convs=2,
+        cls_prior_prob=0.01,
+        prior_generator=dict(
+            type="PointGenerator",
+            strides=[1, 2, 4, 8, 16, 32],
+            regression_range=[(0, 4), (4, 8), (8, 16), (16, 32), (32, 64), (64, 10000)],
+        ),
+        loss_normalizer=100,
+        loss_normalizer_momentum=0.9,
+        center_sample="radius",
+        center_sample_radius=1.5,
+        label_smoothing=0.0,
+        loss=dict(
+            cls_loss=dict(type="FocalLoss"),
+            reg_loss=dict(type="DIOULoss"),
+        ),
+    ),
+)
@@ -0,0 +1,30 @@
+model = dict(
+    type="AFSD",
+    neck=dict(
+        type="AFSDNeck",
+        in_channels=2048,
+        out_channels=512,
+        frame_num=768,  # 96*8
+        layer_num=6,
+    ),
+    rpn_head=dict(
+        type="AFSDCoarseHead",
+        in_channels=512,
+        out_channels=512,
+        frame_num=768,  # 96*8
+        fpn_strides=[4, 8, 16, 32, 64, 128],
+        num_classes=2,
+        layer_num=6,
+        feat_t=768 // 8,
+    ),
+    roi_head=dict(
+        type="AFSDRefineHead",
+        in_channels=512,
+        num_classes=2,
+        # for loss
+        overlap_thresh=0.6,
+        loc_weight=1.0,
+        loc_bounded=True,
+        use_smooth_l1=True,
+    ),
+)
@@ -0,0 +1,41 @@
+model = dict(
+    type="BMN",
+    projection=dict(
+        type="ConvSingleProj",
+        in_channels=400,
+        out_channels=256,
+        num_convs=2,
+        conv_cfg=dict(groups=4),
+    ),
+    rpn_head=dict(
+        type="TemporalEvaluationHead",  # tem
+        in_channels=256,
+        num_classes=2,
+        conv_cfg=dict(groups=4),
+        loss=dict(pos_thresh=0.5, gt_type=["startness", "endness"]),
+    ),
+    roi_head=dict(
+        type="StandardProposalMapHead",
+        proposal_generator=dict(type="DenseProposalMap", tscale=128, dscale=128),
+        proposal_roi_extractor=dict(
+            type="BMNExtractor",
+            in_channels=256,
+            roi_channels=512,
+            out_channels=128,
+            tscale=128,
+            dscale=128,
+            prop_extend_ratio=0.5,
+        ),
+        proposal_head=dict(
+            type="PEMHead",  # FC_head
+            in_channels=128,
+            feat_channels=128,
+            num_convs=2,
+            num_classes=2,
+            loss=dict(
+                cls_loss=dict(type="BalancedBCELoss", pos_thresh=0.9),
+                reg_loss=dict(type="BalancedL2Loss", high_thresh=0.7, low_thresh=0.3, weight=5.0),
+            ),
+        ),
+    ),
+)