Skip to content

Latest commit

 

History

History
77 lines (61 loc) · 3.91 KB

File metadata and controls

77 lines (61 loc) · 3.91 KB

TOSA Paper Verification Report

1. Paper Read-Through (Complete)

Read full local PDF at papers/2504.09228.pdf (11 pages, including method, experiments, references).

Key extracted technical details:

  • Core method: ViT single-stream tracker with occlusion-robust representation (ORR) via random template masking.
  • Masking mechanism: random masking modeled with a spatial Cox process (Eq. 1 in paper).
  • ORR training objective: MSE consistency loss between unmasked vs masked template token features (Eq. 2).
  • Distillation: Adaptive Feature-Based Knowledge Distillation (AFKD) with IoU-conditioned weighting (Eq. 3).
  • Prediction loss: focal + GIoU + L1, with weighted sum (Eq. 4).
  • Training recipe (reported): batch 32, AdamW, lr 4e-4, wd 1e-4, 300 epochs, LR drop at epoch 240.
  • Datasets (reported training): GOT-10k, LaSOT, COCO, TrackingNet.
  • Benchmarks (reported eval): DTB70, UAVDT, VisDrone2018, UAV123.

2. Reference Repo Check

Cloned: repositories/ORTrack.

Validation findings:

  • Repository exists and is public.
  • README matches paper title and CVPR 2025 claim.
  • Code structure contains train/test pipelines and ORTrack implementation files.
  • Config files align with paper hyperparameters.

Practical runtime check:

  • Direct run in modern Python/Torch environment fails due legacy dependencies (torch._six import and old stack assumptions).
  • Requirements pin torch==1.10.0+cu102, torchvision==0.10.0+cu102, python=3.8 style environment.

3. Dataset Accessibility Check

Shared dataset volume mounted:

  • /Volumes/AIFlowDev/RobotFlowLabs/datasets/

Found locally:

  • shared/coco
  • wave10_staging/visdrone exists but currently empty (download log shows Google Drive quota failures).

Not found in current shared volume scan:

  • UAVDT, DTB70, UAV123, full GOT-10k, full LaSOT, full TrackingNet, internal 1.8M UAV dataset path explicitly named for this module.

4. Metrics Plausibility Check

Paper claims (example):

  • ORTrack-DeiT on UAVDT: 83.4 Prec / 60.1 Succ.
  • ORTrack-DeiT on VisDrone2018: 88.6 Prec / 66.8 Succ.
  • ORTrack-DeiT speed: ~206 FPS (GPU), ORTrack-D speed uplift.

Plausibility assessment:

  • Claims are internally consistent with reported lightweight tiny-backbone architecture.
  • Reported ablations are coherent (ORR improves robustness; AFKD improves speed with small accuracy drop).
  • Full independent rerun not yet possible in this module due environment and dataset availability constraints.

5. Independent Reproduction / Citation Signal

External signal gathered:

  • Semantic Scholar entry resolves and reports citation activity for this work.
  • GitHub repository shows meaningful community engagement (stars/issues/forks).

Status:

  • Independent third-party full reproduction package not identified yet.
  • Evidence indicates active downstream usage/citation, but not enough to mark fully independently reproduced in our infrastructure.

6. Red Flags (Documented)

  1. Environment fragility: upstream stack is tightly pinned to older CUDA/PyTorch tooling.
  2. Dataset bottleneck: staged VisDrone download attempts currently blocked by Google Drive quota.
  3. Weight packaging: teacher model artifact handling in upstream repo is manual and not reproducibility-optimized.

7. Verdict

  • Verdict: VERIFIED_WITH_RISKS
  • Paper is real, method is technically coherent, and code exists.
  • Proceed with staged ANIMA integration using a compatibility-first implementation path.
  • CTO review flag: YES (due reproducibility risk from environment and dataset access).