Name	Name	Last commit message	Last commit date
parent directory ..
InternVideo	InternVideo
data	data
.gitignore	.gitignore
README.md	README.md
demo.py	demo.py

Name

Last commit message

Last commit date

Multi-Modalities-Pretraining

This is an official implementation of the multi-modalities pre-training model in InternVideo, which is resposible for multi-modalities tasks including zero-shot action recognition, zero-shot multiple choice, zero-shot retrieval, video question answering, video-text retrieval, and also one of the component in the final InternVideo model.

Usage

Model preparation

We currently provide the B/16 model, please download the model from aliyun and place them under folder models. The model uses UniformerV2 as backbone, and is trained for 12 days using 128 NVIDIA A100 GPUs.

Demo

To classify the demo video of an airplane taking off, run python demo.py, and hopefully you'll see the results as (L/14 model)

Label probs:
an airplane is taking off     : 0.9562
an airplane is flying         : 0.0438
a dog is chasing a ball       : 0.0000

For downstream tasks

This folder aims at providing a minimal inference implementation for easier usage. For training and fine-tuning for downstream tasks, please refer to other specific folders.

If you intend to use InternVideo for your own video-language tasks, use the video encoder and text encoder only for alignment tasks such as retrieval, and use the features from cross-modality decoder at the same time if your task involves modalities fusion such as video question answering.

TODO

The training code and L/14 model is on its way.

Acknowledgement

This repo is built based on CLIP and open_clip.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Multi-Modalities-Pretraining

Usage

Model preparation

Demo

For downstream tasks

TODO

Acknowledgement

FilesExpand file tree

Multi-Modalities-Pretraining

Directory actions

More options

Directory actions

More options

Latest commit

History

Multi-Modalities-Pretraining

Folders and files

parent directory

README.md

Multi-Modalities-Pretraining

Usage

Model preparation

Demo

For downstream tasks

TODO

Acknowledgement