📘Documentation | 🛠️Installation | 👀Model Zoo | 🆕Update News | 🚀Ongoing Projects | 🤔Reporting Issues
English | 简体中文
MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project.
The master branch works with PyTorch 1.5+.
Action Recognition Results on Kinetics-400
Skeleton-based Action Recognition Results on NTU-RGB+D-120
Skeleton-based Spatio-Temporal Action Detection and Action Recognition Results on Kinetics-400
Spatio-Temporal Action Detection Results on AVA-2.1
Modular design: We decompose a video understanding framework into different components. One can easily construct a customized video understanding framework by combining different modules.
Support four major video understanding tasks: MMAction2 implements various algorithms for multiple video understanding tasks, including action recognition, action localization, spatio-temporal action detection, and skeleton-based action detection. We support 27 different algorithms and 20 different datasets for the four major tasks.
Well tested and documented: We provide detailed documentation and API reference, as well as unit tests.
A brand new version of MMAction2 v1.0.0rc0 was released in 01/09/2022:
Find more new features in 1.x branch. Issues and PRs are welcome!
Release: v0.24.0 was released in 05/05/2022. Please refer to changelog.md for details and release history.
MMAction2 depends on PyTorch, MMCV, MMDetection (optional), and MMPose(optional). Below are quick steps for installation. Please refer to install.md for more detailed instruction.
conda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y
conda activate open-mmlab
pip3 install openmim
mim install mmcv-full
mim install mmdet # optional
mim install mmpose # optional
git clone https://github.com/open-mmlab/mmaction2.git
cd mmaction2
pip3 install -e .
Please see getting_started.md for the basic usage of MMAction2. There are also tutorials:
A Colab tutorial is also provided. You may preview the notebook here or directly run on Colab.
Action Recognition | ||||
C3D (CVPR'2014) | TSN (ECCV'2016) | I3D (CVPR'2017) | I3D Non-Local (CVPR'2018) | R(2+1)D (CVPR'2018) |
TRN (ECCV'2018) | TSM (ICCV'2019) | TSM Non-Local (ICCV'2019) | SlowOnly (ICCV'2019) | SlowFast (ICCV'2019) |
CSN (ICCV'2019) | TIN (AAAI'2020) | TPN (CVPR'2020) | X3D (CVPR'2020) | OmniSource (ECCV'2020) |
MultiModality: Audio (ArXiv'2020) | TANet (ArXiv'2020) | TimeSformer (ICML'2021) | ||
Action Localization | ||||
SSN (ICCV'2017) | BSN (ECCV'2018) | BMN (ICCV'2019) | ||
Spatio-Temporal Action Detection | ||||
ACRN (ECCV'2018) | SlowOnly+Fast R-CNN (ICCV'2019) | SlowFast+Fast R-CNN (ICCV'2019) | LFB (CVPR'2019) | |
Skeleton-based Action Recognition | ||||
ST-GCN (AAAI'2018) | 2s-AGCN (CVPR'2019) | PoseC3D (ArXiv'2021) |
Results and models are available in the README.md of each method's config directory. A summary can be found on the model zoo page.
We will keep up with the latest progress of the community and support more popular algorithms and frameworks. If you have any feature requests, please feel free to leave a comment in Issues.
Action Recognition | |||
HMDB51 (Homepage) (ICCV'2011) | UCF101 (Homepage) (CRCV-IR-12-01) | ActivityNet (Homepage) (CVPR'2015) | Kinetics-[400/600/700] (Homepage) (CVPR'2017) |
SthV1 (ICCV'2017) | SthV2 (Homepage) (ICCV'2017) | Diving48 (Homepage) (ECCV'2018) | Jester (Homepage) (ICCV'2019) |
Moments in Time (Homepage) (TPAMI'2019) | Multi-Moments in Time (Homepage) (ArXiv'2019) | HVU (Homepage) (ECCV'2020) | OmniSource (Homepage) (ECCV'2020) |
FineGYM (Homepage) (CVPR'2020) | |||
Action Localization | |||
THUMOS14 (Homepage) (THUMOS Challenge 2014) | ActivityNet (Homepage) (CVPR'2015) | ||
Spatio-Temporal Action Detection | |||
UCF101-24* (Homepage) (CRCV-IR-12-01) | JHMDB* (Homepage) (ICCV'2015) | AVA (Homepage) (CVPR'2018) | |
Skeleton-based Action Recognition | |||
PoseC3D-FineGYM (Homepage) (ArXiv'2021) | PoseC3D-NTURGB+D (Homepage) (ArXiv'2021) | PoseC3D-UCF101 (Homepage) (ArXiv'2021) | PoseC3D-HMDB51 (Homepage) (ArXiv'2021) |
Datasets marked with * are not fully supported yet, but related dataset preparation steps are provided. A summary can be found on the Supported Datasets page.
To demonstrate the efficacy and efficiency of our framework, we compare MMAction2 with some other popular frameworks and official releases in terms of speed. Details can be found in benchmark.
Please refer to data_preparation.md for a general knowledge of data preparation. The supported datasets are listed in supported_datasets.md
Please refer to FAQ for frequently asked questions.
Currently, there are many research works and projects built on MMAction2 by users from community, such as:
etc., check projects.md to see all related projects.
We appreciate all contributions to improve MMAction2. Please refer to CONTRIBUTING.md in MMCV for more details about the contributing guideline.
MMAction2 is an open-source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features and users who give valuable feedback. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their new models.
If you find this project useful in your research, please consider cite:
@misc{2020mmaction2,
title={OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark},
author={MMAction2 Contributors},
howpublished = {\url{https://github.com/open-mmlab/mmaction2}},
year={2020}
}
This project is released under the Apache 2.0 license.