This repository contains code for 3D-RetinaNet, a novel Single-Stage action detection newtwork proposed along with ROAD dataset. Our TPAMI paper contain detailed description 3D-RetinaNet and ROAD dataset. This code contains training and evaluation for ROAD and UCF-24 datasets.
We need three things to get started with training: datasets, kinetics pre-trained weight, and pytorch with torchvision and tensoboardX.
We currently only support following two dataset.
Visit ROAD dataset for download and pre-processing.
You can download rgb-images
it from my google drive link for UCF24 Dataset. Download annotations from corrected-UCF10-annots-repo.
- ucf24/
- pyannot_with_class_names.pkl
- rgb-images
- class-name ...
- video-name ...
- images ......
pip install tensorboardx
kinetics-pt
and run the bash file get_kinetics_weights.sh. OR Download them from Google-Drive. Name the folder kinetics-pt
, it is important to name it right.main.py
as a flag or manually change them.You will need 4 GPUs (each with at least 10GB VRAM) to run training.
Let's assume that you extracted dataset in /home/user/road/
and weights in /home/user/kinetics-pt/
directory then your train command from the root directory of this repo is going to be:
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py /home/user/ /home/user/ /home/user/kinetics-pt/ --MODE=train --ARCH=resnet50 --MODEL_TYPE=I3D --DATASET=road --TRAIN_SUBSETS=train_3 --SEQ_LEN=8 --TEST_SEQ_LEN=8 --BATCH_SIZE=4 --LR=0.0041
Second instance of /home/user/
in above command specifies where checkpoint weight and logs are going to be stored. In this case, checkpoints and logs will be in /home/user/road/cache/<experiment-name>/
.
Different parameters in main.py
will result in different performance. Validation split is automatically selected based in training split number in road.
You can train ucf24
dataset by change some command line parameter as the training sechdule and learning rate differ compared ot road
training.
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py /home/user/ /home/user/ /home/user/kinetics-pt/ --MODE=train --ARCH=resnet50 --MODEL_TYPE=I3D --DATASET=ucf24 --TRAIN_SUBSETS=train --VAL_SUBSETS=val --SEQ_LEN=8 --TEST_SEQ_LEN=8 --BATCH_SIZE=4 --LR=0.00245 --MILESTONES=6,8 --MAX_EPOCHS=10
frame-mean-ap
on a subset of validation split test.LR
, MILESTONES
, MAX_EPOCHS
, and BATCH_SIZE
for training process.label_types
is very important variable, it defines label-types are being used for training and validation time it is bummed up by one with ego-action
label type. It is created in data\dataset.py
for each dataset separately and copied to args
in main.py
, further used at the time of evaluations.To generate the tubes and evaluate them, first, you will need frame-level detection and link them. It is pretty simple in out case. Similar to training command, you can run following commands. These can run on single GPUs.
There are various MODEs
in main.py
. You can do each step independently or together. At the moment gen-dets
mode generates and evaluated frame-wise detection and finally performs tube building and evaluation.
For ROAD dataset, run the following commands.
python main.py /home/user/ /home/user/ /home/user/kinetics-pt/ --MODE=gen_dets --MODEL_TYPE=I3D --TEST_SEQ_LEN=8 --TRAIN_SUBSETS=train_3 --SEQ_LEN=8 --BATCH_SIZE=4 --LR=0.0041
and for UCF24
python main.py /home/user/ /home/user/ /home/user/kinetics-pt/ --MODE=gen_dets --ARCH=resnet50 --MODEL_TYPE=I3D --DATASET=ucf24 --TRAIN_SUBSETS=train --VAL_SUBSETS=val --SEQ_LEN=8 --TEST_SEQ_LEN=8 --BATCH_SIZE=4 --LR=0.00245 --EVAL_EPOCHS=10 --GEN_NMS=80 --TOPK=20 --PATHS_IOUTH=0.25 --TRIM_METHOD=indiv
main.py
to understand there functions..json
file is dumped, which is used for evaluation, see tubes.py
for more detatils.modules\evaluation.py
and data\dataset.py
for frame-level and video-level evaluation code to compute frame-mAP
and video-mAP
.Here, you find the reproduced results from our paper. We use training split #3 for reproduction on a different machines compared to where results were generated for the paper. Below you will find the test results on validation split #3, which closer to test set compared to other split in terms of environmental conditions. We there is little change in learning rate here, so results are little different than the paper. Also, there are six tasks in ROAD dataset that makes it difficult balance the learning among tasks.
Model is set to I3D
with resnet50
backbone. Kinetics pre-trained weights used for resnet50I3D
, download link to given above in Requirements section. Results on split #3 with test-sequence length being 8 <[email protected]>/<[email protected]>
.
Model | I3D |
Agentness | 54.7/-- |
Agent | 31.1/26.0 |
Action | 22.0/16.1 |
Location | 27.3/24.2 |
Duplexes | 23.7/19.5 |
Events/triplets | 13.9/15.5 |
AV-action | 44.8/-- |
UCF24 results | |
Actionness | -- |
Action detection | -- |
ActionNess-framewise | -- |
If this work has been helpful in your research please cite following articles:
@ARTICLE {singh2022road,
author = {Singh, Gurkirt and Akrigg, Stephen and Di Maio, Manuele and Fontana, Valentina and Alitappeh, Reza Javanmard and Saha, Suman and Jeddisaravi, Kossar and Yousefi, Farzad and Culley, Jacob and Nicholson, Tom and others},
journal = {IEEE Transactions on Pattern Analysis & Machine Intelligence},
title = {ROAD: The ROad event Awareness Dataset for autonomous Driving},
year = {5555},
volume = {},
number = {01},
issn = {1939-3539},
pages = {1-1},
keywords = {roads;autonomous vehicles;task analysis;videos;benchmark testing;decision making;vehicle dynamics},
doi = {10.1109/TPAMI.2022.3150906},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {feb}
}
@inproceedings{singh2017online,
title={Online real-time multiple spatiotemporal action localisation and prediction},
author={Singh, Gurkirt and Saha, Suman and Sapienza, Michael and Torr, Philip HS and Cuzzolin, Fabio},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={3637--3646},
year={2017}
}
@article{maddern20171,
title={1 year, 1000 km: The Oxford RobotCar dataset},
author={Maddern, Will and Pascoe, Geoffrey and Linegar, Chris and Newman, Paul},
journal={The International Journal of Robotics Research},
volume={36},
number={1},
pages={3--15},
year={2017},
publisher={SAGE Publications Sage UK: London, England}
}