Awesome Open Source

Programming Languages

Search results for video understanding

video-understanding x

113 search results found

Mmaction2 ⭐ 3,647

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Awesome Action Recognition ⭐ 3,494

A curated list of action recognition and related area resources

Ask Anything ⭐ 2,404

[VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Kubric ⭐ 2,089

A data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.

Temporal Shift Module ⭐ 2,005

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding

Awesome Video Diffusion ⭐ 1,793

A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.

Mmaction ⭐ 1,415

An open-source toolbox for action understanding based on PyTorch

Paddlevideo ⭐ 1,355

Awesome video understanding toolkits based on PaddlePaddle. It supports video data annotation tools, lightweight RGB and skeleton based action recognition model, practical applications for video tagging and sport action detection.

Temporal Segment Networks ⭐ 1,235

Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

Videomae ⭐ 864

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Tsn Pytorch ⭐ 751

Temporal Segment Networks (TSN) in PyTorch

Internvideo ⭐ 736

InternVideo: General Video Foundation Models via Generative and Discriminative Learning (https://arxiv.org/abs/2212.03191)

Awesome Grounding ⭐ 689

awesome grounding: A curated list of research papers in visual grounding

Action Detection ⭐ 551

temporal action detection with SSN

Activity Recognition With Cnn And Rnn ⭐ 402

Temporal Segments LSTM and Temporal-Inception for Activity Recognition

[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

Chat Univi ⭐ 382

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Video Understanding Dataset ⭐ 352

A collection of recent video understanding datasets, under construction!

[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition

End-to-End Learning of Motion Representation for Video Understanding

Specvqgan ⭐ 262

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Multiverse ⭐ 222

Dataset, code and model for the CVPR'20 paper "The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction". And for the ECCV'20 SimAug paper.

Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Actionvlad ⭐ 201

ActionVLAD for video action classification (CVPR 2017)

Tadaconv ⭐ 177

[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.

Movienet Tools ⭐ 174

Tools for movie and video research

Youtube 8m ⭐ 166

The 2nd place Solution to the Youtube-8M Video Understanding Challenge by Team Monkeytyping (based on tensorflow)

deep learning sex position classifier

Cap4video ⭐ 149

【CVPR'2023 Highlight】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

Text4vis ⭐ 149

【AAAI'2023 & IJCV】Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective

Object_level_visual_reasoning ⭐ 148

Pytorch Implementation of "Object level Visual Reasoning in Videos", F. Baradel, N. Neverova, C. Wolf, J. Mille, G. Mori , ECCV 2018

Video2tfrecord ⭐ 142

Easily convert RGB video data (e.g. .avi) to the TensorFlow tfrecords file format for training e.g. a NN in TensorFlow. This implementation allows to limit the number of frames per video to be stored in the tfrecords.

CVPR2019 STEP: Spatio-Temporal Progressive Learning for Video Action Detection

Video Contrastive Learning ⭐ 139

Video Contrastive Learning with Global Context, ICCVW 2021

Videomaev2 ⭐ 134

[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

Tubedetr ⭐ 127

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

【CVPR'2023】Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

Awesome Activity Prediction ⭐ 127

Paper list of activity prediction and related area

[CVPR 2020] Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation (PyTorch)

Frozenbilm ⭐ 120

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

I3d_finetune ⭐ 104

TensorFlow code for finetuning I3D model on UCF101.

Just Ask ⭐ 101

[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

[ICCV 2021 Oral] Deep Evidential Action Recognition

【AAAI 2021】MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Vidchapters ⭐ 93

[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale

Motionsqueeze ⭐ 92

Official PyTorch Implementation of MotionSqueeze, ECCV 2020

Pyanomaly ⭐ 92

Useful Toolbox for Anomaly Detection

S3d.pytorch ⭐ 89

Spatiotemporal-separable 3D convolution network.

Mmpd_rppg_dataset ⭐ 75

MMPD: Multi-Domain Mobile Video Physiology Dataset(EMBC2023 Oral)

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Efficient 3D Backbone Network for Temporal Modeling

[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

[ECCV 2022] Official Pytorch Implementation of the paper : " Zero-Shot Temporal Action Detection via Vision-Language Prompting "

Awesome Temporally Language Grounding ⭐ 59

A curated list of “Temporally Language Grounding” and related area

Din Group Activity Recognition Benchmark ⭐ 53

[ICCV 2021] A new codebase containing various methods for Group Activity Recognition. Paper title: Spatio-Temporal Dynamic Inference Network for Group Activity Recognition.

Mini Kinetics 200 ⭐ 51

Mini-Kinetics-200 data splits used in paper "Rethinking Spatiotemporal Feature Learning For Video Understanding"

Graph_distillation ⭐ 51

Graph Distillation for Action Detection

Temporally Language Grounding ⭐ 49

A Pytorch implemention for some state-of-the-art models for" Temporally Language Grounding in Untrimmed Videos"

[NeurIPS 2019] Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition

Icme2019 Ctr ⭐ 47

The Code for ICME2019 Grand Challenge: Short Video Understanding (Single Model Ranks 6th)

Code release for "Training a Large Video Model on a Single Machine in a Day"

Temporal Shift Module ⭐ 46

Unofficial implementation for paper `Temporal Shift Module for Efficient Video Understanding`

I3d Tensorflow ⭐ 45

Inflated 3D ConvNets for video understanding

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment

[CVPR2023] Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning (https://arxiv.org/abs/2212.04500)

Openpvsg ⭐ 33

Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23

Pointtad ⭐ 30

[NeurIPS 2022] PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points

[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

Glimpse_clouds ⭐ 28

Pytorch implementation of the paper "Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points", F. Baradel, C. Wolf, J. Mille , G.W. Taylor, CVPR 2018

[Codes of CVPR'21 paper] CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning

Pi Consistency Activity Detection ⭐ 26

End-to-End Semi-Supervised Learning for Video Action Detection [CVPR 2022]

STCNet: Spatio-Temporal Cross Network for Industrial Smoke Detection

Rdn4depth ⭐ 22

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos, IJCAI 2019

Mica Movieclip ⭐ 22

This repository contains the codebase for MovieCLIP: Visual Scene Recognition in Movies

Graph learning framework for long-term video understanding

[IEEE T-IP 2022] TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning

[IEEE T-IP 2021] Semantics-aware Adaptive Knowledge Distillation for Cross-modal Action Recognition

Progressive Action Prediction ⭐ 18

[CVPR 2023] Code for action prediction from videos

Cv Datasets ⭐ 18

Collection of open datasets in computer vision.

Soccerdb ⭐ 17

SoccerDB: A Large-Scale Database for Comprehensive Video Understanding

Temporal Augmentation ⭐ 17

Temporal augmentation with two-stream ConvNet features on human action recognition

Fitness Aqa ⭐ 16

Fitness Action Quality Assessment or your AI-Fitness Coach [ECCV 2022]

[ECCV 2022] Official Pytorch Implementation of the paper : " Semi-Supervised Temporal Action Detection with Proposal-Free Masking "

Ltcontext ⭐ 16

[ICCV 2023] How Much Temporal Long-Term Context is Needed for Action Segmentation?

Tem Adapter ⭐ 15

[ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer

Image Pretraining For Video ⭐ 15

[ECCV 2022] This repository includes the official implementation our paper "In Defense of Image Pre-Training for Spatiotemporal Recognition".

Cp 360 Weakly Supervised Saliency ⭐ 13

CP-360-Weakly-Supervised-Saliency

Videomae Action Detection ⭐ 13

[NeurIPS 2022 Spotlight] VideoMAE for Action Detection

Orbit 2022 Winner Method ⭐ 13

Few-Shot Video Object Recognition with Embedding Adaptation and Uniform Clip Sampling: Winner of ORBIT Few-Shot Object Recognition Challenge 2022

Online Product Reviews for Affordances

Region Based Non Local Network ⭐ 11

[Codes of paper]: Region-based Non-local operation for Video Classification

My implementation (PyTorch) for the paper SST: Single-Stream Temporal Action Proposals (http://vision.stanford.edu/pdf/buch2017cvpr.pdf).

Cvpr2023 Cmpae ⭐ 10

[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

【AAAI 2022】Temporal Action Proposal Generation with Background Constraint

Piano Skills Assessment ⭐ 10

Piano Skills Assessment [IEEE MMSP 2021]

Deepepisodicmemory ⭐ 10

Deep neural network architecture for representing robot experiences in an episodic-like memory which facilitates encoding, recalling, and predicting action experience - Research Project at KIT's High Performance Humanoids Technologies Lab (H2T)

C3d Lstm Pytorch ⭐ 9

C3D-LSTM implementation in PyTorch

VTC: Improving Video-Text Retrieval with User Comments

1-100 of 113 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.