Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for video language
video-language
x
20 search results found
Vlog
⭐
380
Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.
Referformer
⭐
281
[CVPR2022] Official Implementation of ReferFormer
Univl
⭐
264
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Univtg
⭐
230
[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
All In One
⭐
180
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
Egovlp
⭐
176
[NeurIPS2022] Egocentric Video-Language Pretraining
Multi Modal Transformer
⭐
117
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
Alpro
⭐
109
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Vidil
⭐
41
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Region_learner
⭐
31
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
Perceiver_vl
⭐
30
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
Mac
⭐
24
An end-to-end masked contrastive video-and-language pre-training framework
Vidsitu
⭐
23
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Gvl
⭐
19
Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Decembert
⭐
15
Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)
Awesome Video Text Datasets
⭐
13
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
Shot2story
⭐
11
A new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.
Vlg
⭐
8
VLG: General Video Recognition with Web Textual Knowledge (https://arxiv.org/abs/2212.01638)
Awesome Video Language Understanding
⭐
7
A Survey on video and language understanding.
Guidance Based Video Grounding
⭐
6
The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"
Related Searches
Python Video Language (14)
Pre Training Video Language (4)
Pytorch Video Language (4)
Multimodal Video Language (3)
1-20 of 20 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.