Awesome Open Source

Programming Languages

Search results for video language

video-language x

20 search results found

Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.

Referformer ⭐ 281

[CVPR2022] Official Implementation of ReferFormer

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding

All In One ⭐ 180

[CVPR2023] All in One: Exploring Unified Video-Language Pre-training

[NeurIPS2022] Egocentric Video-Language Pretraining

Multi Modal Transformer ⭐ 117

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Region_learner ⭐ 31

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Perceiver_vl ⭐ 30

PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)

An end-to-end masked contrastive video-and-language pre-training framework

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

Decembert ⭐ 15

Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)

Awesome Video Text Datasets ⭐ 13

A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.

Shot2story ⭐ 11

A new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.

VLG: General Video Recognition with Web Textual Knowledge (https://arxiv.org/abs/2212.01638)

Awesome Video Language Understanding ⭐ 7

A Survey on video and language understanding.

Guidance Based Video Grounding ⭐ 6

The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"

Related Searches

Python Video Language (14)

Pre Training Video Language (4)

Pytorch Video Language (4)

Multimodal Video Language (3)

1-20 of 20 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.