Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python vision and language
python
x
vision-and-language
x
90 search results found
Prismer
⭐
1,245
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Oscar
⭐
995
Oscar and VinVL
Multimodal Gpt
⭐
971
Multimodal-GPT
Xmodaler
⭐
929
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
One Peace
⭐
714
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Clipbert
⭐
649
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
Groundinglmm
⭐
434
Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Uniter
⭐
418
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
Proctoring Ai
⭐
397
Creating a software for automatic monitoring in online proctoring
Pointllm
⭐
276
[arXiv 2023] PointLLM: Empowering Large Language Models to Understand Point Clouds
X Vlm
⭐
272
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
Vl T5
⭐
245
PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)
Calvin
⭐
210
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Image Captioning
⭐
188
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
Lrv Instruction
⭐
160
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Tcl
⭐
152
code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022
Etpnav
⭐
145
Official repo of "ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments"
Llavar
⭐
133
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
Pytorch_violet
⭐
130
A PyTorch implementation of VIOLET
Tubedetr
⭐
127
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
Hero
⭐
125
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
Arel
⭐
124
Code for the ACL paper "No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling"
Frozenbilm
⭐
120
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Vldet
⭐
117
[ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)
Pseudo Q
⭐
116
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Alpro
⭐
109
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Clip Caption Reward
⭐
104
PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
Rs5m
⭐
103
RS5M: a large-scale vision language dataset for remote sensing
Recurrent Vln Bert
⭐
90
Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation
Vilio
⭐
82
🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle
Ofasys
⭐
79
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Eda
⭐
76
[CVPR 2023] EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
Vl Plm
⭐
75
Exploiting unlabeled data with vision and language models for object detection, ECCV 2022
Vl_adapter
⭐
75
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
Plip
⭐
67
Pathology Language and Image Pre-Training (PLIP) is the first vision and language foundation model for Pathology AI. PLIP is a large-scale pre-trained model that can be used to extract visual and language features from pathology images and text description. The model is a fine-tuned version of the original CLIP model.
Lightningdot
⭐
65
source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT
X2 Vlm
⭐
63
All-In-One VLM: Image + Video + Transfer to Other Languages / Domains
Factualscenegraph
⭐
62
FACTUAL benchmark dataset, the pre-trained textual scene graph parser trained on FACTUAL.
Discrete Continuous Vln
⭐
60
Code and Data of the CVPR 2022 paper: Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
Hirest
⭐
56
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
Robo Vln
⭐
56
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
Rosita
⭐
53
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Hulc
⭐
52
Hierarchical Universal Language Conditioned Policies
Rva
⭐
50
Code for CVPR'19 "Recursive Visual Attention in Visual Dialog"
Villa
⭐
46
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part
Eccv Caption
⭐
46
Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
Multimodal
⭐
45
A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"
Mia
⭐
42
Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)
Awesome Vqa Latest
⭐
42
Visual Question Answering Paper List.
Sugar Crepe
⭐
40
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
Visual Spatial Reasoning
⭐
38
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
Cbp
⭐
33
Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction"
Pytorch_empirical Mvm
⭐
30
A PyTorch implementation of EmpiricalMVM
Perceiver_vl
⭐
30
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
Vognet Pytorch
⭐
28
[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)
Clevr Dialog
⭐
28
Repository to generate CLEVR-Dialog: A diagnostic dataset for Visual Dialog
Iais
⭐
27
[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval
Lang2seg
⭐
25
Referring Expression Object Segmentation with Caption-Aware Consistency, BMVC 2019
Vidsitu
⭐
23
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Trar Vqa
⭐
23
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task
Hulc2
⭐
22
[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data
Zerovl
⭐
20
[ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources
Pacscore
⭐
20
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. CVPR 2023
Aerial Vision And Dialog Navigation
⭐
20
Codebase of the ACL 2023 (Findings) Paper "Aerial Vision-and-Dialog Navigation"
Pytorch_ldast
⭐
19
A PyTorch implementation of LDAST
Cyclical Visual Captioning
⭐
18
PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision
Xmodal Ctx
⭐
18
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
Explore And Match
⭐
16
Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos
Stl Vqa
⭐
16
The good practice in the VQA system such as pos-tag attention, structed triplet learning and triplet attention is very general and can be inserted into almost any visual and language task
Hero_video_feature_extractor
⭐
15
Video Feature Extraction Code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
Gst Visdial
⭐
15
💬 Official PyTorch Implementation for CVPR'23 "The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training"
C3vqg Official
⭐
14
Code for the paper "C3VQG: Category Consistent Cyclic Visual Question Generation".
Cpl
⭐
14
Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"
Clip Openness
⭐
13
Code for "Delving into the Openness of CLIP"
Partglot
⭐
12
Official Implementation of PartGlot (CVPR 2022 Oral)
Gpt Vision Assistant
⭐
12
A simple implementation of Be My Eyes GPT-4, a vision-LLM model that acts as a personal assistant
Map2seq_vln
⭐
11
Code for ORAR Agent for Vision and Language Navigation on Touchdown and map2seq
Val
⭐
11
Code on Paper [CVPR20]Image Search with Text Feedback by Visiolinguistic Attention Learning
Prompt Adapter
⭐
10
Prompt Tuning based Adapter for Vision-Language Model Adaption
Spacap3d
⭐
9
[IJCAI 2022] Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds (official pytorch implementation)
Open Fashion Clip
⭐
8
This is the official repository for the paper "OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data". ICIAP 2023
Foolyourvllms
⭐
8
Code for paper: Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations
Vlpd
⭐
8
Official Code of CVPR'23 Paper "VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision"
Mozuma
⭐
7
Model Zoo for Multimedia Applications
Gvcci
⭐
7
[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation
Zeroshot Storytelling
⭐
7
Github repository for Zero Shot Visual Storytelling
Tgn
⭐
6
Tensorflow Reproduction of the EMNLP-2018 paper "Temporally Grounding Natural Sentence in Video"
Code_ssi
⭐
5
An implementation of SSI
Naq
⭐
5
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory. CVPR 2023.
Inside
⭐
5
INSIDE: Steering Spatial Attention with Non-Imaging Information in CNNs
Related Searches
Python Django (28,897)
Python Deep Learning (22,497)
Python Machine Learning (20,195)
Python Pytorch (18,107)
Python Flask (17,643)
Python Dataset (14,793)
Python Docker (13,757)
Python Tensorflow (13,736)
Python Command Line (13,351)
Python Jupyter Notebook (12,976)
1-90 of 90 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.