Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python vision language
python
x
vision-language
x
53 search results found
Groundingdino
⭐
4,165
Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Marqo
⭐
3,893
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
Chinese Clip
⭐
2,816
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Ofa
⭐
2,142
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
One Peace
⭐
714
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Video Chatgpt
⭐
590
"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Daclip Uir
⭐
441
PyTorch code for "Controlling Vision-Language Models for Universal Image Restoration", ICLR 2024.
Seed
⭐
326
Empowers LLMs with the ability to see and draw.
Calvin
⭐
210
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Kaleido Bert
⭐
207
(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain.
Lvit
⭐
200
[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
Open Groundingdino
⭐
135
This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
Vln Bevbert
⭐
130
[ICCV 2023} Official repo of "BEVBert: Multimodal Map Pre-training for Language-guided Navigation"
Visual Chinese Llama Alpaca
⭐
129
多模态中文LLaMA&Alpaca大语言模型(VisualCLA)
Vse_infty
⭐
110
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021
Vision Language Models Are Bows
⭐
95
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
Poda
⭐
86
[ICCV 2023] Official implementation of "PØDA: Prompt-driven Zero-shot Domain Adaptation"
Vip Llava
⭐
81
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Vision Language Transformer
⭐
76
Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)
Next Qa
⭐
74
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
S2 Transformer
⭐
70
[IJCAI 2022] Official Pytorch code for paper “S2 Transformer for Image Captioning”
Clip2protect
⭐
66
[CVPR 2023] Official repository of paper titled "CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search".
Stale
⭐
63
[ECCV 2022] Official Pytorch Implementation of the paper : " Zero-Shot Temporal Action Detection via Vision-Language Prompting "
D Cube
⭐
56
A detection/segmentation dataset with class names characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
Hulc
⭐
52
Hierarchical Universal Language Conditioned Policies
Mix Generation
⭐
51
MixGen: A New Multi-Modal Data Augmentation
Mcm
⭐
49
PyTorch implementation of MCM (Delving into out-of-distribution detection with vision-language representations), NeurIPS 2022
Vltvg
⭐
47
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022
Pkol
⭐
43
[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”
Bagformer
⭐
41
PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction
Vidil
⭐
41
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Contraclip
⭐
33
Authors official PyTorch implementation of the "ContraCLIP: Interpretable GAN generation driven by pairs of contrasting sentences".
Active_vln
⭐
32
The repository of ECCV 2020 paper `Active Visual Information Gathering for Vision-Language Navigation`
Hulc2
⭐
22
[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data
Protext
⭐
21
Official repository of paper titled "Learning to Prompt with Text Only Supervision for Vision-Language Models".
Multimodal Meta Learn
⭐
19
Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning" (published at ICLR 2023).
Openfusion
⭐
16
Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Pos Subspaces
⭐
15
[NeurIPS'23] Parts of Speech–Grounded Subspaces in Vision-Language Models
Decembert
⭐
15
Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)
Debias Vision Lang
⭐
14
A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning [AACL 2022]
Next Oe
⭐
14
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
Rewrite
⭐
14
[NeurIPS 2023] Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation
Hqga
⭐
13
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)
Shot2story
⭐
11
A new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.
Ntu 2022fall Dlcv
⭐
11
Deep Learning for Computer Vision 深度學習於電腦視覺 by Frank Wang 王鈺強
Autoregressive_inference
⭐
10
Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)
Image Captioning
⭐
10
Image captioning using python and BLIP
Managertower
⭐
9
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Dramaqa
⭐
8
DramaQA Starter Code (2021)
Soonet
⭐
6
Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos
Zerogen
⭐
6
[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation
Rtic Gcn Pytorch
⭐
5
Official PyTorch Implementation of RITC
Vision Language Examples
⭐
5
Vision-lanugage model example code.
Related Searches
Python Django (28,897)
Python Deep Learning (22,250)
Python Machine Learning (20,195)
Python Flask (17,643)
Python Pytorch (16,249)
Python Dataset (14,962)
Python Docker (14,113)
Python Tensorflow (13,992)
Python Command Line (13,351)
Python Jupyter Notebook (12,976)
1-53 of 53 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.