Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python vision language model
python
x
vision-language-model
x
30 search results found
Llava
⭐
12,514
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Qwen Vl
⭐
2,400
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Prismer
⭐
1,245
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Multimodal Maestro
⭐
871
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
Internlm Xcomposer
⭐
820
Instructcv
⭐
464
Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"
Groundinglmm
⭐
434
Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Chat Univi
⭐
382
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Multi Modality Arena
⭐
308
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Vpgtrans
⭐
234
Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
Lamda Pilot
⭐
134
🎉 PILOT: A Pre-trained Model-Based Continual Learning Toolbox
Voxposer
⭐
103
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Liqe
⭐
102
[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
Viecap
⭐
96
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023
Recognize Any Regions
⭐
92
Recognize Any Regions
Roboflamingo
⭐
88
Code for RoboFlamingo
Attackvlm
⭐
79
Code of the paper: On Evaluating Adversarial Robustness of Large Vision-Language Models
Multi_token
⭐
54
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Lmpt
⭐
40
LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition
Menghini Neurips23 Code
⭐
31
Exploring prompt tuning with pseudolabels for multiple modalities, learning settings, and training strategies.
Txt2img Mhn
⭐
23
[IEEE TIP 2023] Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern Hopfield Networks
Hgclip
⭐
21
HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
Clipself
⭐
20
Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Probvlm
⭐
15
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models
Vllm Safety Benchmark
⭐
15
Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
Lovm
⭐
12
[NeurIPS 2023] Official Pytorch code for LOVM: Language-Only Vision Model Selection
Clip Binding
⭐
8
Code to reproduce the experiments in the paper: Does CLIP Bind Concepts? Probing Compositionality in Large Image Models.
Vision Language Examples
⭐
5
Vision-lanugage model example code.
Kani Vision
⭐
5
Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.
Cbvs Uniclip
⭐
5
A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios
Related Searches
Python Django (28,897)
Python Machine Learning (22,372)
Python Deep Learning (22,192)
Python Pytorch (18,505)
Python Flask (17,643)
Python Dataset (15,317)
Python Docker (14,113)
Python Tensorflow (13,736)
Python Command Line (13,351)
Python Jupyter Notebook (12,976)
1-30 of 30 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.