Awesome Open Source

Programming Languages

Search results for python vision language model

vision-language-model x

30 search results found

Llava ⭐ 12,514

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Qwen Vl ⭐ 2,400

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Prismer ⭐ 1,245

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Multimodal Maestro ⭐ 871

Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

Internlm Xcomposer ⭐ 820

Instructcv ⭐ 464

Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"

Groundinglmm ⭐ 434

Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Chat Univi ⭐ 382

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Multi Modality Arena ⭐ 308

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Vpgtrans ⭐ 234

Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.

Lamda Pilot ⭐ 134

🎉 PILOT: A Pre-trained Model-Based Continual Learning Toolbox

Voxposer ⭐ 103

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023

Recognize Any Regions ⭐ 92

Recognize Any Regions

Roboflamingo ⭐ 88

Code for RoboFlamingo

Attackvlm ⭐ 79

Code of the paper: On Evaluating Adversarial Robustness of Large Vision-Language Models

Multi_token ⭐ 54

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition

Menghini Neurips23 Code ⭐ 31

Exploring prompt tuning with pseudolabels for multiple modalities, learning settings, and training strategies.

Txt2img Mhn ⭐ 23

[IEEE TIP 2023] Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern Hopfield Networks

HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding

Clipself ⭐ 20

Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

Vllm Safety Benchmark ⭐ 15

Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"

[NeurIPS 2023] Official Pytorch code for LOVM: Language-Only Vision Model Selection

Clip Binding ⭐ 8

Code to reproduce the experiments in the paper: Does CLIP Bind Concepts? Probing Compositionality in Large Image Models.

Vision Language Examples ⭐ 5

Vision-lanugage model example code.

Kani Vision ⭐ 5

Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.

Cbvs Uniclip ⭐ 5

A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios

Related Searches

Python Django (28,897)

Python Machine Learning (22,372)

Python Deep Learning (22,192)

Python Pytorch (18,505)

Python Flask (17,643)

Python Dataset (15,317)

Python Docker (14,113)

Python Tensorflow (13,736)

Python Command Line (13,351)

Python Jupyter Notebook (12,976)

1-30 of 30 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.