Awesome Open Source

Programming Languages

Search results for vision language model

vision-language-model x

42 search results found

Llava ⭐ 12,514

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Qwen Vl ⭐ 2,400

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Vlm_survey ⭐ 1,405

Vision-Language Models for Vision Tasks: A Survey

Prismer ⭐ 1,245

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Multimodal Maestro ⭐ 871

Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

Internlm Xcomposer ⭐ 820

Awesome Japanese Llm ⭐ 585

日本語LLMまとめ - Overview of Japanese LLMs

Advancedliteratemachinery ⭐ 464

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.

Instructcv ⭐ 464

Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"

Groundinglmm ⭐ 434

Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Chat Univi ⭐ 382

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Internvl ⭐ 364

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks —— An Open-Source Alternative to ViT-22B

Multi Modality Arena ⭐ 308

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Alphaclip ⭐ 273

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Vpgtrans ⭐ 234

Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.

Awesome Knowledge Driven Ad ⭐ 192

A curated list of awesome knowledge-driven autonomous driving (continually updated)

Lamda Pilot ⭐ 134

🎉 PILOT: A Pre-trained Model-Based Continual Learning Toolbox

Voxposer ⭐ 103

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023

Recognize Any Regions ⭐ 92

Recognize Any Regions

Roboflamingo ⭐ 88

Code for RoboFlamingo

Attackvlm ⭐ 79

Code of the paper: On Evaluating Adversarial Robustness of Large Vision-Language Models

Multi_token ⭐ 54

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition

Awesome Multimodal Llm Autonomous Driving ⭐ 35

Multimodal Large Language Models for Autonomous Driving [WACV 2024 Survey Paper]

Llava Docker ⭐ 32

Docker image for LLaVA: Large Language and Vision Assistant

Menghini Neurips23 Code ⭐ 31

Exploring prompt tuning with pseudolabels for multiple modalities, learning settings, and training strategies.

Txt2img Mhn ⭐ 23

[IEEE TIP 2023] Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern Hopfield Networks

HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding

Awesome Multimodal Llm ⭐ 20

Reading list for Multimodal Large Language Models

Clipself ⭐ 20

Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

Vllm Safety Benchmark ⭐ 15

Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"

ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

[NeurIPS 2023] Official Pytorch code for LOVM: Language-Only Vision Model Selection

Llm Survey ⭐ 9

The official GitHub page for the survey paper "A Survey on Large Language Models: Applications, Challenges, Limitations, and Practical Usage".

This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.

Clip Binding ⭐ 8

Code to reproduce the experiments in the paper: Does CLIP Bind Concepts? Probing Compositionality in Large Image Models.

The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"

Cbvs Uniclip ⭐ 5

A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios

Kani Vision ⭐ 5

Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.

Vision Language Examples ⭐ 5

Vision-lanugage model example code.

Related Searches

Python Vision Language Model (20)

Foundation Models Vision Language Model (7)

Multimodal Vision Language Model (6)

Llm Vision Language Model (6)

Vision Language Model Llava (6)

Artificial Intelligence Vision Language Model (4)

Instruction Tuning Vision Language Model (4)

Chatbot Vision Language Model (4)

Gpt 4 Vision Language Model (4)

Machine Learning Vision Language Model (3)

1-42 of 42 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.