Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python llava
llava
x
python
x
20 search results found
Llava
⭐
12,514
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Xtuner
⭐
944
An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM, Llama, Baichuan, Qwen, ChatGLM)
Multimodal Maestro
⭐
871
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
Uform
⭐
729
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Video Chatgpt
⭐
590
"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Awesome Foundation And Multimodal Models
⭐
223
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code]
Lrv Instruction
⭐
160
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Vlmevalkit
⭐
137
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks
Llavar
⭐
133
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
Hallusionbench
⭐
128
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Taggui
⭐
103
Tag manager and captioner for image datasets
Vip Llava
⭐
81
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Multi_token
⭐
54
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Restai
⭐
23
RestAI is an AIaaS (AI as a Service) open-source platform. Built on top of Llamaindex, Langchain and Transformers. Precise embeddings usage and tuning. Free form agents. Automatic load and unload of local LLMs.
Vision Core Ai
⭐
21
Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.
Gpt 4v Distribution Shift
⭐
16
Code for "How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation"
Mmc
⭐
13
Llava Jp
⭐
8
LLaVA-JP is a Japanese VLM trained by LLaVA method
Metatron2
⭐
5
A Multimodal Discord bot with machine learning functions, including LLM chat, Image generation, and Speech Generation capabilities
Kani Vision
⭐
5
Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.
Related Searches
Python Django (28,897)
Python Machine Learning (20,195)
Python Flask (17,643)
Python Dataset (14,792)
Python Docker (14,113)
Python Tensorflow (13,736)
Python Command Line (13,351)
Python Deep Learning (13,092)
Python Jupyter Notebook (12,976)
Python Artificial Intelligence (8,580)
1-20 of 20 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.