Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for llava
llava
x
25 search results found
Llava
⭐
12,514
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Findthechatgpter
⭐
1,774
ChatGPT爆火,开启了通往AGI的关键一步,本项目旨在汇总那些ChatGPT的开源平替们,包括文
Xtuner
⭐
944
An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM, Llama, Baichuan, Qwen, ChatGLM)
Multimodal Maestro
⭐
871
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
Uform
⭐
729
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Video Chatgpt
⭐
590
"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Awesome Foundation And Multimodal Models
⭐
223
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code]
Lrv Instruction
⭐
160
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Vlmevalkit
⭐
137
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks
Llavar
⭐
133
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
Hallusionbench
⭐
128
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Llava Cpp Server
⭐
116
LLaVA server (llama.cpp).
Taggui
⭐
103
Tag manager and captioner for image datasets
Vip Llava
⭐
81
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Multi_token
⭐
54
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Flowgen
⭐
33
AutoGen Visualized - Visual Tools for Multi-Agent Development.
Llava Docker
⭐
32
Docker image for LLaVA: Large Language and Vision Assistant
Restai
⭐
23
RestAI is an AIaaS (AI as a Service) open-source platform. Built on top of Llamaindex, Langchain and Transformers. Precise embeddings usage and tuning. Free form agents. Automatic load and unload of local LLMs.
Vision Core Ai
⭐
21
Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.
Gpt 4v Distribution Shift
⭐
16
Code for "How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation"
Mmc
⭐
13
Llava Jp
⭐
8
LLaVA-JP is a Japanese VLM trained by LLaVA method
Captain
⭐
5
All in one captioning tool for image datasets
Metatron2
⭐
5
A Multimodal Discord bot with machine learning functions, including LLM chat, Image generation, and Speech Generation capabilities
Kani Vision
⭐
5
Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.
Related Searches
Multimodal Llava (15)
Python Llava (13)
Llm Llava (13)
Clips Llava (7)
Gpt 4 Llava (7)
Artificial Intelligence Llava (7)
Llama Llava (6)
Vision Language Model Llava (6)
Chatbot Llava (5)
Llama2 Llava (4)
1-25 of 25 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.