Awesome Open Source

Programming Languages

Search results for llava

25 search results found

Llava ⭐ 12,514

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Findthechatgpter ⭐ 1,774

ChatGPT爆火，开启了通往AGI的关键一步，本项目旨在汇总那些ChatGPT的开源平替们，包括文

An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM, Llama, Baichuan, Qwen, ChatGLM)

Multimodal Maestro ⭐ 871

Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Video Chatgpt ⭐ 590

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Awesome Foundation And Multimodal Models ⭐ 223

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code]

Lrv Instruction ⭐ 160

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Vlmevalkit ⭐ 137

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

Hallusionbench ⭐ 128

HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Llava Cpp Server ⭐ 116

LLaVA server (llama.cpp).

Tag manager and captioner for image datasets

Vip Llava ⭐ 81

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Multi_token ⭐ 54

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

AutoGen Visualized - Visual Tools for Multi-Agent Development.

Llava Docker ⭐ 32

Docker image for LLaVA: Large Language and Vision Assistant

RestAI is an AIaaS (AI as a Service) open-source platform. Built on top of Llamaindex, Langchain and Transformers. Precise embeddings usage and tuning. Free form agents. Automatic load and unload of local LLMs.

Vision Core Ai ⭐ 21

Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.

Gpt 4v Distribution Shift ⭐ 16

Code for "How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation"

LLaVA-JP is a Japanese VLM trained by LLaVA method

All in one captioning tool for image datasets

Metatron2 ⭐ 5

A Multimodal Discord bot with machine learning functions, including LLM chat, Image generation, and Speech Generation capabilities

Kani Vision ⭐ 5

Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.

Related Searches

Multimodal Llava (15)

Python Llava (13)

Clips Llava (7)

Gpt 4 Llava (7)

Artificial Intelligence Llava (7)

Llama Llava (6)

Vision Language Model Llava (6)

Chatbot Llava (5)

Llama2 Llava (4)

1-25 of 25 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.