Awesome Open Source

Programming Languages

Search results for multimodal llava

10 search results found

Llava ⭐ 12,514

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Awesome Foundation And Multimodal Models ⭐ 223

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code]

Lrv Instruction ⭐ 160

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Vlmevalkit ⭐ 137

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

Llava Cpp Server ⭐ 116

LLaVA server (llama.cpp).

Vip Llava ⭐ 81

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Multi_token ⭐ 54

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Llava Docker ⭐ 32

Docker image for LLaVA: Large Language and Vision Assistant

Metatron2 ⭐ 5

A Multimodal Discord bot with machine learning functions, including LLM chat, Image generation, and Speech Generation capabilities

Related Searches

Python Multimodal (186)

Artificial Intelligence Multimodal (53)

Pytorch Multimodal (49)

Llm Multimodal (44)

Dataset Multimodal (34)

Natural Language Processing Multimodal (27)

Machine Learning Multimodal (26)

Multimodal Gpt4 (23)

Computer Vision Multimodal (22)

Chatgpt Multimodal (18)

1-10 of 10 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.