Awesome Open Source

Programming Languages

Search results for python llava

20 search results found

Llava ⭐ 12,514

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM, Llama, Baichuan, Qwen, ChatGLM)

Multimodal Maestro ⭐ 871

Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Video Chatgpt ⭐ 590

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Awesome Foundation And Multimodal Models ⭐ 223

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code]

Lrv Instruction ⭐ 160

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Vlmevalkit ⭐ 137

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

Hallusionbench ⭐ 128

HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Tag manager and captioner for image datasets

Vip Llava ⭐ 81

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Multi_token ⭐ 54

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

RestAI is an AIaaS (AI as a Service) open-source platform. Built on top of Llamaindex, Langchain and Transformers. Precise embeddings usage and tuning. Free form agents. Automatic load and unload of local LLMs.

Vision Core Ai ⭐ 21

Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.

Gpt 4v Distribution Shift ⭐ 16

Code for "How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation"

LLaVA-JP is a Japanese VLM trained by LLaVA method

Metatron2 ⭐ 5

A Multimodal Discord bot with machine learning functions, including LLM chat, Image generation, and Speech Generation capabilities

Kani Vision ⭐ 5

Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.

Related Searches

Python Django (28,897)

Python Machine Learning (20,195)

Python Flask (17,643)

Python Dataset (14,792)

Python Docker (14,113)

Python Tensorflow (13,736)

Python Command Line (13,351)

Python Deep Learning (13,092)

Python Jupyter Notebook (12,976)

Python Artificial Intelligence (8,580)

1-20 of 20 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.