Awesome Open Source

Programming Languages

Search results for clips llava

0 search results found

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Video Chatgpt ⭐ 590

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Awesome Foundation And Multimodal Models ⭐ 223

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code]

Vlmevalkit ⭐ 137

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks

Vip Llava ⭐ 81

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Gpt 4v Distribution Shift ⭐ 16

Code for "How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation"

All in one captioning tool for image datasets

1-0 of 0 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.