Awesome Open Source

Programming Languages

Search results for multimodal vision language model

vision-language-model x

8 search results found

Llava ⭐ 12,514

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Internlm Xcomposer ⭐ 820

Awesome Japanese Llm ⭐ 585

日本語LLMまとめ - Overview of Japanese LLMs

Advancedliteratemachinery ⭐ 464

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.

Multi_token ⭐ 54

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Awesome Multimodal Llm Autonomous Driving ⭐ 35

Multimodal Large Language Models for Autonomous Driving [WACV 2024 Survey Paper]

Llava Docker ⭐ 32

Docker image for LLaVA: Large Language and Vision Assistant

The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"

Cbvs Uniclip ⭐ 5

A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios

Related Searches

Python Multimodal (347)

Pytorch Multimodal (81)

Natural Language Processing Multimodal (55)

Computer Vision Multimodal (53)

Artificial Intelligence Multimodal (53)

Large Language Models Multimodal (25)

Python Vision Language Model (20)

Chatgpt Multimodal (18)

Chatbot Multimodal (18)

Gpt 4 Multimodal (16)

1-8 of 8 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.