Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for multimodal vision and language
multimodal
x
vision-and-language
x
12 search results found
Multimodal Gpt
⭐
971
Multimodal-GPT
One Peace
⭐
714
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Awesome Japanese Llm
⭐
585
日本語LLMまとめ - Overview of Japanese LLMs
Pointllm
⭐
276
[arXiv 2023] PointLLM: Empowering Large Language Models to Understand Point Clouds
Lrv Instruction
⭐
160
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Llavar
⭐
133
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
Vldet
⭐
117
[ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)
Awesome Colorful Llm
⭐
83
Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics, and Fundamental Sciences such as Mathematics.
Iais
⭐
27
[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval
Mac
⭐
24
An end-to-end masked contrastive video-and-language pre-training framework
Trar Vqa
⭐
23
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task
Clip Openness
⭐
13
Code for "Delving into the Openness of CLIP"
Groundvlp
⭐
7
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Spatial Reasoning
⭐
6
Grounding Language Models for Compositional and Spatial Reasoning
Related Searches
Python Multimodal (87)
Python Vision And Language (69)
Pytorch Vision And Language (34)
1-12 of 12 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.