Awesome Open Source

Programming Languages

Search results for multimodal vision and language

vision-and-language x

12 search results found

Multimodal Gpt ⭐ 971

One Peace ⭐ 714

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Awesome Japanese Llm ⭐ 585

日本語LLMまとめ - Overview of Japanese LLMs

Pointllm ⭐ 276

[arXiv 2023] PointLLM: Empowering Large Language Models to Understand Point Clouds

Lrv Instruction ⭐ 160

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

[ICLR 2023] PyTorch implementation of VLDet （https://arxiv.org/abs/2211.14843）

Awesome Colorful Llm ⭐ 83

Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics, and Fundamental Sciences such as Mathematics.

[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval

An end-to-end masked contrastive video-and-language pre-training framework

Trar Vqa ⭐ 23

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

Clip Openness ⭐ 13

Code for "Delving into the Openness of CLIP"

Groundvlp ⭐ 7

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Spatial Reasoning ⭐ 6

Grounding Language Models for Compositional and Spatial Reasoning

Related Searches

Python Multimodal (87)

Python Vision And Language (69)

Pytorch Vision And Language (34)

1-12 of 12 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.