Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for multimodal vision transformer
multimodal
x
vision-transformer
x
10 search results found
Mmpretrain
⭐
3,177
OpenMMLab Pre-training Toolbox and Benchmark
Internvideo
⭐
736
InternVideo: General Video Foundation Models via Generative and Discriminative Learning (https://arxiv.org/abs/2212.03191)
One Peace
⭐
714
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Multi Modal Transformer
⭐
117
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
Llava Cpp Server
⭐
116
LLaVA server (llama.cpp).
Rt X
⭐
77
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"
Clip_surgery
⭐
55
CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks
Cvt2distilgpt2
⭐
46
Improving Chest X-Ray Report Generation by Leveraging Warm-Starting
Awesome Multimodal Llm Autonomous Driving
⭐
35
Multimodal Large Language Models for Autonomous Driving [WACV 2024 Survey Paper]
Mac
⭐
24
An end-to-end masked contrastive video-and-language pre-training framework
Clipq
⭐
5
A simple implementation of a CLIP that splits up an image into quandrants and then gets the embeddings for each quandrant
Related Searches
Python Multimodal (137)
Deep Learning Vision Transformer (105)
Python Vision Transformer (72)
1-10 of 10 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.