Awesome Open Source

Programming Languages

Search results for multimodal vision transformer

vision-transformer x

10 search results found

Mmpretrain ⭐ 3,177

OpenMMLab Pre-training Toolbox and Benchmark

Internvideo ⭐ 736

InternVideo: General Video Foundation Models via Generative and Discriminative Learning (https://arxiv.org/abs/2212.03191)

One Peace ⭐ 714

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Multi Modal Transformer ⭐ 117

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.

Llava Cpp Server ⭐ 116

LLaVA server (llama.cpp).

Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"

Clip_surgery ⭐ 55

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks

Cvt2distilgpt2 ⭐ 46

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

Awesome Multimodal Llm Autonomous Driving ⭐ 35

Multimodal Large Language Models for Autonomous Driving [WACV 2024 Survey Paper]

An end-to-end masked contrastive video-and-language pre-training framework

A simple implementation of a CLIP that splits up an image into quandrants and then gets the embeddings for each quandrant

Related Searches

Python Multimodal (137)

Deep Learning Vision Transformer (105)

Python Vision Transformer (72)

1-10 of 10 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.