Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for vision language pretraining
vision-language-pretraining
x
24 search results found
Lavis
⭐
7,917
LAVIS - A One-stop Library for Language-Vision Intelligence
Video Llama
⭐
1,826
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Declip
⭐
603
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Video Chatgpt
⭐
590
"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Awesome Japanese Llm
⭐
585
日本語LLMまとめ - Overview of Japanese LLMs
Paddlemix
⭐
172
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Valor
⭐
110
Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Ptp
⭐
100
[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》
Recognize Any Regions
⭐
92
Recognize Any Regions
Multimodality Representation Learning
⭐
51
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
Flair
⭐
42
FLAIR: A Foundation LAnguage-Image model of the Retina for fundus image understanding.
Protoclip
⭐
38
📍 Official pytorch implementation of ProtoCLIP in paper Prototypical Contrastive Language Image Pretraining (IEEE TNNLS)
Segclip
⭐
35
PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"
Continual Clip
⭐
30
Official repository for "CLIP model is an Efficient Continual Learner".
Svl_adapter
⭐
17
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
Flm
⭐
16
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
Sga
⭐
14
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023]
Cosa
⭐
10
Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Managertower
⭐
9
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Vtc
⭐
8
VTC: Improving Video-Text Retrieval with User Comments
B2t
⭐
8
Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation
Blitext
⭐
8
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Itra
⭐
7
A codebase for flexible and efficient Image Text Representation Alignment
Adaptation_robustness
⭐
5
Evaluate robustness of adaptation methods on large vision-language models
Related Searches
Python Vision Language Pretraining (8)
Multimodal Deep Learning Vision Language Pretraining (3)
1-24 of 24 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.