Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for multimodal deep learning
multimodal-deep-learning
x
139 search results found
Lavis
⭐
7,917
LAVIS - A One-stop Library for Language-Vision Intelligence
Bentoml
⭐
6,575
The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
Awesome Text To Image
⭐
1,692
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
Pytorch Widedeep
⭐
1,194
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
Awesome Vision Language Pretraining Papers
⭐
724
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
Awesome Grounding
⭐
689
awesome grounding: A curated list of research papers in visual grounding
Nsmusics
⭐
601
NSMusicS(Nine Songs · Music World:九歌 · 音乐世界),Multi platform Multi mode Super Music Software (Full stack development, audio processing, artificial intelligence, natural language processing)
Advancedliteratemachinery
⭐
464
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.
Blended Latent Diffusion
⭐
458
Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]
Multimodal Deep Learning
⭐
433
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Scarches
⭐
294
Reference mapping for single-cell genomics
Awesome Parameter Efficient Transfer Learning
⭐
288
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
Awesome Emotion Recognition In Conversations
⭐
216
A comprehensive reading list for Emotion Recognition in Conversations
Awesome Multimodal In Medical Imaging
⭐
207
A collection of resources on applications of multi-modal learning in medical imaging.
Eccv2022 Papers With Code Demo
⭐
207
收集 ECCV 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!
Awesome Multimodality
⭐
206
A Survey on multimodal learning research.
Multimodal Ml Music
⭐
202
List of academic resources on Multimodal ML for Music
Cav Mae
⭐
201
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
Deepviewagg
⭐
195
[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
Prophet
⭐
179
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Awesome Vision And Language Pre Training
⭐
176
Recent Advances in Vision and Language Pre-training (VLP)
Recommendation Systems Without Explicit Id Features A Literature Review
⭐
171
Large pre-trained Foundation recommender models
Mmmu
⭐
167
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Capdec
⭐
155
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
Mustard
⭐
140
Multimodal Sarcasm Detection Dataset
Mft
⭐
123
Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.
Multimodal Speech Emotion
⭐
122
TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18
Fusilli
⭐
120
A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸
The Compiler
⭐
119
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
Pseudo Q
⭐
116
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Bitnet
⭐
115
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Content Moderation Deep Learning
⭐
110
Deep learning based content moderation from text, audio, video & image input modalities.
Video To Retail Platform
⭐
109
An intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.
Awesome Vision Language Models For Earth Observation
⭐
105
A curated list of awesome vision and language resources for earth observation.
Video Captioning
⭐
102
This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.
Pali3
⭐
97
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
Dj Rn
⭐
93
As a part of HAKE project (HAKE-3D). Code for our CVPR2020 paper "Detailed 2D-3D Joint Representation for Human-Object Interaction".
Multimodal Infomax
⭐
82
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.
Swarms Pytorch
⭐
67
Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊
Awesome Rvos
⭐
66
Referring Video Object Segmentation / Multi-Object Tracking Repo
Awesome 3d Vision And Language
⭐
62
A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.
Mmsa Fet
⭐
58
A Tool for extracting multimodal features from videos.
3dcompat V2
⭐
57
3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition
Mintrec
⭐
53
MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)
Multimodality Representation Learning
⭐
51
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
Embracenet
⭐
50
Robust multimodal integration method implemented in PyTorch and TensorFlow
Vg Gplms
⭐
49
The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".
Muscaps
⭐
46
Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)
Cvt2distilgpt2
⭐
46
Improving Chest X-Ray Report Generation by Leveraging Warm-Starting
Wav2pix
⭐
43
Speech-conditioned face generation using Generative Adversarial Networks (ICASSP 2019)
Bbfn
⭐
42
This repository contains the implementation of the paper -- Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis
Pali
⭐
42
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
Hateful_memes Hate_detectron
⭐
41
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975
Multimodal Deep Learning For Disaster Response
⭐
40
Damage Identification in Social Media Posts using Multimodal Deep Learning: code and dataset
Visual Spatial Reasoning
⭐
38
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
Clipmh
⭐
38
CLIPMH:CLIP Multi-modal Hashing
Sutd Trafficqa
⭐
35
[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Densecap Pytorch
⭐
34
A simplified pytorch version of densecap
Kosmos2.5
⭐
34
My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"
Referit3d
⭐
32
Code accompanying our ECCV-2020 paper on 3D Neural Listeners.
Self Supervised Embedding Fusion Transformer
⭐
29
The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.
Mrl
⭐
28
Learning Cross-Modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)
Artemis
⭐
27
Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)
Affgcn
⭐
25
Attention Feature Fusion base on spatial-temporal Graph Convolutional Network(AFFGCN)
Cfcnet
⭐
25
CFCNet for depth completion, NeurIPS 2019.
Vote2cap Detr
⭐
22
Code release for ''End-to-End 3D Dense Captioning with Vote2Cap-DETR'' (CVPR2023)
Xmfnet
⭐
21
Code for "Cross-modal Learning for Image-Guided Point Cloud Shape Completion" (NeurIPS 2022)
Slp
⭐
20
Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning
Social Iq
⭐
20
[CVPR 2019 Oral] Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence
Neko
⭐
19
In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks
Mmer
⭐
19
Code for the InterSpeech 2023 paper: MMER: Multimodal Multi-task learning for Speech Emotion Recognition
Multimodal Future Prediction
⭐
18
The official repository for the CVPR 2019 paper "Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction"
Msa Robustness
⭐
17
NAACL 2022 paper on Analyzing Modality Robustness in Multimodal Sentiment Analysis
Multimodal Transformer
⭐
17
Attention Based Multi-modal Emotion Recognition; Stanford Emotional Narratives Dataset
Multigraphgan
⭐
17
MultiGraphGAN for predicting multiple target graphs from a source graph using geometric deep learning.
Concatbert
⭐
17
Baseline model for multimodal classification based on images and text. Text representation obtained from pretrained BERT base model and image representation obtained from VGG16 pretrained model.
Meme_challenge
⭐
16
Repository containing code from team Kingsterdam for the Hateful Memes Challenge
Drml
⭐
16
Official Code Release for Diagnosing and Rectifying Vision Models using Language
Revive
⭐
16
Official Code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering (NeurIPS 2022)
3d Bounding Boxes From Monocular Images
⭐
15
A two stage multi-modal loss model along with rigid body transformations to regress 3D bounding boxes
Edis
⭐
15
Entity-Driven Image Search over Multimodal Web Content (EMNLP 2023)
Circdeep
⭐
15
End-to-End learning framework for circular RNA classification from other long non-coding RNA using multimodal deep learning
C3vqg Official
⭐
14
Code for the paper "C3VQG: Category Consistent Cyclic Visual Question Generation".
Msaf
⭐
14
Offical implementation of paper "MSAF: Multimodal Split Attention Fusion"
Documentclip
⭐
14
Uniteandconquer
⭐
14
[CVPR '23] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
Attentive Modality Hopping For Ser
⭐
14
TensorFlow implementation of "Attentive Modality Hopping for Speech Emotion Recognition," ICASSP-20
Vision_audio_and_multimodal_projects
⭐
13
This repository includes all computer vision, audio, document AI, and multimodal projects.
Whos Waldo
⭐
13
Who's Waldo? Linking People Across Text and Images. ICCV 2021.
Open Papernotes
⭐
13
Yet another Ph.D. adventure.
Awesome Visual Dialog
⭐
13
Recent Advances in Visual Dialog
Ligand_generation
⭐
12
Target-aware Variational Auto-encoders for Ligand Generation with Multimodal Protein Representation Learning
Lovm
⭐
12
[NeurIPS 2023] Official Pytorch code for LOVM: Language-Only Vision Model Selection
Mica Deep Mcca
⭐
12
Deep Multiset Canonical Correlation Analysis - An extension of CCA to multiple datasets
Robust Deep Learning Pipeline
⭐
12
Deep Convolutional Bidirectional LSTM for Complex Activity Recognition with Missing Data. Human Activity Recognition Challenge. Springer SIST (2020)
Move2hear Active Av Separation
⭐
11
Code and datasets for 'Move2Hear: Active Audio-Visual Source Separation' (ICCV 2021)
Deep Multi Sensory Object Categorization
⭐
11
Deep Multi-Sensory Object Category Recognition Using Interactive Behavioral Exploration
Focal
⭐
10
Pytorch Implementation of FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space
Crop Forecasting
⭐
10
Predicting rice field yields through the integration of Microsoft Planetary satellite images, meteorological data, and field information in the 2023 EY Open Science Data Challenge - Crop Forecasting.
Piano Skills Assessment
⭐
10
Piano Skills Assessment [IEEE MMSP 2021]
1-100 of 139 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.