Awesome Open Source

Programming Languages

Search results for multimodal deep learning

multimodal-deep-learning x

139 search results found

Lavis ⭐ 7,917

LAVIS - A One-stop Library for Language-Vision Intelligence

Bentoml ⭐ 6,575

The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

Awesome Text To Image ⭐ 1,692

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

Pytorch Widedeep ⭐ 1,194

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

Awesome Vision Language Pretraining Papers ⭐ 724

Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

Awesome Grounding ⭐ 689

awesome grounding: A curated list of research papers in visual grounding

Nsmusics ⭐ 601

NSMusicS（Nine Songs · Music World：九歌 · 音乐世界），Multi platform Multi mode Super Music Software (Full stack development, audio processing, artificial intelligence, natural language processing)

Advancedliteratemachinery ⭐ 464

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.

Blended Latent Diffusion ⭐ 458

Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]

Multimodal Deep Learning ⭐ 433

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Scarches ⭐ 294

Reference mapping for single-cell genomics

Awesome Parameter Efficient Transfer Learning ⭐ 288

A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

Awesome Emotion Recognition In Conversations ⭐ 216

A comprehensive reading list for Emotion Recognition in Conversations

Awesome Multimodal In Medical Imaging ⭐ 207

A collection of resources on applications of multi-modal learning in medical imaging.

Eccv2022 Papers With Code Demo ⭐ 207

收集 ECCV 最新的成果，包括论文、代码和demo视频等，欢迎大家推荐！

Awesome Multimodality ⭐ 206

A Survey on multimodal learning research.

Multimodal Ml Music ⭐ 202

List of academic resources on Multimodal ML for Music

Cav Mae ⭐ 201

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

Deepviewagg ⭐ 195

[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"

Prophet ⭐ 179

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Awesome Vision And Language Pre Training ⭐ 176

Recent Advances in Vision and Language Pre-training (VLP)

Recommendation Systems Without Explicit Id Features A Literature Review ⭐ 171

Large pre-trained Foundation recommender models

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

Mustard ⭐ 140

Multimodal Sarcasm Detection Dataset

Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.

Multimodal Speech Emotion ⭐ 122

TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18

Fusilli ⭐ 120

A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸

The Compiler ⭐ 119

Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!

Pseudo Q ⭐ 116

[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Content Moderation Deep Learning ⭐ 110

Deep learning based content moderation from text, audio, video & image input modalities.

Video To Retail Platform ⭐ 109

An intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.

Awesome Vision Language Models For Earth Observation ⭐ 105

A curated list of awesome vision and language resources for earth observation.

Video Captioning ⭐ 102

This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

As a part of HAKE project (HAKE-3D). Code for our CVPR2020 paper "Detailed 2D-3D Joint Representation for Human-Object Interaction".

Multimodal Infomax ⭐ 82

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

Swarms Pytorch ⭐ 67

Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊

Awesome Rvos ⭐ 66

Referring Video Object Segmentation / Multi-Object Tracking Repo

Awesome 3d Vision And Language ⭐ 62

A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.

Mmsa Fet ⭐ 58

A Tool for extracting multimodal features from videos.

3dcompat V2 ⭐ 57

3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition

MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)

Multimodality Representation Learning ⭐ 51

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .

Embracenet ⭐ 50

Robust multimodal integration method implemented in PyTorch and TensorFlow

Vg Gplms ⭐ 49

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

Cvt2distilgpt2 ⭐ 46

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

Speech-conditioned face generation using Generative Adversarial Networks (ICASSP 2019)

This repository contains the implementation of the paper -- Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"

Hateful_memes Hate_detectron ⭐ 41

Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975

Multimodal Deep Learning For Disaster Response ⭐ 40

Damage Identification in Social Media Posts using Multimodal Deep Learning: code and dataset

Visual Spatial Reasoning ⭐ 38

[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.

CLIPMH:CLIP Multi-modal Hashing

Sutd Trafficqa ⭐ 35

[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

Densecap Pytorch ⭐ 34

A simplified pytorch version of densecap

Kosmos2.5 ⭐ 34

My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"

Referit3d ⭐ 32

Code accompanying our ECCV-2020 paper on 3D Neural Listeners.

Self Supervised Embedding Fusion Transformer ⭐ 29

The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.

Learning Cross-Modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)

Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)

Attention Feature Fusion base on spatial-temporal Graph Convolutional Network（AFFGCN）

CFCNet for depth completion, NeurIPS 2019.

Vote2cap Detr ⭐ 22

Code release for ''End-to-End 3D Dense Captioning with Vote2Cap-DETR'' (CVPR2023)

Code for "Cross-modal Learning for Image-Guided Point Cloud Shape Completion" (NeurIPS 2022)

Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning

Social Iq ⭐ 20

[CVPR 2019 Oral] Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence

In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks

Code for the InterSpeech 2023 paper: MMER: Multimodal Multi-task learning for Speech Emotion Recognition

Multimodal Future Prediction ⭐ 18

The official repository for the CVPR 2019 paper "Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction"

Msa Robustness ⭐ 17

NAACL 2022 paper on Analyzing Modality Robustness in Multimodal Sentiment Analysis

Multimodal Transformer ⭐ 17

Attention Based Multi-modal Emotion Recognition; Stanford Emotional Narratives Dataset

Multigraphgan ⭐ 17

MultiGraphGAN for predicting multiple target graphs from a source graph using geometric deep learning.

Concatbert ⭐ 17

Baseline model for multimodal classification based on images and text. Text representation obtained from pretrained BERT base model and image representation obtained from VGG16 pretrained model.

Meme_challenge ⭐ 16

Repository containing code from team Kingsterdam for the Hateful Memes Challenge

Official Code Release for Diagnosing and Rectifying Vision Models using Language

Official Code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering (NeurIPS 2022)

3d Bounding Boxes From Monocular Images ⭐ 15

A two stage multi-modal loss model along with rigid body transformations to regress 3D bounding boxes

Entity-Driven Image Search over Multimodal Web Content (EMNLP 2023)

Circdeep ⭐ 15

End-to-End learning framework for circular RNA classification from other long non-coding RNA using multimodal deep learning

C3vqg Official ⭐ 14

Code for the paper "C3VQG: Category Consistent Cyclic Visual Question Generation".

Offical implementation of paper "MSAF: Multimodal Split Attention Fusion"

Documentclip ⭐ 14

Uniteandconquer ⭐ 14

[CVPR '23] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

Attentive Modality Hopping For Ser ⭐ 14

TensorFlow implementation of "Attentive Modality Hopping for Speech Emotion Recognition," ICASSP-20

Vision_audio_and_multimodal_projects ⭐ 13

This repository includes all computer vision, audio, document AI, and multimodal projects.

Whos Waldo ⭐ 13

Who's Waldo? Linking People Across Text and Images. ICCV 2021.

Open Papernotes ⭐ 13

Yet another Ph.D. adventure.

Awesome Visual Dialog ⭐ 13

Recent Advances in Visual Dialog

Ligand_generation ⭐ 12

Target-aware Variational Auto-encoders for Ligand Generation with Multimodal Protein Representation Learning

[NeurIPS 2023] Official Pytorch code for LOVM: Language-Only Vision Model Selection

Mica Deep Mcca ⭐ 12

Deep Multiset Canonical Correlation Analysis - An extension of CCA to multiple datasets

Robust Deep Learning Pipeline ⭐ 12

Deep Convolutional Bidirectional LSTM for Complex Activity Recognition with Missing Data. Human Activity Recognition Challenge. Springer SIST (2020)

Move2hear Active Av Separation ⭐ 11

Code and datasets for 'Move2Hear: Active Audio-Visual Source Separation' (ICCV 2021)

Deep Multi Sensory Object Categorization ⭐ 11

Deep Multi-Sensory Object Category Recognition Using Interactive Behavioral Exploration

Pytorch Implementation of FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space

Crop Forecasting ⭐ 10

Predicting rice field yields through the integration of Microsoft Planetary satellite images, meteorological data, and field information in the 2023 EY Open Science Data Challenge - Crop Forecasting.

Piano Skills Assessment ⭐ 10

Piano Skills Assessment [IEEE MMSP 2021]

1-100 of 139 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.