Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for multimodal learning
multimodal-learning
x
99 search results found
Awesome Multimodal Ml
⭐
5,399
Reading list for research topics in multimodal machine learning
Open_flamingo
⭐
3,115
An open-source framework for training large multimodal models.
Awesome Multimodal Research
⭐
1,133
A curated list of Multimodal Related Research.
Iccv 2023 Papers
⭐
806
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!
Cornac
⭐
782
A Comparative Framework for Multimodal Recommender Systems
Clip4clip
⭐
663
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Multimodal Toolkit
⭐
533
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
Omml
⭐
528
Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.
Awsome Deep Learning For Video Analysis
⭐
507
Papers, code and datasets about deep learning and multi-modal learning for video analysis
Rela
⭐
477
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
Unireplknet
⭐
456
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Multimodal Deep Learning
⭐
433
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Omninet
⭐
426
Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain
Pykale
⭐
415
Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!
Mevis
⭐
388
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
Xpretrain
⭐
382
Multi-modality pre-training
Multibench
⭐
356
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
Awesome Vision And Language
⭐
342
A curated list of awesome vision and language resources (still under construction... stay tuned!)
Cm3leon
⭐
288
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
Awesome Multimodal Llm
⭐
243
Research Trends in LLM-guided Multimodal Learning.
Mvits_for_class_agnostic_od
⭐
240
[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
Clipa
⭐
231
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
Awesome Multimodal In Medical Imaging
⭐
207
A collection of resources on applications of multi-modal learning in medical imaging.
Multimodal Ml Music
⭐
202
List of academic resources on Multimodal ML for Music
Lvit
⭐
200
[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
Gpt4point
⭐
181
GPT4Point: A Unified Framework for Point-Language Understanding and Generation.
Mmmu
⭐
167
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Icassp 2023 Papers
⭐
139
ICASSP 2023 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023 conference. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
Topicnet
⭐
129
Interface for easier topic modelling.
Tubedetr
⭐
127
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
Frozenbilm
⭐
120
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Awesome Self Supervised Multimodal Learning
⭐
111
A curated list of self-supervised multimodal learning resources.
Missing_aware_prompts
⭐
101
Multimodal Prompting with Missing Modalities for Visual Recognition, CVPR'23
Just Ask
⭐
101
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Pali3
⭐
97
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
Vidchapters
⭐
93
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
Vista Net
⭐
82
Code for the paper "VistaNet: Visual Aspect Attention Network for Multimodal Sentiment Analysis", AAAI'19
Ofasys
⭐
79
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Learning2dance_cag_2020
⭐
79
PyTorch implementation of our graph convolutional network (GCN) for human motion generation from music. Also with paired dance-music data for training!
Mmvid
⭐
77
[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
Searle
⭐
76
[ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion
Baidubigdata19 Urfc
⭐
72
my solution with 0.67 accuracy
Hvpnet
⭐
66
Code for the NAACL2022 paper "Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction"
Favdbench
⭐
62
[CVPR 2023] Official implementation of the paper: Fine-grained Audible Video Description
Gmu Mmimdb
⭐
62
Source code for training Gated Multimodal Units on MM-IMDb dataset
General Gpt
⭐
61
Multimodal Knowledge Graph
⭐
58
A collection of resources on multimodal knowledge graph, including datasets, papers and contests.
Upop
⭐
54
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
Multimodal Vae Public
⭐
52
A PyTorch implementation of "Multimodal Generative Models for Scalable Weakly-Supervised Learning" (https://arxiv.org/abs/1802.05335)
Multiviz
⭐
48
[ICLR 2023] MultiViz: Towards Visualizing and Understanding Multimodal Models
Cova Web Object Detection
⭐
37
A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!
Adamml
⭐
36
Official implementation of AdaMML. https://arxiv.org/abs/2105.05165.
G Universal Clip
⭐
33
4th place solution for the Google Universal Image Embedding Kaggle Challenge. Instance-Level Recognition workshop at ECCV 2022
Mm Dfn
⭐
31
Source code for ICASSP 2022 paper "MM-DFN: Multimodal Dynamic Fusion Network For Emotion Recognition in Conversations"
Circo
⭐
29
[ICCV 2023] - Composed Image Retrieval on Common Objects in context (CIRCO) dataset
Visually Informed Embedding Of Word View
⭐
28
Visually informed embedding of word (VIEW) is a tool for transferring multimodal background knowledge to NLP algorithms.
Mrg
⭐
27
Code for the paper "Multimodal Review Generation for Recommender Systems", WWW'19
Dapt
⭐
26
Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)
Egopat3d
⭐
25
[CVPR 2022] Egocentric Action Target Prediction in 3D
Time Enriched Multimodal Depression Detection
⭐
25
Official source code for the paper: "It’s Just a Matter of Time: Detecting Depression with Time-Enriched Multimodal Transformers"
Valhalla Nmt
⭐
23
Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for Machine Translation"
Mica Movieclip
⭐
22
This repository contains the codebase for MovieCLIP: Visual Scene Recognition in Movies
Ieee_tgrs_ldgnet
⭐
22
Language-aware Domain Generalization Network for Cross-Scene Hyperspectral Image Classification, IEEE TGRS, 2023.
Slp
⭐
20
Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning
Isbertblind
⭐
19
This repository is for the paper "Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding" (CVPR 2023)
Cmpc
⭐
16
[IJCAI2022] Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast
Autort
⭐
16
Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"
Fed Multimodal
⭐
15
FedMultimodal
Msaf
⭐
14
Offical implementation of paper "MSAF: Multimodal Split Attention Fusion"
Vig
⭐
14
Dataset for Visually Indicated Sound Generation by Perceptually Optimized Classification
Vtlm
⭐
13
Cross-lingual Visual Pre-training for Multimodal Machine Translation
Crossget
⭐
13
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.
Multimodal Distillation
⭐
13
Codebase for "Multimodal Distillation for Egocentric Action Recognition" (ICCV 2023)
Plmpapers
⭐
12
A paper list of pre-trained language models (PLMs).
Cross Modal Retrieval
⭐
11
媒体计算实践作业:图像——文本跨模态搜索
Almt
⭐
10
Learning Language-guided Adaptive Hyper-modality Representation for Multimodal Sentiment Analysis
Piano Skills Assessment
⭐
10
Piano Skills Assessment [IEEE MMSP 2021]
Job Recommend Competition
⭐
9
🥇KNOW기반 직업 추천 알고리즘 경진대회 1등 솔루션입니다🥇
Prml
⭐
9
Multimodal Fully Convolutional Neural networks for Semantic Segmentation.
Numerical Hybrid Qa Literature
⭐
9
A list of Numerical Multimodal reasoning papers and their implementation
Pywikimm
⭐
9
Collects a multimodal dataset of Wikipedia articles and their images
Cxrmate
⭐
8
CXRMate: Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation
Mgn
⭐
8
Official implementation for MGN
Gato
⭐
8
Plug in and play Implementation of "A Generalist Agent" by Deepmind.
Gc Splem
⭐
8
Survival Prediction for Gastric Cancer via Multimodal Learning of Whole Slide Images and Gene Expression -- BIBM 2022
Mmae
⭐
8
Package for Multimodal Autoencoders in TensorFlow / Keras
Visual_question_generation
⭐
7
Torch code for Visual Question Generation
Diverse_and_specific_image_captioning
⭐
7
Unsupervised specificity-guided optimization of Image Captioning models to encourage meaningful diversity in the generated captions.
Kosmosg
⭐
7
My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"
Deepguide
⭐
7
Deep Multimodal Guidance for Medical Image Classification: https://arxiv.org/pdf/2203.05683.pdf
Itra
⭐
7
A codebase for flexible and efficient Image Text Representation Alignment
Guidance Based Video Grounding
⭐
6
The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"
Iros2018_ws
⭐
6
End-to-end multimodal emotion and gender recognition with dynamic weights of joint loss
Multimodal Sentiment Analysis For Health Navigation
⭐
6
Emotion recognition methods through facial expression, speeches, audios, and multimodal data
Vqa
⭐
6
Visual Question Answering System
Bipolar Disorder
⭐
6
automatic recognition of bipolar disorder based on a multi-modal machine learning framework
Mug Bench
⭐
6
Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields
Multiviewcropclassification
⭐
5
Public repository of our IGARSS 2023 submission
Autobot
⭐
5
An autoML for explainable text classification.
1-99 of 99 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.