Awesome Open Source

Programming Languages

Search results for multimodal learning

multimodal-learning x

99 search results found

Awesome Multimodal Ml ⭐ 5,399

Reading list for research topics in multimodal machine learning

Open_flamingo ⭐ 3,115

An open-source framework for training large multimodal models.

Awesome Multimodal Research ⭐ 1,133

A curated list of Multimodal Related Research.

Iccv 2023 Papers ⭐ 806

ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!

A Comparative Framework for Multimodal Recommender Systems

Clip4clip ⭐ 663

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Multimodal Toolkit ⭐ 533

Multimodal model for text and tabular data with HuggingFace transformers as building block for text data

Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.

Awsome Deep Learning For Video Analysis ⭐ 507

Papers, code and datasets about deep learning and multi-modal learning for video analysis

[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation

Unireplknet ⭐ 456

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Multimodal Deep Learning ⭐ 433

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Omninet ⭐ 426

Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain

Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!

[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

Xpretrain ⭐ 382

Multi-modality pre-training

Multibench ⭐ 356

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning

Awesome Vision And Language ⭐ 342

A curated list of awesome vision and language resources (still under construction... stay tuned!)

Cm3leon ⭐ 288

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

Awesome Multimodal Llm ⭐ 243

Research Trends in LLM-guided Multimodal Learning.

Mvits_for_class_agnostic_od ⭐ 240

[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".

[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"

Awesome Multimodal In Medical Imaging ⭐ 207

A collection of resources on applications of multi-modal learning in medical imaging.

Multimodal Ml Music ⭐ 202

List of academic resources on Multimodal ML for Music

[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"

Gpt4point ⭐ 181

GPT4Point: A Unified Framework for Point-Language Understanding and Generation.

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Icassp 2023 Papers ⭐ 139

ICASSP 2023 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023 conference. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!

Topicnet ⭐ 129

Interface for easier topic modelling.

Tubedetr ⭐ 127

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

Frozenbilm ⭐ 120

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Awesome Self Supervised Multimodal Learning ⭐ 111

A curated list of self-supervised multimodal learning resources.

Missing_aware_prompts ⭐ 101

Multimodal Prompting with Missing Modalities for Visual Recognition, CVPR'23

Just Ask ⭐ 101

[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

Vidchapters ⭐ 93

[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale

Vista Net ⭐ 82

Code for the paper "VistaNet: Visual Aspect Attention Network for Multimodal Sentiment Analysis", AAAI'19

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

Learning2dance_cag_2020 ⭐ 79

PyTorch implementation of our graph convolutional network (GCN) for human motion generation from music. Also with paired dance-music data for training!

[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

[ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion

Baidubigdata19 Urfc ⭐ 72

my solution with 0.67 accuracy

Code for the NAACL2022 paper "Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction"

Favdbench ⭐ 62

[CVPR 2023] Official implementation of the paper: Fine-grained Audible Video Description

Gmu Mmimdb ⭐ 62

Source code for training Gated Multimodal Units on MM-IMDb dataset

General Gpt ⭐ 61

Multimodal Knowledge Graph ⭐ 58

A collection of resources on multimodal knowledge graph, including datasets, papers and contests.

[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

Multimodal Vae Public ⭐ 52

A PyTorch implementation of "Multimodal Generative Models for Scalable Weakly-Supervised Learning" (https://arxiv.org/abs/1802.05335)

Multiviz ⭐ 48

[ICLR 2023] MultiViz: Towards Visualizing and Understanding Multimodal Models

Cova Web Object Detection ⭐ 37

A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!

Official implementation of AdaMML. https://arxiv.org/abs/2105.05165.

G Universal Clip ⭐ 33

4th place solution for the Google Universal Image Embedding Kaggle Challenge. Instance-Level Recognition workshop at ECCV 2022

Source code for ICASSP 2022 paper "MM-DFN: Multimodal Dynamic Fusion Network For Emotion Recognition in Conversations"

[ICCV 2023] - Composed Image Retrieval on Common Objects in context (CIRCO) dataset

Visually Informed Embedding Of Word View ⭐ 28

Visually informed embedding of word (VIEW) is a tool for transferring multimodal background knowledge to NLP algorithms.

Code for the paper "Multimodal Review Generation for Recommender Systems", WWW'19

Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)

Egopat3d ⭐ 25

[CVPR 2022] Egocentric Action Target Prediction in 3D

Time Enriched Multimodal Depression Detection ⭐ 25

Official source code for the paper: "It’s Just a Matter of Time: Detecting Depression with Time-Enriched Multimodal Transformers"

Valhalla Nmt ⭐ 23

Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for Machine Translation"

Mica Movieclip ⭐ 22

This repository contains the codebase for MovieCLIP: Visual Scene Recognition in Movies

Ieee_tgrs_ldgnet ⭐ 22

Language-aware Domain Generalization Network for Cross-Scene Hyperspectral Image Classification, IEEE TGRS, 2023.

Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning

Isbertblind ⭐ 19

This repository is for the paper "Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding" (CVPR 2023)

[IJCAI2022] Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast

Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"

Fed Multimodal ⭐ 15

Offical implementation of paper "MSAF: Multimodal Split Attention Fusion"

Dataset for Visually Indicated Sound Generation by Perceptually Optimized Classification

Cross-lingual Visual Pre-training for Multimodal Machine Translation

Crossget ⭐ 13

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.

Multimodal Distillation ⭐ 13

Codebase for "Multimodal Distillation for Egocentric Action Recognition" (ICCV 2023)

Plmpapers ⭐ 12

A paper list of pre-trained language models (PLMs).

Cross Modal Retrieval ⭐ 11

媒体计算实践作业：图像——文本跨模态搜索

Learning Language-guided Adaptive Hyper-modality Representation for Multimodal Sentiment Analysis

Piano Skills Assessment ⭐ 10

Piano Skills Assessment [IEEE MMSP 2021]

Job Recommend Competition ⭐ 9

🥇KNOW기반 직업 추천 알고리즘 경진대회 1등 솔루션입니다🥇

Multimodal Fully Convolutional Neural networks for Semantic Segmentation.

Numerical Hybrid Qa Literature ⭐ 9

A list of Numerical Multimodal reasoning papers and their implementation

Collects a multimodal dataset of Wikipedia articles and their images

CXRMate: Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation

Official implementation for MGN

Plug in and play Implementation of "A Generalist Agent" by Deepmind.

Survival Prediction for Gastric Cancer via Multimodal Learning of Whole Slide Images and Gene Expression -- BIBM 2022

Package for Multimodal Autoencoders in TensorFlow / Keras

Visual_question_generation ⭐ 7

Torch code for Visual Question Generation

Diverse_and_specific_image_captioning ⭐ 7

Unsupervised specificity-guided optimization of Image Captioning models to encourage meaningful diversity in the generated captions.

My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"

Deepguide ⭐ 7

Deep Multimodal Guidance for Medical Image Classification: https://arxiv.org/pdf/2203.05683.pdf

A codebase for flexible and efficient Image Text Representation Alignment

Guidance Based Video Grounding ⭐ 6

The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"

Iros2018_ws ⭐ 6

End-to-end multimodal emotion and gender recognition with dynamic weights of joint loss

Multimodal Sentiment Analysis For Health Navigation ⭐ 6

Emotion recognition methods through facial expression, speeches, audios, and multimodal data

Visual Question Answering System

Bipolar Disorder ⭐ 6

automatic recognition of bipolar disorder based on a multi-modal machine learning framework

Mug Bench ⭐ 6

Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields

Multiviewcropclassification ⭐ 5

Public repository of our IGARSS 2023 submission

An autoML for explainable text classification.

1-99 of 99 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.