Awesome Open Source

Programming Languages

Search results for python multimodal learning

multimodal-learning x

74 search results found

Open_flamingo ⭐ 3,115

An open-source framework for training large multimodal models.

Awesome Multimodal Research ⭐ 1,133

A curated list of Multimodal Related Research.

Iccv 2023 Papers ⭐ 806

ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!

A Comparative Framework for Multimodal Recommender Systems

Clip4clip ⭐ 663

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Multimodal Toolkit ⭐ 533

Multimodal model for text and tabular data with HuggingFace transformers as building block for text data

Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.

[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation

Unireplknet ⭐ 456

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Omninet ⭐ 426

Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain

Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!

[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

Xpretrain ⭐ 382

Multi-modality pre-training

Cm3leon ⭐ 288

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

Mvits_for_class_agnostic_od ⭐ 240

[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".

[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"

[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"

Gpt4point ⭐ 181

GPT4Point: A Unified Framework for Point-Language Understanding and Generation.

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Topicnet ⭐ 129

Interface for easier topic modelling.

Tubedetr ⭐ 127

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

Frozenbilm ⭐ 120

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Missing_aware_prompts ⭐ 101

Multimodal Prompting with Missing Modalities for Visual Recognition, CVPR'23

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

Vista Net ⭐ 82

Code for the paper "VistaNet: Visual Aspect Attention Network for Multimodal Sentiment Analysis", AAAI'19

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

Learning2dance_cag_2020 ⭐ 79

PyTorch implementation of our graph convolutional network (GCN) for human motion generation from music. Also with paired dance-music data for training!

[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

[ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion

Baidubigdata19 Urfc ⭐ 72

my solution with 0.67 accuracy

Code for the NAACL2022 paper "Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction"

Gmu Mmimdb ⭐ 62

Source code for training Gated Multimodal Units on MM-IMDb dataset

Favdbench ⭐ 62

[CVPR 2023] Official implementation of the paper: Fine-grained Audible Video Description

[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

Multimodal Vae Public ⭐ 52

A PyTorch implementation of "Multimodal Generative Models for Scalable Weakly-Supervised Learning" (https://arxiv.org/abs/1802.05335)

Multiviz ⭐ 48

[ICLR 2023] MultiViz: Towards Visualizing and Understanding Multimodal Models

Cova Web Object Detection ⭐ 37

A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!

Official implementation of AdaMML. https://arxiv.org/abs/2105.05165.

Source code for ICASSP 2022 paper "MM-DFN: Multimodal Dynamic Fusion Network For Emotion Recognition in Conversations"

[ICCV 2023] - Composed Image Retrieval on Common Objects in context (CIRCO) dataset

Visually Informed Embedding Of Word View ⭐ 28

Visually informed embedding of word (VIEW) is a tool for transferring multimodal background knowledge to NLP algorithms.

Code for the paper "Multimodal Review Generation for Recommender Systems", WWW'19

Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)

Time Enriched Multimodal Depression Detection ⭐ 25

Official source code for the paper: "It’s Just a Matter of Time: Detecting Depression with Time-Enriched Multimodal Transformers"

Valhalla Nmt ⭐ 23

Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for Machine Translation"

Ieee_tgrs_ldgnet ⭐ 22

Language-aware Domain Generalization Network for Cross-Scene Hyperspectral Image Classification, IEEE TGRS, 2023.

Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning

Isbertblind ⭐ 19

This repository is for the paper "Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding" (CVPR 2023)

Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"

[IJCAI2022] Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast

Fed Multimodal ⭐ 15

Offical implementation of paper "MSAF: Multimodal Split Attention Fusion"

Dataset for Visually Indicated Sound Generation by Perceptually Optimized Classification

Multimodal Distillation ⭐ 13

Codebase for "Multimodal Distillation for Egocentric Action Recognition" (ICCV 2023)

Cross-lingual Visual Pre-training for Multimodal Machine Translation

Learning Language-guided Adaptive Hyper-modality Representation for Multimodal Sentiment Analysis

Piano Skills Assessment ⭐ 10

Piano Skills Assessment [IEEE MMSP 2021]

Collects a multimodal dataset of Wikipedia articles and their images

Job Recommend Competition ⭐ 9

🥇KNOW기반 직업 추천 알고리즘 경진대회 1등 솔루션입니다🥇

Multimodal Fully Convolutional Neural networks for Semantic Segmentation.

Official implementation for MGN

Plug in and play Implementation of "A Generalist Agent" by Deepmind.

Survival Prediction for Gastric Cancer via Multimodal Learning of Whole Slide Images and Gene Expression -- BIBM 2022

Package for Multimodal Autoencoders in TensorFlow / Keras

CXRMate: Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation

Diverse_and_specific_image_captioning ⭐ 7

Unsupervised specificity-guided optimization of Image Captioning models to encourage meaningful diversity in the generated captions.

Deepguide ⭐ 7

Deep Multimodal Guidance for Medical Image Classification: https://arxiv.org/pdf/2203.05683.pdf

A codebase for flexible and efficient Image Text Representation Alignment

My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"

Visual Question Answering System

Mug Bench ⭐ 6

Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields

Iros2018_ws ⭐ 6

End-to-end multimodal emotion and gender recognition with dynamic weights of joint loss

Bipolar Disorder ⭐ 6

automatic recognition of bipolar disorder based on a multi-modal machine learning framework

Multiviewcropclassification ⭐ 5

Public repository of our IGARSS 2023 submission

An autoML for explainable text classification.

Related Searches

Python Machine Learning (20,195)

Python Dataset (14,792)

Python Tensorflow (13,736)

Python Deep Learning (13,092)

Python Jupyter Notebook (12,976)

Python Network (11,495)

Python Html (10,924)

Python Natural Language Processing (9,064)

Python Artificial Intelligence (8,580)

Python Pytorch (7,877)

1-74 of 74 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.