Awesome Open Source

Programming Languages

Search results for python multimodal

172 search results found

[ICCV 2023] Official implementation of "PØDA: Prompt-driven Zero-shot Domain Adaptation"

Mammut Pytorch ⭐ 85

Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch

Zorro Pytorch ⭐ 83

Implementation of Zorro, Masked Multimodal Transformer, in Pytorch

Hyperrim ⭐ 83

Vip Llava ⭐ 81

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"

Visualrwkv ⭐ 75

VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.

Minklocmultimodal ⭐ 73

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition

C++ framework to develop multimodal path planning requests

Diverse Structure Inpainting ⭐ 72

CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Swarms Pytorch ⭐ 67

Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊

[Paper][Preprint 2023] Making Large Language Models Perform Better in Knowledge Graph Completion

Awesome Rvos ⭐ 66

Referring Video Object Segmentation / Multi-Object Tracking Repo

Code for the NAACL2022 paper "Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction"

"LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.

Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.

Audiotoken ⭐ 61

This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

Gemini Pro Bot ⭐ 59

A Python Telegram bot powered by Google's gemini-pro LLM API

Yuren Baichuan 7b ⭐ 58

基于baichuan-7b的开源多模态大语言模型

A large-scale multi-modal pre-trained model

Mkg_analogy ⭐ 56

Code and datasets for the ICLR2023 paper "Multimodal Analogical Reasoning over Knowledge Graphs."

Code for TMI 2020 "Hi-Net: Hybrid-fusion Network for Multi-modal MR Image Synthesis"

ADvISER is a flexible framework to encourage task-oriented dialog system research & development

Multi_token ⭐ 54

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Kosmos X ⭐ 53

The Next Generation Multi-Modality Superintelligence

The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)

Mix Generation ⭐ 51

MixGen: A New Multi-Modal Data Augmentation

Graph_distillation ⭐ 51

Graph Distillation for Action Detection

Embracenet ⭐ 50

Robust multimodal integration method implemented in PyTorch and TensorFlow

Openbg Img ⭐ 49

Baselines for CCKS 2022 Task "Link Prediction for Multimodal Product Knowledge Graph"

Cvt2distilgpt2 ⭐ 46

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

Distill Bev ⭐ 46

DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023)

Keras Llm Robot ⭐ 46

A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.

MOVE (Multi-Omics Variational autoEncoder) for integrating multi-omics data and identifying cross modal associations

Letmedoit ⭐ 42

An advanced AI assistant that leverages the capabilities of ChatGPT API, Gemini Pro, and AutoGen, enabling it both to engage in conversations and to execute computing tasks on local devices.

Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"

Sloper4d ⭐ 39

SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments (CVPR2023)

Brats 2020 ⭐ 38

A complete pipeline for BraTS 2020

Diffblender ⭐ 38

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models

Keymorph ⭐ 38

Robust multimodal brain registration via keypoints

Sparsesync ⭐ 38

Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)

Videodb Python ⭐ 37

VideoDB Python SDK

Iperceive ⭐ 36

Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Python3 | PyTorch | CNNs | Causality | Reasoning | LSTMs | Transformers | Multi-Head Self Attention | Published in IEEE Winter Conference on Applications of Computer Vision (WACV) 2021

Flipped Vqa ⭐ 35

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

Gemini Multimodal Chat ⭐ 35

Multimodal Chat with Gemini API

InFi is a library for building input filters for resource-efficient inference.

Pytorch implementation for "Bootstrap Latent Representations for Multi-modal Recommendation"-WWW'23

Kosmos2.5 ⭐ 34

My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"

VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)

Deep Multimodal Subspace Clustering Networks ⭐ 33

Tensorflow implementation of "Deep Multimodal Subspace Clustering Networks"

[LREC] MMChat: Multi-Modal Chat Dataset on Social Media

Neural Machine Translation with universal Visual Representation (ICLR 2020)

Jailbreaking Large Vision-language Models via Typographic Visual Prompts

Multi View Ae ⭐ 32

Multi-view-AE: An extensive collection of multi-modal autoencoders implemented in a modular, scikit-learn style framework.

GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data.

Mambatransformer ⭐ 31

Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling

Ner Multimodal Pytorch ⭐ 30

Pytorch Implementation of "Adaptive Co-attention Network for Named Entity Recognition in Tweets" (AAAI 2018)

The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)

[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval

A large-scale benchmark dataset for color-event based visual tracking

Citrus Farm Dataset ⭐ 26

Multimodal Dataset for Localization, Mapping and Crop Monitoring in Citrus Tree Farms, ISVC 2023

Meaformer ⭐ 24

[Paper][ACM MM 2023] MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid

Inverse Dall E For Optical Character Recognition ⭐ 24

Inverse DALL-E for Optical Character Recognition

State-of-the-art, multi-modal virtual assistant framework powered by LLaMA. Ame is not complete and is under active development.

Trar Vqa ⭐ 23

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

Modality Transferable Mer ⭐ 23

Modality-Transferable-MER, multimodal emotion recognition model with zero-shot and few-shot abilities.

Botality Ii ⭐ 23

telegram bot for stable diffusion, text-to-speech and large language models, such as llama and alpaca

Protein Localization Transformer ⭐ 22

Code for CELL-E: Biological Zero-Shot Text-to-Image Synthesis for Protein Localization Prediction

Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning

Multimodal Graph Learning: how to encode multiple multimodal neighbors with their relations into LLMs

Reform Eval ⭐ 19

An benchmark for evaluating the capabilities of large vision-language models (LVLMs)

Visualwebarena ⭐ 19

VisualWebArena is a benchmark for multimodal agents.

In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks

Mm23 Missrec ⭐ 18

The code for the paper "MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation" (ACM MM 2023).

Hashtag Prediction Pytorch ⭐ 17

Multimodal Hashtag Prediction with instagram data & pytorch (2nd Place on OpenResource Hackathon 2019)

Chinese Layoutlm V2 ⭐ 17

中文文档理解多模态语言模型，支持多模态文档信息抽取，文档embedding

Usearch Images ⭐ 17

Semantic Search demo featuring UForm, USearch, UCall, and StreamLit, to visual and retrieve from image datasets, similar to "CLIP Retrieval"

Official implementation of the paper "FLIP: Cross-domain Face Anti-spoofing with Language Guidance". (ICCV 2023)

Continuemkgc ⭐ 17

Code for the paper "Continual Multimodal Knowledge Graph Construction"

Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"

Multi-scale Transformer Network for Cross-Modality MR Image Synthesis (IEEE TMI)

Omnifusion ⭐ 16

OmniFusion — a multimodal model to communicate using text and images

[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)

Ocrautoscore ⭐ 14

OCR自动化阅卷项目

Uniteandconquer ⭐ 14

[CVPR '23] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

Videonavqa ⭐ 14

An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)

Lm Research Hub ⭐ 14

Language Modeling Research Hub, a comprehensive compendium for enthusiasts and scholars delving into the fascinating realm of language models (LMs), with a particular focus on large language models (LLMs)

Real Gemini ⭐ 14

Real-time video understanding and interaction through text,audio,image and video with large multi-modal model. 利用多模态大模型的实时视频理解和交互框架，通过文本、语音、图像和视频和这是世界进行问答和交流。

Described ⭐ 14

Automatically describe images sent by users on popular media platforms, incredibly useful for the visually impaired and for complicated imagery.

Code for the paper: Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation.

Clip Openness ⭐ 13

Code for "Delving into the Openness of CLIP"

Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection

Multi-modal Thermal Object Detector

This repo contains the original implementation of VAuLT, the Vision-and-Augmented-Language Transformer. We provide instructions to download some multimodal social-media datasets, and scripts to experiment with. VAuLT is a stack of Transformers, a LM like BERT that preprocesses the text input of ViLT

Top Reid ⭐ 13

【AAAI2024】TOP-ReID: Multi-spectral Object Re-Identification with Token Permutation

Mix Stage ⭐ 12

Official Repository for the paper Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach published in ECCV 2020 (https://arxiv.org/abs/2007.12553)

LipNet with gluon

Multiaug ⭐ 12

Multi-modal data augmentation for machine learning

Related Searches

Python Dataset (14,792)

Python Docker (14,113)

Python Machine Learning (14,099)

Python Tensorflow (13,736)

Python Deep Learning (13,092)

Python Natural Language Processing (9,064)

Python Artificial Intelligence (8,580)

Python Pytorch (7,877)

Python Neural (7,444)

Python Paper (6,586)

101-172 of 172 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.