Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python multimodal
multimodal
x
python
x
172 search results found
Poda
⭐
86
[ICCV 2023] Official implementation of "PØDA: Prompt-driven Zero-shot Domain Adaptation"
Mammut Pytorch
⭐
85
Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch
Zorro Pytorch
⭐
83
Implementation of Zorro, Masked Multimodal Transformer, in Pytorch
Hyperrim
⭐
83
Vip Llava
⭐
81
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Rt X
⭐
77
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"
Visualrwkv
⭐
75
VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.
Minklocmultimodal
⭐
73
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition
Tempus
⭐
72
C++ framework to develop multimodal path planning requests
Diverse Structure Inpainting
⭐
72
CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"
Swarms Pytorch
⭐
67
Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊
Kopa
⭐
67
[Paper][Preprint 2023] Making Large Language Models Perform Better in Knowledge Graph Completion
Awesome Rvos
⭐
66
Referring Video Object Segmentation / Multi-Object Tracking Repo
Hvpnet
⭐
66
Code for the NAACL2022 paper "Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction"
Ll3da
⭐
65
"LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.
Obelics
⭐
63
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.
Audiotoken
⭐
61
This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
Gemini Pro Bot
⭐
59
A Python Telegram bot powered by Google's gemini-pro LLM API
Yuren Baichuan 7b
⭐
58
基于baichuan-7b的开源多模态大语言模型
Bdm Db1
⭐
58
A large-scale multi-modal pre-trained model
Mkg_analogy
⭐
56
Code and datasets for the ICLR2023 paper "Multimodal Analogical Reasoning over Knowledge Graphs."
Hinet
⭐
56
Code for TMI 2020 "Hi-Net: Hybrid-fusion Network for Multi-modal MR Image Synthesis"
Adviser
⭐
56
ADvISER is a flexible framework to encourage task-oriented dialog system research & development
Multi_token
⭐
54
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Kosmos X
⭐
53
The Next Generation Multi-Modality Superintelligence
Cmg
⭐
52
The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)
Mix Generation
⭐
51
MixGen: A New Multi-Modal Data Augmentation
Graph_distillation
⭐
51
Graph Distillation for Action Detection
Embracenet
⭐
50
Robust multimodal integration method implemented in PyTorch and TensorFlow
Openbg Img
⭐
49
Baselines for CCKS 2022 Task "Link Prediction for Multimodal Product Knowledge Graph"
Cvt2distilgpt2
⭐
46
Improving Chest X-Ray Report Generation by Leveraging Warm-Starting
Distill Bev
⭐
46
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023)
Keras Llm Robot
⭐
46
A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.
Move
⭐
44
MOVE (Multi-Omics Variational autoEncoder) for integrating multi-omics data and identifying cross modal associations
Letmedoit
⭐
42
An advanced AI assistant that leverages the capabilities of ChatGPT API, Gemini Pro, and AutoGen, enabling it both to engage in conversations and to execute computing tasks on local devices.
Pali
⭐
42
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
Sloper4d
⭐
39
SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments (CVPR2023)
Brats 2020
⭐
38
A complete pipeline for BraTS 2020
Diffblender
⭐
38
DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models
Keymorph
⭐
38
Robust multimodal brain registration via keypoints
Sparsesync
⭐
38
Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)
Videodb Python
⭐
37
VideoDB Python SDK
Iperceive
⭐
36
Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Python3 | PyTorch | CNNs | Causality | Reasoning | LSTMs | Transformers | Multi-Head Self Attention | Published in IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
Flipped Vqa
⭐
35
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
Gemini Multimodal Chat
⭐
35
Multimodal Chat with Gemini API
Infi
⭐
34
InFi is a library for building input filters for resource-efficient inference.
Bm3
⭐
34
Pytorch implementation for "Bootstrap Latent Representations for Multi-modal Recommendation"-WWW'23
Kosmos2.5
⭐
34
My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"
Vle
⭐
33
VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)
Deep Multimodal Subspace Clustering Networks
⭐
33
Tensorflow implementation of "Deep Multimodal Subspace Clustering Networks"
Mmchat
⭐
33
[LREC] MMChat: Multi-Modal Chat Dataset on Social Media
Uvr Nmt
⭐
32
Neural Machine Translation with universal Visual Representation (ICLR 2020)
Figstep
⭐
32
Jailbreaking Large Vision-language Models via Typographic Visual Prompts
Multi View Ae
⭐
32
Multi-view-AE: An extensive collection of multi-modal autoencoders implemented in a modular, scikit-learn style framework.
Gakg
⭐
31
GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data.
Mambatransformer
⭐
31
Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling
Ner Multimodal Pytorch
⭐
30
Pytorch Implementation of "Adaptive Co-attention Network for Named Entity Recognition in Tweets" (AAAI 2018)
A2summ
⭐
30
The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)
Iais
⭐
27
[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval
Coesot
⭐
27
A large-scale benchmark dataset for color-event based visual tracking
Citrus Farm Dataset
⭐
26
Multimodal Dataset for Localization, Mapping and Crop Monitoring in Citrus Tree Farms, ISVC 2023
Meaformer
⭐
24
[Paper][ACM MM 2023] MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid
Inverse Dall E For Optical Character Recognition
⭐
24
Inverse DALL-E for Optical Character Recognition
Ame
⭐
24
State-of-the-art, multi-modal virtual assistant framework powered by LLaMA. Ame is not complete and is under active development.
Trar Vqa
⭐
23
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task
Modality Transferable Mer
⭐
23
Modality-Transferable-MER, multimodal emotion recognition model with zero-shot and few-shot abilities.
Botality Ii
⭐
23
telegram bot for stable diffusion, text-to-speech and large language models, such as llama and alpaca
Protein Localization Transformer
⭐
22
Code for CELL-E: Biological Zero-Shot Text-to-Image Synthesis for Protein Localization Prediction
Slp
⭐
20
Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning
Mmgl
⭐
19
Multimodal Graph Learning: how to encode multiple multimodal neighbors with their relations into LLMs
Reform Eval
⭐
19
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
Visualwebarena
⭐
19
VisualWebArena is a benchmark for multimodal agents.
Neko
⭐
19
In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks
Mm23 Missrec
⭐
18
The code for the paper "MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation" (ACM MM 2023).
Hashtag Prediction Pytorch
⭐
17
Multimodal Hashtag Prediction with instagram data & pytorch (2nd Place on OpenResource Hackathon 2019)
Chinese Layoutlm V2
⭐
17
中文文档理解多模态语言模型,支持多模态文档信息抽取,文档embedding
Usearch Images
⭐
17
Semantic Search demo featuring UForm, USearch, UCall, and StreamLit, to visual and retrieve from image datasets, similar to "CLIP Retrieval"
Flip
⭐
17
Official implementation of the paper "FLIP: Cross-domain Face Anti-spoofing with Language Guidance". (ICCV 2023)
Continuemkgc
⭐
17
Code for the paper "Continual Multimodal Knowledge Graph Construction"
Autort
⭐
16
Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"
Mt Net
⭐
16
Multi-scale Transformer Network for Cross-Modality MR Image Synthesis (IEEE TMI)
Omnifusion
⭐
16
OmniFusion — a multimodal model to communicate using text and images
Nemar
⭐
15
[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
Mplug
⭐
15
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
Ocrautoscore
⭐
14
OCR自动化阅卷项目
Uniteandconquer
⭐
14
[CVPR '23] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
Videonavqa
⭐
14
An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)
Lm Research Hub
⭐
14
Language Modeling Research Hub, a comprehensive compendium for enthusiasts and scholars delving into the fascinating realm of language models (LMs), with a particular focus on large language models (LLMs)
Real Gemini
⭐
14
Real-time video understanding and interaction through text,audio,image and video with large multi-modal model. 利用多模态大模型的实时视频理解和交互框架,通过文本、语音、图像和视频和这是世界进行问答和交流。
Described
⭐
14
Automatically describe images sent by users on popular media platforms, incredibly useful for the visually impaired and for complicated imagery.
Afft
⭐
14
Code for the paper: Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation.
Clip Openness
⭐
13
Code for "Delving into the Openness of CLIP"
Fakeout
⭐
13
Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection
Mmtod
⭐
13
Multi-modal Thermal Object Detector
Vault
⭐
13
This repo contains the original implementation of VAuLT, the Vision-and-Augmented-Language Transformer. We provide instructions to download some multimodal social-media datasets, and scripts to experiment with. VAuLT is a stack of Transformers, a LM like BERT that preprocesses the text input of ViLT
Top Reid
⭐
13
【AAAI2024】TOP-ReID: Multi-spectral Object Re-Identification with Token Permutation
Mmc
⭐
13
Mix Stage
⭐
12
Official Repository for the paper Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach published in ECCV 2020 (https://arxiv.org/abs/2007.12553)
Lipnet
⭐
12
LipNet with gluon
Multiaug
⭐
12
Multi-modal data augmentation for machine learning
Related Searches
Python Dataset (14,792)
Python Docker (14,113)
Python Machine Learning (14,099)
Python Tensorflow (13,736)
Python Deep Learning (13,092)
Python Natural Language Processing (9,064)
Python Artificial Intelligence (8,580)
Python Pytorch (7,877)
Python Neural (7,444)
Python Paper (6,586)
101-172 of 172 search results
< Previous
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.