Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python multimodal
multimodal
x
python
x
172 search results found
Andromeda
⭐
92
An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast
Poda
⭐
86
[ICCV 2023] Official implementation of "PØDA: Prompt-driven Zero-shot Domain Adaptation"
Mammut Pytorch
⭐
85
Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch
Hyperrim
⭐
83
Zorro Pytorch
⭐
83
Implementation of Zorro, Masked Multimodal Transformer, in Pytorch
Vip Llava
⭐
81
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Rt X
⭐
77
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"
Visualrwkv
⭐
75
VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.
Minklocmultimodal
⭐
73
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition
Diverse Structure Inpainting
⭐
72
CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"
Tempus
⭐
72
C++ framework to develop multimodal path planning requests
Kopa
⭐
67
[Paper][Preprint 2023] Making Large Language Models Perform Better in Knowledge Graph Completion
Swarms Pytorch
⭐
67
Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊
Hvpnet
⭐
66
Code for the NAACL2022 paper "Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction"
Awesome Rvos
⭐
66
Referring Video Object Segmentation / Multi-Object Tracking Repo
Ll3da
⭐
65
"LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.
Obelics
⭐
63
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.
Audiotoken
⭐
61
This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
Gemini Pro Bot
⭐
59
A Python Telegram bot powered by Google's gemini-pro LLM API
Bdm Db1
⭐
58
A large-scale multi-modal pre-trained model
Yuren Baichuan 7b
⭐
58
基于baichuan-7b的开源多模态大语言模型
Adviser
⭐
56
ADvISER is a flexible framework to encourage task-oriented dialog system research & development
Mkg_analogy
⭐
56
Code and datasets for the ICLR2023 paper "Multimodal Analogical Reasoning over Knowledge Graphs."
Hinet
⭐
56
Code for TMI 2020 "Hi-Net: Hybrid-fusion Network for Multi-modal MR Image Synthesis"
Multi_token
⭐
54
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Kosmos X
⭐
53
The Next Generation Multi-Modality Superintelligence
Cmg
⭐
52
The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)
Mix Generation
⭐
51
MixGen: A New Multi-Modal Data Augmentation
Graph_distillation
⭐
51
Graph Distillation for Action Detection
Embracenet
⭐
50
Robust multimodal integration method implemented in PyTorch and TensorFlow
Openbg Img
⭐
49
Baselines for CCKS 2022 Task "Link Prediction for Multimodal Product Knowledge Graph"
Distill Bev
⭐
46
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023)
Cvt2distilgpt2
⭐
46
Improving Chest X-Ray Report Generation by Leveraging Warm-Starting
Keras Llm Robot
⭐
46
A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.
Move
⭐
44
MOVE (Multi-Omics Variational autoEncoder) for integrating multi-omics data and identifying cross modal associations
Letmedoit
⭐
42
An advanced AI assistant that leverages the capabilities of ChatGPT API, Gemini Pro, and AutoGen, enabling it both to engage in conversations and to execute computing tasks on local devices.
Pali
⭐
42
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
Sloper4d
⭐
39
SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments (CVPR2023)
Keymorph
⭐
38
Robust multimodal brain registration via keypoints
Diffblender
⭐
38
DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models
Sparsesync
⭐
38
Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)
Brats 2020
⭐
38
A complete pipeline for BraTS 2020
Videodb Python
⭐
37
VideoDB Python SDK
Iperceive
⭐
36
Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Python3 | PyTorch | CNNs | Causality | Reasoning | LSTMs | Transformers | Multi-Head Self Attention | Published in IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
Flipped Vqa
⭐
35
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
Gemini Multimodal Chat
⭐
35
Multimodal Chat with Gemini API
Kosmos2.5
⭐
34
My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"
Infi
⭐
34
InFi is a library for building input filters for resource-efficient inference.
Bm3
⭐
34
Pytorch implementation for "Bootstrap Latent Representations for Multi-modal Recommendation"-WWW'23
Mmchat
⭐
33
[LREC] MMChat: Multi-Modal Chat Dataset on Social Media
Deep Multimodal Subspace Clustering Networks
⭐
33
Tensorflow implementation of "Deep Multimodal Subspace Clustering Networks"
Vle
⭐
33
VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)
Figstep
⭐
32
Jailbreaking Large Vision-language Models via Typographic Visual Prompts
Uvr Nmt
⭐
32
Neural Machine Translation with universal Visual Representation (ICLR 2020)
Multi View Ae
⭐
32
Multi-view-AE: An extensive collection of multi-modal autoencoders implemented in a modular, scikit-learn style framework.
Gakg
⭐
31
GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data.
Mambatransformer
⭐
31
Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling
Ner Multimodal Pytorch
⭐
30
Pytorch Implementation of "Adaptive Co-attention Network for Named Entity Recognition in Tweets" (AAAI 2018)
A2summ
⭐
30
The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)
Iais
⭐
27
[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval
Coesot
⭐
27
A large-scale benchmark dataset for color-event based visual tracking
Citrus Farm Dataset
⭐
26
Multimodal Dataset for Localization, Mapping and Crop Monitoring in Citrus Tree Farms, ISVC 2023
Inverse Dall E For Optical Character Recognition
⭐
24
Inverse DALL-E for Optical Character Recognition
Meaformer
⭐
24
[Paper][ACM MM 2023] MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid
Ame
⭐
24
State-of-the-art, multi-modal virtual assistant framework powered by LLaMA. Ame is not complete and is under active development.
Botality Ii
⭐
23
telegram bot for stable diffusion, text-to-speech and large language models, such as llama and alpaca
Modality Transferable Mer
⭐
23
Modality-Transferable-MER, multimodal emotion recognition model with zero-shot and few-shot abilities.
Trar Vqa
⭐
23
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task
Protein Localization Transformer
⭐
22
Code for CELL-E: Biological Zero-Shot Text-to-Image Synthesis for Protein Localization Prediction
Slp
⭐
20
Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning
Neko
⭐
19
In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks
Reform Eval
⭐
19
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
Visualwebarena
⭐
19
VisualWebArena is a benchmark for multimodal agents.
Mmgl
⭐
19
Multimodal Graph Learning: how to encode multiple multimodal neighbors with their relations into LLMs
Mm23 Missrec
⭐
18
The code for the paper "MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation" (ACM MM 2023).
Continuemkgc
⭐
17
Code for the paper "Continual Multimodal Knowledge Graph Construction"
Usearch Images
⭐
17
Semantic Search demo featuring UForm, USearch, UCall, and StreamLit, to visual and retrieve from image datasets, similar to "CLIP Retrieval"
Chinese Layoutlm V2
⭐
17
中文文档理解多模态语言模型,支持多模态文档信息抽取,文档embedding
Hashtag Prediction Pytorch
⭐
17
Multimodal Hashtag Prediction with instagram data & pytorch (2nd Place on OpenResource Hackathon 2019)
Flip
⭐
17
Official implementation of the paper "FLIP: Cross-domain Face Anti-spoofing with Language Guidance". (ICCV 2023)
Omnifusion
⭐
16
OmniFusion — a multimodal model to communicate using text and images
Mt Net
⭐
16
Multi-scale Transformer Network for Cross-Modality MR Image Synthesis (IEEE TMI)
Autort
⭐
16
Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"
Nemar
⭐
15
[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
Mplug
⭐
15
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
Videonavqa
⭐
14
An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)
Described
⭐
14
Automatically describe images sent by users on popular media platforms, incredibly useful for the visually impaired and for complicated imagery.
Afft
⭐
14
Code for the paper: Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation.
Lm Research Hub
⭐
14
Language Modeling Research Hub, a comprehensive compendium for enthusiasts and scholars delving into the fascinating realm of language models (LMs), with a particular focus on large language models (LLMs)
Real Gemini
⭐
14
Real-time video understanding and interaction through text,audio,image and video with large multi-modal model. 利用多模态大模型的实时视频理解和交互框架,通过文本、语音、图像和视频和这是世界进行问答和交流。
Ocrautoscore
⭐
14
OCR自动化阅卷项目
Uniteandconquer
⭐
14
[CVPR '23] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
Fakeout
⭐
13
Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection
Mmtod
⭐
13
Multi-modal Thermal Object Detector
Clip Openness
⭐
13
Code for "Delving into the Openness of CLIP"
Mmc
⭐
13
Top Reid
⭐
13
【AAAI2024】TOP-ReID: Multi-spectral Object Re-Identification with Token Permutation
Vault
⭐
13
This repo contains the original implementation of VAuLT, the Vision-and-Augmented-Language Transformer. We provide instructions to download some multimodal social-media datasets, and scripts to experiment with. VAuLT is a stack of Transformers, a LM like BERT that preprocesses the text input of ViLT
Multiaug
⭐
12
Multi-modal data augmentation for machine learning
Lipnet
⭐
12
LipNet with gluon
Related Searches
Python Dataset (14,792)
Python Docker (14,113)
Python Machine Learning (14,099)
Python Tensorflow (13,736)
Python Deep Learning (13,092)
Python Natural Language Processing (9,064)
Python Artificial Intelligence (8,580)
Python Pytorch (7,877)
Python Neural (7,444)
Python Paper (6,586)
101-172 of 172 search results
< Previous
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.