Awesome Open Source

Programming Languages

Search results for python multimodal

172 search results found

Andromeda ⭐ 92

An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast

[ICCV 2023] Official implementation of "PØDA: Prompt-driven Zero-shot Domain Adaptation"

Mammut Pytorch ⭐ 85

Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch

Hyperrim ⭐ 83

Zorro Pytorch ⭐ 83

Implementation of Zorro, Masked Multimodal Transformer, in Pytorch

Vip Llava ⭐ 81

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"

Visualrwkv ⭐ 75

VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.

Minklocmultimodal ⭐ 73

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition

Diverse Structure Inpainting ⭐ 72

CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

C++ framework to develop multimodal path planning requests

[Paper][Preprint 2023] Making Large Language Models Perform Better in Knowledge Graph Completion

Swarms Pytorch ⭐ 67

Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊

Code for the NAACL2022 paper "Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction"

Awesome Rvos ⭐ 66

Referring Video Object Segmentation / Multi-Object Tracking Repo

"LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.

Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.

Audiotoken ⭐ 61

This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

Gemini Pro Bot ⭐ 59

A Python Telegram bot powered by Google's gemini-pro LLM API

A large-scale multi-modal pre-trained model

Yuren Baichuan 7b ⭐ 58

基于baichuan-7b的开源多模态大语言模型

ADvISER is a flexible framework to encourage task-oriented dialog system research & development

Mkg_analogy ⭐ 56

Code and datasets for the ICLR2023 paper "Multimodal Analogical Reasoning over Knowledge Graphs."

Code for TMI 2020 "Hi-Net: Hybrid-fusion Network for Multi-modal MR Image Synthesis"

Multi_token ⭐ 54

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Kosmos X ⭐ 53

The Next Generation Multi-Modality Superintelligence

The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)

Mix Generation ⭐ 51

MixGen: A New Multi-Modal Data Augmentation

Graph_distillation ⭐ 51

Graph Distillation for Action Detection

Embracenet ⭐ 50

Robust multimodal integration method implemented in PyTorch and TensorFlow

Openbg Img ⭐ 49

Baselines for CCKS 2022 Task "Link Prediction for Multimodal Product Knowledge Graph"

Distill Bev ⭐ 46

DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023)

Cvt2distilgpt2 ⭐ 46

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

Keras Llm Robot ⭐ 46

A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.

MOVE (Multi-Omics Variational autoEncoder) for integrating multi-omics data and identifying cross modal associations

Letmedoit ⭐ 42

An advanced AI assistant that leverages the capabilities of ChatGPT API, Gemini Pro, and AutoGen, enabling it both to engage in conversations and to execute computing tasks on local devices.

Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"

Sloper4d ⭐ 39

SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments (CVPR2023)

Keymorph ⭐ 38

Robust multimodal brain registration via keypoints

Diffblender ⭐ 38

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models

Sparsesync ⭐ 38

Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)

Brats 2020 ⭐ 38

A complete pipeline for BraTS 2020

Videodb Python ⭐ 37

VideoDB Python SDK

Iperceive ⭐ 36

Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Python3 | PyTorch | CNNs | Causality | Reasoning | LSTMs | Transformers | Multi-Head Self Attention | Published in IEEE Winter Conference on Applications of Computer Vision (WACV) 2021

Flipped Vqa ⭐ 35

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

Gemini Multimodal Chat ⭐ 35

Multimodal Chat with Gemini API

Kosmos2.5 ⭐ 34

My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"

InFi is a library for building input filters for resource-efficient inference.

Pytorch implementation for "Bootstrap Latent Representations for Multi-modal Recommendation"-WWW'23

[LREC] MMChat: Multi-Modal Chat Dataset on Social Media

Deep Multimodal Subspace Clustering Networks ⭐ 33

Tensorflow implementation of "Deep Multimodal Subspace Clustering Networks"

VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)

Jailbreaking Large Vision-language Models via Typographic Visual Prompts

Neural Machine Translation with universal Visual Representation (ICLR 2020)

Multi View Ae ⭐ 32

Multi-view-AE: An extensive collection of multi-modal autoencoders implemented in a modular, scikit-learn style framework.

GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data.

Mambatransformer ⭐ 31

Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling

Ner Multimodal Pytorch ⭐ 30

Pytorch Implementation of "Adaptive Co-attention Network for Named Entity Recognition in Tweets" (AAAI 2018)

The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)

[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval

A large-scale benchmark dataset for color-event based visual tracking

Citrus Farm Dataset ⭐ 26

Multimodal Dataset for Localization, Mapping and Crop Monitoring in Citrus Tree Farms, ISVC 2023

Inverse Dall E For Optical Character Recognition ⭐ 24

Inverse DALL-E for Optical Character Recognition

Meaformer ⭐ 24

[Paper][ACM MM 2023] MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid

State-of-the-art, multi-modal virtual assistant framework powered by LLaMA. Ame is not complete and is under active development.

Botality Ii ⭐ 23

telegram bot for stable diffusion, text-to-speech and large language models, such as llama and alpaca

Modality Transferable Mer ⭐ 23

Modality-Transferable-MER, multimodal emotion recognition model with zero-shot and few-shot abilities.

Trar Vqa ⭐ 23

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

Protein Localization Transformer ⭐ 22

Code for CELL-E: Biological Zero-Shot Text-to-Image Synthesis for Protein Localization Prediction

Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning

In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks

Reform Eval ⭐ 19

An benchmark for evaluating the capabilities of large vision-language models (LVLMs)

Visualwebarena ⭐ 19

VisualWebArena is a benchmark for multimodal agents.

Multimodal Graph Learning: how to encode multiple multimodal neighbors with their relations into LLMs

Mm23 Missrec ⭐ 18

The code for the paper "MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation" (ACM MM 2023).

Continuemkgc ⭐ 17

Code for the paper "Continual Multimodal Knowledge Graph Construction"

Usearch Images ⭐ 17

Semantic Search demo featuring UForm, USearch, UCall, and StreamLit, to visual and retrieve from image datasets, similar to "CLIP Retrieval"

Chinese Layoutlm V2 ⭐ 17

中文文档理解多模态语言模型，支持多模态文档信息抽取，文档embedding

Hashtag Prediction Pytorch ⭐ 17

Multimodal Hashtag Prediction with instagram data & pytorch (2nd Place on OpenResource Hackathon 2019)

Official implementation of the paper "FLIP: Cross-domain Face Anti-spoofing with Language Guidance". (ICCV 2023)

Omnifusion ⭐ 16

OmniFusion — a multimodal model to communicate using text and images

Multi-scale Transformer Network for Cross-Modality MR Image Synthesis (IEEE TMI)

Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"

[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)

Videonavqa ⭐ 14

An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)

Described ⭐ 14

Automatically describe images sent by users on popular media platforms, incredibly useful for the visually impaired and for complicated imagery.

Code for the paper: Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation.

Lm Research Hub ⭐ 14

Language Modeling Research Hub, a comprehensive compendium for enthusiasts and scholars delving into the fascinating realm of language models (LMs), with a particular focus on large language models (LLMs)

Real Gemini ⭐ 14

Real-time video understanding and interaction through text,audio,image and video with large multi-modal model. 利用多模态大模型的实时视频理解和交互框架，通过文本、语音、图像和视频和这是世界进行问答和交流。

Ocrautoscore ⭐ 14

OCR自动化阅卷项目

Uniteandconquer ⭐ 14

[CVPR '23] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection

Multi-modal Thermal Object Detector

Clip Openness ⭐ 13

Code for "Delving into the Openness of CLIP"

Top Reid ⭐ 13

【AAAI2024】TOP-ReID: Multi-spectral Object Re-Identification with Token Permutation

This repo contains the original implementation of VAuLT, the Vision-and-Augmented-Language Transformer. We provide instructions to download some multimodal social-media datasets, and scripts to experiment with. VAuLT is a stack of Transformers, a LM like BERT that preprocesses the text input of ViLT

Multiaug ⭐ 12

Multi-modal data augmentation for machine learning

LipNet with gluon

Related Searches

Python Dataset (14,792)

Python Docker (14,113)

Python Machine Learning (14,099)

Python Tensorflow (13,736)

Python Deep Learning (13,092)

Python Natural Language Processing (9,064)

Python Artificial Intelligence (8,580)

Python Pytorch (7,877)

Python Neural (7,444)

Python Paper (6,586)

101-172 of 172 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.