Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python multimodal
multimodal
x
python
x
172 search results found
Jina
⭐
19,573
☁️ Build multimodal AI applications with cloud-native stack
Unilm
⭐
16,971
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Llava
⭐
12,514
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Nemo
⭐
9,041
NeMo: a toolkit for conversational AI
Modelscope
⭐
5,517
ModelScope: bring the notion of Model-as-a-Service to life.
Courses
⭐
4,018
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
Rerun
⭐
3,895
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
Marqo
⭐
3,893
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
Tree Of Thoughts
⭐
3,798
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
Discoart
⭐
3,773
🪩 Create Disco Diffusion artworks in one line
Cogvlm
⭐
3,690
a state-of-the-art-level open visual language model | 多模态预训练模型
Fengshenbang Lm
⭐
3,670
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型
Visualglm 6b
⭐
3,638
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Img2dataset
⭐
2,986
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Interngpt
⭐
2,976
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Chinese Clip
⭐
2,816
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Torchscale
⭐
2,804
Foundation Architecture for (M)LLMs
Deepke
⭐
2,679
An Open Toolkit for Knowledge Graph Extraction and Construction published at EMNLP2022 System Demonstrations.
Next Gpt
⭐
2,602
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Ofa
⭐
2,142
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Video Llava
⭐
1,750
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Gptdiscord
⭐
1,720
A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!
Mplug Owl
⭐
1,657
[Official Implementation] mPLUG-Owl & mPLUG-Owl2: Alibaba MLLM Family.
Metatransformer
⭐
1,325
Meta-Transformer for Unified Multimodal Learning
Autodistill
⭐
1,286
Images to inference with no labeling (use foundation models to train supervised models)
Hcaptcha Challenger
⭐
1,247
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
Lisa
⭐
1,206
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Awesome Multimodal Research
⭐
1,133
A curated list of Multimodal Related Research.
Motiongpt
⭐
1,018
[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs
Data Juicer
⭐
994
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
Multimodal Gpt
⭐
971
Multimodal-GPT
Viscpm
⭐
916
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
Medmnist
⭐
903
[pip install medmnist] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification
Coca Pytorch
⭐
900
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
Internlm Xcomposer
⭐
820
Recsyspapers
⭐
801
推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.
Internvideo
⭐
736
InternVideo: General Video Foundation Models via Generative and Discriminative Learning (https://arxiv.org/abs/2212.03191)
Uform
⭐
729
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Sdt
⭐
724
This repository is the official implementation of Disentangling Writer and Character Styles for Handwriting Generation (CVPR23).
One Peace
⭐
714
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Salmonn
⭐
710
SALMONN: Speech Audio Language Music Open Neural Network
Clip4clip
⭐
663
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Fastrag
⭐
591
Efficient Retrieval Augmentation and Generation Framework
Swift
⭐
578
魔搭大模型训练推理部署工具箱,支持LLaMA、千问、ChatGLM、BaiChuan等多种模型及Lo LLM training/inference/deployment framework of ModelScope community, Support various models like LLaMA, Qwen, ChatGLM, Baichuan and others, and training methods like LoRA, ResTuning, NEFTune, etc.)
Xrayglm
⭐
577
🩺 首个会看胸部X光片的中文多模态医学大模型 | The first Chinese Medical Multimodal Model that Chest Radiographs Summarization.
Omml
⭐
528
Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.
Unicontrol
⭐
497
Unified Controllable Visual Generation Model
Papermage
⭐
494
library supporting NLP and CV research on scientific papers
Break A Scene
⭐
418
Official implementation for "Break-A-Scene: Extracting Multiple Concepts from a Single Image" [SIGGRAPH Asia 2023]
Pykale
⭐
415
Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!
Swarms
⭐
376
Build, Deploy, and Scale Reliable Swarms of Autonomous Agents for Workflow Automation. Join our Community: https://discord.gg/DbjBMJTSWD
Agentchain
⭐
355
Chain together LLMs for reasoning & orchestrate multiple large models for accomplishing complex tasks
Languagebind
⭐
346
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Tsflex
⭐
340
Flexible time series feature extraction & processing
Seed
⭐
326
Empowers LLMs with the ability to see and draw.
Dalle Mtf
⭐
296
Open-AI's DALL-E for large scale training in mesh-tensorflow.
Cm3leon
⭐
288
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
Pointllm
⭐
276
[arXiv 2023] PointLLM: Empowering Large Language Models to Understand Point Clouds
Clip Guided Diffusion
⭐
267
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
Cc2dataset
⭐
264
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...
Oasis
⭐
260
Official implementation of the paper "You Only Need Adversarial Supervision for Semantic Image Synthesis" (ICLR 2021)
Easyinstruct
⭐
253
An Easy-to-use Instruction Processing Framework for LLMs.
Audio2head
⭐
252
code for paper "Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion" in the conference of IJCAI 2021
Awesome Foundation And Multimodal Models
⭐
223
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code]
Rt 2
⭐
215
Democratization of RT-2 "RT-2: New model translates vision and language into action"
Tsit
⭐
211
[ECCV 2020 Spotlight] A Simple and Versatile Framework for Image-to-Image Translation
Kaleido Bert
⭐
207
(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain.
Deepviewagg
⭐
195
[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
Emogen
⭐
189
PyTorch Implementation for Paper "Emotionally Enhanced Talking Face Generation"
Fashion Clip
⭐
189
FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.
Camliflow
⭐
188
[CVPR 2022 Oral & TPAMI 2023] Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion
Mmrec
⭐
184
A Toolbox for MultiModal Recommendation. Integrating 10+ Models...
Llava Interactive Demo
⭐
181
LLaVA-Interactive-Demo
Bliva
⭐
181
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
Multimodalstory Demo
⭐
178
FairyTailor: Multimodal Generative Framework for Storytelling
Paddlemix
⭐
172
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Mmmu
⭐
167
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Lrv Instruction
⭐
160
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Palm E
⭐
143
Implementation of "PaLM-E: An Embodied Multimodal Language Model"
Vlmevalkit
⭐
137
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks
Llavar
⭐
133
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
Taisu
⭐
129
TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)
Visual Chinese Llama Alpaca
⭐
129
多模态中文LLaMA&Alpaca大语言模型(VisualCLA)
Magic
⭐
124
Language Models Can See: Plugging Visual Controls in Text Generation
Unitr
⭐
123
[ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation"
Fusilli
⭐
120
A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸
Visual Med Alpaca
⭐
120
Visual Med-Alpaca is an open-source, multi-modal foundation model designed specifically for the biomedical domain, built on the LLaMa-7B.
Kb Ner
⭐
119
Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.
Vldet
⭐
117
[ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)
Bitnet
⭐
115
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Qd Detr
⭐
113
Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)
Solc
⭐
109
Remote Sensing Sar-Optical Land-use Classfication Pytorch Pytorch高分辨率遥感语义分割/地物分割/地物分类
Mdvc
⭐
106
PyTorch implementation of Multi-modal Dense Video Captioning (CVPR 2020 Workshops)
Zeta
⭐
106
Build high-performance AI models with modular building blocks
Pali3
⭐
97
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
Llark
⭐
97
Code for the paper "LLark: A Multimodal Foundation Model for Music" by Josh Gardner, Simon Durand, Daniel Stoller, and Rachel Bittner.
Vision Language Models Are Bows
⭐
95
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
Rstnet
⭐
95
Official Code for 'RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words' (CVPR 2021)
Video2music
⭐
94
Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model
Andromeda
⭐
92
An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast
Related Searches
Python Dataset (14,792)
Python Docker (14,113)
Python Machine Learning (14,099)
Python Tensorflow (13,736)
Python Deep Learning (13,092)
Python Natural Language Processing (9,064)
Python Artificial Intelligence (8,580)
Python Pytorch (7,877)
Python Neural (7,444)
Python Paper (6,586)
1-100 of 172 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.