Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for artificial intelligence multimodal
artificial-intelligence
x
multimodal
x
28 search results found
Nemo
⭐
9,041
NeMo: a toolkit for conversational AI
Dalle Pytorch
⭐
5,477
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Ai Notes
⭐
4,180
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
Tree Of Thoughts
⭐
3,798
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
Awesome Aigc Tutorials
⭐
2,879
Curated tutorials and resources for Large Language Models, AI Painting, and more.
Clip Retrieval
⭐
1,949
Easily compute clip embeddings and build a clip retrieval system with them
Alan Sdk Flutter
⭐
1,742
Conversational AI SDK for Flutter to build AI-powered voice assistants for Flutter applications (iOS and Android)
Alan Sdk Android
⭐
1,732
Conversational AI SDK for Android to build AI-powered voice assistants for Android applications (Java, Kotlin)
Gptdiscord
⭐
1,720
A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!
Alan Sdk Ionic
⭐
1,515
In-App assistant SDK to build a multimodal conversational UX for applications created with Ionic (React, Angular, Vue)
Metatransformer
⭐
1,325
Meta-Transformer for Unified Multimodal Learning
Alan Sdk Cordova
⭐
1,070
In-App assistant SDK to build a multimodal conversational UX for Apache Cordova applications
Coca Pytorch
⭐
900
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
Modelfusion
⭐
712
The TypeScript library for building AI applications.
Nsmusics
⭐
601
NSMusicS(Nine Songs · Music World:九歌 · 音乐世界),Multi platform Multi mode Super Music Software (Full stack development, audio processing, artificial intelligence, natural language processing)
Farmvibes Ai
⭐
580
FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability
Alan Sdk Reactnative
⭐
560
In-App assistant SDK to build a multimodal conversational UX for applications created with React Native (iOS, Android)
Psi
⭐
506
Platform for Situated Intelligence
Advancedliteratemachinery
⭐
464
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.
Alan Sdk Pcf
⭐
426
Build a voice assistant for any application created with Microsoft Power Apps
Llavavision
⭐
409
A simple "Be My Eyes" web app with a llama.cpp/llava backend
Swarms
⭐
376
Build, Deploy, and Scale Reliable Swarms of Autonomous Agents for Workflow Automation. Join our Community: https://discord.gg/DbjBMJTSWD
Agentchain
⭐
355
Chain together LLMs for reasoning & orchestrate multiple large models for accomplishing complex tasks
Dalle Mtf
⭐
296
Open-AI's DALL-E for large scale training in mesh-tensorflow.
Clip Guided Diffusion
⭐
267
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
Vectordb Recipes
⭐
267
High quality resources & applications for LLMs, multi-modal models and VectorDBs
Rt 2
⭐
215
Democratization of RT-2 "RT-2: New model translates vision and language into action"
Palm E
⭐
143
Implementation of "PaLM-E: An Embodied Multimodal Language Model"
Bitnet
⭐
115
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Zeta
⭐
106
Build high-performance AI models with modular building blocks
Pali3
⭐
97
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
Video2music
⭐
94
Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model
Andromeda
⭐
92
An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast
Mammut Pytorch
⭐
85
Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch
Zorro Pytorch
⭐
83
Implementation of Zorro, Masked Multimodal Transformer, in Pytorch
Rt X
⭐
77
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"
Swarms Pytorch
⭐
67
Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊
Pali
⭐
42
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
Letmedoit
⭐
42
An advanced AI assistant that leverages the capabilities of ChatGPT API, Gemini Pro, and AutoGen, enabling it both to engage in conversations and to execute computing tasks on local devices.
Tokencompose
⭐
41
(arXiv) 🧩 TokenCompose: Grounding Diffusion with Token-level Supervision
Videodb Python
⭐
37
VideoDB Python SDK
Llava Docker
⭐
32
Docker image for LLaVA: Large Language and Vision Assistant
Mambatransformer
⭐
31
Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling
Ame
⭐
24
State-of-the-art, multi-modal virtual assistant framework powered by LLaMA. Ame is not complete and is under active development.
Botality Ii
⭐
23
telegram bot for stable diffusion, text-to-speech and large language models, such as llama and alpaca
Protein Localization Transformer
⭐
22
Code for CELL-E: Biological Zero-Shot Text-to-Image Synthesis for Protein Localization Prediction
Usearch Images
⭐
17
Semantic Search demo featuring UForm, USearch, UCall, and StreamLit, to visual and retrieve from image datasets, similar to "CLIP Retrieval"
Autort
⭐
16
Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"
Whereisai
⭐
10
AI company, product, and tool collection.
Pegasus
⭐
9
PegasusX: The Future of Multimodal Embeddings 🦄 🦄
Mmca
⭐
8
The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention"
Dailypaperclub
⭐
8
The repository for the exclusive Daily Paper Club hosted at Agora every 10pm NYC time at this discord: https://discord.gg/Gnzh6dnzyz
Tinygptv
⭐
7
Simple Implementation of TinyGPTV in super simple Zeta lego blocks
Mcvae
⭐
6
Multi-Channel Variational Auto Encoder: A Bayesian Deep Learning Framework for Modeling High-Dimensional Heterogeneous Data.
Multimodal Tot
⭐
6
Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement
Gats
⭐
6
Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in pytorch and zeta
Ekr
⭐
6
Elysium Knowledge Repository is an open source initiative to embed all of Humanity's multi-modal knowledge and wisdom.
Clipq
⭐
5
A simple implementation of a CLIP that splits up an image into quandrants and then gets the embeddings for each quandrant
Does Clip Know My Face
⭐
5
Source Code for the Paper "Does CLIP Know my Face?" (Demo: https://huggingface.co/spaces/AIML-TUDA/does-clip-
Mlxtransformer
⭐
5
Simple Implementation of a Transformer in the new framework MLX by Apple
Metatron2
⭐
5
A Multimodal Discord bot with machine learning functions, including LLM chat, Image generation, and Speech Generation capabilities
Platform
⭐
5
Run custom multi-modal AI models fully on-device
Related Searches
Machine Learning Artificial Intelligence (5,478)
Python Artificial Intelligence (4,497)
Deep Learning Artificial Intelligence (2,805)
Jupyter Notebook Artificial Intelligence (2,652)
Javascript Artificial Intelligence (2,198)
Artificial Intelligence Neural Network (1,732)
Java Artificial Intelligence (1,340)
C Plus Plus Artificial Intelligence (1,284)
Artificial Intelligence Chatgpt (1,141)
Artificial Intelligence Natural Language Processing (1,008)
1-28 of 28 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.