Awesome Open Source

Programming Languages

Search results for artificial intelligence multimodal

artificial-intelligence x

28 search results found

NeMo: a toolkit for conversational AI

Dalle Pytorch ⭐ 5,477

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Ai Notes ⭐ 4,180

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

Tree Of Thoughts ⭐ 3,798

Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%

Awesome Aigc Tutorials ⭐ 2,879

Curated tutorials and resources for Large Language Models, AI Painting, and more.

Clip Retrieval ⭐ 1,949

Easily compute clip embeddings and build a clip retrieval system with them

Alan Sdk Flutter ⭐ 1,742

Conversational AI SDK for Flutter to build AI-powered voice assistants for Flutter applications (iOS and Android)

Alan Sdk Android ⭐ 1,732

Conversational AI SDK for Android to build AI-powered voice assistants for Android applications (Java, Kotlin)

Gptdiscord ⭐ 1,720

A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

Alan Sdk Ionic ⭐ 1,515

In-App assistant SDK to build a multimodal conversational UX for applications created with Ionic (React, Angular, Vue)

Metatransformer ⭐ 1,325

Meta-Transformer for Unified Multimodal Learning

Alan Sdk Cordova ⭐ 1,070

In-App assistant SDK to build a multimodal conversational UX for Apache Cordova applications

Coca Pytorch ⭐ 900

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

Modelfusion ⭐ 712

The TypeScript library for building AI applications.

Nsmusics ⭐ 601

NSMusicS（Nine Songs · Music World：九歌 · 音乐世界），Multi platform Multi mode Super Music Software (Full stack development, audio processing, artificial intelligence, natural language processing)

Farmvibes Ai ⭐ 580

FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability

Alan Sdk Reactnative ⭐ 560

In-App assistant SDK to build a multimodal conversational UX for applications created with React Native (iOS, Android)

Platform for Situated Intelligence

Advancedliteratemachinery ⭐ 464

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.

Alan Sdk Pcf ⭐ 426

Build a voice assistant for any application created with Microsoft Power Apps

Llavavision ⭐ 409

A simple "Be My Eyes" web app with a llama.cpp/llava backend

Build, Deploy, and Scale Reliable Swarms of Autonomous Agents for Workflow Automation. Join our Community: https://discord.gg/DbjBMJTSWD

Agentchain ⭐ 355

Chain together LLMs for reasoning & orchestrate multiple large models for accomplishing complex tasks

Dalle Mtf ⭐ 296

Open-AI's DALL-E for large scale training in mesh-tensorflow.

Clip Guided Diffusion ⭐ 267

A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

Vectordb Recipes ⭐ 267

High quality resources & applications for LLMs, multi-modal models and VectorDBs

Democratization of RT-2 "RT-2: New model translates vision and language into action"

Implementation of "PaLM-E: An Embodied Multimodal Language Model"

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Build high-performance AI models with modular building blocks

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

Video2music ⭐ 94

Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

Andromeda ⭐ 92

An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast

Mammut Pytorch ⭐ 85

Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch

Zorro Pytorch ⭐ 83

Implementation of Zorro, Masked Multimodal Transformer, in Pytorch

Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"

Swarms Pytorch ⭐ 67

Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊

Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"

Letmedoit ⭐ 42

An advanced AI assistant that leverages the capabilities of ChatGPT API, Gemini Pro, and AutoGen, enabling it both to engage in conversations and to execute computing tasks on local devices.

Tokencompose ⭐ 41

(arXiv) 🧩 TokenCompose: Grounding Diffusion with Token-level Supervision

Videodb Python ⭐ 37

VideoDB Python SDK

Llava Docker ⭐ 32

Docker image for LLaVA: Large Language and Vision Assistant

Mambatransformer ⭐ 31

Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling

State-of-the-art, multi-modal virtual assistant framework powered by LLaMA. Ame is not complete and is under active development.

Botality Ii ⭐ 23

telegram bot for stable diffusion, text-to-speech and large language models, such as llama and alpaca

Protein Localization Transformer ⭐ 22

Code for CELL-E: Biological Zero-Shot Text-to-Image Synthesis for Protein Localization Prediction

Usearch Images ⭐ 17

Semantic Search demo featuring UForm, USearch, UCall, and StreamLit, to visual and retrieve from image datasets, similar to "CLIP Retrieval"

Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"

Whereisai ⭐ 10

AI company, product, and tool collection.

PegasusX: The Future of Multimodal Embeddings 🦄 🦄

The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention"

Dailypaperclub ⭐ 8

The repository for the exclusive Daily Paper Club hosted at Agora every 10pm NYC time at this discord: https://discord.gg/Gnzh6dnzyz

Simple Implementation of TinyGPTV in super simple Zeta lego blocks

Multi-Channel Variational Auto Encoder: A Bayesian Deep Learning Framework for Modeling High-Dimensional Heterogeneous Data.

Multimodal Tot ⭐ 6

Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement

Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in pytorch and zeta

Elysium Knowledge Repository is an open source initiative to embed all of Humanity's multi-modal knowledge and wisdom.

A simple implementation of a CLIP that splits up an image into quandrants and then gets the embeddings for each quandrant

Does Clip Know My Face ⭐ 5

Source Code for the Paper "Does CLIP Know my Face?" (Demo: https://huggingface.co/spaces/AIML-TUDA/does-clip-

Mlxtransformer ⭐ 5

Simple Implementation of a Transformer in the new framework MLX by Apple

Metatron2 ⭐ 5

A Multimodal Discord bot with machine learning functions, including LLM chat, Image generation, and Speech Generation capabilities

Run custom multi-modal AI models fully on-device

Related Searches

Machine Learning Artificial Intelligence (5,478)

Python Artificial Intelligence (4,497)

Deep Learning Artificial Intelligence (2,805)

Jupyter Notebook Artificial Intelligence (2,652)

Javascript Artificial Intelligence (2,198)

Artificial Intelligence Neural Network (1,732)

Java Artificial Intelligence (1,340)

C Plus Plus Artificial Intelligence (1,284)

Artificial Intelligence Chatgpt (1,141)

Artificial Intelligence Natural Language Processing (1,008)

1-28 of 28 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.