Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.
Alternatives To Cleora
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
2 days ago139apache-2.0TypeScript
🧠 Your Second Brain supercharged by Generative AI 🧠 Dump all your files and chat with your personal assistant on your files & more using GPT 3.5/4, Private, Anthropic, VertexAI, LLMs...
a day ago171apache-2.0Python
Private Q&A and summarization of documents+images or chat with local GPT, 100% private, Apache 2.0. Supports LLaMa2, llama.cpp, and more. Demo:
10 days ago11mitPython
A community-driven way to read and chat with AI bots - powered by chatGPT.
Awesome Knowledge Graph2,122
3 years ago4
Generative Ai2,041
2 days ago21apache-2.0Jupyter Notebook
Sample code and notebooks for Generative AI on Google Cloud
25 days ago38mitPython
A robust, all-in-one GPT3 interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!
Llama Node759
2 months ago38apache-2.0Rust
Believe in AI democratization. llama for nodejs backed by llama-rs, llama.cpp and rwkv.cpp, work locally on your laptop CPU. support llama/alpaca/gpt4all/vicuna/rwkv model.
3 years ago7Python
:memo: This repository recorded my NLP journey.
Nlp Notebooks710
a year ago6Jupyter Notebook
A collection of notebooks for Natural Language Processing from NLP Town
18 hours ago28mitPython
local-first semantic code search engine
Alternatives To Cleora
Select To Compare

Alternative Project Comparisons

Cleora logo


1️⃣st place at SIGIR eCom Challenge 2020

2️⃣nd place and Best Paper Award at WSDM Challenge 2021

2️⃣nd place at Twitter Recsys Challenge 2021

3️⃣rd place at KDD Cup 2021


Cleora is a genus of moths in the family Geometridae. Their scientific name derives from the Ancient Greek geo γῆ or γαῖα "the earth", and metron μέτρον "measure" in reference to the way their larvae, or "inchworms", appear to "measure the earth" as they move along in a looping fashion.

Cleora is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

Read the whitepaper "Cleora: A Simple, Strong and Scalable Graph Embedding Scheme"

Cleora embeds entities in n-dimensional spherical spaces utilizing extremely fast stable, iterative random projections, which allows for unparalleled performance and scalability.

Types of data which can be embedded include for example:

  • heterogeneous undirected graphs
  • heterogeneous undirected hypergraphs
  • text and other categorical array data
  • any combination of the above

Key competitive advantages of Cleora:

  • more than 197x faster than DeepWalk
  • ~4x-8x faster than PyTorch-BigGraph (depends on use case)
  • star expansion, clique expansion, and no expansion support for hypergraphs
  • quality of results outperforming or competitive with other embedding frameworks like PyTorch-BigGraph, GOSH, DeepWalk, LINE
  • can embed extremely large graphs & hypergraphs on a single machine

Embedding times - example:

Algorithm FB dataset RoadNet dataset LiveJournal dataset
Cleora 00:00:43 h 00:21:59 h 01:31:42 h
PyTorch-BigGraph 00:04.33 h 00:31:11 h 07:10:00 h

Link Prediction results - example:

FB dataset RoadNet dataset LiveJournal dataset
Algorithm MRR HitRate@10 MRR HitRate@10 MRR HitRate@10
Cleora 0.072 0.172 0.929 0.942 0.586 0.627
PyTorch-BigGraph 0.035 0.072 0.850 0.866 0.565 0.672

Cleora design principles

Cleora is built as a multi-purpose "just embed it" tool, suitable for many different data types and formats.

Cleora ingests a relational table of rows representing a typed and undirected heterogeneous hypergraph, which can contain multiple:

  • typed categorical columns
  • typed categorical array columns

For example a relational table representing shopping baskets may have the following columns:

user <\t> product <\t> store

With the input file containing values:

user_id <\t> product_id product_id product_id <\t> store_id

Every column has a type, which is used to determine whether spaces of identifiers between different columns are shared or distinct. It is possible for two columns to share a type, which is the case for homogeneous graphs:

user <\t> user

Based on the column format specification, Cleora performs:

  • Star decomposition of hyper-edges
  • Creation of pairwise graphs for all pairs of entity types
  • Embedding of each graph

The final output of Cleora consists of multiple files for each (undirected) pair of entity types in the table.

Those embeddings can then be utilized in a novel way thanks to their dim-wise independence property, which is described further below.

Key technical features of Cleora embeddings

The embeddings produced by Cleora are different from those produced by Node2vec, Word2vec, DeepWalk or other systems in this class by a number of key properties:

  • efficiency - Cleora is two orders of magnitude faster than Node2Vec or DeepWalk
  • inductivity - as Cleora embeddings of an entity are defined only by interactions with other entities, vectors for new entities can be computed on-the-fly
  • updatability - refreshing a Cleora embedding for an entity is a very fast operation allowing for real-time updates without retraining
  • stability - all starting vectors for entities are deterministic, which means that Cleora embeddings on similar datasets will end up being similar. Methods like Word2vec, Node2vec or DeepWalk return different results with every run.
  • cross-dataset compositionality - thanks to stability of Cleora embeddings, embeddings of the same entity on multiple datasets can be combined by averaging, yielding meaningful vectors
  • dim-wise independence - thanks to the process producing Cleora embeddings, every dimension is independent of others. This property allows for efficient and low-parameter method for combining multi-view embeddings with Conv1d layers.
  • extreme parallelism and performance - Cleora is written in Rust utilizing thread-level parallelism for all calculations except input file loading. In practice this means that the embedding process is often faster than loading the input data.

Key usability features of Cleora embeddings

The technical properties described above imply good production-readiness of Cleora, which from the end-user perspective can be summarized as follows:

  • heterogeneous relational tables can be embedded without any artificial data pre-processing
  • mixed interaction + text datasets can be embedded with ease
  • cold start problem for new entities is non-existent
  • real-time updates of the embeddings do not require any separate solutions
  • multi-view embeddings work out of the box
  • temporal, incremental embeddings are stable out of the box, with no need for re-alignment, rotations or other methods
  • extremely large datasets are supported and can be embedded within seconds / minutes


More information can be found in the full documentation.

Cleora Enterprise

Cleora Enterprise is now available for selected customers. Key improvements in addition to this open-source version:

  • performance optimizations: 10x faster embedding times
  • latest research: significantly improved embedding quality
  • new feature: item attributes support
  • new feature: multimodal fusion of multiple graphs, text and image embeddings
  • new feature: compressed embeddings in various formats (spherical, hyperbolic, sparse)

For details contact us at [email protected]


Please cite our paper (and the respective papers of the methods used) if you use this code in your own work:

  author    = {Barbara Rychalska, Piotr Babel, Konrad Goluchowski, Andrzej Michalowski, Jacek Dabrowski},
  title     = {Cleora: {A} Simple, Strong and Scalable Graph Embedding Scheme},
  journal   = {CoRR},
  year      = {2021}


Synerise Cleora is MIT licensed, as found in the LICENSE file.

How to Contribute

You are welcomed to contribute to this open-source toolbox. The detailed instructions will be released soon as issues.

Popular Embeddings Projects
Popular Artificial Intelligence Projects
Popular Machine Learning Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Jupyter Notebook
Machine Learning