Kafka Graphs

Graph Analytics with Apache Kafka
Alternatives To Kafka Graphs
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
3 hours ago145otherC++
Open-source graph database, built for real-time streaming data, compatible with Neo4j.
2 years ago1JavaScript
Yelper recommendation system
Kafka Graphs961a month ago13August 11, 20204apache-2.0Java
Graph Analytics with Apache Kafka
7 years ago5apache-2.0Scala
Experiments in Streaming
Kafka Graphql Examples19
3 years ago1mitClojure
A platform to test solutions for using Kafka with Graphql
Kafka Connect Cosmosdb Graph12
2 months agomitJava
Kafka connector for Cosmos DB Gremlin API
4 months ago1Java
a temporal graph analytics library based on Flink Stateful Functions
Topology Grapher7
3 years ago5March 20, 20204bsd-3-clauseClojure
Library to build a directed graph from a Kafka Streams topology
Kafka Acl Viewer6
3 years ago1apache-2.0Go
View your Kafka cluster ACLs as a graph
a year ago19JavaScript
Kayak is travel search engine which gathers all data from different vendors, process it and display to users. Displayed various analytics graphs like User tracking graph, revenue graph to admin. It is a distributed service oriented application which is implemented through Kafka messaging service to make it more scalable and reliable. Used Redis to cache data.
Alternatives To Kafka Graphs
Select To Compare

Alternative Project Comparisons

Kafka Graphs - Graph Analytics with Kafka

Build Status Maven Javadoc

Kafka Graphs is a client layer for distributed processing of graphs with Apache Kafka.


Releases of Kafka Graphs are deployed to Maven Central.


Creating Graphs

A graph in Kafka Graphs is represented by two tables from Kafka Streams, one for vertices and one for edges. The vertex table is comprised of an ID and a vertex value, while the edge table is comprised of a source ID, target ID, and edge value.

KTable<Long, Long> vertices = ...
KTable<Edge<Long>, Long> edges = ...
KGraph<Long, Long, Long> graph = new KGraph<>(
    GraphSerialized.with(Serdes.Long(), Serdes.Long(), Serdes.Long())

Transforming Graphs

Kafka Graphs provides a number of APIs for transforming graphs in the same manner as Apache Flink Gelly and Apache Spark GraphX.

  • Filtering methods
    • filterOnEdges
    • filterOnVertices
    • subgraph
  • Joining methods
    • joinWithEdges
    • joinWithEdgesOnSource
    • joinWithEdgesOnTarget
    • joinWithVertices
  • Mapping methods
    • mapEdges
    • mapVertices
  • Reducing methods
    • groupReduceOnEdges
    • groupReduceOnNeighbors
    • reduceOnEdges
    • reduceOnNeighbors
  • Transforming methods
    • inDegrees
    • outDegrees
    • undirected

For example, the following will compute the sum of the values of all incoming neighbors for each vertex.

graph.reduceOnNeighbors(new SumValues(), EdgeDirection.IN);

Pregel-Based Graph Algorithms

Kafka Graphs provides a number of graph algorithms based on the vertex-centric approach of Pregel. The vertex-centric approach allows a computation to "think like a vertex" so that it only need consider how the value of a vertex should change based on messages sent from other vertices. The following algorithms are provided by Kafka Graphs:

  1. Breadth-first search (BFS): given a source vertex, determines the minimum number of hops to reach every other vertex.
  2. Label propagation (LP): finds communities in a graph by propagating labels between neighbors.
  3. Local clustering coefficient (LCC): computes the degree of clustering for each vertex as determined by the ratio between the number of triangles a vertex closes with its neighbors to the maximum number of triangles it could close.
  4. Multiple-source shortest paths (MSSP): given a set of source vertices, finds the shortest paths from these vertices to all other vertices.
  5. PageRank (PR): measures the rank or popularity of each vertex by propagating influence between vertices.
  6. Single-source shortest paths (SSSP): given a source vertex, finds the shortest paths to all other vertices.
  7. Weakly connected components (WCC): determines the weakly connected component for each vertex.

For example, here is the implementation of the single-source shortest paths (SSSP) algorithm:

public final class SSSPComputeFunction 
  implements ComputeFunction<Long, Double, Double, Double> {

  public void compute(
    int superstep,
    VertexWithValue<Long, Double> vertex,
    Map<Long, Double> messages,
    Iterable<EdgeWithValue<Long, Double>> edges,
    Callback<Long, Double, Double> cb) {

    double minDistance = (vertex.id().equals(srcVertexId)) 
      ? 0d : Double.POSITIVE_INFINITY;

    for (Double message : messages.values()) {
      minDistance = Math.min(minDistance, message);

    if (minDistance < vertex.value()) {
      for (EdgeWithValue<Long, Double> edge : edges) {
        double distance = minDistance + edge.value();
        cb.sendMessageTo(edge.target(), distance);

Custom Pregel-based graph algorithms can also be added by implementing the ComputeFunction interface.

Distributed Graph Processing

Since Kafka Graphs is built on top of Kafka Streams, it is able to leverage the underlying partitioning scheme of Kafka Streams in order to support distributed graph processing. To facilitate running graph algorithms in a distributed manner, Kafka Graphs provides a REST application for managing graph algorithm executions.

java -jar kafka-graphs-rest-app-1.3.0.jar \
  --kafka.graphs.bootstrapServers=localhost:9092 \

When multiple instantiations of the REST application are started on different hosts, all configured with the same Kafka and ZooKeeper servers, they will automatically coordinate with each other to partition the set of vertices when executing a graph algorithm. When a REST request is sent to one host, it will automatically proxy the request to the other hosts if necessary.

Assuming that tables for the vertices and edges have already created, the following steps can be used to execute a graph algorithm using the REST APIs.

First, the graph is prepared by grouping the edges by the source ID, and also ensuring that the topics for the vertices and edges have the same number of partitions.

POST /prepare
  "algorithm": "sssp",
  "initialVerticesTopic": "vertices", 
  "initialEdgesTopic": "edges", 
  "verticesTopic": "newvertices",
  "edgesGroupedBySourceTopic": "newedges"

Next, the graph algorithm is configured.

POST /pregel
  "algorithm": "sssp",
  "configs": {
    "srcVertexId": 0
  "verticesTopic": "newvertices",
  "edgesGroupedBySourceTopic": "newedges"

The above REST request will return an ID for the graph algorithm instantiation. The graph algorithm is executed as follows.

POST /pregel/{id}
  "numIterations": 30

While the graph algorithm is running we can inquire its state.

GET /pregel/{id}

When the state is COMPLETED we can obtain the result. The result will be streamed (with content type text/event-stream) from all hosts that processed the graph.

GET /pregel/{id}/result

Finally, we clean up resources.

DELETE /pregel/{id}

As you can see above, a graph algorithm may have specific parameters. Here are the possible algorithms and their associated parameters, if any.

Algorithm Parameters Example
bfs srcVertexId "configs": { "srcVertexId": 0 }
mssp landmarkVertexIds "configs": { "landmarkVertexIds": "0,1,2" }
pagerank tolerance, resetProbability "configs": { "tolerance": 0.0001, "resetProbability": 0.15 }
sssp srcVertexId "configs": { "srcVertexId": 0 }

Graph Streaming

Kafka Graphs also provides an experimental single-pass graph streaming analytics framework based on Gelly Streaming. See the Java APIs.

Popular Kafka Projects
Popular Graph Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Graph Algorithms
Apache Kafka
Kafka Streams