Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Memgraph | 1,456 | 3 hours ago | 145 | other | C++ | |||||
Open-source graph database, built for real-time streaming data, compatible with Neo4j. | ||||||||||
Yelper_recommendation_system | 108 | 2 years ago | 1 | JavaScript | ||||||
Yelper recommendation system | ||||||||||
Kafka Graphs | 96 | 1 | a month ago | 13 | August 11, 2020 | 4 | apache-2.0 | Java | ||
Graph Analytics with Apache Kafka | ||||||||||
Coast | 59 | 7 years ago | 5 | apache-2.0 | Scala | |||||
Experiments in Streaming | ||||||||||
Kafka Graphql Examples | 19 | 3 years ago | 1 | mit | Clojure | |||||
A platform to test solutions for using Kafka with Graphql | ||||||||||
Kafka Connect Cosmosdb Graph | 12 | 2 months ago | mit | Java | ||||||
Kafka connector for Cosmos DB Gremlin API | ||||||||||
Hesse | 8 | 4 months ago | 1 | Java | ||||||
a temporal graph analytics library based on Flink Stateful Functions | ||||||||||
Topology Grapher | 7 | 3 years ago | 5 | March 20, 2020 | 4 | bsd-3-clause | Clojure | |||
Library to build a directed graph from a Kafka Streams topology | ||||||||||
Kafka Acl Viewer | 6 | 3 years ago | 1 | apache-2.0 | Go | |||||
View your Kafka cluster ACLs as a graph | ||||||||||
Kayak | 6 | a year ago | 19 | JavaScript | ||||||
Kayak is travel search engine which gathers all data from different vendors, process it and display to users. Displayed various analytics graphs like User tracking graph, revenue graph to admin. It is a distributed service oriented application which is implemented through Kafka messaging service to make it more scalable and reliable. Used Redis to cache data. |
Kafka Graphs is a client layer for distributed processing of graphs with Apache Kafka.
Releases of Kafka Graphs are deployed to Maven Central.
<dependency>
<groupId>io.kgraph</groupId>
<artifactId>kafka-graphs-core</artifactId>
<version>1.5.0</version>
</dependency>
A graph in Kafka Graphs is represented by two tables from Kafka Streams, one for vertices and one for edges. The vertex table is comprised of an ID and a vertex value, while the edge table is comprised of a source ID, target ID, and edge value.
KTable<Long, Long> vertices = ...
KTable<Edge<Long>, Long> edges = ...
KGraph<Long, Long, Long> graph = new KGraph<>(
vertices,
edges,
GraphSerialized.with(Serdes.Long(), Serdes.Long(), Serdes.Long())
);
Kafka Graphs provides a number of APIs for transforming graphs in the same manner as Apache Flink Gelly and Apache Spark GraphX.
filterOnEdges
filterOnVertices
subgraph
joinWithEdges
joinWithEdgesOnSource
joinWithEdgesOnTarget
joinWithVertices
mapEdges
mapVertices
groupReduceOnEdges
groupReduceOnNeighbors
reduceOnEdges
reduceOnNeighbors
inDegrees
outDegrees
undirected
For example, the following will compute the sum of the values of all incoming neighbors for each vertex.
graph.reduceOnNeighbors(new SumValues(), EdgeDirection.IN);
Kafka Graphs provides a number of graph algorithms based on the vertex-centric approach of Pregel. The vertex-centric approach allows a computation to "think like a vertex" so that it only need consider how the value of a vertex should change based on messages sent from other vertices. The following algorithms are provided by Kafka Graphs:
For example, here is the implementation of the single-source shortest paths (SSSP) algorithm:
public final class SSSPComputeFunction
implements ComputeFunction<Long, Double, Double, Double> {
public void compute(
int superstep,
VertexWithValue<Long, Double> vertex,
Map<Long, Double> messages,
Iterable<EdgeWithValue<Long, Double>> edges,
Callback<Long, Double, Double> cb) {
double minDistance = (vertex.id().equals(srcVertexId))
? 0d : Double.POSITIVE_INFINITY;
for (Double message : messages.values()) {
minDistance = Math.min(minDistance, message);
}
if (minDistance < vertex.value()) {
cb.setNewVertexValue(minDistance);
for (EdgeWithValue<Long, Double> edge : edges) {
double distance = minDistance + edge.value();
cb.sendMessageTo(edge.target(), distance);
}
}
cb.voteToHalt();
}
}
Custom Pregel-based graph algorithms can also be added by implementing the ComputeFunction
interface.
Since Kafka Graphs is built on top of Kafka Streams, it is able to leverage the underlying partitioning scheme of Kafka Streams in order to support distributed graph processing. To facilitate running graph algorithms in a distributed manner, Kafka Graphs provides a REST application for managing graph algorithm executions.
java -jar kafka-graphs-rest-app-1.3.0.jar \
--kafka.graphs.bootstrapServers=localhost:9092 \
--kafka.graphs.zookeeperConnect=localhost:2181
When multiple instantiations of the REST application are started on different hosts, all configured with the same Kafka and ZooKeeper servers, they will automatically coordinate with each other to partition the set of vertices when executing a graph algorithm. When a REST request is sent to one host, it will automatically proxy the request to the other hosts if necessary.
Assuming that tables for the vertices and edges have already created, the following steps can be used to execute a graph algorithm using the REST APIs.
First, the graph is prepared by grouping the edges by the source ID, and also ensuring that the topics for the vertices and edges have the same number of partitions.
POST /prepare
{
"algorithm": "sssp",
"initialVerticesTopic": "vertices",
"initialEdgesTopic": "edges",
"verticesTopic": "newvertices",
"edgesGroupedBySourceTopic": "newedges"
}
Next, the graph algorithm is configured.
POST /pregel
{
"algorithm": "sssp",
"configs": {
"srcVertexId": 0
},
"verticesTopic": "newvertices",
"edgesGroupedBySourceTopic": "newedges"
}
The above REST request will return an ID for the graph algorithm instantiation. The graph algorithm is executed as follows.
POST /pregel/{id}
{
"numIterations": 30
}
While the graph algorithm is running we can inquire its state.
GET /pregel/{id}
When the state is COMPLETED
we can obtain the result. The result will be streamed (with content type text/event-stream
) from all hosts that processed the graph.
GET /pregel/{id}/result
Finally, we clean up resources.
DELETE /pregel/{id}
As you can see above, a graph algorithm may have specific parameters. Here are the possible algorithms and their associated parameters, if any.
Algorithm | Parameters | Example |
---|---|---|
bfs | srcVertexId | "configs": { "srcVertexId": 0 } |
lcc | ||
lp | ||
mssp | landmarkVertexIds | "configs": { "landmarkVertexIds": "0,1,2" } |
pagerank | tolerance, resetProbability | "configs": { "tolerance": 0.0001, "resetProbability": 0.15 } |
sssp | srcVertexId | "configs": { "srcVertexId": 0 } |
wcc |
Kafka Graphs also provides an experimental single-pass graph streaming analytics framework based on Gelly Streaming. See the Java APIs.