# summary

pgGraphBLAS is a postgres extension that bridges The GraphBLAS API with the PostgreSQL object relational database.

GraphBLAS is a sparse linear algebra API optimized for processing graphs encoded as sparse matrices and vectors. In addition to common real/integer matrix algebra operations, GraphBLAS supports up to 960 different "semiring" algebra operations, that can be used as basic building blocks to implement a wide variety of graph algorithms.

pgGraphBLAS leverages the expertise in the field of sparse matrix programming by The GraphBLAS Forum and uses the SuiteSparse:GraphBLAS API implementation. SuiteSparse:GraphBLAS is brought to us by the work of Dr. Tim Davis, professor in the Department of Computer Science and Engineering at Texas A&M University. News and information can provide you with a lot more background information, in addition to the references below.

# intro

For a long time, mathematicians have known that matrices are powerful representations of graphs, as described in this mathmatical introduction to GraphBLAS by Dr. Jeremy Kepner head and founder of MIT Lincoln Laboratory Supercomputing Center.

As Kepner's paper describes, there are two useful matrix representations of graphs: Adjacency Matrices and Incidence Matrices. For this introduction we will focus on the adjacency type as they are simpler, but the same ideas apply to both, and it is easy to switch back and forth between them. (Image Credit: Dr. Jeremy Kepner)

On the left is a directed graph, and on the right, the adjacency matrix that represents it. The matrix has a row and column for every vertex. If there is an going from node A to B, then there will be a value present in the intersection of As row with Bs column. For example, vertex 1 connects to 4, so there is a value (dot) at the intersction of the first row and the fourth column. 4 also connects back to 1 so there are two values in the matrix to represent these two edges, the one at the (1, 4) position and the other at the (4,1) position.

One practical problem with matrix-encoding graphs is that most real-world graphs tend to be sparse, as above, only 12 of 49 possible elements have a value. Those that have values tend to be scattered uniformally across the matrix (for "typical" graphs), so dense linear algebra libraries like BLAS or numpy do not encode or operate on them efficiently, as the relevant data is mostly empty memory with actual data elements spaced far apart. This wastes memory and cpu resources, and defeats CPU caching mechanisms.

For example, suppose a fictional social network has 1 billion users, and each user has about 100 friends, which means there are about 100 billion (1e+11) connections in the graph. A dense matrix large enough to hold this graph would need (1 billion)^2 or (1,000,000,000,000,000,000), a "quintillion" elements, but only 1e+11 of them would have meaningful values, leaving only 0.0000001 of the matrix being utilized.

By using a sparse matrix instead of dense, only the elements used are actually stored in the matrix. The parts of the matrix with no value are interpreted as an "algebraic zero" value, which might not be the actual number zero, but other values like positive or negative infinity depending on the particular semiring operations applied to the matrix. The math used with sparse matrices is exactly the same as dense, the sparsity of the data doesn't matter to the math, but it does matter to how efficiently the matrix is implemented internally.

pgGraphBLAS is a postgres extension that provides access to two new types: `matrix` and `vector`, as well as the GraphBLAS api to manipulate these types. Aggregate functions are provided to build matrices from SQL queries, and set-returning functions are also provided to turn graphs back into relational sets. From a PostgreSQL point of view, matrices look a little bit like arrays, being stored as variable length column values.

# matrix multplication

They key operation of GraphBLAS is the matrix multiply as provided by the `mxm` (matrix times matrix), `mxv` (matrix times vector), and `vxm` (vector times matrix) functions. Matrix multplication has a remarkable property of being useful for finding the neighbors of any node in a graph algorithm. By using different combinations of operations (semiring) different graph algorithms can step and accumulate different results, interpreting the data in unique ways, even over the same graphs. (Image Credit: Dr. Jeremy Kepner)

Above is the same graph and matrix from before, shown here as `A`. Next to it we see a vector `v`. When you multiply the transpose of A and v, the result vector contains all of the neighboring nodes for the nodes specified in `v`. In this case, `v` contains a value for node 4, shown in red. By multiplying the matrix by the input vector, the result is the output vector with the 4's two neighboring nodes, 1 and 3. Interating this multiplication process produces the most common graph operation: breadth-first search.

``````create function bfs(A matrix, source bigint) returns vector as \$\$
declare
n bigint := nrows(A);                          -- The number of result rows.
v vector := vector_integer(n);                 -- int32 result vector of vertex levels.
q vector := assign(vector_bool(n), false);     -- bool mask of completed vertices.
level integer := 0;                            -- Start at level 1.
not_done bool := true;                         -- Flag to indicate still work to do.
begin
q := set_element(q, source, true);             -- Set the source element to done.

while not_done and level <= n loop             -- While still work to do.
v := assign(v, level, mask=>q);            -- Assign the current level to all

q := mxv(transpose(A), q, q,               -- Multiply q<mask> = T(A)q,
semiring=>'lor_land_bool',             -- using LOR_LAND_BOOL semiring
mask=>v,                               -- only those *not* masked
dmask=>'scmp',                         -- by complementing the mask
doutp=>'replace');                     -- clearing results in q first

not_done := reduce_bool(q);                -- are there more neighbors?

level := level + 1;                        -- increment the level
end loop;
return v;
end;
\$\$ language plpgsql;
``````

The above code is written in `plpgsql` which is postgres' procedural query language. This language works well with pgGraphBLAS algorithmic approach.

``````postgres=# create table t (m matrix);
CREATE TABLE

postgres=# create table f (i bigint, j bigint);
CREATE TABLE

postgres=# create index on f (i) include (j);
CREATE INDEX

postgres=# insert into t (m) values (matrix_random_bool(10000,10000,1000000));
INSERT 0 1
Time: 1464.482 ms (00:01.464)

postgres=# insert into f select row, col from matrix_elements_bool((select m from t));
INSERT 0 994944
Time: 11110.765 ms (00:11.111)

postgres=# select count(*) from plbfs(100);
count
-------
9999
(1 row)

Time: 4329.555 ms (00:04.330)

postgres=# select print(bfs(m, 100), 1) from t;
print
---------------------------------------------------------------------------
+
GraphBLAS vector: A->V                                                   +
nrows: 10000 ncols: 1 max # entries: 15000                               +
format: standard CSC vlen: 10000 nvec_nonempty: 1 nvec: 1 plen: 1 vdim: 1+
hyper_ratio 0.0625                                                       +
GraphBLAS type:  int32_t size: 4                                         +
number of entries: 9999                                                  +

(1 row)

Time: 364.490 ms
``````

In the above simple benchmark, the GraphBLAS version of BFS can be seen to be almost 12x faster than the plpgsql procedural version.

# references

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
c (14,757
postgres (278
linear-algebra (104
graphs (90
graph-theory (28
postgresql-extension (21
sparse-matrix (16