A distributed algorithm applied to the bitcoin blockchain that allows to create a new representation of the transaction - a clusterized graph that combines all the addresses belonging to the same owner/organization.
Alternatives To Bitcoinclusteraggregator
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Contracts27343 days ago81November 30, 202259gpl-2.0TypeScript
The Graph Protocol
a year agomitC#
Open-source 3D and VR Blockchain Visualisations
3 years ago1mitJavaScript
Create a graph of the ETH balance over time of an Ethereum address
10 years agootherJava
A tool to generate dummy data (name, address, phone number, email, social graph connections, etc.) for populating systems for testing
7 years ago1Python
An IDA Pro Plugin to extract Call Graphs and Control Flow Graphs from an assembly file written in Python.
Decentraland Subgraph20
a year ago8TypeScript
Decentraland data source for The Graph
a year ago4JavaScript
Visualize mesh graphs as forcegraph and on OpenStreetMap
Ethdenver Dapp10
3 years ago7JavaScript
ETHDenver example dApp built on The Graph
5 years agomitC++
6502 / 65C02 / 65816 symbolic disassembler
Blockchain Tracer9
3 years agoJavaScript
Visualizing and Tracing Bitcoin Transactions
Alternatives To Bitcoinclusteraggregator
Select To Compare

Alternative Project Comparisons

Bitcoin Cluster Aggregator


Giacomo Bossi


The aim of the project is to parse and analyze the bitcoin block-chain in order to retrieve useful information about the transaction between known wallets, using a distributed approach. In this project input and output data are stored inside Hadoop Filesystem (hdfs) and the algorithms computation is managed by the Spark cluster computing system.


Since the bitcoin block-chain is heavily compressed, first we must unwrap all the transaction in a more readable way and after condensing all the addresses belonging to the same entity and build an entity graph.

Bitcoin Transaction

Transaction Composition

In order to retrieve all the data needed to compose each transaction it has been used the Hadoop Crypto-Ledger library (LINK), a multi-blockchain parser, that allow to translate the raw-byte data into an object. But this unwrap of the blockchain produce this result:

To recompose each transaction in a more readable way (Input,Output,Amount), it is possible to join two transaction using as key the transaction ID and the transaction index and reconstruct the flow of coins between two wallets in the following way:

So, given a current transaction and a previous one with the backward link of the current pointing to the previous, the resulting unwrapping it’s the following:


We define an entity as a cluster of addresses belonging to the same owner/organization.

Partial Entity

Since a transaction can involve multiple input and multiple output, it’s possible to state that all the inputs belong to the same entity. This cluster is formed by all the addresses that has been used simultaneously as an input in a transaction.
So, for example an entity is the following:

But this kind of entity results incomplete, since there could be another transaction that involves one of more addresses previously used. It’s clear the concept of partial entity, the inputs address of a single transaction describe a possible subset of all the entity’s addresses.

Complete Entity

A complete entity can be defined as the union of the intersection of the partial entities. This kind of clustering results into a recursive intersection between all the inputs of all the transaction, since it’s required to search all the intersections between the other partial entity, and in positive case we merge the two partial entity and start from the beginning again. But this kind of algorithm it’s just to complex to be computed in a feasible time, so we have developed an algorithm that translate the domain into a graph and find all the connected components inside it.

Complete Entity Algorithm

In order to compute the intersection over all the partial entity, we first transform the intersection over the transactions into a graph and then we have applied an optimized version of the connected component algorithm in order to accumulate all the intersection and provide a solution.

Generating chain subgraph

First, given all the partial entity we create a new tuple with the following values inside: (address,partialEntity) After we accumulate for each address all the entity in which it is present: (address,List[PartialEntity])

Now we got for each address all the intersection over the partial entities. From this list we create a list of edges that links all the partial entities together. In order to save memory and computational power, the shape of all the subgraphs defined by a partial entity is a chain.

So, given a list of edges it’s possible to run the connected components algorithm over the set of subgraphs.

Connected Components

The result of the connected components algorithm is the following:

Initial Graph Connected Component Result Resulting Entity

Every entity is the union of a set of partial entity. So, for each cluster, retrieving the content of each partial entity and reassembling together all the addresses we finally find a complete entity.

Entity Graph

The result of the previous computation over all the nodes of the block-chain combined with the all the transaction creates a new graph, the entity graph, a useful instrument to track the bitcoin exchanges between entities like bitcoin exchanges, e-commerce stores, cryptocurrency lottery and many other. A possible representation of the computation is the following

Inside the graph the (-1) node indicates all the exchanges that points a new (never-used) bitcoin address.

Inside the misc folder it's also possible to find two sample algorithms, a pagerank and a triangle counter. The pagerank algorithm returns the ordered list of entity nodes, in order of importance. The triangle counter nodes return the list of entity nodes ordered by the number of triangle found.

Project made during the Erasmus Programme @ National Technique University of Athens


Build the project using maven

mvn clean install

Put the bitcoin blockchain raw data inside a hdfs:

hadoop fs -mkdir -p /user/bitcoin/input
hadoop fs -put ~./.bitcoin/blocks/blk*.dat /user/bitcoin/input

Run the cluster-based graph builder:

spark-submit --class Builder --master local[4] ./target/BitcoinExplorer-0.0.1.jar hdfs://localhost:9000/user/bitcoin/input hdfs://localhost:9000/user/bitcoin/output/vertices hdfs://localhost:9000/user/bitcoin/output/edges

Run the Pagerank Algorithm

spark-submit --class Pagerank --master local[4] ./target/BitcoinExplorer-0.0.1.jar hdfs://localhost:9000/user/bitcoin/output/vertices hdfs://localhost:9000/user/bitcoin/output/edges hdfs://localhost:9000/user/bitcoin/output/pagerank

Run the Triangle Count Algorithm

spark-submit --class TriangleCount --master local[4] ./target/BitcoinExplorer-0.0.1.jar hdfs://localhost:9000/user/bitcoin/output/vertices hdfs://localhost:9000/user/bitcoin/output/edges hdfs://localhost:9000/user/bitcoin/output/trianglecount
Popular Address Projects
Popular Graph Projects
Popular Text Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.