Awesome Open Source
Awesome Open Source

|Build Status Linux MacOS| |Build Status Win| |Coverage Status| |PyPi| |Download Counter| |JOSS|

PyClustering

pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each algorithm or model. C++ pyclustering library is a part of pyclustering and supported for Linux, Windows and MacOS operating systems.

Version: 0.11.dev

License: The 3-Clause BSD License

E-Mail: [email protected]

Documentation: https://pyclustering.github.io/docs/0.10.1/html/

Homepage: https://pyclustering.github.io/

PyClustering Wiki: https://github.com/annoviko/pyclustering/wiki

Dependencies

Required packages: scipy, matplotlib, numpy, Pillow

Python version: >=3.6 (32-bit, 64-bit)

C++ version: >= 14 (32-bit, 64-bit)

Performance

Each algorithm is implemented using Python and C/C++ language, if your platform is not supported then Python implementation is used, otherwise C/C++. Implementation can be chosen by ccore flag (by default it is always 'True' and it means that C/C++ is used), for example:

.. code:: python

# As by default - C/C++ part of the library is used
xmeans_instance_1 = xmeans(data_points, start_centers, 20, ccore=True);

# The same - C/C++ part of the library is used by default
xmeans_instance_2 = xmeans(data_points, start_centers, 20);

# Switch off core - Python is used
xmeans_instance_3 = xmeans(data_points, start_centers, 20, ccore=False);

Installation

Installation using pip3 tool:

.. code:: bash

$ pip3 install pyclustering

Manual installation from official repository using Makefile:

.. code:: bash

# get sources of the pyclustering library, for example, from repository
$ mkdir pyclustering
$ cd pyclustering/
$ git clone https://github.com/annoviko/pyclustering.git .

# compile CCORE library (core of the pyclustering library).
$ cd ccore/
$ make ccore_64bit      # build for 64-bit OS

# $ make ccore_32bit    # build for 32-bit OS

# return to parent folder of the pyclustering library
$ cd ../

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Manual installation using CMake:

.. code:: bash

# get sources of the pyclustering library, for example, from repository
$ mkdir pyclustering
$ cd pyclustering/
$ git clone https://github.com/annoviko/pyclustering.git .

# generate build files.
$ mkdir build
$ cmake ..

# build pyclustering-shared target depending on what was generated (Makefile or MSVC solution)
# if Makefile has been generated then
$ make pyclustering-shared

# return to parent folder of the pyclustering library
$ cd ../

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Manual installation using Microsoft Visual Studio solution:

  1. Clone repository from: https://github.com/annoviko/pyclustering.git
  2. Open folder pyclustering/ccore
  3. Open Visual Studio project ccore.sln
  4. Select solution platform: x86 or x64
  5. Build pyclustering-shared project.
  6. Add pyclustering folder to python path or install it using setup.py

.. code:: bash

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Proposals, Questions, Bugs

In case of any questions, proposals or bugs related to the pyclustering please contact to [email protected] or create an issue here.

PyClustering Status

+----------------------+------------------------------+-------------------------------------+---------------------------------+ | Branch | master | 0.10.dev | 0.10.1.rel | +======================+==============================+=====================================+=================================+ | Build (Linux, MacOS) | |Build Status Linux MacOS| | |Build Status Linux MacOS 0.10.dev| | |Build Status Linux 0.10.1.rel| | +----------------------+------------------------------+-------------------------------------+---------------------------------+ | Build (Win) | |Build Status Win| | |Build Status Win 0.10.dev| | |Build Status Win 0.10.1.rel| | +----------------------+------------------------------+-------------------------------------+---------------------------------+ | Code Coverage | |Coverage Status| | |Coverage Status 0.10.dev| | |Coverage Status 0.10.1.rel| | +----------------------+------------------------------+-------------------------------------+---------------------------------+

Cite the Library

If you are using pyclustering library in a scientific paper, please, cite the library:

Novikov, A., 2019. PyClustering: Data Mining Library. Journal of Open Source Software, 4(36), p.1230. Available at: http://dx.doi.org/10.21105/joss.01230.

BibTeX entry:

.. code::

@article{Novikov2019,
    doi         = {10.21105/joss.01230},
    url         = {https://doi.org/10.21105/joss.01230},
    year        = 2019,
    month       = {apr},
    publisher   = {The Open Journal},
    volume      = {4},
    number      = {36},
    pages       = {1230},
    author      = {Andrei Novikov},
    title       = {{PyClustering}: Data Mining Library},
    journal     = {Journal of Open Source Software}
}

Brief Overview of the Library Content

Clustering algorithms and methods (module pyclustering.cluster):

+------------------------+---------+-----+ | Algorithm | Python | C++ | +========================+=========+=====+ | Agglomerative | ✓ | ✓ | +------------------------+---------+-----+ | BANG | ✓ | | +------------------------+---------+-----+ | BIRCH | ✓ | | +------------------------+---------+-----+ | BSAS | ✓ | ✓ | +------------------------+---------+-----+ | CLARANS | ✓ | | +------------------------+---------+-----+ | CLIQUE | ✓ | ✓ | +------------------------+---------+-----+ | CURE | ✓ | ✓ | +------------------------+---------+-----+ | DBSCAN | ✓ | ✓ | +------------------------+---------+-----+ | Elbow | ✓ | ✓ | +------------------------+---------+-----+ | EMA | ✓ | | +------------------------+---------+-----+ | Fuzzy C-Means | ✓ | ✓ | +------------------------+---------+-----+ | GA (Genetic Algorithm) | ✓ | ✓ | +------------------------+---------+-----+ | G-Means | ✓ | ✓ | +------------------------+---------+-----+ | HSyncNet | ✓ | ✓ | +------------------------+---------+-----+ | K-Means | ✓ | ✓ | +------------------------+---------+-----+ | K-Means++ | ✓ | ✓ | +------------------------+---------+-----+ | K-Medians | ✓ | ✓ | +------------------------+---------+-----+ | K-Medoids | ✓ | ✓ | +------------------------+---------+-----+ | MBSAS | ✓ | ✓ | +------------------------+---------+-----+ | OPTICS | ✓ | ✓ | +------------------------+---------+-----+ | ROCK | ✓ | ✓ | +------------------------+---------+-----+ | Silhouette | ✓ | ✓ | +------------------------+---------+-----+ | SOM-SC | ✓ | ✓ | +------------------------+---------+-----+ | SyncNet | ✓ | ✓ | +------------------------+---------+-----+ | Sync-SOM | ✓ | | +------------------------+---------+-----+ | TTSAS | ✓ | ✓ | +------------------------+---------+-----+ | X-Means | ✓ | ✓ | +------------------------+---------+-----+

Oscillatory networks and neural networks (module pyclustering.nnet):

+--------------------------------------------------------------------------------+---------+-----+ | Model | Python | C++ | +================================================================================+=========+=====+ | CNN (Chaotic Neural Network) | ✓ | | +--------------------------------------------------------------------------------+---------+-----+ | fSync (Oscillatory network based on Landau-Stuart equation and Kuramoto model) | ✓ | | +--------------------------------------------------------------------------------+---------+-----+ | HHN (Oscillatory network based on Hodgkin-Huxley model) | ✓ | ✓ | +--------------------------------------------------------------------------------+---------+-----+ | Hysteresis Oscillatory Network | ✓ | | +--------------------------------------------------------------------------------+---------+-----+ | LEGION (Local Excitatory Global Inhibitory Oscillatory Network) | ✓ | ✓ | +--------------------------------------------------------------------------------+---------+-----+ | PCNN (Pulse-Coupled Neural Network) | ✓ | ✓ | +--------------------------------------------------------------------------------+---------+-----+ | SOM (Self-Organized Map) | ✓ | ✓ | +--------------------------------------------------------------------------------+---------+-----+ | Sync (Oscillatory network based on Kuramoto model) | ✓ | ✓ | +--------------------------------------------------------------------------------+---------+-----+ | SyncPR (Oscillatory network for pattern recognition) | ✓ | ✓ | +--------------------------------------------------------------------------------+---------+-----+ | SyncSegm (Oscillatory network for image segmentation) | ✓ | ✓ | +--------------------------------------------------------------------------------+---------+-----+

Graph Coloring Algorithms (module pyclustering.gcolor):

+------------------------+---------+-----+ | Algorithm | Python | C++ | +========================+=========+=====+ | DSatur | ✓ | | +------------------------+---------+-----+ | Hysteresis | ✓ | | +------------------------+---------+-----+ | GColorSync | ✓ | | +------------------------+---------+-----+

Containers (module pyclustering.container):

+------------------------+---------+-----+ | Algorithm | Python | C++ | +========================+=========+=====+ | KD Tree | ✓ | ✓ | +------------------------+---------+-----+ | CF Tree | ✓ | | +------------------------+---------+-----+

Examples in the Library

The library contains examples for each algorithm and oscillatory network model:

Clustering examples: pyclustering/cluster/examples

Graph coloring examples: pyclustering/gcolor/examples

Oscillatory network examples: pyclustering/nnet/examples

.. image:: https://github.com/annoviko/pyclustering/blob/master/docs/img/example_cluster_place.png :alt: Where are examples?

Code Examples

Data clustering by CURE algorithm

.. code:: python

from pyclustering.cluster import cluster_visualizer;
from pyclustering.cluster.cure import cure;
from pyclustering.utils import read_sample;
from pyclustering.samples.definitions import FCPS_SAMPLES;

# Input data in following format [ [0.1, 0.5], [0.3, 0.1], ... ].
input_data = read_sample(FCPS_SAMPLES.SAMPLE_LSUN);

# Allocate three clusters.
cure_instance = cure(input_data, 3);
cure_instance.process();
clusters = cure_instance.get_clusters();

# Visualize allocated clusters.
visualizer = cluster_visualizer();
visualizer.append_clusters(clusters, input_data);
visualizer.show();

Data clustering by K-Means algorithm

.. code:: python

from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample

# Load list of points for cluster analysis.
sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS)

# Prepare initial centers using K-Means++ method.
initial_centers = kmeans_plusplus_initializer(sample, 2).initialize()

# Create instance of K-Means algorithm with prepared centers.
kmeans_instance = kmeans(sample, initial_centers)

# Run cluster analysis and obtain results.
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()
final_centers = kmeans_instance.get_centers()

# Visualize obtained results
kmeans_visualizer.show_clusters(sample, clusters, final_centers)

Data clustering by OPTICS algorithm

.. code:: python

from pyclustering.cluster import cluster_visualizer
from pyclustering.cluster.optics import optics, ordering_analyser, ordering_visualizer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample

# Read sample for clustering from some file
sample = read_sample(FCPS_SAMPLES.SAMPLE_LSUN)

# Run cluster analysis where connectivity radius is bigger than real
radius = 2.0
neighbors = 3
amount_of_clusters = 3
optics_instance = optics(sample, radius, neighbors, amount_of_clusters)

# Performs cluster analysis
optics_instance.process()

# Obtain results of clustering
clusters = optics_instance.get_clusters()
noise = optics_instance.get_noise()
ordering = optics_instance.get_ordering()

# Visualize ordering diagram
analyser = ordering_analyser(ordering)
ordering_visualizer.show_ordering_diagram(analyser, amount_of_clusters)

# Visualize clustering results
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, sample)
visualizer.show()

Simulation of oscillatory network PCNN

.. code:: python

from pyclustering.nnet.pcnn import pcnn_network, pcnn_visualizer

# Create Pulse-Coupled neural network with 10 oscillators.
net = pcnn_network(10)

# Perform simulation during 100 steps using binary external stimulus.
dynamic = net.simulate(50, [1, 1, 1, 0, 0, 0, 0, 1, 1, 1])

# Allocate synchronous ensembles from the output dynamic.
ensembles = dynamic.allocate_sync_ensembles()

# Show output dynamic.
pcnn_visualizer.show_output_dynamic(dynamic, ensembles)

Simulation of chaotic neural network CNN

.. code:: python

from pyclustering.cluster import cluster_visualizer
from pyclustering.samples.definitions import SIMPLE_SAMPLES
from pyclustering.utils import read_sample
from pyclustering.nnet.cnn import cnn_network, cnn_visualizer

# Load stimulus from file.
stimulus = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)

# Create chaotic neural network, amount of neurons should be equal to amount of stimulus.
network_instance = cnn_network(len(stimulus))

# Perform simulation during 100 steps.
steps = 100
output_dynamic = network_instance.simulate(steps, stimulus)

# Display output dynamic of the network.
cnn_visualizer.show_output_dynamic(output_dynamic)

# Display dynamic matrix and observation matrix to show clustering phenomenon.
cnn_visualizer.show_dynamic_matrix(output_dynamic)
cnn_visualizer.show_observation_matrix(output_dynamic)

# Visualize clustering results.
clusters = output_dynamic.allocate_sync_ensembles(10)
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, stimulus)
visualizer.show()

Illustrations

Cluster allocation on FCPS dataset collection by DBSCAN:

.. image:: https://github.com/annoviko/pyclustering/blob/master/docs/img/fcps_cluster_analysis.png :alt: Clustering by DBSCAN

Cluster allocation by OPTICS using cluster-ordering diagram:

.. image:: https://github.com/annoviko/pyclustering/blob/master/docs/img/optics_example_clustering.png :alt: Clustering by OPTICS

Partial synchronization (clustering) in Sync oscillatory network:

.. image:: https://github.com/annoviko/pyclustering/blob/master/docs/img/sync_partial_synchronization.png :alt: Partial synchronization in Sync oscillatory network

Cluster visualization by SOM (Self-Organized Feature Map)

.. image:: https://github.com/annoviko/pyclustering/blob/master/docs/img/target_som_processing.png :alt: Cluster visualization by SOM

.. |Build Status Linux MacOS| image:: https://travis-ci.org/annoviko/pyclustering.svg?branch=master :target: https://travis-ci.org/annoviko/pyclustering .. |Build Status Win| image:: https://ci.appveyor.com/api/projects/status/4uly2exfp49emwn0/branch/master?svg=true :target: https://ci.appveyor.com/project/annoviko/pyclustering/branch/master .. |Coverage Status| image:: https://coveralls.io/repos/github/annoviko/pyclustering/badge.svg?branch=master&ts=1 :target: https://coveralls.io/github/annoviko/pyclustering?branch=master .. |DOI| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.4280556.svg :target: https://doi.org/10.5281/zenodo.4280556 .. |PyPi| image:: https://badge.fury.io/py/pyclustering.svg :target: https://badge.fury.io/py/pyclustering .. |Build Status Linux MacOS 0.10.dev| image:: https://travis-ci.org/annoviko/pyclustering.svg?branch=0.10.dev :target: https://travis-ci.org/annoviko/pyclustering .. |Build Status Win 0.10.dev| image:: https://ci.appveyor.com/api/projects/status/4uly2exfp49emwn0/branch/0.10.dev?svg=true :target: https://ci.appveyor.com/project/annoviko/pyclustering/branch/0.9.dev .. |Coverage Status 0.10.dev| image:: https://coveralls.io/repos/github/annoviko/pyclustering/badge.svg?branch=0.10.dev&ts=1 :target: https://coveralls.io/github/annoviko/pyclustering?branch=0.9.dev .. |Build Status Linux 0.10.1.rel| image:: https://travis-ci.org/annoviko/pyclustering.svg?branch=0.10.1.rel :target: https://travis-ci.org/annoviko/pyclustering .. |Build Status Win 0.10.1.rel| image:: https://ci.appveyor.com/api/projects/status/4uly2exfp49emwn0/branch/0.10.1.rel?svg=true :target: https://ci.appveyor.com/project/annoviko/pyclustering/branch/0.10.1.rel .. |Coverage Status 0.10.1.rel| image:: https://coveralls.io/repos/github/annoviko/pyclustering/badge.svg?branch=0.10.1.rel&ts=1 :target: https://coveralls.io/github/annoviko/pyclustering?branch=0.10.1.rel .. |Download Counter| image:: https://pepy.tech/badge/pyclustering :target: https://pepy.tech/project/pyclustering .. |JOSS| image:: http://joss.theoj.org/papers/10.21105/joss.01230/status.svg :target: https://doi.org/10.21105/joss.01230


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
python (54,384
c-plus-plus (18,727
machine-learning (3,631
python3 (1,642
data-science (900
algorithms (470
neural-networks (435
clustering (178
data-mining (169