Fast k-medoids clustering in Python

This package is a wrapper around the fast Rust k-medoids package, implementing the FasterPAM and FastPAM algorithms along with the baseline k-means-style and PAM algorithms. Furthermore, the (Medoid) Silhouette can be optimized by the FasterMSC, FastMSC, PAMMEDSIL and PAMSIL algorithms.

All algorithms expect a distance matrix and the number of clusters as input. They hence can be used with arbitrary dissimilarity functions.

If you use this code in scientific work, please cite the papers in the References. Thank you.

Installation

Installation with pip or conda

Pre-built packages for many Linux, Windows, and OSX systems are available in PyPI and conda-forge can be installed with

pip install kmedoids respectively
conda install -c conda-forge kmedoids.

On uncommon architectures, you may need to first install Cargo (i.e., the Rust programming language) first, and a subsequent pip install kmedoids will try to compile the package for your CPU architecture and operating system.

Compilation from source

You need to have Rust and Python 3 installed.

Installation uses maturin, for compiling and installing Rust extensions. Maturin is best used within a Python virtual environment.

# activate your desired virtual environment first
pip install maturin
git clone https://github.com/kno10/python-kmedoids.git
cd python-kmedoids
# build and install the package:
maturin develop --release

Integration test to validate the installation.

python -m unittest discover tests

This procedure uses the latest git version from https://github.com/kno10/rust-kmedoids. If you want to use local modifications to the Rust code, you need to provide the source folder of the Rust module in Cargo.toml by setting the path= option of the kmedoids dependency.

Example

import kmedoids
c = kmedoids.fasterpam(distmatrix, 5)
print("Loss is:", c.loss)

Using the sklearn-compatible API

Note that KMedoids defaults to the “precomputed” metric, expecting a pairwise distance matrix. If you have sklearn installed, you can use metric=”euclidean”.

import kmedoids
km = kmedoids.KMedoids(5, method='fasterpam')
c = km.fit(distmatrix)
print("Loss is:", c.inertia_)

MNIST (10k samples)

import kmedoids
import numpy
from sklearn.datasets import fetch_openml
from sklearn.metrics.pairwise import euclidean_distances
X, _ = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)
X = X[:10000]
diss = euclidean_distances(X)
start = time.time()
fp = kmedoids.fasterpam(diss, 100)
print("FasterPAM took: %.2f ms" % ((time.time() - start)*1000))
print("Loss with FasterPAM:", fp.loss)
start = time.time()
pam = kmedoids.pam(diss, 100)
print("PAM took: %.2f ms" % ((time.time() - start)*1000))
print("Loss with PAM:", pam.loss)

Choosing the optimal number of clusters

This package includes DynMSC, an algorithm that optimizes the Medoid Silhouette, and chooses the “optimal” number of clusters in a range of kmin..kmax. Beware that if you allow a too large kmax, the optimum result will likely have many one-elemental clusters. A too high kmax may mask more desirable results, hence it is recommended that you choose only 2-3 times the number of clusters you expect as maximum.

import kmedoids, numpy
from sklearn.datasets import fetch_openml
from sklearn.metrics.pairwise import euclidean_distances
X, _ = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)
X = X[:10000]
diss = euclidean_distances(X)
kmin = 10
kmax = 20
dm = kmedoids.dynmsc(diss, kmax, kmin)
print("Optimal number of clusters according to the Medoid Silhouette:", dm.bestk)
print("Medoid Silhouette over range of k:", dm.losses)
print("Range of k:", dm.rangek)

Memory Requirements

Because the algorithms require a distance matrix as input, you need O(N²) memory to use these implementations. With single precision, this matrix needs 4·N² bytes, so a typical laptop with 8 GB of RAM could handle data sets of over 40.000 instances, but if your computation of the distance matrix incurs copying the matrix, only 30.000 or less may be feasible.

The majority of run time usually is the distance matrix computation, so it is recommended you only compute it once, then experiment with different algorithm settings. Avoid recomputing it repeatedly.

For larger data sets, it is recommended to only cluster a representative sample of the data. Usually, this will still yield sufficient result quality.

Implemented Algorithms

K-Medoids Clustering:

FasterPAM (Schubert and Rousseeuw, 2020, 2021)
FastPAM1 (Schubert and Rousseeuw, 2019, 2021)
PAM (Kaufman and Rousseeuw, 1987) with BUILD and SWAP
BUILD (Kaufman and Rousseeuw, 1987)
Alternating (k-means-style approach)

Silhouette Clustering:

DynMSC (Lenssen and Schubert, 2023)
FasterMSC (Lenssen and Schubert, 2022)
FastMSC (Lenssen and Schubert, 2022)
PAMMEDSIL (Van der Laan and Pollard, 2003)
PAMSIL (Van der Laan and Pollard, 2003)

Evaluation:

Medoid Silhouette (Van der Laan and Pollard, 2003)
Silhouette (Kaufman and Rousseeuw, 1987)

Note that the k-means style “alternating” algorithm yields rather poor result quality (see Schubert and Rousseeuw 2021 for an example and explanation).

FasterPAM

kmedoids.fasterpam(diss, medoids, max_iter=100, init='random', random_state=None, n_cpu=-1)

FasterPAM k-medoids clustering

This is an accelerated version of PAM clustering, that eagerly performs any swap found, and contains the O(k) improvement to find the best swaps faster.

References:

Erich Schubert, Peter J. Rousseeuw
Fast and Eager k-Medoids Clustering:
O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms
Information Systems (101), 2021, 101804
https://doi.org/10.1016/j.is.2021.101804 (open access)

Erich Schubert, Peter J. Rousseeuw:
Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms
In: 12th International Conference on Similarity Search and Applications (SISAP 2019), 171-187.
https://doi.org/10.1007/978-3-030-32047-8_16
Preprint: https://arxiv.org/abs/1810.05691

Parameters:

diss (ndarray) – square numpy array of dissimilarities
medoids (int or ndarray) – number of clusters to find or existing medoids
max_iter (int) – maximum number of iterations
init (str, "random", "first" or "build") – initialization method
random_state (int, RandomState instance or None) – random seed (also used for shuffling the processing order)
n_cpu (int) – number of threads to use (-1: automatic)

Returns:

k-medoids clustering result

Return type:

KMedoidsResult

FastPAM1

kmedoids.fastpam1(diss, medoids, max_iter=100, init='random', random_state=None)

FastPAM1 k-medoids clustering

This is an accelerated version of PAM clustering, that performs the same swaps as the original PAM (given the same starting conditions), but finds the best swap O(k) times faster.

References:

Erich Schubert, Peter J. Rousseeuw
Fast and Eager k-Medoids Clustering:
O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms
Information Systems (101), 2021, 101804
https://doi.org/10.1016/j.is.2021.101804 (open access)

Erich Schubert, Peter J. Rousseeuw:
Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms
In: 12th International Conference on Similarity Search and Applications (SISAP 2019), 171-187.
https://doi.org/10.1007/978-3-030-32047-8_16
Preprint: https://arxiv.org/abs/1810.05691

Parameters:

diss (ndarray) – square numpy array of dissimilarities
medoids (int or ndarray) – number of clusters to find or existing medoids
max_iter (int) – maximum number of iterations
init (str, "random", "first" or "build") – initialization method
random_state (int, RandomState instance or None) – random seed if no medoids are given

Returns:

k-medoids clustering result

Return type:

KMedoidsResult

PAM

kmedoids.pam(diss, medoids, max_iter=100, init='build', random_state=None)

PAM k-medoids clustering

This is an implementation of the original PAM (Partitioning Around Medoids) clustering algorithm. For improved versions, see the fastpam and fasterpam methods.

References:

Leonard Kaufman, Peter J. Rousseeuw:
Clustering by means of medoids.
In: Dodge Y (ed) Statistical Data Analysis Based on the L 1 Norm and Related Methods, pp 405–416, 1987

Leonard Kaufman, Peter J. Rousseeuw:
Finding Groups in Data: An Introduction to Cluster Analysis.
John Wiley&Sons, 1990, https://doi.org/10.1002/9780470316801

Parameters:

diss (ndarray) – square numpy array of dissimilarities
medoids (int or ndarray) – number of clusters to find or existing medoids
max_iter (int) – maximum number of iterations
init (str, "random", "first" or "build") – initialization method
random_state (int, RandomState instance or None) – random seed if no medoids are given

Returns:

k-medoids clustering result

Return type:

KMedoidsResult

Alternating k-medoids (k-means style)

kmedoids.alternating(diss, medoids, max_iter=100, init='random', random_state=None)

Alternating k-medoids clustering (k-means-style algorithm)

Note: this yields substantially worse results than PAM algorithms on difficult data sets.

Parameters:

diss (ndarray) – square numpy array of dissimilarities
medoids (int or ndarray) – number of clusters to find or existing medoids
max_iter (int) – maximum number of iterations
init (str, "random", "first" or "build") – initialization method
random_state (int, RandomState instance or None) – random seed if no medoids are given

Returns:

k-medoids clustering result

Return type:

KMedoidsResult

PAM BUILD

kmedoids.pam_build(diss, k)

PAM k-medoids clustering – BUILD only

This is an implementation of the original PAM (Partitioning Around Medoids) clustering algorithm. For improved versions, see the fastpam and fasterpam methods.

References:

Leonard Kaufman, Peter J. Rousseeuw:
Clustering by means of medoids.
In: Dodge Y (ed) Statistical Data Analysis Based on the L 1 Norm and Related Methods, 405-416, 1987

Leonard Kaufman, Peter J. Rousseeuw:
Finding Groups in Data: An Introduction to Cluster Analysis.
John Wiley&Sons, 1990, https://doi.org/10.1002/9780470316801

Parameters:

diss (ndarray) – square numpy array of dissimilarities
k (int) – number of clusters to find

Returns:

k-medoids clustering result

Return type:

KMedoidsResult

DynMSC

kmedoids.dynmsc(diss, medoids, minimum_k=2, max_iter=100, init='random', random_state=None)

DynMSC clustering

This is a version of FasterMSC with automatic cluster number selection, that performs FasterMSC for a minimum k to the number of input medoids and returns the clustering with the highest Average Medoid Silhouette.

References:

Lars Lenssen, Erich Schubert:
Medoid silhouette clustering with automatic cluster number selection
Information Systems (120), 2024, 102290
https://doi.org/10.1016/j.is.2023.102290
Preprint: https://arxiv.org/abs/2309.03751

Parameters:

diss (ndarray) – square numpy array of dissimilarities
medoids (int or ndarray) – maximum number of clusters to find or existing medoids with length of maximum number of clusters to find
minimum_k (int) – minimum number of clusters to find
max_iter (int) – maximum number of iterations
init (str, "random", "first" or "build") – initialization method
random_state (int, RandomState instance or None) – random seed if no medoids are given

Returns:

k-medoids clustering with automatic number of clusters

Return type:

DynkResult

FasterMSC

kmedoids.fastermsc(diss, medoids, max_iter=100, init='random', random_state=None)

FasterMSC clustering

This is an accelerated version of PAMMEDSIL clustering, that eagerly performs any swap found, and contains the O(k^2) improvement to find the best swaps faster.

References:

Lars Lenssen, Erich Schubert:
Medoid silhouette clustering with automatic cluster number selection
Information Systems (120), 2024, 102290
https://doi.org/10.1016/j.is.2023.102290
Preprint: https://arxiv.org/abs/2309.03751

Lars Lenssen, Erich Schubert:
Clustering by Direct Optimization of the Medoid Silhouette
In: 15th International Conference on Similarity Search and Applications (SISAP 2022).
https://doi.org/10.1007/978-3-031-17849-8_15

Parameters:

diss (ndarray) – square numpy array of dissimilarities
medoids (int or ndarray) – number of clusters to find or existing medoids
max_iter (int) – maximum number of iterations
init (str, "random", "first" or "build") – initialization method
random_state (int, RandomState instance or None) – random seed if no medoids are given

Returns:

k-medoids clustering result

Return type:

KMedoidsResult

FastMSC

kmedoids.fastmsc(diss, medoids, max_iter=100, init='random', random_state=None)

FastMSC clustering

This is an accelerated version of PAMMEDSIL clustering, that performs the same swaps as the original PAMMEDSIL (given the same starting conditions), but finds the best swap O(k^2) times faster.

References:

Lars Lenssen, Erich Schubert:
Medoid silhouette clustering with automatic cluster number selection
Information Systems (120), 2024, 102290
https://doi.org/10.1016/j.is.2023.102290
Preprint: https://arxiv.org/abs/2309.03751

Lars Lenssen, Erich Schubert:
Clustering by Direct Optimization of the Medoid Silhouette
In: 15th International Conference on Similarity Search and Applications (SISAP 2022).
https://doi.org/10.1007/978-3-031-17849-8_15

Parameters:

diss (ndarray) – square numpy array of dissimilarities
medoids (int or ndarray) – number of clusters to find or existing medoids
max_iter (int) – maximum number of iterations
init (str, "random", "first" or "build") – initialization method
random_state (int, RandomState instance or None) – random seed if no medoids are given

Returns:

k-medoids clustering result

Return type:

KMedoidsResult

PAMMEDSIL

kmedoids.pammedsil(diss, medoids, max_iter=100, init='build', random_state=None)

PAMMEDSIL clustering

This is an implementation of the original PAMMEDSIL clustering algorithm. For improved versions, see the fastmsc and fastermsc methods.

References:

Mark Van der Laan, Katherine Pollard, Jennifer Bryan:
A new partitioning around medoids algorithm.
In: Journal of Statistical Computation and Simulation, pp 575-584, 2003
https://doi.org/10.1080/0094965031000136012

Parameters:

diss (ndarray) – square numpy array of dissimilarities
medoids (int or ndarray) – number of clusters to find or existing medoids
max_iter (int) – maximum number of iterations
init (str, "random", "first" or "build") – initialization method
random_state (int, RandomState instance or None) – random seed if no medoids are given

Returns:

k-medoids clustering result

Return type:

KMedoidsResult

PAMSIL

kmedoids.pamsil(diss, medoids, max_iter=100, init='build', random_state=None)

PAMSIL k-medoids clustering

This is an implementation of the original PAMSIL.

References:

Mark Van der Laan, Katherine Pollard, Jennifer Bryan:
A new partitioning around medoids algorithm.
In: Journal of Statistical Computation and Simulation, pp 575-584, 2003
https://doi.org/10.1080/0094965031000136012

Parameters:

diss (ndarray) – square numpy array of dissimilarities
medoids (int or ndarray) – number of clusters to find or existing medoids
max_iter (int) – maximum number of iterations
init (str, "random", "first" or "build") – initialization method
random_state (int, RandomState instance or None) – random seed if no medoids are given

Returns:

k-medoids clustering result

Return type:

KMedoidsResult

Silhouette

kmedoids.silhouette(diss, labels, samples=False, n_cpu=-1)

Silhouette index for cluster evaluation.

The Silhouette, proposed by Peter Rousseeuw in 1987, is a popular internal evaluation measure for clusterings. Although it is defined on arbitary metrics, it is most appropriate for evaluating “spherical” clusters, as it expects objects to be closer to all members of its own cluster than to members of other clusters.

References:

Peter J. Rousseeuw:
Silhouettes: A graphical aid to the interpretation and validation of cluster analysis
Journal of Computational and Applied Mathematics, Volume 20, 1987
https://doi.org/10.1016/0377-0427(87)90125-7

Parameters:

diss (ndarray) – square numpy array of dissimilarities
labels (ndarray of int) – cluster labels (use 0 to k-1, no negative values allowed)
samples (boolean) – whether to return individual samples or not
n_cpu (int) – number of threads to use (-1: automatic)

Returns:

tuple containing the average silhouette and the individual samples

Return type:

(float, ndarray)

Medoid Silhouette

kmedoids.medoid_silhouette(diss, meds, samples=False)

Medoid Silhouette index for cluster evaluation.

The Medoid Silhouette is an approximation to the Silhouette index, that uses the distance to the cluster medoids instead of the average distance to the other cluster members. If every point is assigned to the nearest medoid, the Medoid Silhouette of a point reduces to 1-a/b where a is the distance to the nearest, and b the distance to the second nearest medoid. If b is 0, the Medoid Silhouette is 1.

This function assumes you already have a distance matrix. It is not necessary to compute a distance matrix to evaluate the medoid silhouette – only the distances between points and medoids are necessary. If you do not have a distance matrix, simply compute the medoid Silhouette directly, by computing (1) the N x k distance matrix to the medoids, (2) finding the two smallest values for each data point, and (3) computing the average of 1-a/b on these (with 0/0 as 0). This can be implemented in a few lines with numpy easily.

Parameters:

diss (ndarray) – square numpy array of dissimilarities
meds (ndarray of int) – medoid indexes (k distinct values in 0 to n-1)
samples (boolean) – whether to return individual samples or not

Returns:

tuple containing the average Medoid Silhouette and the individual samples

Return type:

(float, ndarray)

k-Medoids result object

class kmedoids.KMedoidsResult(loss, labels, medoids, n_iter=None, n_swap=None)

K-medoids clustering result

Parameters:

loss (float) – Loss of this clustering (sum of deviations)
labels (ndarray) – Cluster assignment
medoids (ndarray) – Chosen medoid indexes
n_iter (int) – Number of iterations
n_swap (int) – Number of swaps performed

sklearn-compatible API

class kmedoids.KMedoids(n_clusters, *, metric='precomputed', metric_params=None, method='fasterpam', init='random', max_iter=300, random_state=None)

K-Medoids Clustering using PAM, FasterPAM, and FasterMSC (sklearn-compatible API).

References:

Erich Schubert and Lars Lenssen:
Fast k-medoids Clustering in Rust and Python
Journal of Open Source Software 7(75), 4183
https://doi.org/10.21105/joss.04183 (open access)

Erich Schubert, Peter J. Rousseeuw:
Fast and Eager k-Medoids Clustering
O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms
Information Systems (101), 2021, 101804
https://doi.org/10.1016/j.is.2021.101804 (open access)

Erich Schubert, Peter J. Rousseeuw:
Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms
In: 12th International Conference on Similarity Search and Applications (SISAP 2019), 171-187.
https://doi.org/10.1007/978-3-030-32047-8_16
Preprint: https://arxiv.org/abs/1810.05691

Lars Lenssen, Erich Schubert:
Medoid silhouette clustering with automatic cluster number selection
Information Systems (120), 2024, 102290
https://doi.org/10.1016/j.is.2023.102290
Preprint: https://arxiv.org/abs/2309.03751

Lars Lenssen, Erich Schubert:
Clustering by Direct Optimization of the Medoid Silhouette
In: 15th International Conference on Similarity Search and Applications (SISAP 2022).
https://doi.org/10.1007/978-3-031-17849-8_15

Leonard Kaufman, Peter J. Rousseeuw:
Clustering by means of medoids.
In: Dodge Y (ed) Statistical Data Analysis Based on the L 1 Norm and Related Methods, 405-416, 1987

Leonard Kaufman, Peter J. Rousseeuw:
Finding Groups in Data: An Introduction to Cluster Analysis.
John Wiley&Sons, 1990, https://doi.org/10.1002/9780470316801

Mark Van der Laan, Katherine Pollard, Jennifer Bryan:
A new partitioning around medoids algorithm.
In: Journal of Statistical Computation and Simulation, pp 575-584, 2003
https://doi.org/10.1080/0094965031000136012

Parameters:

n_clusters (int) – The number of clusters to form (maximum number of clusters if method=”dynmsc”)
metric (string, default: 'precomputed') – It is recommended to use ‘precomputed’, in particular when experimenting with different n_clusters. If you have sklearn installed, you may pass any metric supported by sklearn.metrics.pairwise_distances.
metric_params (dict, default=None) – Additional keyword arguments for the metric function.
method (string, "fasterpam" (default), "fastpam1", "pam", "alternate", "fastermsc", "fastmsc", "pamsil" or "pammedsil") – Which algorithm to use
init (string, "random" (default), "first" or "build") – initialization method
max_iter (int) – Specify the maximum number of iterations when fitting
random_state (int, RandomState instance or None) – random seed if no medoids are given

Variables:

cluster_centers – None for ‘precomputed’
medoid_indices – The indices of the medoid rows in X
labels – Labels of each point
inertia – Sum of distances of samples to their closest cluster center

fit(X, y=None)

Fit K-Medoids to the provided data.

Parameters:

X ({array-like, sparse matrix}, shape = (n_samples, n_samples)) – Dataset to cluster
y – ignored

Returns:

self

fit_predict(X, y=None)

Predict the closest cluster for each sample in X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input data
y (Ignored) – Not used, present for API consistency by convention

Returns:

Cluster labels

Return type:

ndarray of shape (n_samples,)

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

Xarray-like of shape (n_samples, n_features): Input samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None: Target values (None for unsupervised transformations).
**fit_paramsdict: Additional fit parameters.

Returns

X_newndarray array of shape (n_samples, n_features_new): Transformed array.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict: Parameter names mapped to their values.

predict(X)

Predict the closest cluster for each sample in X.

Parameters:: X ({array-like, sparse matrix}, shape = (n_samples, n_samples)) – New data to predict
Returns:: Index of the cluster each sample belongs to
Return type:: array, shape = (n_query,)

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters

transform{“default”, “pandas”}, default=None

Configure output of transform and fit_transform.

“default”: Default output format of a transformer
“pandas”: DataFrame output
None: Transform configuration is unchanged

Returns

selfestimator instance: Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**paramsdict: Estimator parameters.

Returns

selfestimator instance: Estimator instance.

transform(X)

Transforms X to cluster-distance space.

Parameters:: X ({array-like}, shape (n_query, n_features), or (n_query, n_indexed) if metric == 'precomputed') – Data to transform
Returns:: X transformed in the new space of distances to cluster centers
Return type:: {array-like}, shape=(n_query, n_clusters)

References

This software has been published in the Journal of Open-Source Software:

Erich Schubert and Lars Lenssen:

Fast k-medoids Clustering in Rust and Python

Journal of Open Source Software 7(75), 4183

https://doi.org/10.21105/joss.04183 (open access)

For further details on the implemented algorithm FasterPAM, see:

Erich Schubert, Peter J. Rousseeuw

Fast and Eager k-Medoids Clustering:

O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms

Information Systems (101), 2021, 101804

https://doi.org/10.1016/j.is.2021.101804 (open access)

an earlier (slower, and now obsolete) version was published as:

Erich Schubert, Peter J. Rousseeuw:

Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

In: 12th International Conference on Similarity Search and Applications (SISAP 2019), 171-187.

https://doi.org/10.1007/978-3-030-32047-8_16

Preprint: https://arxiv.org/abs/1810.05691

For further details on medoid Silhouette clustering with automatic cluster number selection (FasterMSC, DynMSC), see:

Lars Lenssen, Erich Schubert:

Medoid silhouette clustering with automatic cluster number selection

Information Systems (120), 2024, 102290

https://doi.org/10.1016/j.is.2023.102290

Preprint: https://arxiv.org/abs/2309.03751

an earlier version was published as:

Lars Lenssen, Erich Schubert:

Clustering by Direct Optimization of the Medoid Silhouette

In: 15th International Conference on Similarity Search and Applications (SISAP 2022).

https://doi.org/10.1007/978-3-031-17849-8_15

This is a port of the original Java code from ELKI to Rust. The Rust version is then wrapped for use with Python.

If you use this code in scientific work, please cite above papers. Thank you.

License: GPL-3 or later

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.