clump

Clustering algorithms: K-means and mini-batch K-means, DBSCAN / HDBSCAN / OPTICS, EVoC for hierarchical, DenStream for streaming, and COP-Kmeans for constrained variants.

use clump::Dbscan;

let data = vec![vec![0.0, 0.0], vec![0.1, 0.1], vec![10.0, 10.0]];
let labels = Dbscan::new(0.5, 2).fit_predict(&data).unwrap();

Algorithms

Algorithm	Kind	Discovers k	Noise handling	Input
K-means	Centroid	No (k required)	None	`&impl DataRef`
Mini-Batch K-means	Centroid (streaming)	No (k required)	None	`&impl DataRef`
DBSCAN	Density	Yes	Labels noise (`NOISE` sentinel)	`&impl DataRef`
HDBSCAN	Density (hierarchical)	Yes	Labels noise	`&impl DataRef`
DenStream	Density (streaming)	Yes	Decays outliers	`&impl DataRef`
EVoC	Hierarchical	Yes	Near-duplicate detection	`&impl DataRef`
COP-Kmeans	Constrained centroid	No (k required)	None	`&impl DataRef` + constraints
OPTICS	Density (reachability)	Yes	Reachability plot	`&impl DataRef`
Correlation Clustering	Graph-based	Yes	None	`SignedEdge` list

Quickstart

[dependencies]
clump = "0.5.5"

use clump::{Dbscan, Kmeans};

let data = vec![
    vec![0.0, 0.0],
    vec![0.1, 0.1],
    vec![10.0, 10.0],
    vec![11.0, 11.0],
];

// K-means: returns labels (default: squared Euclidean)
let labels = Kmeans::new(2).with_seed(42).fit_predict(&data).unwrap();
assert_eq!(labels[0], labels[1]);
assert_ne!(labels[0], labels[2]);

// DBSCAN: discovers clusters from density (default: Euclidean)
let labels = Dbscan::new(0.5, 2).fit_predict(&data).unwrap();

Kmeans::fit returns KmeansFit with centroids, which supports predict on new points. Dbscan::fit_predict assigns noise points to clump::NOISE; use fit_predict_with_noise for Option labels.

Zero-copy flat input

All algorithms accept &impl DataRef. Pass Vec<Vec<f32>> or use FlatRef for zero-copy flat buffers:

use clump::{FlatRef, Kmeans};

let flat = vec![0.0f32, 0.0, 0.1, 0.1, 10.0, 10.0, 10.1, 10.1];
let data = FlatRef::new(&flat, 4, 2);
let labels = Kmeans::new(2).with_seed(42).fit_predict(&data).unwrap();

Streaming clustering

use clump::MiniBatchKmeans;

let mut mbk = MiniBatchKmeans::new(3).with_seed(42);
mbk.update_batch(&batch1).unwrap();
mbk.update_batch(&batch2).unwrap();
// Centroids available via mbk.centroids()

Constrained clustering

use clump::{CopKmeans, Constraint};

let constraints = vec![
    Constraint::MustLink(0, 1),
    Constraint::CannotLink(0, 2),
];
let labels = CopKmeans::new(2)
    .with_seed(42)
    .fit_predict_constrained(&data, &constraints)
    .unwrap();

Correlation clustering

use clump::{CorrelationClustering, SignedEdge};

let edges = vec![
    SignedEdge { i: 0, j: 1, weight: 1.0 },   // similar
    SignedEdge { i: 0, j: 2, weight: -1.0 },   // dissimilar
];
let result = CorrelationClustering::new().fit(3, &edges).unwrap();
let labels = result.labels;

Also see edges_from_distances to build signed edges from a distance matrix.

Distance metrics

All algorithms are generic over DistanceMetric. Built-in: SquaredEuclidean, Euclidean, CosineDistance, InnerProductDistance, CompositeDistance. Use with_metric on any algorithm to swap. Custom metrics: implement DistanceMetric (one method: fn distance(&self, a: &[f32], b: &[f32]) -> f32).

Features

Optional features: parallel (Rayon), gpu (Metal k-means, macOS), serde, ndarray (Array2 conversions), simd (NEON/AVX2/AVX-512 distance).

Examples

Example	What it shows
`quickstart`	K-means and DBSCAN on synthetic data
`clustering`	Multiple algorithms on the same dataset, label comparison
`streaming`	Mini-Batch K-means and DenStream on streaming data
`evaluation`	Silhouette score, cluster quality metrics
`flat_input`	Zero-copy `FlatRef` input from raw `&[f32]`

cargo run --example quickstart

Benchmarks

benches/comparison.rs has head-to-head scaffolding against linfa-clustering for k-means and DBSCAN. Comparative numbers across all algorithms are a TODO — the algorithms are implemented and tested, but aggregate results haven't been run and recorded yet.

License

MIT OR Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
.github		.github
benches		benches
clump-python		clump-python
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

clump

Algorithms

Quickstart

Zero-copy flat input

Streaming clustering

Constrained clustering

Correlation clustering

Distance metrics

Features

Examples

Benchmarks

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

clump

Algorithms

Quickstart

Zero-copy flat input

Streaming clustering

Constrained clustering

Correlation clustering

Distance metrics

Features

Examples

Benchmarks

License

About

Topics

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages