Clustering¶

The coclust.clustering module provides clustering algorithms.

class coclust.clustering.SphericalKmeans(n_clusters=2, init=None, max_iter=20, n_init=1, tol=1e-09, random_state=None, weighting=True)[source]¶

Spherical k-means clustering.

Parameters:

n_clusters (int, optional, default: 2) – Number of clusters to form
init (numpy array or scipy sparse matrix, shape (n_features, n_clusters), optional, default: None) – Initial column labels
max_iter (int, optional, default: 20) – Maximum number of iterations
n_init (int, optional, default: 1) – Number of time the algorithm will be run with different initializations. The final results will be the best output of n_init consecutive runs.
random_state (integer or numpy.RandomState, optional) – The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.
tol (float, default: 1e-9) – Relative tolerance with regards to criterion to declare convergence
weighting (boolean, default: True) – Flag to activate or deactivate TF-IDF weighting

labels_¶: array-like, shape (n_rows,) – cluster label of each row

criterion¶: float – criterion obtained from the best run

criterions¶: list of floats – sequence of criterion values during the best run

fit(X, y=None)[source]¶

Perform clustering.

Parameters:	X (numpy array or scipy sparse matrix, shape=(n_samples, n_features)) – Matrix to be analyzed

Spherical k-means¶

coclust.clustering.spherical_kmeans provides an implementation of the spherical k-means algorithm.

class coclust.clustering.spherical_kmeans.SphericalKmeans(n_clusters=2, init=None, max_iter=20, n_init=1, tol=1e-09, random_state=None, weighting=True)[source]¶

Spherical k-means clustering.

Parameters:

n_clusters (int, optional, default: 2) – Number of clusters to form
init (numpy array or scipy sparse matrix, shape (n_features, n_clusters), optional, default: None) – Initial column labels
max_iter (int, optional, default: 20) – Maximum number of iterations
n_init (int, optional, default: 1) – Number of time the algorithm will be run with different initializations. The final results will be the best output of n_init consecutive runs.
random_state (integer or numpy.RandomState, optional) – The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.
tol (float, default: 1e-9) – Relative tolerance with regards to criterion to declare convergence
weighting (boolean, default: True) – Flag to activate or deactivate TF-IDF weighting

labels_¶: array-like, shape (n_rows,) – cluster label of each row

criterion¶: float – criterion obtained from the best run

criterions¶: list of floats – sequence of criterion values during the best run

fit(X, y=None)[source]¶

Perform clustering.

Parameters:	X (numpy array or scipy sparse matrix, shape=(n_samples, n_features)) – Matrix to be analyzed