Clustering

The coclust.clustering module provides clustering algorithms.

class coclust.clustering.SphericalKmeans(n_clusters=2, init=None, max_iter=20, n_init=1, tol=1e-09, random_state=None, weighting=True)[source]

Spherical k-means clustering.

Parameters:
  • n_clusters (int, optional, default: 2) – Number of clusters to form
  • init (numpy array or scipy sparse matrix, shape (n_features, n_clusters), optional, default: None) – Initial column labels
  • max_iter (int, optional, default: 20) – Maximum number of iterations
  • n_init (int, optional, default: 1) – Number of time the algorithm will be run with different initializations. The final results will be the best output of n_init consecutive runs.
  • random_state (integer or numpy.RandomState, optional) – The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.
  • tol (float, default: 1e-9) – Relative tolerance with regards to criterion to declare convergence
  • weighting (boolean, default: True) – Flag to activate or deactivate TF-IDF weighting
labels_

array-like, shape (n_rows,) – cluster label of each row

criterion

float – criterion obtained from the best run

criterions

list of floats – sequence of criterion values during the best run

fit(X, y=None)[source]

Perform clustering.

Parameters:X (numpy array or scipy sparse matrix, shape=(n_samples, n_features)) – Matrix to be analyzed

Spherical k-means

coclust.clustering.spherical_kmeans provides an implementation of the spherical k-means algorithm.

class coclust.clustering.spherical_kmeans.SphericalKmeans(n_clusters=2, init=None, max_iter=20, n_init=1, tol=1e-09, random_state=None, weighting=True)[source]

Spherical k-means clustering.

Parameters:
  • n_clusters (int, optional, default: 2) – Number of clusters to form
  • init (numpy array or scipy sparse matrix, shape (n_features, n_clusters), optional, default: None) – Initial column labels
  • max_iter (int, optional, default: 20) – Maximum number of iterations
  • n_init (int, optional, default: 1) – Number of time the algorithm will be run with different initializations. The final results will be the best output of n_init consecutive runs.
  • random_state (integer or numpy.RandomState, optional) – The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.
  • tol (float, default: 1e-9) – Relative tolerance with regards to criterion to declare convergence
  • weighting (boolean, default: True) – Flag to activate or deactivate TF-IDF weighting
labels_

array-like, shape (n_rows,) – cluster label of each row

criterion

float – criterion obtained from the best run

criterions

list of floats – sequence of criterion values during the best run

fit(X, y=None)[source]

Perform clustering.

Parameters:X (numpy array or scipy sparse matrix, shape=(n_samples, n_features)) – Matrix to be analyzed