Scripts

The input matrix can be a Matlab file or a text file. For the Matlab file, the key corresponding to the matrix must be given. For the text file, each line should describe an entry of a matrix with three columns: the row index, the column index and the value. The separator is given by a script parameter.

Perform co-clustering: the coclust script

The coclust script can be used to run a particular co-clustering algorithm on a data matrix. The user has to select an algorithm which is given as a first argument to coclust. The choices are:

  • modularity
  • specmodularity
  • info

The following command line shows how to run the CoclustMod algorithm three times on a matrix contained in a Matlab file whose matrix key is the string ‘fea’. The computed row labels are to be stored in a file called cstr-rows.txt:

coclust modularity  -k fea --n_coclusters 4 --output_row_labels cstr-rows.txt  --n_runs 3 cstr.mat 

To have a list of all possible parameters for a given algorithm use the -h option as in the following example:

coclust modularity -h
usage: coclust [-h] {modularity,specmodularity,info} ...

Positional Arguments

subparser_name

Possible choices: modularity, specmodularity, info

choose the algorithm to use

Sub-commands:

modularity

use the modularity based algorithm

coclust modularity [-h] [-k MATLAB_MATRIX_KEY | -sep CSV_SEP]
                   [--output_row_labels OUTPUT_ROW_LABELS]
                   [--output_column_labels OUTPUT_COLUMN_LABELS]
                   [--output_fuzzy_row_labels OUTPUT_FUZZY_ROW_LABELS]
                   [--output_fuzzy_column_labels OUTPUT_FUZZY_COLUMN_LABELS]
                   [--convergence_plot CONVERGENCE_PLOT]
                   [--reorganized_matrix REORGANIZED_MATRIX] [-n N_COCLUSTERS]
                   [-m MAX_ITER] [-e EPSILON]
                   [-i INIT_ROW_LABELS | --n_runs N_RUNS] [--seed SEED]
                   [-l TRUE_ROW_LABELS] [--visu]
                   INPUT_MATRIX
input
INPUT_MATRIX matrix file path
-k, --matlab_matrix_key
 if not set, csv input is considered
-sep, --csv_sep
 

if not set, “,” is considered; use “t” for tab-separated values

Default: “,”

output
--output_row_labels
 file path for the predicted row labels
--output_column_labels
 file path for the predicted column labels
--output_fuzzy_row_labels
 

file path for the predicted fuzzy row labels

Default: 2

--output_fuzzy_column_labels
 

file path for the predicted fuzzy column labels

Default: 2

--convergence_plot
 file path for the convergence plot
--reorganized_matrix
 file path for the reorganized matrix
algorithm parameters
-n, --n_coclusters
 

number of co-clusters

Default: 2

-m, --max_iter

maximum number of iterations

Default: 15

-e, --epsilon

stop if the criterion (modularity) variation in an iteration is less than EPSILON

Default: 1e-09

-i, --init_row_labels
 file containing the initial row labels, if not set random initialization is performed
--n_runs

number of runs

Default: 1

--seed set the random state, useful for reproductible results
evaluation parameters
-l, --true_row_labels
 file containing the true row labels
--visu

Plot modularity values and reorganized matrix (requires Numpy, SciPy and matplotlib).

Default: False

specmodularity

use the spectral modularity based algorithm

coclust specmodularity [-h] [-k MATLAB_MATRIX_KEY | -sep CSV_SEP]
                       [--output_row_labels OUTPUT_ROW_LABELS]
                       [--output_column_labels OUTPUT_COLUMN_LABELS]
                       [--reorganized_matrix REORGANIZED_MATRIX]
                       [-n N_COCLUSTERS] [-m MAX_ITER] [-e EPSILON]
                       [--n_runs N_RUNS] [--seed SEED] [-l TRUE_ROW_LABELS]
                       [--visu]
                       INPUT_MATRIX
input
INPUT_MATRIX matrix file path
-k, --matlab_matrix_key
 if not set, csv input is considered
-sep, --csv_sep
 

if not set, “,” is considered; use “t” for tab-separated values

Default: “,”

output
--output_row_labels
 file path for the predicted row labels
--output_column_labels
 file path for the predicted column labels
--reorganized_matrix
 file path for the reorganized matrix
algorithm parameters
-n, --n_coclusters
 

number of co-clusters

Default: 2

-m, --max_iter

maximum number of iterations

Default: 15

-e, --epsilon

stop if the criterion (modularity) variation in an iteration is less than EPSILON

Default: 1e-09

--n_runs

number of runs

Default: 1

--seed set the random state, useful for reproductible results
evaluation parameters
-l, --true_row_labels
 file containing the true row labels
--visu

Plot modularity values and reorganized matrix (requires Numpy, SciPy and matplotlib).

Default: False

info

Undocumented

coclust info [-h] [-k MATLAB_MATRIX_KEY | -sep CSV_SEP]
             [--output_row_labels OUTPUT_ROW_LABELS]
             [--output_column_labels OUTPUT_COLUMN_LABELS]
             [--reorganized_matrix REORGANIZED_MATRIX] [-K N_ROW_CLUSTERS]
             [-L N_COL_CLUSTERS] [-m MAX_ITER] [-e EPSILON]
             [-i INIT_ROW_LABELS | --n_runs N_RUNS] [--seed SEED]
             [-l TRUE_ROW_LABELS] [--visu]
             INPUT_MATRIX
input
INPUT_MATRIX matrix file path
-k, --matlab_matrix_key
 if not set, csv input is considered
-sep, --csv_sep
 

if not set, “,” is considered; use “t” for tab-separated values

Default: “,”

output
--output_row_labels
 file path for the predicted row labels
--output_column_labels
 file path for the predicted column labels
--reorganized_matrix
 file path for the reorganized matrix
algorithm parameters
-K, --n_row_clusters
 

number of row clusters

Default: 2

-L, --n_col_clusters
 

number of column clusters

Default: 2

-m, --max_iter

maximum number of iterations

Default: 15

-e, --epsilon

stop if the criterion (modularity) variation in an iteration is less than EPSILON

Default: 1e-09

-i, --init_row_labels
 file containing the initial row labels, if not set random initialization is performed
--n_runs

number of runs

Default: 1

--seed set the random state, useful for reproductible results
evaluation parameters
-l, --true_row_labels
 file containing the true row labels
--visu

Plot modularity values and reorganized matrix (requires Numpy, SciPy and matplotlib).

Default: False

Detect the best number of co-clusters: the coclust-nb script

coclust-nb detects the number of co-clusters giving the best modularity score. It therefore relies on the CoclustMod algorithm. This is a simple yet often effective way to determine the appropriate number of co-clusters. A sample usage sample is given below:

coclust-nb cstr.csv --seed=1 --n_runs=20 --max_iter=60  --from 2 --to 6 
usage: coclust-nb [-h] [-k MATLAB_MATRIX_KEY | -sep CSV_SEP]
                  [--output_row_labels OUTPUT_ROW_LABELS]
                  [--output_column_labels OUTPUT_COLUMN_LABELS]
                  [--reorganized_matrix REORGANIZED_MATRIX] [--from FROM]
                  [--to TO] [-m MAX_ITER] [-e EPSILON] [--n_runs N_RUNS]
                  [--seed SEED] [--visu]
                  INPUT_MATRIX

input

INPUT_MATRIX matrix file path
-k, --matlab_matrix_key
 if not set, csv input is considered
-sep, --csv_sep
 

if not set, “,” is considered; use “t” for tab-separated values

Default: “,”

output

--output_row_labels
 file path for the predicted row labels
--output_column_labels
 file path for the predicted column labels
--reorganized_matrix
 file path for the reorganized matrix

algorithm parameters

--from

minimum number of co-clusters

Default: 2

--to

maximum number of co-clusters

Default: 10

-m, --max_iter

maximum number of iterations

Default: 15

-e, --epsilon

stop if the criterion (modularity) variation in an iteration is less than EPSILON

Default: 1e-09

--n_runs

number of runs

Default: 1

--seed set the random state, useful for reproductible results

evaluation parameters

--visu

Plot modularity values and reorganized matrix (requires Numpy, SciPy and matplotlib).

Default: False