Scripts¶

The input matrix can be a Matlab file or a text file. For the Matlab file, the key corresponding to the matrix must be given. For the text file, each line should describe an entry of a matrix with three columns: the row index, the column index and the value. The separator is given by a script parameter.

Perform co-clustering: the coclust script¶

The coclust script can be used to run a particular co-clustering algorithm on a data matrix. The user has to select an algorithm which is given as a first argument to coclust. The choices are:

modularity
specmodularity
info

The following command line shows how to run the CoclustMod algorithm three times on a matrix contained in a Matlab file whose matrix key is the string ‘fea’. The computed row labels are to be stored in a file called cstr-rows.txt:

coclust modularity  -k fea --n_coclusters 4 --output_row_labels cstr-rows.txt  --n_runs 3 cstr.mat 

To have a list of all possible parameters for a given algorithm use the -h option as in the following example:

coclust modularity -h

usage: coclust [-h] {modularity,specmodularity,info} ...

Positional Arguments¶

subparser_name

Possible choices: modularity, specmodularity, info

choose the algorithm to use

Sub-commands:¶

modularity¶

use the modularity based algorithm

coclust modularity [-h] [-k MATLAB_MATRIX_KEY | -sep CSV_SEP]
                   [--output_row_labels OUTPUT_ROW_LABELS]
                   [--output_column_labels OUTPUT_COLUMN_LABELS]
                   [--output_fuzzy_row_labels OUTPUT_FUZZY_ROW_LABELS]
                   [--output_fuzzy_column_labels OUTPUT_FUZZY_COLUMN_LABELS]
                   [--convergence_plot CONVERGENCE_PLOT]
                   [--reorganized_matrix REORGANIZED_MATRIX] [-n N_COCLUSTERS]
                   [-m MAX_ITER] [-e EPSILON]
                   [-i INIT_ROW_LABELS | --n_runs N_RUNS] [--seed SEED]
                   [-l TRUE_ROW_LABELS] [--visu]
                   INPUT_MATRIX

input¶

`INPUT_MATRIX`	matrix file path
`-k, --matlab_matrix_key`
	if not set, csv input is considered
`-sep, --csv_sep`
	if not set, “,” is considered; use “t” for tab-separated values Default: “,”

output¶

`--output_row_labels`
	file path for the predicted row labels
`--output_column_labels`
	file path for the predicted column labels
`--output_fuzzy_row_labels`
	file path for the predicted fuzzy row labels Default: 2
`--output_fuzzy_column_labels`
	file path for the predicted fuzzy column labels Default: 2
`--convergence_plot`
	file path for the convergence plot
`--reorganized_matrix`
	file path for the reorganized matrix

algorithm parameters¶

`-n, --n_coclusters`
	number of co-clusters Default: 2
`-m, --max_iter`	maximum number of iterations Default: 15
`-e, --epsilon`	stop if the criterion (modularity) variation in an iteration is less than EPSILON Default: 1e-09
`-i, --init_row_labels`
	file containing the initial row labels, if not set random initialization is performed
`--n_runs`	number of runs Default: 1
`--seed`	set the random state, useful for reproductible results

evaluation parameters¶

-l, --true_row_labels

file containing the true row labels

--visu

Plot modularity values and reorganized matrix (requires Numpy, SciPy and matplotlib).

Default: False

specmodularity¶

use the spectral modularity based algorithm

coclust specmodularity [-h] [-k MATLAB_MATRIX_KEY | -sep CSV_SEP]
                       [--output_row_labels OUTPUT_ROW_LABELS]
                       [--output_column_labels OUTPUT_COLUMN_LABELS]
                       [--reorganized_matrix REORGANIZED_MATRIX]
                       [-n N_COCLUSTERS] [-m MAX_ITER] [-e EPSILON]
                       [--n_runs N_RUNS] [--seed SEED] [-l TRUE_ROW_LABELS]
                       [--visu]
                       INPUT_MATRIX

input¶

`INPUT_MATRIX`	matrix file path
`-k, --matlab_matrix_key`
	if not set, csv input is considered
`-sep, --csv_sep`
	if not set, “,” is considered; use “t” for tab-separated values Default: “,”

output¶

`--output_row_labels`
	file path for the predicted row labels
`--output_column_labels`
	file path for the predicted column labels
`--reorganized_matrix`
	file path for the reorganized matrix

algorithm parameters¶

`-n, --n_coclusters`
	number of co-clusters Default: 2
`-m, --max_iter`	maximum number of iterations Default: 15
`-e, --epsilon`	stop if the criterion (modularity) variation in an iteration is less than EPSILON Default: 1e-09
`--n_runs`	number of runs Default: 1
`--seed`	set the random state, useful for reproductible results

evaluation parameters¶

-l, --true_row_labels

file containing the true row labels

--visu

Plot modularity values and reorganized matrix (requires Numpy, SciPy and matplotlib).

Default: False

info¶

Undocumented

coclust info [-h] [-k MATLAB_MATRIX_KEY | -sep CSV_SEP]
             [--output_row_labels OUTPUT_ROW_LABELS]
             [--output_column_labels OUTPUT_COLUMN_LABELS]
             [--reorganized_matrix REORGANIZED_MATRIX] [-K N_ROW_CLUSTERS]
             [-L N_COL_CLUSTERS] [-m MAX_ITER] [-e EPSILON]
             [-i INIT_ROW_LABELS | --n_runs N_RUNS] [--seed SEED]
             [-l TRUE_ROW_LABELS] [--visu]
             INPUT_MATRIX

input¶

`INPUT_MATRIX`	matrix file path
`-k, --matlab_matrix_key`
	if not set, csv input is considered
`-sep, --csv_sep`
	if not set, “,” is considered; use “t” for tab-separated values Default: “,”

output¶

`--output_row_labels`
	file path for the predicted row labels
`--output_column_labels`
	file path for the predicted column labels
`--reorganized_matrix`
	file path for the reorganized matrix

algorithm parameters¶

`-K, --n_row_clusters`
	number of row clusters Default: 2
`-L, --n_col_clusters`
	number of column clusters Default: 2
`-m, --max_iter`	maximum number of iterations Default: 15
`-e, --epsilon`	stop if the criterion (modularity) variation in an iteration is less than EPSILON Default: 1e-09
`-i, --init_row_labels`
	file containing the initial row labels, if not set random initialization is performed
`--n_runs`	number of runs Default: 1
`--seed`	set the random state, useful for reproductible results

evaluation parameters¶

-l, --true_row_labels

file containing the true row labels

--visu

Plot modularity values and reorganized matrix (requires Numpy, SciPy and matplotlib).

Default: False

Detect the best number of co-clusters: the coclust-nb script¶

coclust-nb detects the number of co-clusters giving the best modularity score. It therefore relies on the CoclustMod algorithm. This is a simple yet often effective way to determine the appropriate number of co-clusters. A sample usage sample is given below:

coclust-nb cstr.csv --seed=1 --n_runs=20 --max_iter=60  --from 2 --to 6 

usage: coclust-nb [-h] [-k MATLAB_MATRIX_KEY | -sep CSV_SEP]
                  [--output_row_labels OUTPUT_ROW_LABELS]
                  [--output_column_labels OUTPUT_COLUMN_LABELS]
                  [--reorganized_matrix REORGANIZED_MATRIX] [--from FROM]
                  [--to TO] [-m MAX_ITER] [-e EPSILON] [--n_runs N_RUNS]
                  [--seed SEED] [--visu]
                  INPUT_MATRIX

input¶

`INPUT_MATRIX`	matrix file path
`-k, --matlab_matrix_key`
	if not set, csv input is considered
`-sep, --csv_sep`
	if not set, “,” is considered; use “t” for tab-separated values Default: “,”

output¶

`--output_row_labels`
	file path for the predicted row labels
`--output_column_labels`
	file path for the predicted column labels
`--reorganized_matrix`
	file path for the reorganized matrix

algorithm parameters¶

`--from`	minimum number of co-clusters Default: 2
`--to`	maximum number of co-clusters Default: 10
`-m, --max_iter`	maximum number of iterations Default: 15
`-e, --epsilon`	stop if the criterion (modularity) variation in an iteration is less than EPSILON Default: 1e-09
`--n_runs`	number of runs Default: 1
`--seed`	set the random state, useful for reproductible results

evaluation parameters¶

--visu

Plot modularity values and reorganized matrix (requires Numpy, SciPy and matplotlib).

Default: False