Scripts¶
The input matrix can be a Matlab file or a text file. For the Matlab file, the key corresponding to the matrix must be given. For the text file, each line should describe an entry of a matrix with three columns: the row index, the column index and the value. The separator is given by a script parameter.
Perform co-clustering: the coclust script¶
The coclust script can be used to run a particular co-clustering algorithm on a data matrix. The user has to select an algorithm which is given as a first argument to coclust. The choices are:
- modularity
- specmodularity
- info
The following command line shows how to run the CoclustMod algorithm three times on a matrix contained in a Matlab file whose matrix key is the string ‘fea’. The computed row labels are to be stored in a file called cstr-rows.txt:
coclust modularity -k fea --n_coclusters 4 --output_row_labels cstr-rows.txt --n_runs 3 cstr.mat
To have a list of all possible parameters for a given algorithm use the -h option as in the following example:
coclust modularity -h
usage: coclust [-h] {modularity,specmodularity,info} ...
Positional Arguments¶
subparser_name | Possible choices: modularity, specmodularity, info choose the algorithm to use |
Sub-commands:¶
modularity¶
use the modularity based algorithm
coclust modularity [-h] [-k MATLAB_MATRIX_KEY | -sep CSV_SEP]
[--output_row_labels OUTPUT_ROW_LABELS]
[--output_column_labels OUTPUT_COLUMN_LABELS]
[--output_fuzzy_row_labels OUTPUT_FUZZY_ROW_LABELS]
[--output_fuzzy_column_labels OUTPUT_FUZZY_COLUMN_LABELS]
[--convergence_plot CONVERGENCE_PLOT]
[--reorganized_matrix REORGANIZED_MATRIX] [-n N_COCLUSTERS]
[-m MAX_ITER] [-e EPSILON]
[-i INIT_ROW_LABELS | --n_runs N_RUNS] [--seed SEED]
[-l TRUE_ROW_LABELS] [--visu]
INPUT_MATRIX
input¶
INPUT_MATRIX | matrix file path |
-k, --matlab_matrix_key | |
if not set, csv input is considered | |
-sep, --csv_sep | |
if not set, “,” is considered; use “t” for tab-separated values Default: “,” |
output¶
--output_row_labels | |
file path for the predicted row labels | |
--output_column_labels | |
file path for the predicted column labels | |
--output_fuzzy_row_labels | |
file path for the predicted fuzzy row labels Default: 2 | |
--output_fuzzy_column_labels | |
file path for the predicted fuzzy column labels Default: 2 | |
--convergence_plot | |
file path for the convergence plot | |
--reorganized_matrix | |
file path for the reorganized matrix |
algorithm parameters¶
-n, --n_coclusters | |
number of co-clusters Default: 2 | |
-m, --max_iter | maximum number of iterations Default: 15 |
-e, --epsilon | stop if the criterion (modularity) variation in an iteration is less than EPSILON Default: 1e-09 |
-i, --init_row_labels | |
file containing the initial row labels, if not set random initialization is performed | |
--n_runs | number of runs Default: 1 |
--seed | set the random state, useful for reproductible results |
evaluation parameters¶
-l, --true_row_labels | |
file containing the true row labels | |
--visu | Plot modularity values and reorganized matrix (requires Numpy, SciPy and matplotlib). Default: False |
specmodularity¶
use the spectral modularity based algorithm
coclust specmodularity [-h] [-k MATLAB_MATRIX_KEY | -sep CSV_SEP]
[--output_row_labels OUTPUT_ROW_LABELS]
[--output_column_labels OUTPUT_COLUMN_LABELS]
[--reorganized_matrix REORGANIZED_MATRIX]
[-n N_COCLUSTERS] [-m MAX_ITER] [-e EPSILON]
[--n_runs N_RUNS] [--seed SEED] [-l TRUE_ROW_LABELS]
[--visu]
INPUT_MATRIX
input¶
INPUT_MATRIX | matrix file path |
-k, --matlab_matrix_key | |
if not set, csv input is considered | |
-sep, --csv_sep | |
if not set, “,” is considered; use “t” for tab-separated values Default: “,” |
output¶
--output_row_labels | |
file path for the predicted row labels | |
--output_column_labels | |
file path for the predicted column labels | |
--reorganized_matrix | |
file path for the reorganized matrix |
algorithm parameters¶
-n, --n_coclusters | |
number of co-clusters Default: 2 | |
-m, --max_iter | maximum number of iterations Default: 15 |
-e, --epsilon | stop if the criterion (modularity) variation in an iteration is less than EPSILON Default: 1e-09 |
--n_runs | number of runs Default: 1 |
--seed | set the random state, useful for reproductible results |
evaluation parameters¶
-l, --true_row_labels | |
file containing the true row labels | |
--visu | Plot modularity values and reorganized matrix (requires Numpy, SciPy and matplotlib). Default: False |
info¶
Undocumented
coclust info [-h] [-k MATLAB_MATRIX_KEY | -sep CSV_SEP]
[--output_row_labels OUTPUT_ROW_LABELS]
[--output_column_labels OUTPUT_COLUMN_LABELS]
[--reorganized_matrix REORGANIZED_MATRIX] [-K N_ROW_CLUSTERS]
[-L N_COL_CLUSTERS] [-m MAX_ITER] [-e EPSILON]
[-i INIT_ROW_LABELS | --n_runs N_RUNS] [--seed SEED]
[-l TRUE_ROW_LABELS] [--visu]
INPUT_MATRIX
input¶
INPUT_MATRIX | matrix file path |
-k, --matlab_matrix_key | |
if not set, csv input is considered | |
-sep, --csv_sep | |
if not set, “,” is considered; use “t” for tab-separated values Default: “,” |
output¶
--output_row_labels | |
file path for the predicted row labels | |
--output_column_labels | |
file path for the predicted column labels | |
--reorganized_matrix | |
file path for the reorganized matrix |
algorithm parameters¶
-K, --n_row_clusters | |
number of row clusters Default: 2 | |
-L, --n_col_clusters | |
number of column clusters Default: 2 | |
-m, --max_iter | maximum number of iterations Default: 15 |
-e, --epsilon | stop if the criterion (modularity) variation in an iteration is less than EPSILON Default: 1e-09 |
-i, --init_row_labels | |
file containing the initial row labels, if not set random initialization is performed | |
--n_runs | number of runs Default: 1 |
--seed | set the random state, useful for reproductible results |
evaluation parameters¶
-l, --true_row_labels | |
file containing the true row labels | |
--visu | Plot modularity values and reorganized matrix (requires Numpy, SciPy and matplotlib). Default: False |
Detect the best number of co-clusters: the coclust-nb script¶
coclust-nb detects the number of co-clusters giving the best modularity score. It therefore relies on the CoclustMod algorithm. This is a simple yet often effective way to determine the appropriate number of co-clusters. A sample usage sample is given below:
coclust-nb cstr.csv --seed=1 --n_runs=20 --max_iter=60 --from 2 --to 6
usage: coclust-nb [-h] [-k MATLAB_MATRIX_KEY | -sep CSV_SEP]
[--output_row_labels OUTPUT_ROW_LABELS]
[--output_column_labels OUTPUT_COLUMN_LABELS]
[--reorganized_matrix REORGANIZED_MATRIX] [--from FROM]
[--to TO] [-m MAX_ITER] [-e EPSILON] [--n_runs N_RUNS]
[--seed SEED] [--visu]
INPUT_MATRIX
input¶
INPUT_MATRIX | matrix file path |
-k, --matlab_matrix_key | |
if not set, csv input is considered | |
-sep, --csv_sep | |
if not set, “,” is considered; use “t” for tab-separated values Default: “,” |
output¶
--output_row_labels | |
file path for the predicted row labels | |
--output_column_labels | |
file path for the predicted column labels | |
--reorganized_matrix | |
file path for the reorganized matrix |
algorithm parameters¶
--from | minimum number of co-clusters Default: 2 |
--to | maximum number of co-clusters Default: 10 |
-m, --max_iter | maximum number of iterations Default: 15 |
-e, --epsilon | stop if the criterion (modularity) variation in an iteration is less than EPSILON Default: 1e-09 |
--n_runs | number of runs Default: 1 |
--seed | set the random state, useful for reproductible results |
evaluation parameters¶
--visu | Plot modularity values and reorganized matrix (requires Numpy, SciPy and matplotlib). Default: False |