PyBMF.utils package

Submodules

PyBMF.utils.boolean_utils module

PyBMF.utils.boolean_utils.add(X, Y, sparse=None, boolean=False)[source]

Matrix-matrix addition for both dense and sparse input with Boolean logic support.

Also support regular matrix-const addition.

Parameters:
  • X (ndarray, spmatrix, int, float)

  • Y (ndarray, spmatrix, int, float)

  • sparse (bool, default: False)

  • boolean (bool, default: False)

PyBMF.utils.boolean_utils.dot(u, v, boolean=False)[source]

Vector-vector inner product for both dense and sparse input with Boolean logic support.

Parameters:
  • U (ndarray, spmatrix)

  • V (ndarray, spmatrix)

  • boolean (bool, default: False)

PyBMF.utils.boolean_utils.ismat(X)[source]

Whether X is a matrix or not.

`{attention} Note that ``np.matrix is NOT a supported matrix type in PyBMF. ```

PyBMF.utils.boolean_utils.isnum(X)[source]

Whether X is a number or not.

PyBMF.utils.boolean_utils.matmul(U, V, sparse=None, boolean=False)[source]

Matrix-matrix multiplication for both dense and sparse input with Boolean logic support.

Parameters:
  • U (ndarray, spmatrix)

  • V (ndarray, spmatrix)

  • sparse (bool, default: None)

  • boolean (bool, default: False)

PyBMF.utils.boolean_utils.multiply(U, V, sparse=None, boolean=False)[source]

Point-wise multiplication for both dense and sparse input with Boolean logic support.

For vector-vector or matrix-matrix Hadamard product. Also support regular const-vector and const-matrix product.

Parameters:
  • X (ndarray, spmatrix, int, float)

  • Y (ndarray, spmatrix, int, float)

  • sparse (bool, default: None) – Whether to enforce a sparse output. If None, keep the same dtype as input.

  • boolean (bool, default: False) – Whether to use Boolean logic on the binary input.

PyBMF.utils.boolean_utils.power(X, n)[source]

Matrix power for both dense and sparse input.

PyBMF.utils.boolean_utils.subtract(X, Y, sparse=False, boolean=False)[source]

Matrix-matrix subtraction for both dense and sparse input with Boolean logic support.

Also support regular matrix-const subtraction.

Parameters:
  • X (ndarray, spmatrix, int, float)

  • Y (ndarray, spmatrix, int, float)

  • sparse (bool, default: False)

  • boolean (bool, default: False)

PyBMF.utils.collective_display_utils module

PyBMF.utils.collective_display_utils.get_settings(Xs, factors, Us=None)[source]

Get display settings.

Used in the show_matrix() wrapper for CMF models.

Parameters:
  • Xs (list of spmatrix or ndarray)

  • factors (list of int list)

  • Us (list of spmatrix or ndarray, optional)

  • a (factor id) – note that factor id may not be equal to the index in Us and factor_info, especially when the factor id does not start from 0. split_factor_list only accepts the compete factors list.

  • b (factor id) – note that factor id may not be equal to the index in Us and factor_info, especially when the factor id does not start from 0. split_factor_list only accepts the compete factors list.

  • f (factor id) – note that factor id may not be equal to the index in Us and factor_info, especially when the factor id does not start from 0. split_factor_list only accepts the compete factors list.

PyBMF.utils.collective_display_utils.sort_matrices(Xs, factors)[source]

Sort out matrices.

Transpose the matrices when necessary and return the positions.

PyBMF.utils.collective_evaluate_utils module

PyBMF.utils.collective_evaluate_utils.collective_cover(gt, pd, w, axis, starts=None)[source]

The collective wrapper for cover function.

Parameters:
  • gt (spmatrix) – The concatenated ground-truth matrix.

  • pd (spmatrix) – The concatenated predicted matrix.

  • w (list of float) – The allowed ratio of false positives in each matrices.

  • axis (int in {0, 1}, default: None) – The dimension of the basis.

  • starts (list of int) – The starting point of each matrix on the other dimension rather than the dimension of the basis.

Returns:

scores – The scores of each basis over each submatrix.

Return type:

(n_submat, n_basis) array

PyBMF.utils.collective_evaluate_utils.harmonic_score(scores)[source]

Harmonic score(s) of n sets of scores.

Parameters:

scores ((n, k) array)

Returns:

s

Return type:

float or (1, k) array

PyBMF.utils.collective_evaluate_utils.weighted_score(scores, weights)[source]

Weighted score(s) of n sets of scores.

Parameters:
  • scores ((n, k) array)

  • weights ((1, n) array)

Returns:

s

Return type:

float or (1, k) array

PyBMF.utils.collective_transform_utils module

PyBMF.utils.collective_transform_utils.concat_Us_into_U(Us, factors)[source]

Concatenate factors of collective matrices Us into a single pair of factors U.

Used in some collective models.

PyBMF.utils.collective_transform_utils.concat_Xs_into_X(Xs, factors)[source]

Concatenate collective matrices Xs into a single matrix X.

Used in BaseData and some collective models.

PyBMF.utils.collective_transform_utils.concat_factor_info(factor_info, factors)[source]

Concatenate factor_info of collective matrices Xs into a length-2 factor_info of a single matrix X.

PyBMF.utils.collective_transform_utils.split_U_into_Us(U, V, factors, factor_starts)[source]

Seperate concatenated factors (U, V) into collective factors Us.

Used in some collective models.

PyBMF.utils.collective_transform_utils.split_X_into_Xs(X, factors, factor_starts)[source]

Split concatenated single matrix X into collective matrices Xs.

Used in some collective models.

PyBMF.utils.collective_utils module

PyBMF.utils.collective_utils.get_dummy_factor_info(Xs, factors)[source]

Get dummy factor_info for collective matrices.

PyBMF.utils.collective_utils.get_factor_dims(Xs, factors)[source]

The dimensions of each factor.

PyBMF.utils.collective_utils.get_factor_list(factors)[source]

Get sorted factor list.

Parameters:

factors (list of int list) – List of factor id pairs, indicating the row and column factors of each matrix. Please follow the convention that factors are numbered consecutively and starting from 0. There must exist a matrix with its factors numbered as [0, 1].

Returns:

factor_list – List of sorted factor ids.

Return type:

list

PyBMF.utils.collective_utils.get_factor_starts(Xs, factors)[source]

The starting point of each factor when multiple factors Us are concatenated into a pair of row and column factor U.

PyBMF.utils.collective_utils.get_matrices(factors)[source]

List of related matrices given factors.

This is the reversion of ‘factors’, the list of related factors given matrices.

PyBMF.utils.collective_utils.split_factor_list(factors)[source]

Classify factors into row and column factors.

Please follow the convention that factors are numbered consecutively and starting from 0. There must exist a matrix with its factors numbered as [0, 1]. Factor 0 and those on the same side as 0 are regraded as row factors. Factor 1 and those on the same side as 1 are regraded as column factors.

List f stores the type of each factor with 0 for unclassified, 1 for row factor and 2 for column factor.

PyBMF.utils.common module

PyBMF.utils.common.binarize(X, threshold=0.5)[source]

To binarize a matrix. Also known as Heaviside step function.

Parameters:
  • X (float ndarray, spmatrix)

  • threshold (float, default: 0.5)

Returns:

result

Return type:

int ndarray, spmatrix

PyBMF.utils.common.d_sigmoid(X)[source]
PyBMF.utils.common.get_prediction(U, V, boolean=True, sparse=True)[source]

Get prediction.

Parameters:
  • U (array, spmatrix)

  • V (array, spmatrix)

  • boolean (bool) – Whether to apply Boolean multiplication.

PyBMF.utils.common.get_prediction_with_threshold(U, V, u=None, v=None, us=None, vs=None, sparse=True)[source]

Get prediction after thresholding factors U and V.

Parameters:
  • U (ndarray, spmatrix) – The factor matrix.

  • V (ndarray, spmatrix) – The factor matrix.

  • u (float) – The shared threshold across all factors for U.

  • v (float) – The shared threshold across all factors for V.

  • us (list of k floats) – The individual thresholds for each factor in U.

  • vs (list of k floats) – The individual thresholds for each factor in V.

Returns:

X_pd – The prediction matrix.

Return type:

ndarray, spmatrix

PyBMF.utils.common.get_residual(X, U, V)[source]

Get residual matrix of X.

PyBMF.utils.common.get_rng(seed, rng)[source]

Get random number generator.

Parameters:
  • seed (optional) – Random seed.

  • rng (optional) – Random number generator.

PyBMF.utils.common.safe_indexing(X, indices)[source]

Return items or rows from X using indices

Allows simple indexing of lists or arrays. Modified from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/__init__.py

Parameters:
  • X (array-like, sparse-matrix, list, pandas.DataFrame, pandas.Series.) – Data from which to sample rows or items.

  • indices (array-like of int) – Indices according to which X will be subsampled.

Returns:

Subset of X on first axis

Return type:

subset

PyBMF.utils.common.to_interval(X, min, max)[source]

Transform data into interval [min, max].

Parameters:
  • X (ndarray)

  • min (float)

  • max (float)

  • TODO (to support spmatrix)

PyBMF.utils.data_utils module

PyBMF.utils.data_utils.mean(X, axis=None)[source]

Row and column-wise mean.

Parameters:
  • X (ndarray, spmatrix)

  • axis (int, optional)

Returns:

result

Return type:

tuple, ndarray

PyBMF.utils.data_utils.median(X, axis=None)[source]

Row and column-wise median.

Parameters:
  • X (ndarray, spmatrix)

  • axis (int, optional)

Returns:

result

Return type:

tuple, ndarray

PyBMF.utils.data_utils.sample(X, axis, factor_info=None, idx=None, n_samples=None, seed=None)[source]

Sample a matrix by its row or column. Update factor_info if provided.

axisint

which dimension to down-sample. 0, sample rows. 1, sample columns.

factor_infolist, tuple

factor_info for single matrix X or collective matrices Xs. for X, factor_info is a list of tuples. for Xs, factor_info is a tuple.

idx:

the indices to sample with.

n_samples:

randomly down-sample to this length.

seed:

seed for down-sampling.

PyBMF.utils.data_utils.sort_order(order)[source]

Fix the gap after down-sampling.

E.g. [1, 6, 4, 2] will be turned into [0, 3, 2, 1].

PyBMF.utils.data_utils.sum(X, axis=None)[source]

Row and column-wise sum.

Parameters:
  • X (ndarray, spmatrix)

  • axis (int, optional)

Returns:

result

Return type:

tuple, ndarray

PyBMF.utils.data_utils.summarize(X)[source]

To show the summary of a matrix.

Parameters:

X (ndarray, spmatrix)

PyBMF.utils.dataframe_utils module

PyBMF.utils.dataframe_utils._make_html(file_path, file_name, html)[source]

Make a html file.

Parameters:
  • file_path (str) – Path to save the html file.

  • file_name (str) – Name of the html file.

  • html (str) – HTML code.

Returns:

full_path – Full path of the html file.

Return type:

str

PyBMF.utils.dataframe_utils._make_name(model=None, model_name=None, format='%Y-%m-%d %H-%M-%S-%f ')[source]

Make a file name for an instance of a model.

Milliseconds are added to the end of the name to make it unique.

Parameters:
  • model (object) – Model object.

  • model_name (str) – Name of the model.

  • format (str) – Format of the timestamp.

Returns:

model_name – Name of the model.

Return type:

str

PyBMF.utils.dataframe_utils._open_html(full_path, browser_path)[source]

Open a html file in browser.

Parameters:
  • full_path (str) – Full path of the html file.

  • browser_path (str) – Path of the browser.

PyBMF.utils.dataframe_utils.get_config(key)[source]

Get config value from settings.ini.

Parameters:

key (str) – Key in settings.ini.

Returns:

value – Value in settings.ini.

Return type:

str

PyBMF.utils.dataframe_utils.log2html(df, log_name, open_browser=True, log_path=None, browser_path=None)[source]

Display and save a dataframe in HTML, and open it in browser if needed.

Please create settings.ini or set file_path and browser_path manually before calling.

Parameters:
  • df (pandas.DataFrame) – Dataframe to be displayed in HTML.

  • log_name (str) – Name of the log file.

  • open_browser (bool) – Whether to open in browser.

  • log_path (str) – Path to save the log file.

  • browser_path (str) – Path of the browser.

PyBMF.utils.dataframe_utils.log2latex(df, log_name, open_browser=True, log_path=None, browser_path=None)[source]

Display a dataframe in TeX on overleaf.com.

This tool automatically highlights the maximum values in each column.

Parameters:
  • df (pandas.DataFrame) – Dataframe to be displayed in TeX.

  • log_name (str) – Name of the log file.

  • open_browser (bool) – Whether to open in browser.

  • log_path (str) – Path to save the log file.

  • browser_path (str) – Path of the browser.

PyBMF.utils.decorator_utils module

PyBMF.utils.decorator_utils.ignore_warnings(func)[source]
PyBMF.utils.decorator_utils.timeit(func)[source]

PyBMF.utils.display module

PyBMF.utils.display.fill_nan(X, mask)[source]

Fill the missing values of a sparse matrix with NaN, so that missing values in a sparse matrix are displayed differently from zeros.

Used for displaying matrices while identifying missing values.

Parameters:
  • X (ndarray or spmatrix) – The matrix with values to be filled with NaN.

  • mask (spamtrix) – The masking matrix. Explicit zeros in mask are not considered as missing. Note that there are several ways to preserve zeros in a sparse matrix. BaseSplit.load_neg_data() is one fot them.

Returns:

Y – The dense matrix with NaN in it.

Return type:

ndarray

PyBMF.utils.display.get_size_inches(scaling, ppi, hds, pixels, width_cells, height_cells)[source]

Get figure size in inches.

Parameters:
  • width_cells (int) – Figure width in the number of matrix cells.

  • height_cells (int) – Figure height in the number of matrix cells.

Returns:

  • width_inches (float) – Figure width in inches.

  • height_inches (float) – Figure height in inches.

PyBMF.utils.display.show_factor_distribution(U, V, resolution=100, show_hist=False, show_minmax=True, remove_below=None, us=None, vs=None)[source]

Show the distribution of real-valued factor matrices U and V.

PyBMF.utils.display.show_matrix(settings, scaling=1.0, ppi=96, hds=1.5, pixels=None, title=None, fontsize=8, keep_nan=True, colorbar=False, clim=None, discrete=False, center=True, cmap='rainbow', cmin='gray', cmax='black', cnan='white', save_fig=True)[source]

Show the matrix and factors.

Parameters:
  • settings (list of tuple) – A list of (data, location, title) tuple.

  • scaling (float, default: 1.0) – The scaling factor. The default scaling is 1.0, the maximum size a figure can be displayed within the screen.

  • ppi (int, default: 96) – Pixels per inch. The ppi of a 4K 24” display is 96.

  • hds (float, default: 1.5) – High DPI scaling, if your Python IDE supports this. The default hds in Spyder is 1.5.

  • pixels (int, optional) – Each cell in a matrix takes up pivels * pixels on screen. This will overwrite scaling.

  • title (string, optional) – The centered suptitle of the figure.

  • fontsize (int, default: 8) – Size of the title and subtitles.

  • colorbar (bool, default: False) – Show colorbar.

  • clim (list, optional) – Colorbar range limit applied over all matrices. If clim is None, each matrix will have its own colorbar range limit separately.

  • discrete (bool, default: False) – Show discrete colorbar.

  • center (bool, default: True) – Available only when discrete is True.

  • cmap (str, default: 'rainbow') – The colormap.

  • cmin (str, default: 'gray') – The color of values lower than the range limit clim.

  • cmax (str, default: 'black') – The color of values higher than the range limit clim.

  • cnan (str, default: 'white') – The color of NaN. To differentiate real zeros and NaN in sparse marices.

  • save_fig (bool, default: True) – Save the figure to output directory.

PyBMF.utils.evaluate_utils module

PyBMF.utils.evaluate_utils.eval(metrics, task, X_gt, X_pd=None, U=None, V=None)[source]

Evaluate with given metrics.

Parameters:
  • X_gt (ndarray or spmatrix)

  • X_pd (ndarray or spmatrix, optional)

  • U (spmatrix, optional)

  • V (spmatrix, optional)

  • metrics (list of str) – List of metric names.

  • task (str in {'prediction', 'reconstruction'}) – If task == ‘prediction’, it ignores the missing values and only use the triplet from the spmatrix. The triplet may contain zeros, depending on whether negative sampling has been used. If task == ‘reconstruction’, it uses the whole matrix, which considers all missing values as zeros in spmatrix.

PyBMF.utils.evaluate_utils.header(names, levels, depth=None)[source]

Create multi-level headers.

>>> header(['time', 'k', 'score'], levels=3, depth=2)
>>> [('', 'time', ''), ('', 'k', ''), ('', 'score', '')]
PyBMF.utils.evaluate_utils.record(df_dict, df_name, columns, records, verbose=False)[source]

Create and add records to a dataframe in a logs dict.

Parameters:
  • df_dict (dict)

  • df_name (str)

  • columns (list of str or str tuple)

  • records (list)

  • verbose (bool, default: False)

  • caption (str, optional)

PyBMF.utils.experiment_utils module

PyBMF.utils.experiment_utils.get_model_by_path(path_list)[source]
PyBMF.utils.experiment_utils.get_model_by_time(root='../saved_models/', model_name='Asso', time_start='24-06-03 04:43', time_end='24-06-03 07:00')[source]

PyBMF.utils.generator_utils module

PyBMF.utils.generator_utils.add_noise(X, noise, seed=None, rng=None)[source]

Add noise to a matrix.

Parameters:
  • X (ndarray, spmatrix)

  • noise (list of 2 float in [0, 1]) – Probabilities for false negative (p_pos) and false positive (p_neg).

  • seed (optional) – Random seed.

  • rng (optional) – Random number generator.

PyBMF.utils.generator_utils.reverse_index(idx)[source]

Reverse index

Example

idx = np.array([0, 1, 2, 4, 5, 3]) inv = reverse_index(idx) # inv = [0, 1, 2, 5, 3, 4]

PyBMF.utils.generator_utils.shuffle_by_dim(X, dim, seed=None, rng=None)[source]

Shuffle a matrix by dimension

dim:

0, shuffle by rows 1, shuffle by columns

same as:

np.take(X, idx, axis=dim, out=X)

PyBMF.utils.generator_utils.shuffle_matrix(X, seed=None, rng=None)[source]

Shuffle a matrix

PyBMF.utils.metrics module

PyBMF.utils.metrics.ACC(gt, pd, axis=None)[source]

Accuracy.

PyBMF.utils.metrics.ERR(gt, pd, axis=None)[source]

Error rate.

PyBMF.utils.metrics.F1(gt, pd, axis=None)[source]

F1 score.

tp = TP(gt, pd, axis) fp = FP(gt, pd, axis) fn = FN(gt, pd, axis) return 2 * tp / (2 * tp + fp + fn)

PyBMF.utils.metrics.FN(gt, pd, axis=None)[source]
PyBMF.utils.metrics.FNR(gt, pd, axis=None)[source]

miss rate or false negative rate

PyBMF.utils.metrics.FP(gt, pd, axis=None)[source]
PyBMF.utils.metrics.FPR(gt, pd, axis=None)[source]

fall-out or false positive rate

PyBMF.utils.metrics.MAE(gt, pd, axis=None)[source]
PyBMF.utils.metrics.PPV(gt, pd, axis=None)[source]

precision or positive predictive value

PyBMF.utils.metrics.RMSE(gt, pd, axis=None)[source]
PyBMF.utils.metrics.TN(gt, pd, axis=None)[source]
PyBMF.utils.metrics.TNR(gt, pd, axis=None)[source]

specificity, selectivity or true negative rate

PyBMF.utils.metrics.TP(gt, pd, axis=None)[source]
PyBMF.utils.metrics.TPR(gt, pd, axis=None)[source]

sensitivity, recall, hit rate, or true positive rate

PyBMF.utils.metrics.coverage_score(gt, pd, w_fp=0.5, w_fn=None, axis=None)[source]

Covergage score function to be maximized.

Measure the coverage of X using Y.

Parameters:

axis (int in {0, 1}, default: None) – The dimension to which the basis belongs. When axis is None, return the overall coverage score. When axis is 0, the basis is at dimension 0, thus return the column-wise coverage scores.

PyBMF.utils.metrics.description_length(gt, U, V, pd=None, w_model=1.0, w_fp=1.0, w_fn=1.0)[source]

The vanilla description length function.

Will compute X_pd from U and V if pd is None.

PyBMF.utils.metrics.get_metrics(gt, pd, metrics, axis=None)[source]

Get results of the metrics all at once.

Metrics from sklearn.metrics are included as sanity check. Their input must be binary array, which makes them slow and less flexible.

Parameters:
  • gt (array, spmatrix) – Ground truth, can be 1d array, 2d dense or sparse matrix.

  • pd (array, spmatrix) – Prediction, can be 1d array, 2d dense or sparse matrix. When the input are matrices, row and column-wise measurement can be conducted by defining axis.

  • metrics (list of str) – The name of metrics.

  • axis (int in {0, 1}) – When axis == 0, The result containing the column-wise measurement has the same length as columns.

Returns:

results

Return type:

list

PyBMF.utils.metrics.invert(X)[source]
PyBMF.utils.metrics.weighted_error(gt, pd, w_fp=0.5, w_fn=None, axis=None)[source]

Coverage cost function to be minimized.

PyBMF.utils.sparse_utils module

PyBMF.utils.sparse_utils.bool_to_index(x)[source]
PyBMF.utils.sparse_utils.check_sparse(X, sparse=None)[source]
PyBMF.utils.sparse_utils.index_to_bool(x)[source]
PyBMF.utils.sparse_utils.sparse_indexing(X, indices)[source]
PyBMF.utils.sparse_utils.to_dense(X, squeeze=False, keep_nan=False)[source]

Convert to dense array

PyBMF.utils.sparse_utils.to_sparse(X, type='csr')[source]

Convert to sparse matrix.

Guide for choosing sparsity types: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.lil_matrix.html

PyBMF.utils.sparse_utils.to_triplet(X)[source]

Convert a dense or sparse matrix to a UIR triplet

Module contents