PyBMF.utils package¶

Submodules¶

PyBMF.utils.boolean_utils module¶

PyBMF.utils.boolean_utils.add(X, Y, sparse=None, boolean=False)[source]¶

Matrix-matrix addition for both dense and sparse input with Boolean logic support.

Also support regular matrix-const addition.

Parameters:

X (ndarray, spmatrix, int, float)
Y (ndarray, spmatrix, int, float)
sparse (bool, default: False)
boolean (bool, default: False)

PyBMF.utils.boolean_utils.dot(u, v, boolean=False)[source]¶

Vector-vector inner product for both dense and sparse input with Boolean logic support.

Parameters:

U (ndarray, spmatrix)
V (ndarray, spmatrix)
boolean (bool, default: False)

PyBMF.utils.boolean_utils.ismat(X)[source]¶

Whether X is a matrix or not.

`{attention} Note that ``np.matrix is NOT a supported matrix type in PyBMF. ```

PyBMF.utils.boolean_utils.isnum(X)[source]¶: Whether X is a number or not.

PyBMF.utils.boolean_utils.matmul(U, V, sparse=None, boolean=False)[source]¶

Matrix-matrix multiplication for both dense and sparse input with Boolean logic support.

Parameters:

U (ndarray, spmatrix)
V (ndarray, spmatrix)
sparse (bool, default: None)
boolean (bool, default: False)

PyBMF.utils.boolean_utils.multiply(U, V, sparse=None, boolean=False)[source]¶

Point-wise multiplication for both dense and sparse input with Boolean logic support.

For vector-vector or matrix-matrix Hadamard product. Also support regular const-vector and const-matrix product.

Parameters:

X (ndarray, spmatrix, int, float)
Y (ndarray, spmatrix, int, float)
sparse (bool, default: None) – Whether to enforce a sparse output. If None, keep the same dtype as input.
boolean (bool, default: False) – Whether to use Boolean logic on the binary input.

PyBMF.utils.boolean_utils.power(X, n)[source]¶: Matrix power for both dense and sparse input.

PyBMF.utils.boolean_utils.subtract(X, Y, sparse=False, boolean=False)[source]¶

Matrix-matrix subtraction for both dense and sparse input with Boolean logic support.

Also support regular matrix-const subtraction.

Parameters:

X (ndarray, spmatrix, int, float)
Y (ndarray, spmatrix, int, float)
sparse (bool, default: False)
boolean (bool, default: False)

PyBMF.utils.collective_display_utils module¶

PyBMF.utils.collective_display_utils.get_settings(Xs, factors, Us=None)[source]¶

Get display settings.

Used in the show_matrix() wrapper for CMF models.

Parameters:

Xs (list of spmatrix or ndarray)
factors (list of int list)
Us (list of spmatrix or ndarray, optional)
a (factor id) – note that factor id may not be equal to the index in Us and factor_info, especially when the factor id does not start from 0. split_factor_list only accepts the compete factors list.
b (factor id) – note that factor id may not be equal to the index in Us and factor_info, especially when the factor id does not start from 0. split_factor_list only accepts the compete factors list.
f (factor id) – note that factor id may not be equal to the index in Us and factor_info, especially when the factor id does not start from 0. split_factor_list only accepts the compete factors list.

PyBMF.utils.collective_display_utils.sort_matrices(Xs, factors)[source]¶

Sort out matrices.

Transpose the matrices when necessary and return the positions.

PyBMF.utils.collective_evaluate_utils module¶

PyBMF.utils.collective_evaluate_utils.collective_cover(gt, pd, w, axis, starts=None)[source]¶

The collective wrapper for cover function.

Parameters:

gt (spmatrix) – The concatenated ground-truth matrix.
pd (spmatrix) – The concatenated predicted matrix.
w (list of float) – The allowed ratio of false positives in each matrices.
axis (int in {0, 1}, default: None) – The dimension of the basis.
starts (list of int) – The starting point of each matrix on the other dimension rather than the dimension of the basis.

Returns:

scores – The scores of each basis over each submatrix.

Return type:

(n_submat, n_basis) array

PyBMF.utils.collective_evaluate_utils.harmonic_score(scores)[source]¶

Harmonic score(s) of n sets of scores.

Parameters:: scores ((n, k) array)
Returns:: s
Return type:: float or (1, k) array

PyBMF.utils.collective_evaluate_utils.weighted_score(scores, weights)[source]¶

Weighted score(s) of n sets of scores.

Parameters:

scores ((n, k) array)
weights ((1, n) array)

Returns:

Return type:

float or (1, k) array

PyBMF.utils.collective_transform_utils module¶

PyBMF.utils.collective_transform_utils.concat_Us_into_U(Us, factors)[source]¶

Concatenate factors of collective matrices Us into a single pair of factors U.

Used in some collective models.

PyBMF.utils.collective_transform_utils.concat_Xs_into_X(Xs, factors)[source]¶

Concatenate collective matrices Xs into a single matrix X.

Used in BaseData and some collective models.

PyBMF.utils.collective_transform_utils.concat_factor_info(factor_info, factors)[source]¶: Concatenate factor_info of collective matrices Xs into a length-2 factor_info of a single matrix X.

PyBMF.utils.collective_transform_utils.split_U_into_Us(U, V, factors, factor_starts)[source]¶

Seperate concatenated factors (U, V) into collective factors Us.

Used in some collective models.

PyBMF.utils.collective_transform_utils.split_X_into_Xs(X, factors, factor_starts)[source]¶

Split concatenated single matrix X into collective matrices Xs.

Used in some collective models.

PyBMF.utils.collective_utils module¶

PyBMF.utils.collective_utils.get_dummy_factor_info(Xs, factors)[source]¶: Get dummy factor_info for collective matrices.

PyBMF.utils.collective_utils.get_factor_dims(Xs, factors)[source]¶: The dimensions of each factor.

PyBMF.utils.collective_utils.get_factor_list(factors)[source]¶

Get sorted factor list.

Parameters:: factors (list of int list) – List of factor id pairs, indicating the row and column factors of each matrix. Please follow the convention that factors are numbered consecutively and starting from 0. There must exist a matrix with its factors numbered as [0, 1].
Returns:: factor_list – List of sorted factor ids.
Return type:: list

PyBMF.utils.collective_utils.get_factor_starts(Xs, factors)[source]¶: The starting point of each factor when multiple factors Us are concatenated into a pair of row and column factor U.

PyBMF.utils.collective_utils.get_matrices(factors)[source]¶

List of related matrices given factors.

This is the reversion of ‘factors’, the list of related factors given matrices.

PyBMF.utils.collective_utils.split_factor_list(factors)[source]¶

Classify factors into row and column factors.

Please follow the convention that factors are numbered consecutively and starting from 0. There must exist a matrix with its factors numbered as [0, 1]. Factor 0 and those on the same side as 0 are regraded as row factors. Factor 1 and those on the same side as 1 are regraded as column factors.

List f stores the type of each factor with 0 for unclassified, 1 for row factor and 2 for column factor.

PyBMF.utils.common module¶

PyBMF.utils.common.binarize(X, threshold=0.5)[source]¶

To binarize a matrix. Also known as Heaviside step function.

Parameters:

X (float ndarray, spmatrix)
threshold (float, default: 0.5)

Returns:

result

Return type:

int ndarray, spmatrix

PyBMF.utils.common.d_sigmoid(X)[source]¶

PyBMF.utils.common.get_prediction(U, V, boolean=True, sparse=True)[source]¶

Get prediction.

Parameters:

U (array, spmatrix)
V (array, spmatrix)
boolean (bool) – Whether to apply Boolean multiplication.

PyBMF.utils.common.get_prediction_with_threshold(U, V, u=None, v=None, us=None, vs=None, sparse=True)[source]¶

Get prediction after thresholding factors U and V.

Parameters:

U (ndarray, spmatrix) – The factor matrix.
V (ndarray, spmatrix) – The factor matrix.
u (float) – The shared threshold across all factors for U.
v (float) – The shared threshold across all factors for V.
us (list of k floats) – The individual thresholds for each factor in U.
vs (list of k floats) – The individual thresholds for each factor in V.

Returns:

X_pd – The prediction matrix.

Return type:

ndarray, spmatrix

PyBMF.utils.common.get_residual(X, U, V)[source]¶: Get residual matrix of X.

PyBMF.utils.common.get_rng(seed, rng)[source]¶

Get random number generator.

Parameters:

seed (optional) – Random seed.
rng (optional) – Random number generator.

PyBMF.utils.common.safe_indexing(X, indices)[source]¶

Return items or rows from X using indices

Allows simple indexing of lists or arrays. Modified from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/__init__.py

Parameters:

X (array-like, sparse-matrix, list, pandas.DataFrame, pandas.Series.) – Data from which to sample rows or items.
indices (array-like of int) – Indices according to which X will be subsampled.

Returns:

Subset of X on first axis

Return type:

subset

PyBMF.utils.common.to_interval(X, min, max)[source]¶

Transform data into interval [min, max].

Parameters:

X (ndarray)
min (float)
max (float)
TODO (to support spmatrix)

PyBMF.utils.data_utils module¶

PyBMF.utils.data_utils.mean(X, axis=None)[source]¶

Row and column-wise mean.

Parameters:

X (ndarray, spmatrix)
axis (int, optional)

Returns:

result

Return type:

tuple, ndarray

PyBMF.utils.data_utils.median(X, axis=None)[source]¶

Row and column-wise median.

Parameters:

X (ndarray, spmatrix)
axis (int, optional)

Returns:

result

Return type:

tuple, ndarray

PyBMF.utils.data_utils.sample(X, axis, factor_info=None, idx=None, n_samples=None, seed=None)[source]¶

Sample a matrix by its row or column. Update factor_info if provided.

axisint: which dimension to down-sample. 0, sample rows. 1, sample columns.
factor_infolist, tuple: factor_info for single matrix X or collective matrices Xs. for X, factor_info is a list of tuples. for Xs, factor_info is a tuple.
idx:: the indices to sample with.
n_samples:: randomly down-sample to this length.
seed:: seed for down-sampling.

PyBMF.utils.data_utils.sort_order(order)[source]¶

Fix the gap after down-sampling.

E.g. [1, 6, 4, 2] will be turned into [0, 3, 2, 1].

PyBMF.utils.data_utils.sum(X, axis=None)[source]¶

Row and column-wise sum.

Parameters:

X (ndarray, spmatrix)
axis (int, optional)

Returns:

result

Return type:

tuple, ndarray

PyBMF.utils.data_utils.summarize(X)[source]¶

To show the summary of a matrix.

Parameters:: X (ndarray, spmatrix)

PyBMF.utils.dataframe_utils module¶

PyBMF.utils.dataframe_utils._make_html(file_path, file_name, html)[source]¶

Make a html file.

Parameters:

file_path (str) – Path to save the html file.
file_name (str) – Name of the html file.
html (str) – HTML code.

Returns:

full_path – Full path of the html file.

Return type:

str

PyBMF.utils.dataframe_utils._make_name(model=None, model_name=None, format='%Y-%m-%d %H-%M-%S-%f ')[source]¶

Make a file name for an instance of a model.

Milliseconds are added to the end of the name to make it unique.

Parameters:

model (object) – Model object.
model_name (str) – Name of the model.
format (str) – Format of the timestamp.

Returns:

model_name – Name of the model.

Return type:

str

PyBMF.utils.dataframe_utils._open_html(full_path, browser_path)[source]¶

Open a html file in browser.

Parameters:

full_path (str) – Full path of the html file.
browser_path (str) – Path of the browser.

PyBMF.utils.dataframe_utils.get_config(key)[source]¶

Get config value from settings.ini.

Parameters:: key (str) – Key in settings.ini.
Returns:: value – Value in settings.ini.
Return type:: str

PyBMF.utils.dataframe_utils.log2html(df, log_name, open_browser=True, log_path=None, browser_path=None)[source]¶

Display and save a dataframe in HTML, and open it in browser if needed.

Please create settings.ini or set file_path and browser_path manually before calling.

Parameters:

df (pandas.DataFrame) – Dataframe to be displayed in HTML.
log_name (str) – Name of the log file.
open_browser (bool) – Whether to open in browser.
log_path (str) – Path to save the log file.
browser_path (str) – Path of the browser.

PyBMF.utils.dataframe_utils.log2latex(df, log_name, open_browser=True, log_path=None, browser_path=None)[source]¶

Display a dataframe in TeX on overleaf.com.

This tool automatically highlights the maximum values in each column.

Parameters:

df (pandas.DataFrame) – Dataframe to be displayed in TeX.
log_name (str) – Name of the log file.
open_browser (bool) – Whether to open in browser.
log_path (str) – Path to save the log file.
browser_path (str) – Path of the browser.

PyBMF.utils.decorator_utils module¶

PyBMF.utils.decorator_utils.ignore_warnings(func)[source]¶

PyBMF.utils.decorator_utils.timeit(func)[source]¶

PyBMF.utils.display module¶

PyBMF.utils.display.fill_nan(X, mask)[source]¶

Fill the missing values of a sparse matrix with NaN, so that missing values in a sparse matrix are displayed differently from zeros.

Used for displaying matrices while identifying missing values.

Parameters:

X (ndarray or spmatrix) – The matrix with values to be filled with NaN.
mask (spamtrix) – The masking matrix. Explicit zeros in mask are not considered as missing. Note that there are several ways to preserve zeros in a sparse matrix. BaseSplit.load_neg_data() is one fot them.

Returns:

Y – The dense matrix with NaN in it.

Return type:

ndarray

PyBMF.utils.display.get_size_inches(scaling, ppi, hds, pixels, width_cells, height_cells)[source]¶

Get figure size in inches.

Parameters:

width_cells (int) – Figure width in the number of matrix cells.
height_cells (int) – Figure height in the number of matrix cells.

Returns:

width_inches (float) – Figure width in inches.
height_inches (float) – Figure height in inches.

PyBMF.utils.display.show_factor_distribution(U, V, resolution=100, show_hist=False, show_minmax=True, remove_below=None, us=None, vs=None)[source]¶: Show the distribution of real-valued factor matrices U and V.

PyBMF.utils.display.show_matrix(settings, scaling=1.0, ppi=96, hds=1.5, pixels=None, title=None, fontsize=8, keep_nan=True, colorbar=False, clim=None, discrete=False, center=True, cmap='rainbow', cmin='gray', cmax='black', cnan='white', save_fig=True)[source]¶

Show the matrix and factors.

Parameters:

settings (list of tuple) – A list of (data, location, title) tuple.
scaling (float, default: 1.0) – The scaling factor. The default scaling is 1.0, the maximum size a figure can be displayed within the screen.
ppi (int, default: 96) – Pixels per inch. The ppi of a 4K 24” display is 96.
hds (float, default: 1.5) – High DPI scaling, if your Python IDE supports this. The default hds in Spyder is 1.5.
pixels (int, optional) – Each cell in a matrix takes up pivels * pixels on screen. This will overwrite scaling.
title (string, optional) – The centered suptitle of the figure.
fontsize (int, default: 8) – Size of the title and subtitles.
colorbar (bool, default: False) – Show colorbar.
clim (list, optional) – Colorbar range limit applied over all matrices. If clim is None, each matrix will have its own colorbar range limit separately.
discrete (bool, default: False) – Show discrete colorbar.
center (bool, default: True) – Available only when discrete is True.
cmap (str, default: 'rainbow') – The colormap.
cmin (str, default: 'gray') – The color of values lower than the range limit clim.
cmax (str, default: 'black') – The color of values higher than the range limit clim.
cnan (str, default: 'white') – The color of NaN. To differentiate real zeros and NaN in sparse marices.
save_fig (bool, default: True) – Save the figure to output directory.

PyBMF.utils.evaluate_utils module¶

PyBMF.utils.evaluate_utils.eval(metrics, task, X_gt, X_pd=None, U=None, V=None)[source]¶

Evaluate with given metrics.

Parameters:

X_gt (ndarray or spmatrix)
X_pd (ndarray or spmatrix, optional)
U (spmatrix, optional)
V (spmatrix, optional)
metrics (list of str) – List of metric names.
task (str in {'prediction', 'reconstruction'}) – If task == ‘prediction’, it ignores the missing values and only use the triplet from the spmatrix. The triplet may contain zeros, depending on whether negative sampling has been used. If task == ‘reconstruction’, it uses the whole matrix, which considers all missing values as zeros in spmatrix.

PyBMF.utils.evaluate_utils.header(names, levels, depth=None)[source]¶

Create multi-level headers.

>>> header(['time', 'k', 'score'], levels=3, depth=2)
>>> [('', 'time', ''), ('', 'k', ''), ('', 'score', '')]

PyBMF.utils.evaluate_utils.record(df_dict, df_name, columns, records, verbose=False)[source]¶

Create and add records to a dataframe in a logs dict.

Parameters:

df_dict (dict)
df_name (str)
columns (list of str or str tuple)
records (list)
verbose (bool, default: False)
caption (str, optional)

PyBMF.utils.experiment_utils module¶

PyBMF.utils.experiment_utils.get_model_by_path(path_list)[source]¶

PyBMF.utils.experiment_utils.get_model_by_time(root='../saved_models/', model_name='Asso', time_start='24-06-03 04:43', time_end='24-06-03 07:00')[source]¶

PyBMF.utils.generator_utils module¶

PyBMF.utils.generator_utils.add_noise(X, noise, seed=None, rng=None)[source]¶

Add noise to a matrix.

Parameters:

X (ndarray, spmatrix)
noise (list of 2 float in [0, 1]) – Probabilities for false negative (p_pos) and false positive (p_neg).
seed (optional) – Random seed.
rng (optional) – Random number generator.

PyBMF.utils.generator_utils.reverse_index(idx)[source]¶

Reverse index

Example

idx = np.array([0, 1, 2, 4, 5, 3]) inv = reverse_index(idx) # inv = [0, 1, 2, 5, 3, 4]

PyBMF.utils.generator_utils.shuffle_by_dim(X, dim, seed=None, rng=None)[source]¶

Shuffle a matrix by dimension

dim:: 0, shuffle by rows 1, shuffle by columns
same as:: np.take(X, idx, axis=dim, out=X)

PyBMF.utils.generator_utils.shuffle_matrix(X, seed=None, rng=None)[source]¶: Shuffle a matrix

PyBMF.utils.metrics module¶

PyBMF.utils.metrics.ACC(gt, pd, axis=None)[source]¶: Accuracy.

PyBMF.utils.metrics.ERR(gt, pd, axis=None)[source]¶: Error rate.

PyBMF.utils.metrics.F1(gt, pd, axis=None)[source]¶

F1 score.

tp = TP(gt, pd, axis) fp = FP(gt, pd, axis) fn = FN(gt, pd, axis) return 2 * tp / (2 * tp + fp + fn)

PyBMF.utils.metrics.FN(gt, pd, axis=None)[source]¶

PyBMF.utils.metrics.FNR(gt, pd, axis=None)[source]¶: miss rate or false negative rate

PyBMF.utils.metrics.FP(gt, pd, axis=None)[source]¶

PyBMF.utils.metrics.FPR(gt, pd, axis=None)[source]¶: fall-out or false positive rate

PyBMF.utils.metrics.MAE(gt, pd, axis=None)[source]¶

PyBMF.utils.metrics.PPV(gt, pd, axis=None)[source]¶: precision or positive predictive value

PyBMF.utils.metrics.RMSE(gt, pd, axis=None)[source]¶

PyBMF.utils.metrics.TN(gt, pd, axis=None)[source]¶

PyBMF.utils.metrics.TNR(gt, pd, axis=None)[source]¶: specificity, selectivity or true negative rate

PyBMF.utils.metrics.TP(gt, pd, axis=None)[source]¶

PyBMF.utils.metrics.TPR(gt, pd, axis=None)[source]¶: sensitivity, recall, hit rate, or true positive rate

PyBMF.utils.metrics.coverage_score(gt, pd, w_fp=0.5, w_fn=None, axis=None)[source]¶

Covergage score function to be maximized.

Measure the coverage of X using Y.

Parameters:: axis (int in {0, 1}, default: None) – The dimension to which the basis belongs. When axis is None, return the overall coverage score. When axis is 0, the basis is at dimension 0, thus return the column-wise coverage scores.

PyBMF.utils.metrics.description_length(gt, U, V, pd=None, w_model=1.0, w_fp=1.0, w_fn=1.0)[source]¶

The vanilla description length function.

Will compute X_pd from U and V if pd is None.

PyBMF.utils.metrics.get_metrics(gt, pd, metrics, axis=None)[source]¶

Get results of the metrics all at once.

Metrics from sklearn.metrics are included as sanity check. Their input must be binary array, which makes them slow and less flexible.

Parameters:

gt (array, spmatrix) – Ground truth, can be 1d array, 2d dense or sparse matrix.
pd (array, spmatrix) – Prediction, can be 1d array, 2d dense or sparse matrix. When the input are matrices, row and column-wise measurement can be conducted by defining axis.
metrics (list of str) – The name of metrics.
axis (int in {0, 1}) – When axis == 0, The result containing the column-wise measurement has the same length as columns.

Returns:

results

Return type:

list

PyBMF.utils.metrics.invert(X)[source]¶

PyBMF.utils.metrics.weighted_error(gt, pd, w_fp=0.5, w_fn=None, axis=None)[source]¶: Coverage cost function to be minimized.

PyBMF.utils.sparse_utils module¶

PyBMF.utils.sparse_utils.bool_to_index(x)[source]¶

PyBMF.utils.sparse_utils.check_sparse(X, sparse=None)[source]¶

PyBMF.utils.sparse_utils.index_to_bool(x)[source]¶

PyBMF.utils.sparse_utils.sparse_indexing(X, indices)[source]¶

PyBMF.utils.sparse_utils.to_dense(X, squeeze=False, keep_nan=False)[source]¶: Convert to dense array

PyBMF.utils.sparse_utils.to_sparse(X, type='csr')[source]¶

Convert to sparse matrix.

Guide for choosing sparsity types: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.lil_matrix.html

PyBMF.utils.sparse_utils.to_triplet(X)[source]¶: Convert a dense or sparse matrix to a UIR triplet

PyBMF.utils package¶

Submodules¶

PyBMF.utils.boolean_utils module¶

PyBMF.utils.collective_display_utils module¶

PyBMF.utils.collective_evaluate_utils module¶

PyBMF.utils.collective_transform_utils module¶

PyBMF.utils.collective_utils module¶

PyBMF.utils.common module¶

PyBMF.utils.data_utils module¶

PyBMF.utils.dataframe_utils module¶

PyBMF.utils.decorator_utils module¶

PyBMF.utils.display module¶

PyBMF.utils.evaluate_utils module¶

PyBMF.utils.experiment_utils module¶

PyBMF.utils.generator_utils module¶

PyBMF.utils.metrics module¶

PyBMF.utils.sparse_utils module¶

Module contents¶