PyBMF.utils package¶
Submodules¶
PyBMF.utils.boolean_utils module¶
- PyBMF.utils.boolean_utils.add(X, Y, sparse=None, boolean=False)[source]¶
Matrix-matrix addition for both dense and sparse input with Boolean logic support.
Also support regular matrix-const addition.
- Parameters:
X (ndarray, spmatrix, int, float)
Y (ndarray, spmatrix, int, float)
sparse (bool, default: False)
boolean (bool, default: False)
- PyBMF.utils.boolean_utils.dot(u, v, boolean=False)[source]¶
Vector-vector inner product for both dense and sparse input with Boolean logic support.
- Parameters:
U (ndarray, spmatrix)
V (ndarray, spmatrix)
boolean (bool, default: False)
- PyBMF.utils.boolean_utils.ismat(X)[source]¶
Whether X is a matrix or not.
`{attention} Note that ``np.matrixis NOT a supported matrix type in PyBMF. ```
- PyBMF.utils.boolean_utils.matmul(U, V, sparse=None, boolean=False)[source]¶
Matrix-matrix multiplication for both dense and sparse input with Boolean logic support.
- Parameters:
U (ndarray, spmatrix)
V (ndarray, spmatrix)
sparse (bool, default: None)
boolean (bool, default: False)
- PyBMF.utils.boolean_utils.multiply(U, V, sparse=None, boolean=False)[source]¶
Point-wise multiplication for both dense and sparse input with Boolean logic support.
For vector-vector or matrix-matrix Hadamard product. Also support regular const-vector and const-matrix product.
- Parameters:
X (ndarray, spmatrix, int, float)
Y (ndarray, spmatrix, int, float)
sparse (bool, default: None) – Whether to enforce a sparse output. If None, keep the same dtype as input.
boolean (bool, default: False) – Whether to use Boolean logic on the binary input.
- PyBMF.utils.boolean_utils.subtract(X, Y, sparse=False, boolean=False)[source]¶
Matrix-matrix subtraction for both dense and sparse input with Boolean logic support.
Also support regular matrix-const subtraction.
- Parameters:
X (ndarray, spmatrix, int, float)
Y (ndarray, spmatrix, int, float)
sparse (bool, default: False)
boolean (bool, default: False)
PyBMF.utils.collective_display_utils module¶
- PyBMF.utils.collective_display_utils.get_settings(Xs, factors, Us=None)[source]¶
Get display settings.
Used in the show_matrix() wrapper for CMF models.
- Parameters:
Xs (list of spmatrix or ndarray)
factors (list of int list)
Us (list of spmatrix or ndarray, optional)
a (factor id) – note that factor id may not be equal to the index in Us and factor_info, especially when the factor id does not start from 0. split_factor_list only accepts the compete factors list.
b (factor id) – note that factor id may not be equal to the index in Us and factor_info, especially when the factor id does not start from 0. split_factor_list only accepts the compete factors list.
f (factor id) – note that factor id may not be equal to the index in Us and factor_info, especially when the factor id does not start from 0. split_factor_list only accepts the compete factors list.
PyBMF.utils.collective_evaluate_utils module¶
- PyBMF.utils.collective_evaluate_utils.collective_cover(gt, pd, w, axis, starts=None)[source]¶
The collective wrapper for cover function.
- Parameters:
gt (spmatrix) – The concatenated ground-truth matrix.
pd (spmatrix) – The concatenated predicted matrix.
w (list of float) – The allowed ratio of false positives in each matrices.
axis (int in {0, 1}, default: None) – The dimension of the basis.
starts (list of int) – The starting point of each matrix on the other dimension rather than the dimension of the basis.
- Returns:
scores – The scores of each basis over each submatrix.
- Return type:
(n_submat, n_basis) array
PyBMF.utils.collective_transform_utils module¶
- PyBMF.utils.collective_transform_utils.concat_Us_into_U(Us, factors)[source]¶
Concatenate factors of collective matrices Us into a single pair of factors U.
Used in some collective models.
- PyBMF.utils.collective_transform_utils.concat_Xs_into_X(Xs, factors)[source]¶
Concatenate collective matrices Xs into a single matrix X.
Used in BaseData and some collective models.
- PyBMF.utils.collective_transform_utils.concat_factor_info(factor_info, factors)[source]¶
Concatenate factor_info of collective matrices Xs into a length-2 factor_info of a single matrix X.
PyBMF.utils.collective_utils module¶
- PyBMF.utils.collective_utils.get_dummy_factor_info(Xs, factors)[source]¶
Get dummy factor_info for collective matrices.
- PyBMF.utils.collective_utils.get_factor_list(factors)[source]¶
Get sorted factor list.
- Parameters:
factors (list of int list) – List of factor id pairs, indicating the row and column factors of each matrix. Please follow the convention that factors are numbered consecutively and starting from 0. There must exist a matrix with its factors numbered as [0, 1].
- Returns:
factor_list – List of sorted factor ids.
- Return type:
list
- PyBMF.utils.collective_utils.get_factor_starts(Xs, factors)[source]¶
The starting point of each factor when multiple factors Us are concatenated into a pair of row and column factor U.
- PyBMF.utils.collective_utils.get_matrices(factors)[source]¶
List of related matrices given factors.
This is the reversion of ‘factors’, the list of related factors given matrices.
- PyBMF.utils.collective_utils.split_factor_list(factors)[source]¶
Classify factors into row and column factors.
Please follow the convention that factors are numbered consecutively and starting from 0. There must exist a matrix with its factors numbered as [0, 1]. Factor 0 and those on the same side as 0 are regraded as row factors. Factor 1 and those on the same side as 1 are regraded as column factors.
List f stores the type of each factor with 0 for unclassified, 1 for row factor and 2 for column factor.
PyBMF.utils.common module¶
- PyBMF.utils.common.binarize(X, threshold=0.5)[source]¶
To binarize a matrix. Also known as Heaviside step function.
- Parameters:
X (float ndarray, spmatrix)
threshold (float, default: 0.5)
- Returns:
result
- Return type:
int ndarray, spmatrix
- PyBMF.utils.common.get_prediction(U, V, boolean=True, sparse=True)[source]¶
Get prediction.
- Parameters:
U (array, spmatrix)
V (array, spmatrix)
boolean (bool) – Whether to apply Boolean multiplication.
- PyBMF.utils.common.get_prediction_with_threshold(U, V, u=None, v=None, us=None, vs=None, sparse=True)[source]¶
Get prediction after thresholding factors U and V.
- Parameters:
U (ndarray, spmatrix) – The factor matrix.
V (ndarray, spmatrix) – The factor matrix.
u (float) – The shared threshold across all factors for
U.v (float) – The shared threshold across all factors for
V.us (list of k floats) – The individual thresholds for each factor in
U.vs (list of k floats) – The individual thresholds for each factor in
V.
- Returns:
X_pd – The prediction matrix.
- Return type:
ndarray, spmatrix
- PyBMF.utils.common.get_rng(seed, rng)[source]¶
Get random number generator.
- Parameters:
seed (optional) – Random seed.
rng (optional) – Random number generator.
- PyBMF.utils.common.safe_indexing(X, indices)[source]¶
Return items or rows from X using indices
Allows simple indexing of lists or arrays. Modified from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/__init__.py
- Parameters:
X (array-like, sparse-matrix, list, pandas.DataFrame, pandas.Series.) – Data from which to sample rows or items.
indices (array-like of int) – Indices according to which X will be subsampled.
- Returns:
Subset of X on first axis
- Return type:
subset
PyBMF.utils.data_utils module¶
- PyBMF.utils.data_utils.mean(X, axis=None)[source]¶
Row and column-wise mean.
- Parameters:
X (ndarray, spmatrix)
axis (int, optional)
- Returns:
result
- Return type:
tuple, ndarray
- PyBMF.utils.data_utils.median(X, axis=None)[source]¶
Row and column-wise median.
- Parameters:
X (ndarray, spmatrix)
axis (int, optional)
- Returns:
result
- Return type:
tuple, ndarray
- PyBMF.utils.data_utils.sample(X, axis, factor_info=None, idx=None, n_samples=None, seed=None)[source]¶
Sample a matrix by its row or column. Update factor_info if provided.
- axisint
which dimension to down-sample. 0, sample rows. 1, sample columns.
- factor_infolist, tuple
factor_info for single matrix X or collective matrices Xs. for X, factor_info is a list of tuples. for Xs, factor_info is a tuple.
- idx:
the indices to sample with.
- n_samples:
randomly down-sample to this length.
- seed:
seed for down-sampling.
- PyBMF.utils.data_utils.sort_order(order)[source]¶
Fix the gap after down-sampling.
E.g. [1, 6, 4, 2] will be turned into [0, 3, 2, 1].
PyBMF.utils.dataframe_utils module¶
- PyBMF.utils.dataframe_utils._make_html(file_path, file_name, html)[source]¶
Make a html file.
- Parameters:
file_path (str) – Path to save the html file.
file_name (str) – Name of the html file.
html (str) – HTML code.
- Returns:
full_path – Full path of the html file.
- Return type:
str
- PyBMF.utils.dataframe_utils._make_name(model=None, model_name=None, format='%Y-%m-%d %H-%M-%S-%f ')[source]¶
Make a file name for an instance of a model.
Milliseconds are added to the end of the name to make it unique.
- Parameters:
model (object) – Model object.
model_name (str) – Name of the model.
format (str) – Format of the timestamp.
- Returns:
model_name – Name of the model.
- Return type:
str
- PyBMF.utils.dataframe_utils._open_html(full_path, browser_path)[source]¶
Open a html file in browser.
- Parameters:
full_path (str) – Full path of the html file.
browser_path (str) – Path of the browser.
- PyBMF.utils.dataframe_utils.get_config(key)[source]¶
Get config value from settings.ini.
- Parameters:
key (str) – Key in settings.ini.
- Returns:
value – Value in settings.ini.
- Return type:
str
- PyBMF.utils.dataframe_utils.log2html(df, log_name, open_browser=True, log_path=None, browser_path=None)[source]¶
Display and save a dataframe in HTML, and open it in browser if needed.
Please create settings.ini or set
file_pathandbrowser_pathmanually before calling.- Parameters:
df (pandas.DataFrame) – Dataframe to be displayed in HTML.
log_name (str) – Name of the log file.
open_browser (bool) – Whether to open in browser.
log_path (str) – Path to save the log file.
browser_path (str) – Path of the browser.
- PyBMF.utils.dataframe_utils.log2latex(df, log_name, open_browser=True, log_path=None, browser_path=None)[source]¶
Display a dataframe in TeX on overleaf.com.
This tool automatically highlights the maximum values in each column.
- Parameters:
df (pandas.DataFrame) – Dataframe to be displayed in TeX.
log_name (str) – Name of the log file.
open_browser (bool) – Whether to open in browser.
log_path (str) – Path to save the log file.
browser_path (str) – Path of the browser.
PyBMF.utils.decorator_utils module¶
PyBMF.utils.display module¶
- PyBMF.utils.display.fill_nan(X, mask)[source]¶
Fill the missing values of a sparse matrix with NaN, so that missing values in a sparse matrix are displayed differently from zeros.
Used for displaying matrices while identifying missing values.
- Parameters:
X (ndarray or spmatrix) – The matrix with values to be filled with NaN.
mask (spamtrix) – The masking matrix. Explicit zeros in mask are not considered as missing. Note that there are several ways to preserve zeros in a sparse matrix. BaseSplit.load_neg_data() is one fot them.
- Returns:
Y – The dense matrix with NaN in it.
- Return type:
ndarray
- PyBMF.utils.display.get_size_inches(scaling, ppi, hds, pixels, width_cells, height_cells)[source]¶
Get figure size in inches.
- Parameters:
width_cells (int) – Figure width in the number of matrix cells.
height_cells (int) – Figure height in the number of matrix cells.
- Returns:
width_inches (float) – Figure width in inches.
height_inches (float) – Figure height in inches.
- PyBMF.utils.display.show_factor_distribution(U, V, resolution=100, show_hist=False, show_minmax=True, remove_below=None, us=None, vs=None)[source]¶
Show the distribution of real-valued factor matrices U and V.
- PyBMF.utils.display.show_matrix(settings, scaling=1.0, ppi=96, hds=1.5, pixels=None, title=None, fontsize=8, keep_nan=True, colorbar=False, clim=None, discrete=False, center=True, cmap='rainbow', cmin='gray', cmax='black', cnan='white', save_fig=True)[source]¶
Show the matrix and factors.
- Parameters:
settings (list of tuple) – A list of (data, location, title) tuple.
scaling (float, default: 1.0) – The scaling factor. The default scaling is 1.0, the maximum size a figure can be displayed within the screen.
ppi (int, default: 96) – Pixels per inch. The ppi of a 4K 24” display is 96.
hds (float, default: 1.5) – High DPI scaling, if your Python IDE supports this. The default hds in Spyder is 1.5.
pixels (int, optional) – Each cell in a matrix takes up pivels * pixels on screen. This will overwrite scaling.
title (string, optional) – The centered suptitle of the figure.
fontsize (int, default: 8) – Size of the title and subtitles.
colorbar (bool, default: False) – Show colorbar.
clim (list, optional) – Colorbar range limit applied over all matrices. If clim is None, each matrix will have its own colorbar range limit separately.
discrete (bool, default: False) – Show discrete colorbar.
center (bool, default: True) – Available only when discrete is True.
cmap (str, default: 'rainbow') – The colormap.
cmin (str, default: 'gray') – The color of values lower than the range limit clim.
cmax (str, default: 'black') – The color of values higher than the range limit clim.
cnan (str, default: 'white') – The color of NaN. To differentiate real zeros and NaN in sparse marices.
save_fig (bool, default: True) – Save the figure to output directory.
PyBMF.utils.evaluate_utils module¶
- PyBMF.utils.evaluate_utils.eval(metrics, task, X_gt, X_pd=None, U=None, V=None)[source]¶
Evaluate with given metrics.
- Parameters:
X_gt (ndarray or spmatrix)
X_pd (ndarray or spmatrix, optional)
U (spmatrix, optional)
V (spmatrix, optional)
metrics (list of str) – List of metric names.
task (str in {'prediction', 'reconstruction'}) – If task == ‘prediction’, it ignores the missing values and only use the triplet from the spmatrix. The triplet may contain zeros, depending on whether negative sampling has been used. If task == ‘reconstruction’, it uses the whole matrix, which considers all missing values as zeros in spmatrix.
PyBMF.utils.experiment_utils module¶
PyBMF.utils.generator_utils module¶
- PyBMF.utils.generator_utils.add_noise(X, noise, seed=None, rng=None)[source]¶
Add noise to a matrix.
- Parameters:
X (ndarray, spmatrix)
noise (list of 2 float in [0, 1]) – Probabilities for false negative (p_pos) and false positive (p_neg).
seed (optional) – Random seed.
rng (optional) – Random number generator.
- PyBMF.utils.generator_utils.reverse_index(idx)[source]¶
Reverse index
Example
idx = np.array([0, 1, 2, 4, 5, 3]) inv = reverse_index(idx) # inv = [0, 1, 2, 5, 3, 4]
PyBMF.utils.metrics module¶
- PyBMF.utils.metrics.F1(gt, pd, axis=None)[source]¶
F1 score.
tp = TP(gt, pd, axis) fp = FP(gt, pd, axis) fn = FN(gt, pd, axis) return 2 * tp / (2 * tp + fp + fn)
- PyBMF.utils.metrics.TPR(gt, pd, axis=None)[source]¶
sensitivity, recall, hit rate, or true positive rate
- PyBMF.utils.metrics.coverage_score(gt, pd, w_fp=0.5, w_fn=None, axis=None)[source]¶
Covergage score function to be maximized.
Measure the coverage of X using Y.
- Parameters:
axis (int in {0, 1}, default: None) – The dimension to which the basis belongs. When axis is None, return the overall coverage score. When axis is 0, the basis is at dimension 0, thus return the column-wise coverage scores.
- PyBMF.utils.metrics.description_length(gt, U, V, pd=None, w_model=1.0, w_fp=1.0, w_fn=1.0)[source]¶
The vanilla description length function.
Will compute X_pd from U and V if pd is None.
- PyBMF.utils.metrics.get_metrics(gt, pd, metrics, axis=None)[source]¶
Get results of the metrics all at once.
Metrics from sklearn.metrics are included as sanity check. Their input must be binary array, which makes them slow and less flexible.
- Parameters:
gt (array, spmatrix) – Ground truth, can be 1d array, 2d dense or sparse matrix.
pd (array, spmatrix) – Prediction, can be 1d array, 2d dense or sparse matrix. When the input are matrices, row and column-wise measurement can be conducted by defining axis.
metrics (list of str) – The name of metrics.
axis (int in {0, 1}) – When axis == 0, The result containing the column-wise measurement has the same length as columns.
- Returns:
results
- Return type:
list
PyBMF.utils.sparse_utils module¶
- PyBMF.utils.sparse_utils.to_sparse(X, type='csr')[source]¶
Convert to sparse matrix.
Guide for choosing sparsity types: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.lil_matrix.html