PyBMF.models package

Submodules

PyBMF.models.Asso module

class PyBMF.models.Asso.Asso(tau, k=None, tol=0, w_fp=0.5, w_fn=None)[source]

Bases: BaseModel

The Asso algorithm.

Parameters:
  • k (int, optional) – The target rank. If None, it will factorize until the error is smaller than tol, or when other stopping criteria is met.

  • tol (float, default: 0) – The error tolerance.

  • tau (float) – The binarization threshold when building basis. Can be determined via model selection techniques.

  • w_fp (float) – The penalty weights for FP.

  • w_fn (float, optional, default: None) – The penalty weights for FN. If w_fn is None, it will be treated as 1 - w_fp.

_fit()[source]

The main procedure of fitting.

fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model.

init_model()[source]

Initialize the model.

PyBMF.models.Asso.build_assoc(X, dim)[source]

Build the real-valued association matrix.

Parameters:
  • X (ndarray, spmatrix) – The data matrix.

  • dim (int) – The dimension which basis belongs to. If dim == 0, basis is treated as a column vector and vector as a row vector.

Returns:

assoc – The association matrix.

Return type:

spmatrix

PyBMF.models.Asso.build_basis(assoc, tau)[source]

Get the binary-valued basis candidates.

Parameters:
  • assoc (spmatrix) – The association matrix.

  • tau (float) – The threshold for the association matrix.

Returns:

basis – The binary-valued basis candidates.

Return type:

spmatrix

PyBMF.models.Asso.get_vector(X_gt, X_old, s_old, basis, basis_dim, w_fp, w_fn)[source]

Return the optimal column/row vector given a row/column basis candidate.

Parameters:
  • X_gt (spmatrix) – The ground-truth matrix.

  • X_old (spmatrix) – The prediction matrix before adding the current pattern.

  • s_old (array) – The column/row-wise coverage scores of previous prediction X_pd.

  • basis ((1, n) spmatrix) – The basis vector.

  • basis_dim (int) – The dimension to which basis belongs. If basis_dim == 0, a pattern is considered basis.T * vector. Otherwise, it’s considered vector.T * basis. Note that both basis and vector are row vectors.

  • w_fp (float) – The penalty weights for false positives and false negatives.

  • w_fn (float) – The penalty weights for false positives and false negatives.

Returns:

  • score (float) – The coverage score.

  • vector ((1, n) spmatrix) – The matched vector.

PyBMF.models.AssoIter module

class PyBMF.models.AssoIter.AssoIter(model, w_fp=0.5, w_fn=None)[source]

Bases: Asso

The Asso algorithm with iterative search over each column of U.

_fit()[source]

Using iterative search to refine U

In the paper, the algorithm uses cover function with the same weight for coverage and over-coverage as updating criteria, and uses error function as stopping criteria. Changing them may improve the performance.

check_params(**kwargs)[source]

Check and load model parameters and experiment configurations.

Called upon model initialization and fitting.

# include this in your model class:

def __init__(self, k, tol, alpha):
    self.check_params(k=k, tol=tol, alpha=alpha)

def fit(self, X_train, X_val=None, X_test=None, **kwargs):
    self.check_params(**kwargs)

# call them when initializing and fitting:

model = MyModel(k=10, W='mask', alpha=0.1, seed=1997)

model.fit(X_train, X_val, X_test, seed=2024, task='prediction', verbose=False, display=True)
fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model.

get_refined_column(k)[source]

Return the optimal column given i-th basis

The other k-1 basis remains unchanged.

init_model()[source]

Initialize the model.

PyBMF.models.AssoOpt module

class PyBMF.models.AssoOpt.AssoOpt(model, w_fp=1, w_fn=1)[source]

Bases: Asso

The Asso algorithm with exhaustive search over each row of U.

This implementation may be slow, but is able to deal with larger number of factors k or dimension of X_train.

_fit()[source]

Using exhaustive search to refine U.

check_params(**kwargs)[source]

Check and load model parameters and experiment configurations.

Called upon model initialization and fitting.

# include this in your model class:

def __init__(self, k, tol, alpha):
    self.check_params(k=k, tol=tol, alpha=alpha)

def fit(self, X_train, X_val=None, X_test=None, **kwargs):
    self.check_params(**kwargs)

# call them when initializing and fitting:

model = MyModel(k=10, W='mask', alpha=0.1, seed=1997)

model.fit(X_train, X_val, X_test, seed=2024, task='prediction', verbose=False, display=True)
fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model.

init_model()[source]

Initialize the model.

set_optimal_row(i)[source]

Update the i-th row in U.

PyBMF.models.AssoOpt.int2bin(i, bits)[source]

Turn i into (1, bits) binary sparse matrix.

PyBMF.models.BaseModel module

class PyBMF.models.BaseModel.BaseModel(**kwargs)[source]

Bases: BaseModelTools

The base class for all the models.

Initialize the model with parameters.

_evaluate(name, info, metrics)[source]

Evaluate on a given dataset.

Parameters:
  • name (str in ['train', 'val', 'test']) – Which matrix to evaluate.

  • info (dict of list) – The extra information to be recorded.

  • metrics (list of str) – The metrics to be evaluated and recorded.

_fit()[source]

Where the tedious fitting procedure takes place.

check_params(**kwargs)[source]

Check and load model parameters and experiment configurations.

Called upon model initialization and fitting.

# include this in your model class:

def __init__(self, k, tol, alpha):
    self.check_params(k=k, tol=tol, alpha=alpha)

def fit(self, X_train, X_val=None, X_test=None, **kwargs):
    self.check_params(**kwargs)

# call them when initializing and fitting:

model = MyModel(k=10, W='mask', alpha=0.1, seed=1997)

model.fit(X_train, X_val, X_test, seed=2024, task='prediction', verbose=False, display=True)
evaluate(df_name, head_info={}, train_info={}, val_info={}, test_info={}, metrics=['Recall', 'Precision', 'Accuracy', 'F1'], train_metrics=None, val_metrics=None, test_metrics=None, verbose=False)[source]

Evaluate a BMF model on the given train, val and test dataset.

Parameters:
  • df_name (str) – The name of dataframe to record with.

  • head_info (dict) – The names and values of shared information at the head of each record.

  • train_info (dict) – The names and values of external information measured on training data.

  • val_info (dict) – The names and values of external information measured on validation data.

  • test_info (dict) – The names and values of external information measured on testing data.

  • metrics (list of str, default: ['Recall', 'Precision', 'Accuracy', 'F1']) – The metrics to be measured. For metric names check utils.get_metrics.

  • train_metrics (list of str, optional) – The metrics to be measured on training data. Will use metrics instead if it’s None.

  • val_metrics (list of str, optional) – The metrics to be measured on validation data. Will use metrics instead if it’s None.

  • test_metrics (list of str, optional) – The metrics to be measured on testing data. Will use metrics instead if it’s None.

finish(show_logs=True, save_model=True, show_result=True)[source]

Called when the fitting is over.

The default finishing procedure.

Simply overwrite this method if you want to drop or add any parts of the procedures.

You can attach this to the end of fit() or simply call from outside.

fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model to observations, with validation and test if necessary.

For a fitting procedure, implement and append your _fit() and finish().

Simply overwrite this method if you want to drop or add any parts of the procedures.

Parameters:
  • X_train (ndarray) – Training data.

  • X_val (ndarray) – Validation data.

  • X_test (ndarray) – Test data.

  • **kwargs (dict) – Other parameters.

init_model()[source]

Initialize the model.

Called after params are set and datasets are loaded.

Simply overwrite this method if you want to drop or add any parts of the procedures.

load_dataset(X_train, X_val=None, X_test=None)[source]

Load train and validation data.

For matrices that are modified frequently, lil (LIst of List) or coo is preferred.

For matrices that are not getting modified, csr or csc is preferred.

Parameters:
  • X_train (ndarray, spmatrix) – Data for matrix factorization.

  • X_val (ndarray, spmatrix) – Data for model selection.

  • X_test (ndarray, spmatrix) – Data for prediction.

predict_X(U=None, V=None, u=None, v=None, us=None, vs=None, boolean=True)[source]

Update the prediction X_pd.

Parameters:
  • U (ndarray, spmatrix) – Factor matrix.

  • V (ndarray, spmatrix) – Factor matrix.

  • u (float) – The shared threshold for factors in U.

  • v (float) – The shared threshold for factors in V.

  • us (list of k floats) – The thresholds for each factor in U.

  • vs (list of k floats) – The thresholds for each factor in V.

  • boolean (bool) – Whether to apply Boolean multiplication.

show_matrix(settings=None, scaling=None, pixels=None, **kwargs)[source]

The show_matrix() wrapper for BMF models.

If settings is missing, show the factors U, V and X_pd by default.

PyBMF.models.BaseModelTools module

class PyBMF.models.BaseModelTools.BaseModelTools[source]

Bases: object

The helper class for BaseModel.

_early_stop(msg, verbose, k=None)[source]

To deal with early covergence or stop.

Parameters:
  • msg (str) – The message to be displayed.

  • verbose (bool) – Whether to print the message.

  • k (int, optional) – The number of factors obtained.

_init_factors()[source]

Initialize the factors.

_init_logs()[source]

Initialize the logs.

The logs is a dict that holds the records in one place. The types of records include but are not limited to dataframe, ndarray and list.

_make_name()[source]

Make name.

_save_model(path=None, name=None)[source]

Save the model.

Parameters:
  • path (str) – Path to save the model.

  • name (str) – Name of the model.

_show_logs()[source]

Display all the dataframes in self.logs.

_show_result()[source]

Display the prediction.

Make sure self.X_pd is set properly before calling.

For example:

>>> self.X_pd = get_prediction(U=self.U, V=self.V, boolean=True)
_start_timer()[source]

Start timer.

_stop_timer()[source]

Stop timer.

early_stop(error=None, diff=None, n_iter=None, n_factor=None, msg=None, k=None, verbose=True)[source]

Stopping criteria detection and early stop.

Parameters:
  • k (int) – The number of factors to obtain. This will keep the first k columns in self.U and self.V.

  • error (float) – Current error. To be compared with error tolerance self.tol.

  • diff (float) – Current update difference. To be compared with difference threshold self.min_diff.

  • n_iter (int) – Current number of iterations. To be compared with maximum number of iterations self.max_iter.

  • n_factor (int) – Current number of factors. To be compared with maximum number of factors self.k.

  • k – The number of factors to obtain.

  • verbose (bool) – Whether to print the message.

Returns:

is_improving – Whether the fitting should continue or not.

Return type:

bool

extend_factors(k)[source]

Increase the number of factors to k (k = 1, 2, …).

Parameters:

k (int) – The number of factors to obtain.

import_model(**kwargs)[source]

Import or inherit variables and parameters from another model.

Parameters:

**kwargs – The variables and parameters to be imported.

print_msg(msg, type='I')[source]

Print message.

Parameters:
  • msg (str) – The message to be printed.

  • type (str) – The type of message, e.g. ‘I’ for info, ‘W’ for warning, ‘E’ for error.

set_config(**kwargs)[source]

Set system configurations.

System configurations are those involved when calling the fit() method.

They controls the global random seed generator, the verbosity and display settings.

They also identify the type of task the model is dealing with, which affects the evaluation procedure.

System configurations

taskstr, {‘prediction’, ‘reconstruction’}

The type of evaluation task. When the datasets (X_train, X_val and X_test) are provided as csr_matrix, prediction tasks only measure the entries in the sparse matrix (these entries can be 0 or 1, see negative_sampling()), while reconstruction tasks measure the whole matrix (treat sparse matrix as numpy array).

seedint

Model seed.

displaybool, default: False

Switch for visualization.

verbosebool, default: False

Switch for verbosity.

scalingfloat, default: 1.0

Scaling of images in visualization.

pixelsint, default: 2

Resolution of images in visualization.

set_factors(k, u, v)[source]

Add new factor (k = 0, 1, …).

Parameters:
  • k (int) – The number of factor to be added.

  • u (numpy.ndarray) – The user factor to be added.

  • v (numpy.ndarray) – The item factor to be added.

set_params(**kwargs)[source]

Model parameters.

The parameter list shows the commonly used meanings of them.

Model parameters

kint

The rank.

Undarray, spmatrix

Initial factor matrix when init_method is 'custom'.

Vndarray, spmatrix

Initial factor matrix when init_method is 'custom'.

Usndarray, spmatrix

For collective matrix factorization. Initial factor matrices when init_method is 'custom'.

Wndarray, spmatrix or str in {‘mask’, ‘full’}

Masking weight matrix. For ‘mask’, it’ll use all samples in X_train (both 1’s and 0’s) as a mask. For ‘full’, it refers to a full 1’s matrix.

Wslist of spmatrix, str in {‘mask’, ‘full’}

For collective matrix factorization. Masking weight matrices.

alphalist of floats

For collective matrix factorization. Importance weights for matrices.

lrfloat

The learning rate.

regfloat

The regularization parameter.

tolfloat

The error tolerance. Fitting will stop when the specified error is below tol.

min_difffloat

The minimal difference. Fitting will stop when the specified change is below min_diff.

max_iterint

The maximal number of iterations.

init_methodstr

The initialization method.

truncate_factors(k)[source]

Get the first k factors (k = 1, 2, …).

Parameters:

k (int) – The number of factors to obtain.

PyBMF.models.BinaryMFPenalty module

class PyBMF.models.BinaryMFPenalty.BinaryMFPenalty(k, U=None, V=None, W='full', beta_loss='frobenius', solver='mu', reg=2.0, reg_growth=3, max_reg=10000000000.0, tol=0.01, min_diff=0.0, max_iter=100, init_method='custom', normalize_method='balance', seed=None)[source]

Bases: ContinuousModel

Binary matrix factorization, penalty function algorithm.

Parameters:
  • reg (float) – The regularization weight ‘lambda’ in the paper.

  • reg_growth (float) – The growing rate of regularization weight.

  • max_reg (float) – The upper bound of regularization weight.

  • tol (float) – The error tolerance ‘epsilon’ in the paper.

_fit()[source]

The multiplicative update of factor matrices U, V.

check_params(**kwargs)[source]

Check the validity of parameters.

fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model.

get_prediction()[source]

Get prediction matrix.

update_U()[source]

The update process of U.

update_V()[source]

The update process of V.

PyBMF.models.BinaryMFPenalty.error(X_gt, X_pd, W, U, V, reg)[source]

Error for penalty function algorithm.

PyBMF.models.BinaryMFPenalty.rec_error(X_gt, X_pd, W)[source]

Reconstruction error.

PyBMF.models.BinaryMFPenalty.reg_error(X)[source]

The regularization function.

PyBMF.models.BinaryMFPenalty.update_U(X, W, U, V, reg, solver='mu', beta_loss='frobenius')[source]

Multiplicative update of factor U.

PyBMF.models.BinaryMFPenalty.update_V(X, W, U, V, reg, solver='mu', beta_loss='frobenius')[source]

Multiplicative update of factor V.

PyBMF.models.BinaryMFThreshold module

class PyBMF.models.BinaryMFThreshold.BinaryMFThreshold(k, U, V, W='mask', u=0.5, v=0.5, lamda=100, solver='line-search', min_diff=0.001, max_iter=100, init_method='custom', normalize_method=None, seed=None)[source]

Bases: ContinuousModel

Binary matrix factorization, thresholding algorithm with line search.

Note

To be released:

For sigmoid link function, please use BinaryMFThresholdExSigmoid.

For columnwise thresholds, please use BinaryMFThresholdExColumnwise.

For both, please use BinaryMFThresholdExSigmoidColumnwise.

Parameters:
  • u (float) – Initial threshold for U.

  • v (float) – Initial threshold for V.

  • lamda (float) – The ‘lambda’ in sigmoid function.

F(**kwargs)
_fit()[source]

The gradient descent method.

check_params(**kwargs)[source]

Check the validity of parameters.

dF(**kwargs)
dXdx(X, x)[source]

The fractional term in the gradient.

dU* dV* dW* dH*

This computes — and — (or — and — as in the paper).

du dv dw dh

Parameters:

X (ndarray) – The X*, sigmoid(X - x) in the paper.

evaluate_with_threshold(n_iter, new_fval)[source]
fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model.

threshold_to_x()[source]
x_bounds()[source]
x_to_threshold(x_last)[source]

PyBMF.models.ContinuousModel module

class PyBMF.models.ContinuousModel.ContinuousModel[source]

Bases: BaseModel

Base class for continuous binary matrix factorization models.

_show_matrix()[source]

Wrapper for BaseModel._show_matrix().

_to_bool()[source]

Turn X, W, U and V into bool matrices.

For temporary use in development.

_to_dense()[source]

Turn X, W, U and V into dense matrices.

For temporary use in development.

_to_float()[source]

Turn X, W, U and V into float matrices.

For temporary use in development.

init_UV(init_method='normal')[source]

Initialize factors U and V with given init_method.

init_W()[source]

Initialize masking weight matrix for models that accept masking weights.

This turns codenames into matrix.

If W is ‘mask’: W will be assigned 1 for any entrances in X_train, no matter if the value is 1, or 0 from negative sampling.

If W is ‘full’: W is full 1 matrix. The loss will take the whole matrix into consideration.

If W is ndarray or spmatrix: W will be used as the mask matrix.

init_model()[source]

The BaseModel.init_model() for continuous models.

normalize_UV()[source]

Normalize factors U and V with given normalize_method.

If ‘balance’: balance each pair of factors, used in BinaryMFPenalty. This does not necessarily map the factors to an interval within [0, 1].

If ‘matrixwise-normalize’: normalize the whole factor matrix to [0, 1], used in thresholding methods. This will maintain the relative magnitude of the values within the whole factor matrix.

If ‘columnwise-normalize’: normalize each factor vector to [0, 1], used in thresholding methods. This will maintain the relative magnitude of the values within each factor vector.

If ‘matrixwise-mapping’: map unique values in the whole factor matrix to an athmetic sequence in [0, 1]. This will maintain the relative magnitude of the values within the whole factor matrix.

If ‘columnwise-mapping’: map unique values in each factor vector to an athmetic sequence in [0, 1]. This will maintain the relative magnitude of the values within each factor vector.

If None: do nothing.

show_matrix(settings=None, u=None, v=None, boolean=True, **kwargs)[source]

Wrapper of BaseModel.show_matrix() with thresholds u and v.

PyBMF.models.ContinuousModel.unique_values_mapping(arr)[source]

Map unique values in a matrix to [0, 1] interval.

PyBMF.models.ELBMF module

class PyBMF.models.ELBMF.ELBMF(k, U=None, V=None, W='full', init_method='custom', reg_l1=0.01, reg_l2=0.02, reg_growth=1.02, rounding=False, beta=0.0, tol=0.0, max_iter=1000, min_diff=1e-08, seed=None)[source]

Bases: ContinuousModel

fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model to observations, with validation and test if necessary.

For a fitting procedure, implement and append your _fit() and finish().

Simply overwrite this method if you want to drop or add any parts of the procedures.

Parameters:
  • X_train (ndarray) – Training data.

  • X_val (ndarray) – Validation data.

  • X_test (ndarray) – Test data.

  • **kwargs (dict) – Other parameters.

iPALM()[source]
init_model()[source]

Initialize factors and logging variables.

PyBMF.models.ELBMF.get_integrality_gap(U, reg_l1, reg_l2)[source]
PyBMF.models.ELBMF.prox(U, kai, lamda)[source]

Proximal operator of the elastic net penalty.

PyBMF.models.ELBMF.update_U(X, U, V, W, reg_l1, reg_l2, beta, U_last)[source]

A step of Gauss-Seidel optimization for U, applies to V as well.

PyBMF.models.ELBMFNumPy module

class PyBMF.models.ELBMFNumPy.ELBMF(org_A, A, ncomponents, l1reg, l2reg, c, maxiter, tolerance, random_seed=19, beta=0.0, batchsize=None, with_rounding=True, callback=None)[source]

Bases: object

factorize()[source]
init_factorization(seed)[source]
print_loss()[source]

PyBMF.models.ELBMFPyTorch module

PyBMF.models.FastStep module

class PyBMF.models.FastStep.FastStep(k, U=None, V=None, W='full', tau=20, solver='line-search', tol=0, min_diff=0.01, max_round=30, max_iter=50, init_method='uniform', normalize_method=None, seed=None)[source]

Bases: ContinuousModel

The FastStep algorithm. The projected line search is applied.

F(**kwargs)
_fit()[source]

The gradient descent of the k factors.

check_params(**kwargs)[source]

Check and load model parameters and experiment configurations.

Called upon model initialization and fitting.

# include this in your model class:

def __init__(self, k, tol, alpha):
    self.check_params(k=k, tol=tol, alpha=alpha)

def fit(self, X_train, X_val=None, X_test=None, **kwargs):
    self.check_params(**kwargs)

# call them when initializing and fitting:

model = MyModel(k=10, W='mask', alpha=0.1, seed=1997)

model.fit(X_train, X_val, X_test, seed=2024, task='prediction', verbose=False, display=True)
dF(**kwargs)
early_stop(error=None, diff=None, n_round=None, n_iter=None, n_factor=None, msg=None, k=None, verbose=True)[source]

Early stop detection wrapper.

fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model to observations, with validation and test if necessary.

For a fitting procedure, implement and append your _fit() and finish().

Simply overwrite this method if you want to drop or add any parts of the procedures.

Parameters:
  • X_train (ndarray) – Training data.

  • X_val (ndarray) – Validation data.

  • X_test (ndarray) – Test data.

  • **kwargs (dict) – Other parameters.

PyBMF.models.GreConD module

class PyBMF.models.GreConD.GreConD(k=None, tol=0)[source]

Bases: BaseModel

The GreConD algorithm for exact Boolean decomposition.

Parameters:
  • k (int, optional) – The target rank. If None, it will factorize until the error is smaller than tol, or when other stopping criteria is met.

  • tol (float, default: 0) – The error tolerance.

_fit()[source]

The main process if fitting.

fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model.

PyBMF.models.GreConD.get_concept(X_gt, X_rs)[source]

Get a concept/pattern from the residual matrix.

Parameters:
  • X_gt (spmatrix) – The ground-truth matrix.

  • X_rs (spmatrix) – The residual matrix.

Returns:

  • score (float) – The TP coverage of the pattern over X_gt.

  • u ((m, 1) spmatrix) – The factors. If the pattern is not found, they’ll be zero vectors.

  • v ((n, 1) spmatrix) – The factors. If the pattern is not found, they’ll be zero vectors.

PyBMF.models.GreConDPlus module

class PyBMF.models.GreConDPlus.GreConDPlus(k=None, tol=0, w_fp=0.5, w_fn=None)[source]

Bases: BaseModel

The GreConD+ algorithm for approximate Boolean decomposition.

Parameters:
  • k (int, optional) – The target rank. If None, it will factorize until the error is smaller than tol, or when other stopping criteria is met.

  • tol (float, default: 0) – The error tolerance.

  • w_fp (float) – The penalty weights for false positives (FP).

  • w_fn (float) – The penalty weights for false negatives (FN).

fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model.

remove_covered(k)[source]

Remove fully covered patterns by k-th pattern (k = 0, 1, …).

Parameters:

k (int) – The index of pattern to check with.

remove_overlapped()[source]

Remove overlapped columns and rows.

set_extensions(k, u_exp, v_exp)[source]

Add the extension part of each factor (k = 0, 1, …).

Parameters:
  • k (int) – The number of factor to be added.

  • u_exp (spmatrix) – The extension u_exp to be added to U_exp.

  • v_exp (spmatrix) – The extension v_exp to be added to V_exp.

PyBMF.models.GreConDPlus._expansion(X_gt, X_old, u, v, w_fp, w_fn, axis)[source]

Row-wise or column-wise expansion of a pattern given u and v.

Parameters:

axis (int, [0, 1]) – axis stands for the dimension of factor that remain unchanged. 1 for row-wise expansion and 0 for column-wise expansion.

PyBMF.models.GreConDPlus.expansion(X_gt, X_old, u, v, w_fp, w_fn)[source]

Row-wise or column-wise expansion of a pattern given u and v.

In GreConD+, factors are initially the dense cores. Factor u and v will grow in each iteration. The expansion part is recorded in u_exp and v_exp.

Parameters:
  • X_gt (spmatrix) – The ground-truth matrix.

  • X_old (spmatrix) – The residual matrix before the join of u and v.

  • u ((m, 1) spmatrix) – The factors.

  • v ((n, 1) spmatrix) – The factors.

  • w_fp (float) – The penalty weights for false positives (FP).

  • w_fn (float) – The penalty weights for false negatives (FN).

Returns:

  • u, ((m, 1) spmatrix) – The expansion part in u.

  • v, ((n, 1) spmatrix) – The expansion part in v.

PyBMF.models.Hyper module

class PyBMF.models.Hyper.Hyper(min_support)[source]

Bases: BaseModel

The Hyper algorithm.

Hyper is an exact decomposition algorithm. Hyper+ is used after fitting a Hyper model. It’s a relaxation of the exact decomposition.

Parameters:

min_support (float) – The ‘alpha’ in the paper. The min support of frequent itemsets.

static cost(T, I, X_rs)[source]

The cost function (gamma) in Hyper.

static find_hyper(I, X_gt, X_rs)[source]
queuelist

The indices of rows with non-zero uncoverage, in descending order. Row must be a support of I.

fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model to observations, with validation and test if necessary.

For a fitting procedure, implement and append your _fit() and finish().

Simply overwrite this method if you want to drop or add any parts of the procedures.

Parameters:
  • X_train (ndarray) – Training data.

  • X_val (ndarray) – Validation data.

  • X_test (ndarray) – Test data.

  • **kwargs (dict) – Other parameters.

init_itemsets()[source]

Initialize candidate itemsets with Apriori.

I : list of int list

init_model()[source]

Initialize the model.

Called after params are set and datasets are loaded.

Simply overwrite this method if you want to drop or add any parts of the procedures.

init_transactions()[source]

Initialize transactions with cost.

T : list of int list c : list of float X_rs : spmatrix

sort_by_cost()[source]

Sort T, I and c lists in the ascending order of cost c.

PyBMF.models.HyperPlus module

class PyBMF.models.HyperPlus.HyperPlus(model, target_k, samples=500, beta=inf)[source]

Bases: Hyper

The Hyper+ algorithm.

Hyper is an exact decomposition algorithm. Hyper+ is used after fitting a Hyper model. It’s a relaxation of the exact decomposition.

Parameters:
  • model (Hyper class) – The fitted Hyper model.

  • target_k (int) – The target number of factors. By default, it’s 1. This will ask the model to factorize all the way down to k = 1. The last pattern will always be full 1 matrix.

  • samples (int, default: all possible samples) – Number of pairs to be merged during trials. Smaller samples will be faster but more likely to surpass the false positive rate beta.

  • beta (float) – The upper limit of false positive / ground truth. 1.0 means no limit.

check_params(**kwargs)[source]

Check and load model parameters and experiment configurations.

Called upon model initialization and fitting.

# include this in your model class:

def __init__(self, k, tol, alpha):
    self.check_params(k=k, tol=tol, alpha=alpha)

def fit(self, X_train, X_val=None, X_test=None, **kwargs):
    self.check_params(**kwargs)

# call them when initializing and fitting:

model = MyModel(k=10, W='mask', alpha=0.1, seed=1997)

model.fit(X_train, X_val, X_test, seed=2024, task='prediction', verbose=False, display=True)
fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model to observations, with validation and test if necessary.

For a fitting procedure, implement and append your _fit() and finish().

Simply overwrite this method if you want to drop or add any parts of the procedures.

Parameters:
  • X_train (ndarray) – Training data.

  • X_val (ndarray) – Validation data.

  • X_test (ndarray) – Test data.

  • **kwargs (dict) – Other parameters.

PyBMF.models.HyperPlus.cost_savings(T_0, I_0, T_1, I_1, X_pd)[source]

Compute the cost savings of merging H_0 and H_1.

savings = model description savings / exclusive area of merged pattern

H_0 = [T_0, I_0], H_1 = [T_1, I_1]

PyBMF.models.IP module

class PyBMF.models.IP.IP[source]

Bases: ContinuousModel

PyBMF.models.MEBF module

class PyBMF.models.MEBF.MEBF(k=None, tol=0, t=None, w_fp=1, w_fn=1)[source]

Bases: BaseModel

Median Expansion for Boolean Factorization

Parameters:
  • k (int, optional) – The target rank. If None, it will factorize until the error is smaller than tol, or when other stopping criteria is met.

  • tol (float, default: 0) – The error tolerance.

  • w_fp (float) – The penalty weights for false positives (FP).

  • w_fn (float) – The penalty weights for false negatives (FN).

_fit()[source]

The main process if fitting.

bidirectional_growth()[source]

Bi-directional growth algorithm.

fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model.

get_factor(axis)[source]

Get factor for bi-directional growth.

Parameters:

axis (int) – 0, sort cols, find middle u and grow on v. 1, sort rows, find middle v and grow on u.

Returns:

  • a (csr_matrix) – A factor vector.

  • b (csr_matrix) – A factor vector.

get_weak_signal(axis)[source]

Get factor for weak signal detection.

Parameters:

axis (int) – 0, find u and grow on v 1, find v and grow on u

Returns:

  • a (csr_matrix) – A factor vector.

  • b (csr_matrix) – A factor vector.

weak_signal_detection()[source]

Weak signal detection algorithm.

PyBMF.models.MaxSAT module

class PyBMF.models.MaxSAT.MaxSAT(k, mode='fast_undercover')[source]

Bases: ContinuousModel

fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model to observations, with validation and test if necessary.

For a fitting procedure, implement and append your _fit() and finish().

Simply overwrite this method if you want to drop or add any parts of the procedures.

Parameters:
  • X_train (ndarray) – Training data.

  • X_val (ndarray) – Validation data.

  • X_test (ndarray) – Test data.

  • **kwargs (dict) – Other parameters.

PyBMF.models.MessagePassing module

PyBMF.models.NMFSklearn module

class PyBMF.models.NMFSklearn.NMFSklearn(k, U=None, V=None, beta_loss='frobenius', init_method='nndsvd', solver='cd', tol=0.0001, max_iter=1000, seed=None)[source]

Bases: ContinuousModel

NMF by scikit-learn.

Parameters:
  • U (ndarray, spmatrix) – Need to be prepared if init_method is ‘custom’.

  • V (ndarray, spmatrix) – Need to be prepared if init_method is ‘custom’.

check_params(**kwargs)[source]

Check and load model parameters and experiment configurations.

Called upon model initialization and fitting.

# include this in your model class:

def __init__(self, k, tol, alpha):
    self.check_params(k=k, tol=tol, alpha=alpha)

def fit(self, X_train, X_val=None, X_test=None, **kwargs):
    self.check_params(**kwargs)

# call them when initializing and fitting:

model = MyModel(k=10, W='mask', alpha=0.1, seed=1997)

model.fit(X_train, X_val, X_test, seed=2024, task='prediction', verbose=False, display=True)
fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model to observations, with validation and test if necessary.

For a fitting procedure, implement and append your _fit() and finish().

Simply overwrite this method if you want to drop or add any parts of the procedures.

Parameters:
  • X_train (ndarray) – Training data.

  • X_val (ndarray) – Validation data.

  • X_test (ndarray) – Test data.

  • **kwargs (dict) – Other parameters.

PyBMF.models.OrMachine module

class PyBMF.models.OrMachine.OrMachine(k)[source]

Bases: ContinuousModel

fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model to observations, with validation and test if necessary.

For a fitting procedure, implement and append your _fit() and finish().

Simply overwrite this method if you want to drop or add any parts of the procedures.

Parameters:
  • X_train (ndarray) – Training data.

  • X_val (ndarray) – Validation data.

  • X_test (ndarray) – Test data.

  • **kwargs (dict) – Other parameters.

PyBMF.models.PNLPF module

class PyBMF.models.PNLPF.PNLPF(k, U=None, V=None, W='full', reg=2.0, beta_loss='frobenius', solver='mu', link_lamda=10, reg_growth=3, max_reg=10000000000.0, tol=0.01, min_diff=0.0, max_iter=100, init_method='custom', normalize_method='balance', seed=None)[source]

Bases: BinaryMFPenalty

PNLPF, binary matrix factorization’s penalty function algorithm with sigmmoid link function.

Parameters:
  • reg (float) – The regularization weight ‘lambda’ in the paper.

  • reg_growth (float) – The growing rate of regularization weight.

  • max_reg (float) – The upper bound of regularization weight.

  • tol (float) – The error tolerance ‘epsilon’ in the paper.

get_prediction()[source]

Get prediction matrix.

update_U(**kwargs)

The update process of U.

update_V(**kwargs)

The update process of V.

PyBMF.models.PNLPF.get_prediction_with_sigmoid(U, V, link_lamda)[source]

Get prediction with sigmoid link function.

PyBMF.models.PNLPF.update_U(X, W, U, V, reg, link_lamda, solver='mu', beta_loss='frobenius')[source]
PyBMF.models.PNLPF.update_V(X, W, U, V, reg, link_lamda, solver='mu', beta_loss='frobenius')[source]

PyBMF.models.PRIMP module

PyBMF.models.PRIMPPyTorch module

PyBMF.models.Panda module

class PyBMF.models.Panda.Panda(k=None, tol=0, w_model=1, w_fp=1, w_fn=1, init_method='correlation', exact_decomp=False)[source]

Bases: BaseModel

PaNDa and PaNDa+ algorithm.

PaNDa and PaNDa+ both fix w_fp, w_fn to 1.0. They’re allowed to be tuned in our implementation.

PaNDa fixes w_model to 1.0 while PaNDa+ does not.

Parameters:
  • k (int, optional) – The target rank. If None, it will factorize until the error is smaller than tol, or when other stopping criteria is met.

  • tol (float, default: 0) – The error tolerance.

  • w_model (float) – The model code weight rho in the Panda+ paper.

  • w_fp (float) – The penalty weights for false positives (FP). Added on top of the original works of Panda and Panda+ for flexibility.

  • w_fn (float) – The penalty weights for false negatives (FN). Added on top of the original works of Panda and Panda+ for flexibility.

  • init_method (str) – How items are sorted.

  • exact_decomp (bool) – Exact decomposition option. Added on top of the original works of Panda and Panda+ to enable exact decomposition.

_fit()[source]

The main process of fitting.

In Panda, 3 lists are maintained.

E (list) is the item extension list. I (spmatrix) is the item list. T (spmatrix) is the user or transaction list.

check_params(**kwargs)[source]

Check the validity of the parameters.

extend_core(**kwargs)
find_core(**kwargs)
fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model.

sort_items(method)[source]

Sort the extension list by descending scores.

Parameters:

method (str) –

Sort method.

frequency:

Sort items in extension list by frequency.

couples-frequency:

Sort items in extension list by frequency of item-pairs that include an item.

correlation:

Sort items in extension list by correlation with current transactions T.

PyBMF.models.TransposedModel module

class PyBMF.models.TransposedModel.TransposedModel(model, **kwargs)[source]

Bases: Asso

The model with transposed input.

fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model.

PyBMF.models.WNMF module

class PyBMF.models.WNMF.WNMF(k, U=None, V=None, W='mask', beta_loss='frobenius', init_method='normal', solver='mu', tol=0.0, min_diff=0.0, max_iter=30, seed=None)[source]

Bases: ContinuousModel

Weighted Nonnegative Matrix Factorization.

Parameters:
  • U (ndarray, spmatrix) – Need to be prepared if init_method is ‘custom’.

  • V (ndarray, spmatrix) – Need to be prepared if init_method is ‘custom’.

_fit()[source]

The alternative minimization algorithm.

check_params(**kwargs)[source]

Check and load model parameters and experiment configurations.

Called upon model initialization and fitting.

# include this in your model class:

def __init__(self, k, tol, alpha):
    self.check_params(k=k, tol=tol, alpha=alpha)

def fit(self, X_train, X_val=None, X_test=None, **kwargs):
    self.check_params(**kwargs)

# call them when initializing and fitting:

model = MyModel(k=10, W='mask', alpha=0.1, seed=1997)

model.fit(X_train, X_val, X_test, seed=2024, task='prediction', verbose=False, display=True)
error(**kwargs)
fit(X_train, X_val=None, X_test=None, **kwargs)[source]

Fit the model.

update(**kwargs)

Module contents