PyBMF.models package¶
Submodules¶
PyBMF.models.Asso module¶
- class PyBMF.models.Asso.Asso(tau, k=None, tol=0, w_fp=0.5, w_fn=None)[source]¶
Bases:
BaseModelThe Asso algorithm.
- Parameters:
k (int, optional) – The target rank. If
None, it will factorize until the error is smaller thantol, or when other stopping criteria is met.tol (float, default: 0) – The error tolerance.
tau (float) – The binarization threshold when building basis. Can be determined via model selection techniques.
w_fp (float) – The penalty weights for FP.
w_fn (float, optional, default: None) – The penalty weights for FN. If
w_fnisNone, it will be treated as1 - w_fp.
- PyBMF.models.Asso.build_assoc(X, dim)[source]¶
Build the real-valued association matrix.
- Parameters:
X (ndarray, spmatrix) – The data matrix.
dim (int) – The dimension which
basisbelongs to. Ifdim== 0,basisis treated as a column vector andvectoras a row vector.
- Returns:
assoc – The association matrix.
- Return type:
spmatrix
- PyBMF.models.Asso.build_basis(assoc, tau)[source]¶
Get the binary-valued basis candidates.
- Parameters:
assoc (spmatrix) – The association matrix.
tau (float) – The threshold for the association matrix.
- Returns:
basis – The binary-valued basis candidates.
- Return type:
spmatrix
- PyBMF.models.Asso.get_vector(X_gt, X_old, s_old, basis, basis_dim, w_fp, w_fn)[source]¶
Return the optimal column/row vector given a row/column basis candidate.
- Parameters:
X_gt (spmatrix) – The ground-truth matrix.
X_old (spmatrix) – The prediction matrix before adding the current pattern.
s_old (array) – The column/row-wise coverage scores of previous prediction X_pd.
basis ((1, n) spmatrix) – The basis vector.
basis_dim (int) – The dimension to which basis belongs. If basis_dim == 0, a pattern is considered basis.T * vector. Otherwise, it’s considered vector.T * basis. Note that both basis and vector are row vectors.
w_fp (float) – The penalty weights for false positives and false negatives.
w_fn (float) – The penalty weights for false positives and false negatives.
- Returns:
score (float) – The coverage score.
vector ((1, n) spmatrix) – The matched vector.
PyBMF.models.AssoIter module¶
- class PyBMF.models.AssoIter.AssoIter(model, w_fp=0.5, w_fn=None)[source]¶
Bases:
AssoThe Asso algorithm with iterative search over each column of U.
- _fit()[source]¶
Using iterative search to refine U
In the paper, the algorithm uses cover function with the same weight for coverage and over-coverage as updating criteria, and uses error function as stopping criteria. Changing them may improve the performance.
- check_params(**kwargs)[source]¶
Check and load model parameters and experiment configurations.
Called upon model initialization and fitting.
# include this in your model class: def __init__(self, k, tol, alpha): self.check_params(k=k, tol=tol, alpha=alpha) def fit(self, X_train, X_val=None, X_test=None, **kwargs): self.check_params(**kwargs) # call them when initializing and fitting: model = MyModel(k=10, W='mask', alpha=0.1, seed=1997) model.fit(X_train, X_val, X_test, seed=2024, task='prediction', verbose=False, display=True)
PyBMF.models.AssoOpt module¶
- class PyBMF.models.AssoOpt.AssoOpt(model, w_fp=1, w_fn=1)[source]¶
Bases:
AssoThe Asso algorithm with exhaustive search over each row of U.
This implementation may be slow, but is able to deal with larger number of factors k or dimension of X_train.
- check_params(**kwargs)[source]¶
Check and load model parameters and experiment configurations.
Called upon model initialization and fitting.
# include this in your model class: def __init__(self, k, tol, alpha): self.check_params(k=k, tol=tol, alpha=alpha) def fit(self, X_train, X_val=None, X_test=None, **kwargs): self.check_params(**kwargs) # call them when initializing and fitting: model = MyModel(k=10, W='mask', alpha=0.1, seed=1997) model.fit(X_train, X_val, X_test, seed=2024, task='prediction', verbose=False, display=True)
PyBMF.models.BaseModel module¶
- class PyBMF.models.BaseModel.BaseModel(**kwargs)[source]¶
Bases:
BaseModelToolsThe base class for all the models.
Initialize the model with parameters.
- _evaluate(name, info, metrics)[source]¶
Evaluate on a given dataset.
- Parameters:
name (str in ['train', 'val', 'test']) – Which matrix to evaluate.
info (dict of list) – The extra information to be recorded.
metrics (list of str) – The metrics to be evaluated and recorded.
- check_params(**kwargs)[source]¶
Check and load model parameters and experiment configurations.
Called upon model initialization and fitting.
# include this in your model class: def __init__(self, k, tol, alpha): self.check_params(k=k, tol=tol, alpha=alpha) def fit(self, X_train, X_val=None, X_test=None, **kwargs): self.check_params(**kwargs) # call them when initializing and fitting: model = MyModel(k=10, W='mask', alpha=0.1, seed=1997) model.fit(X_train, X_val, X_test, seed=2024, task='prediction', verbose=False, display=True)
- evaluate(df_name, head_info={}, train_info={}, val_info={}, test_info={}, metrics=['Recall', 'Precision', 'Accuracy', 'F1'], train_metrics=None, val_metrics=None, test_metrics=None, verbose=False)[source]¶
Evaluate a BMF model on the given train, val and test dataset.
- Parameters:
df_name (str) – The name of
dataframeto record with.head_info (dict) – The names and values of shared information at the head of each record.
train_info (dict) – The names and values of external information measured on training data.
val_info (dict) – The names and values of external information measured on validation data.
test_info (dict) – The names and values of external information measured on testing data.
metrics (list of str, default: ['Recall', 'Precision', 'Accuracy', 'F1']) – The metrics to be measured. For metric names check
utils.get_metrics.train_metrics (list of str, optional) – The metrics to be measured on training data. Will use metrics instead if it’s
None.val_metrics (list of str, optional) – The metrics to be measured on validation data. Will use metrics instead if it’s
None.test_metrics (list of str, optional) – The metrics to be measured on testing data. Will use metrics instead if it’s
None.
- finish(show_logs=True, save_model=True, show_result=True)[source]¶
Called when the fitting is over.
The default finishing procedure.
Simply overwrite this method if you want to drop or add any parts of the procedures.
You can attach this to the end of
fit()or simply call from outside.
- fit(X_train, X_val=None, X_test=None, **kwargs)[source]¶
Fit the model to observations, with validation and test if necessary.
For a fitting procedure, implement and append your
_fit()andfinish().Simply overwrite this method if you want to drop or add any parts of the procedures.
- Parameters:
X_train (ndarray) – Training data.
X_val (ndarray) – Validation data.
X_test (ndarray) – Test data.
**kwargs (dict) – Other parameters.
- init_model()[source]¶
Initialize the model.
Called after params are set and datasets are loaded.
Simply overwrite this method if you want to drop or add any parts of the procedures.
- load_dataset(X_train, X_val=None, X_test=None)[source]¶
Load train and validation data.
For matrices that are modified frequently,
lil(LIst of List) orcoois preferred.For matrices that are not getting modified,
csrorcscis preferred.- Parameters:
X_train (ndarray, spmatrix) – Data for matrix factorization.
X_val (ndarray, spmatrix) – Data for model selection.
X_test (ndarray, spmatrix) – Data for prediction.
- predict_X(U=None, V=None, u=None, v=None, us=None, vs=None, boolean=True)[source]¶
Update the prediction
X_pd.- Parameters:
U (ndarray, spmatrix) – Factor matrix.
V (ndarray, spmatrix) – Factor matrix.
u (float) – The shared threshold for factors in
U.v (float) – The shared threshold for factors in
V.us (list of k floats) – The thresholds for each factor in
U.vs (list of k floats) – The thresholds for each factor in
V.boolean (bool) – Whether to apply Boolean multiplication.
PyBMF.models.BaseModelTools module¶
- class PyBMF.models.BaseModelTools.BaseModelTools[source]¶
Bases:
objectThe helper class for
BaseModel.- _early_stop(msg, verbose, k=None)[source]¶
To deal with early covergence or stop.
- Parameters:
msg (str) – The message to be displayed.
verbose (bool) – Whether to print the message.
k (int, optional) – The number of factors obtained.
- _init_logs()[source]¶
Initialize the logs.
The
logsis adictthat holds the records in one place. The types of records include but are not limited todataframe,ndarrayandlist.
- _save_model(path=None, name=None)[source]¶
Save the model.
- Parameters:
path (str) – Path to save the model.
name (str) – Name of the model.
- _show_result()[source]¶
Display the prediction.
Make sure
self.X_pdis set properly before calling.For example:
>>> self.X_pd = get_prediction(U=self.U, V=self.V, boolean=True)
- early_stop(error=None, diff=None, n_iter=None, n_factor=None, msg=None, k=None, verbose=True)[source]¶
Stopping criteria detection and early stop.
- Parameters:
k (int) – The number of factors to obtain. This will keep the first
kcolumns inself.Uandself.V.error (float) – Current error. To be compared with error tolerance
self.tol.diff (float) – Current update difference. To be compared with difference threshold
self.min_diff.n_iter (int) – Current number of iterations. To be compared with maximum number of iterations
self.max_iter.n_factor (int) – Current number of factors. To be compared with maximum number of factors
self.k.k – The number of factors to obtain.
verbose (bool) – Whether to print the message.
- Returns:
is_improving – Whether the fitting should continue or not.
- Return type:
bool
- extend_factors(k)[source]¶
Increase the number of factors to k (k = 1, 2, …).
- Parameters:
k (int) – The number of factors to obtain.
- import_model(**kwargs)[source]¶
Import or inherit variables and parameters from another model.
- Parameters:
**kwargs – The variables and parameters to be imported.
- print_msg(msg, type='I')[source]¶
Print message.
- Parameters:
msg (str) – The message to be printed.
type (str) – The type of message, e.g. ‘I’ for info, ‘W’ for warning, ‘E’ for error.
- set_config(**kwargs)[source]¶
Set system configurations.
System configurations are those involved when calling the
fit()method.They controls the global random seed generator, the verbosity and display settings.
They also identify the type of task the model is dealing with, which affects the evaluation procedure.
System configurations¶
- taskstr, {‘prediction’, ‘reconstruction’}
The type of evaluation task. When the datasets (X_train, X_val and X_test) are provided as csr_matrix, prediction tasks only measure the entries in the sparse matrix (these entries can be 0 or 1, see negative_sampling()), while reconstruction tasks measure the whole matrix (treat sparse matrix as numpy array).
- seedint
Model seed.
- displaybool, default: False
Switch for visualization.
- verbosebool, default: False
Switch for verbosity.
- scalingfloat, default: 1.0
Scaling of images in visualization.
- pixelsint, default: 2
Resolution of images in visualization.
- set_factors(k, u, v)[source]¶
Add new factor (k = 0, 1, …).
- Parameters:
k (int) – The number of factor to be added.
u (numpy.ndarray) – The user factor to be added.
v (numpy.ndarray) – The item factor to be added.
- set_params(**kwargs)[source]¶
Model parameters.
The parameter list shows the commonly used meanings of them.
Model parameters¶
- kint
The rank.
- Undarray, spmatrix
Initial factor matrix when
init_methodis'custom'.- Vndarray, spmatrix
Initial factor matrix when
init_methodis'custom'.- Usndarray, spmatrix
For collective matrix factorization. Initial factor matrices when
init_methodis'custom'.- Wndarray, spmatrix or str in {‘mask’, ‘full’}
Masking weight matrix. For ‘mask’, it’ll use all samples in
X_train(both 1’s and 0’s) as a mask. For ‘full’, it refers to a full 1’s matrix.- Wslist of spmatrix, str in {‘mask’, ‘full’}
For collective matrix factorization. Masking weight matrices.
- alphalist of floats
For collective matrix factorization. Importance weights for matrices.
- lrfloat
The learning rate.
- regfloat
The regularization parameter.
- tolfloat
The error tolerance. Fitting will stop when the specified error is below
tol.- min_difffloat
The minimal difference. Fitting will stop when the specified change is below
min_diff.- max_iterint
The maximal number of iterations.
- init_methodstr
The initialization method.
PyBMF.models.BinaryMFPenalty module¶
- class PyBMF.models.BinaryMFPenalty.BinaryMFPenalty(k, U=None, V=None, W='full', beta_loss='frobenius', solver='mu', reg=2.0, reg_growth=3, max_reg=10000000000.0, tol=0.01, min_diff=0.0, max_iter=100, init_method='custom', normalize_method='balance', seed=None)[source]¶
Bases:
ContinuousModelBinary matrix factorization, penalty function algorithm.
- Parameters:
reg (float) – The regularization weight ‘lambda’ in the paper.
reg_growth (float) – The growing rate of regularization weight.
max_reg (float) – The upper bound of regularization weight.
tol (float) – The error tolerance ‘epsilon’ in the paper.
- PyBMF.models.BinaryMFPenalty.error(X_gt, X_pd, W, U, V, reg)[source]¶
Error for penalty function algorithm.
PyBMF.models.BinaryMFThreshold module¶
- class PyBMF.models.BinaryMFThreshold.BinaryMFThreshold(k, U, V, W='mask', u=0.5, v=0.5, lamda=100, solver='line-search', min_diff=0.001, max_iter=100, init_method='custom', normalize_method=None, seed=None)[source]¶
Bases:
ContinuousModelBinary matrix factorization, thresholding algorithm with line search.
Note
To be released:
For sigmoid link function, please use
BinaryMFThresholdExSigmoid.For columnwise thresholds, please use
BinaryMFThresholdExColumnwise.For both, please use
BinaryMFThresholdExSigmoidColumnwise.- Parameters:
u (float) – Initial threshold for
U.v (float) – Initial threshold for
V.lamda (float) – The ‘lambda’ in sigmoid function.
- F(**kwargs)¶
- dF(**kwargs)¶
PyBMF.models.ContinuousModel module¶
- class PyBMF.models.ContinuousModel.ContinuousModel[source]¶
Bases:
BaseModelBase class for continuous binary matrix factorization models.
- init_W()[source]¶
Initialize masking weight matrix for models that accept masking weights.
This turns codenames into matrix.
If
Wis ‘mask’:Wwill be assigned 1 for any entrances inX_train, no matter if the value is 1, or 0 from negative sampling.If
Wis ‘full’:Wis full 1 matrix. The loss will take the whole matrix into consideration.If
Wis ndarray or spmatrix:Wwill be used as the mask matrix.
- normalize_UV()[source]¶
Normalize factors U and V with given
normalize_method.If ‘balance’: balance each pair of factors, used in BinaryMFPenalty. This does not necessarily map the factors to an interval within [0, 1].
If ‘matrixwise-normalize’: normalize the whole factor matrix to [0, 1], used in thresholding methods. This will maintain the relative magnitude of the values within the whole factor matrix.
If ‘columnwise-normalize’: normalize each factor vector to [0, 1], used in thresholding methods. This will maintain the relative magnitude of the values within each factor vector.
If ‘matrixwise-mapping’: map unique values in the whole factor matrix to an athmetic sequence in [0, 1]. This will maintain the relative magnitude of the values within the whole factor matrix.
If ‘columnwise-mapping’: map unique values in each factor vector to an athmetic sequence in [0, 1]. This will maintain the relative magnitude of the values within each factor vector.
If None: do nothing.
PyBMF.models.ELBMF module¶
- class PyBMF.models.ELBMF.ELBMF(k, U=None, V=None, W='full', init_method='custom', reg_l1=0.01, reg_l2=0.02, reg_growth=1.02, rounding=False, beta=0.0, tol=0.0, max_iter=1000, min_diff=1e-08, seed=None)[source]¶
Bases:
ContinuousModel- fit(X_train, X_val=None, X_test=None, **kwargs)[source]¶
Fit the model to observations, with validation and test if necessary.
For a fitting procedure, implement and append your
_fit()andfinish().Simply overwrite this method if you want to drop or add any parts of the procedures.
- Parameters:
X_train (ndarray) – Training data.
X_val (ndarray) – Validation data.
X_test (ndarray) – Test data.
**kwargs (dict) – Other parameters.
PyBMF.models.ELBMFNumPy module¶
PyBMF.models.ELBMFPyTorch module¶
PyBMF.models.FastStep module¶
- class PyBMF.models.FastStep.FastStep(k, U=None, V=None, W='full', tau=20, solver='line-search', tol=0, min_diff=0.01, max_round=30, max_iter=50, init_method='uniform', normalize_method=None, seed=None)[source]¶
Bases:
ContinuousModelThe FastStep algorithm. The projected line search is applied.
- F(**kwargs)¶
- check_params(**kwargs)[source]¶
Check and load model parameters and experiment configurations.
Called upon model initialization and fitting.
# include this in your model class: def __init__(self, k, tol, alpha): self.check_params(k=k, tol=tol, alpha=alpha) def fit(self, X_train, X_val=None, X_test=None, **kwargs): self.check_params(**kwargs) # call them when initializing and fitting: model = MyModel(k=10, W='mask', alpha=0.1, seed=1997) model.fit(X_train, X_val, X_test, seed=2024, task='prediction', verbose=False, display=True)
- dF(**kwargs)¶
- early_stop(error=None, diff=None, n_round=None, n_iter=None, n_factor=None, msg=None, k=None, verbose=True)[source]¶
Early stop detection wrapper.
- fit(X_train, X_val=None, X_test=None, **kwargs)[source]¶
Fit the model to observations, with validation and test if necessary.
For a fitting procedure, implement and append your
_fit()andfinish().Simply overwrite this method if you want to drop or add any parts of the procedures.
- Parameters:
X_train (ndarray) – Training data.
X_val (ndarray) – Validation data.
X_test (ndarray) – Test data.
**kwargs (dict) – Other parameters.
PyBMF.models.GreConD module¶
- class PyBMF.models.GreConD.GreConD(k=None, tol=0)[source]¶
Bases:
BaseModelThe GreConD algorithm for exact Boolean decomposition.
- Parameters:
k (int, optional) – The target rank. If
None, it will factorize until the error is smaller than tol, or when other stopping criteria is met.tol (float, default: 0) – The error tolerance.
- PyBMF.models.GreConD.get_concept(X_gt, X_rs)[source]¶
Get a concept/pattern from the residual matrix.
- Parameters:
X_gt (spmatrix) – The ground-truth matrix.
X_rs (spmatrix) – The residual matrix.
- Returns:
score (float) – The TP coverage of the pattern over X_gt.
u ((m, 1) spmatrix) – The factors. If the pattern is not found, they’ll be zero vectors.
v ((n, 1) spmatrix) – The factors. If the pattern is not found, they’ll be zero vectors.
PyBMF.models.GreConDPlus module¶
- class PyBMF.models.GreConDPlus.GreConDPlus(k=None, tol=0, w_fp=0.5, w_fn=None)[source]¶
Bases:
BaseModelThe GreConD+ algorithm for approximate Boolean decomposition.
- Parameters:
k (int, optional) – The target rank. If
None, it will factorize until the error is smaller than tol, or when other stopping criteria is met.tol (float, default: 0) – The error tolerance.
w_fp (float) – The penalty weights for false positives (FP).
w_fn (float) – The penalty weights for false negatives (FN).
- PyBMF.models.GreConDPlus._expansion(X_gt, X_old, u, v, w_fp, w_fn, axis)[source]¶
Row-wise or column-wise expansion of a pattern given u and v.
- Parameters:
axis (int, [0, 1]) – axis stands for the dimension of factor that remain unchanged. 1 for row-wise expansion and 0 for column-wise expansion.
- PyBMF.models.GreConDPlus.expansion(X_gt, X_old, u, v, w_fp, w_fn)[source]¶
Row-wise or column-wise expansion of a pattern given u and v.
In GreConD+, factors are initially the dense cores. Factor
uandvwill grow in each iteration. The expansion part is recorded inu_expandv_exp.- Parameters:
X_gt (spmatrix) – The ground-truth matrix.
X_old (spmatrix) – The residual matrix before the join of u and v.
u ((m, 1) spmatrix) – The factors.
v ((n, 1) spmatrix) – The factors.
w_fp (float) – The penalty weights for false positives (FP).
w_fn (float) – The penalty weights for false negatives (FN).
- Returns:
u, ((m, 1) spmatrix) – The expansion part in
u.v, ((n, 1) spmatrix) – The expansion part in
v.
PyBMF.models.Hyper module¶
- class PyBMF.models.Hyper.Hyper(min_support)[source]¶
Bases:
BaseModelThe Hyper algorithm.
Hyper is an exact decomposition algorithm. Hyper+ is used after fitting a Hyper model. It’s a relaxation of the exact decomposition.
- Parameters:
min_support (float) – The ‘alpha’ in the paper. The min support of frequent itemsets.
- static find_hyper(I, X_gt, X_rs)[source]¶
- queuelist
The indices of rows with non-zero uncoverage, in descending order. Row must be a support of I.
- fit(X_train, X_val=None, X_test=None, **kwargs)[source]¶
Fit the model to observations, with validation and test if necessary.
For a fitting procedure, implement and append your
_fit()andfinish().Simply overwrite this method if you want to drop or add any parts of the procedures.
- Parameters:
X_train (ndarray) – Training data.
X_val (ndarray) – Validation data.
X_test (ndarray) – Test data.
**kwargs (dict) – Other parameters.
- init_model()[source]¶
Initialize the model.
Called after params are set and datasets are loaded.
Simply overwrite this method if you want to drop or add any parts of the procedures.
PyBMF.models.HyperPlus module¶
- class PyBMF.models.HyperPlus.HyperPlus(model, target_k, samples=500, beta=inf)[source]¶
Bases:
HyperThe Hyper+ algorithm.
Hyper is an exact decomposition algorithm. Hyper+ is used after fitting a Hyper model. It’s a relaxation of the exact decomposition.
- Parameters:
model (Hyper class) – The fitted
Hypermodel.target_k (int) – The target number of factors. By default, it’s 1. This will ask the model to factorize all the way down to
k= 1. The last pattern will always be full 1 matrix.samples (int, default: all possible samples) – Number of pairs to be merged during trials. Smaller
sampleswill be faster but more likely to surpass the false positive ratebeta.beta (float) – The upper limit of false positive / ground truth. 1.0 means no limit.
- check_params(**kwargs)[source]¶
Check and load model parameters and experiment configurations.
Called upon model initialization and fitting.
# include this in your model class: def __init__(self, k, tol, alpha): self.check_params(k=k, tol=tol, alpha=alpha) def fit(self, X_train, X_val=None, X_test=None, **kwargs): self.check_params(**kwargs) # call them when initializing and fitting: model = MyModel(k=10, W='mask', alpha=0.1, seed=1997) model.fit(X_train, X_val, X_test, seed=2024, task='prediction', verbose=False, display=True)
- fit(X_train, X_val=None, X_test=None, **kwargs)[source]¶
Fit the model to observations, with validation and test if necessary.
For a fitting procedure, implement and append your
_fit()andfinish().Simply overwrite this method if you want to drop or add any parts of the procedures.
- Parameters:
X_train (ndarray) – Training data.
X_val (ndarray) – Validation data.
X_test (ndarray) – Test data.
**kwargs (dict) – Other parameters.
PyBMF.models.IP module¶
- class PyBMF.models.IP.IP[source]¶
Bases:
ContinuousModel
PyBMF.models.MEBF module¶
- class PyBMF.models.MEBF.MEBF(k=None, tol=0, t=None, w_fp=1, w_fn=1)[source]¶
Bases:
BaseModelMedian Expansion for Boolean Factorization
- Parameters:
k (int, optional) – The target rank. If
None, it will factorize until the error is smaller thantol, or when other stopping criteria is met.tol (float, default: 0) – The error tolerance.
w_fp (float) – The penalty weights for false positives (FP).
w_fn (float) – The penalty weights for false negatives (FN).
- get_factor(axis)[source]¶
Get factor for bi-directional growth.
- Parameters:
axis (int) – 0, sort cols, find middle u and grow on v. 1, sort rows, find middle v and grow on u.
- Returns:
a (csr_matrix) – A factor vector.
b (csr_matrix) – A factor vector.
PyBMF.models.MaxSAT module¶
- class PyBMF.models.MaxSAT.MaxSAT(k, mode='fast_undercover')[source]¶
Bases:
ContinuousModel- fit(X_train, X_val=None, X_test=None, **kwargs)[source]¶
Fit the model to observations, with validation and test if necessary.
For a fitting procedure, implement and append your
_fit()andfinish().Simply overwrite this method if you want to drop or add any parts of the procedures.
- Parameters:
X_train (ndarray) – Training data.
X_val (ndarray) – Validation data.
X_test (ndarray) – Test data.
**kwargs (dict) – Other parameters.
PyBMF.models.MessagePassing module¶
PyBMF.models.NMFSklearn module¶
- class PyBMF.models.NMFSklearn.NMFSklearn(k, U=None, V=None, beta_loss='frobenius', init_method='nndsvd', solver='cd', tol=0.0001, max_iter=1000, seed=None)[source]¶
Bases:
ContinuousModelNMF by scikit-learn.
- Parameters:
U (ndarray, spmatrix) – Need to be prepared if
init_methodis ‘custom’.V (ndarray, spmatrix) – Need to be prepared if
init_methodis ‘custom’.
- check_params(**kwargs)[source]¶
Check and load model parameters and experiment configurations.
Called upon model initialization and fitting.
# include this in your model class: def __init__(self, k, tol, alpha): self.check_params(k=k, tol=tol, alpha=alpha) def fit(self, X_train, X_val=None, X_test=None, **kwargs): self.check_params(**kwargs) # call them when initializing and fitting: model = MyModel(k=10, W='mask', alpha=0.1, seed=1997) model.fit(X_train, X_val, X_test, seed=2024, task='prediction', verbose=False, display=True)
- fit(X_train, X_val=None, X_test=None, **kwargs)[source]¶
Fit the model to observations, with validation and test if necessary.
For a fitting procedure, implement and append your
_fit()andfinish().Simply overwrite this method if you want to drop or add any parts of the procedures.
- Parameters:
X_train (ndarray) – Training data.
X_val (ndarray) – Validation data.
X_test (ndarray) – Test data.
**kwargs (dict) – Other parameters.
PyBMF.models.OrMachine module¶
- class PyBMF.models.OrMachine.OrMachine(k)[source]¶
Bases:
ContinuousModel- fit(X_train, X_val=None, X_test=None, **kwargs)[source]¶
Fit the model to observations, with validation and test if necessary.
For a fitting procedure, implement and append your
_fit()andfinish().Simply overwrite this method if you want to drop or add any parts of the procedures.
- Parameters:
X_train (ndarray) – Training data.
X_val (ndarray) – Validation data.
X_test (ndarray) – Test data.
**kwargs (dict) – Other parameters.
PyBMF.models.PNLPF module¶
- class PyBMF.models.PNLPF.PNLPF(k, U=None, V=None, W='full', reg=2.0, beta_loss='frobenius', solver='mu', link_lamda=10, reg_growth=3, max_reg=10000000000.0, tol=0.01, min_diff=0.0, max_iter=100, init_method='custom', normalize_method='balance', seed=None)[source]¶
Bases:
BinaryMFPenaltyPNLPF, binary matrix factorization’s penalty function algorithm with sigmmoid link function.
- Parameters:
reg (float) – The regularization weight ‘lambda’ in the paper.
reg_growth (float) – The growing rate of regularization weight.
max_reg (float) – The upper bound of regularization weight.
tol (float) – The error tolerance ‘epsilon’ in the paper.
- update_U(**kwargs)¶
The update process of U.
- update_V(**kwargs)¶
The update process of V.
- PyBMF.models.PNLPF.get_prediction_with_sigmoid(U, V, link_lamda)[source]¶
Get prediction with sigmoid link function.
PyBMF.models.PRIMP module¶
PyBMF.models.PRIMPPyTorch module¶
PyBMF.models.Panda module¶
- class PyBMF.models.Panda.Panda(k=None, tol=0, w_model=1, w_fp=1, w_fn=1, init_method='correlation', exact_decomp=False)[source]¶
Bases:
BaseModelPaNDa and PaNDa+ algorithm.
PaNDa and PaNDa+ both fix
w_fp,w_fnto1.0. They’re allowed to be tuned in our implementation.PaNDa fixes
w_modelto1.0while PaNDa+ does not.- Parameters:
k (int, optional) – The target rank. If None, it will factorize until the error is smaller than tol, or when other stopping criteria is met.
tol (float, default: 0) – The error tolerance.
w_model (float) – The model code weight rho in the Panda+ paper.
w_fp (float) – The penalty weights for false positives (FP). Added on top of the original works of Panda and Panda+ for flexibility.
w_fn (float) – The penalty weights for false negatives (FN). Added on top of the original works of Panda and Panda+ for flexibility.
init_method (str) – How items are sorted.
exact_decomp (bool) – Exact decomposition option. Added on top of the original works of Panda and Panda+ to enable exact decomposition.
- _fit()[source]¶
The main process of fitting.
In Panda, 3 lists are maintained.
E(list) is the item extension list.I(spmatrix) is the item list.T(spmatrix) is the user or transaction list.
- extend_core(**kwargs)¶
- find_core(**kwargs)¶
- sort_items(method)[source]¶
Sort the extension list by descending scores.
- Parameters:
method (str) –
Sort method.
frequency:Sort items in extension list by frequency.
couples-frequency:Sort items in extension list by frequency of item-pairs that include an item.
correlation:Sort items in extension list by correlation with current transactions
T.
PyBMF.models.TransposedModel module¶
PyBMF.models.WNMF module¶
- class PyBMF.models.WNMF.WNMF(k, U=None, V=None, W='mask', beta_loss='frobenius', init_method='normal', solver='mu', tol=0.0, min_diff=0.0, max_iter=30, seed=None)[source]¶
Bases:
ContinuousModelWeighted Nonnegative Matrix Factorization.
- Parameters:
U (ndarray, spmatrix) – Need to be prepared if
init_methodis ‘custom’.V (ndarray, spmatrix) – Need to be prepared if
init_methodis ‘custom’.
- check_params(**kwargs)[source]¶
Check and load model parameters and experiment configurations.
Called upon model initialization and fitting.
# include this in your model class: def __init__(self, k, tol, alpha): self.check_params(k=k, tol=tol, alpha=alpha) def fit(self, X_train, X_val=None, X_test=None, **kwargs): self.check_params(**kwargs) # call them when initializing and fitting: model = MyModel(k=10, W='mask', alpha=0.1, seed=1997) model.fit(X_train, X_val, X_test, seed=2024, task='prediction', verbose=False, display=True)
- error(**kwargs)¶
- update(**kwargs)¶