PyBMF

Documentation Status

A Python library for Boolean Matrix Factorization. Work under Preferred.ai.

PyBMF is under active development. We welcome the authors of BMF papers and those interested in BMF to play around and contribute. Please contact us if you have any questions or suggestions.

Prospectives

Boolean matrix factorization (BMF) is a well-known problem in pattern mining. Throughout the years of prosperous research, it has evolved from greedy heuristics to include a wide range of advanced technologies. We hold the belief that a playground with fairness and adaptiveness is necessary for the development of such algorithms.

PyBMF aims to provide a unified framework with:

  1. generators for various types of synthetic data

  2. unified ways of importing real-world data

  3. data splitting and cross-validation utilities

  4. negative sampling utilities for continuous methods

  5. the ability to utilize sparse matrices for heuristics

  6. evaluation tools for binary and continuous metrics

  7. visualization tools for single or multi-matrix data

  8. tools for saving and loading models and logs

  9. ability to incorporate Boolean matrix simplification and visualization models

Models

Category

Model

Paper

Original Implementation

In PyBMF

Heuristics

Asso

PKDD2006 TKDE2008

C

Heuristics

Hyper/Hyper+

SIGKDD2011

Heuristics

GreConD

JCSS2010

MATLAB

Heuristics

Panda

ICDM2010

Heuristics

Panda+

TKDE2013

Heuristics

NASSAU

SDM2015

link

Heuristics

GreConD+

DAM2018

MATLAB

Heuristics

MEBF

AAAI2020

R

Continuous

NMFSklearn

🛞 Wrapper of sklearn.nmf

Continuous

WNMF

✅ Multiplicative update

Continuous

BinaryMF-Penalty

ICDM2007

MATLAB

✅ Multiplicative update

Continuous

BinaryMF-Thresholding

ICDM2007

MATLAB

✅ Line search

Continuous

FastStep

PAKDD2016

C++

✅ Line search

Continuous

PRIMP

DMKD2017

CUDA C++

✅ PALM

Continuous

PNL-PF

SP2021

✅ Multiplicative update

Continuous

ELBMF

NIPS2022

Julia Python

✅ PALM

Probablistic

MessagePassing

ICML2016

Python

🛞 Wrapper of original implementation

Probablistic

OrMachine

ICLM2017

Cython

🛞 Wrapper of original implementation

Linear Optimization

ColumnGeneration

AAAI2021

Python

🛞 Wrapper of original implementation

Satisfiability

UndercoverBMF

AAAI2021

C++

🛞 Wrapper of original implementation

Simplification

IterEss

IS2019

Simplification

DelegationBMF

AAAI2024

C++

Visualization

OrderedBMF

SIAM2019

C++

Visualization

BiclusterVisualization

PKDD2023

Python

Compatibility

Currently built and tested on Python 3.9.18.

TODO

  • [ ] Add mask parameter W to PRIMP and ELBMF

  • [ ] Fix DataFrame display utils in dataframe_utils.py

  • [ ] Include BMF visualization models

  • [ ] Diagnosis of thresholding models

Subpackages

Module contents

Indices and tables