SGER: Collaborative Research: Nonnegative Matrix Factorization for Data Mining. Grant

  • Nonnegative matrix factorization (NMF) factorizes an input nonnegative matrix into two nonnegative matrices of lower rank. It is recently discovered that NMF in the most basic form is equivalent to a relaxed K-means clustering, the most widely used pattern discovery algorithm in data mining. This direct link between mathematics and data mining sets in motion a large number of developments on using matrix factorizations for pattern discovery. It turns out that NMF provides more consistent and mathematically well-defined optimization formulations for many fundamental and emerging data-mining problems. NMF algorithms have well-understood properties; they are simple and easy-to-implement, well suited for distributed parallel architectures. This research aims to formally establish a comprehensive NMF-based framework for data mining. In particular, we will (1) extend matrix factorization data-mining methodology from current focus on clustering (pattern discovery) to newer problems: semi-supervised clustering (extending partial knowledge to whole data) and classifications (pattern prediction, such as predicting a cancer tumor tissue from a normal one); (2) develop fast numerical algorithms and incorporate state-of-the-art numerical optimization techniques; and (3) apply and evaluate the NMF algorithms in different real-world applications including text mining and bioinformatics.

date/time interval

  • September 15, 2008 - August 31, 2009

sponsor award ID

  • 0844513