SGER: Collaborative Research: Nonnegative Matrix Factorization for Data Mining.

Nonnegative matrix factorization (NMF) factorizes an input nonnegative matrix into two nonnegative matrices of lower rank. It is recently discovered that NMF in the most basic form is equivalent to a relaxed K-means clustering, the most widely used pattern discovery algorithm in data mining. This direct link between mathematics and data mining sets in motion a large number of developments on using matrix factorizations for pattern discovery. It turns out that NMF provides more consistent and mathematically well-defined optimization formulations for many fundamental and emerging data-mining problems. NMF algorithms have well-understood properties; they are simple and easy-to-implement, well suited for distributed parallel architectures. This research aims to formally establish a comprehensive NMF-based framework for data mining. In particular, we will (1) extend matrix factorization data-mining methodology from current focus on clustering (pattern discovery) to newer problems: semi-supervised clustering (extending partial knowledge to whole data) and classifications (pattern prediction, such as predicting a cancer tumor tissue from a normal one); (2) develop fast numerical algorithms and incorporate state-of-the-art numerical optimization techniques; and (3) apply and evaluate the NMF algorithms in different real-world applications including text mining and bioinformatics.

FIU Discovery