Mining temporal patterns from iTRAQ mass spectrometry(LC-MS/MS) data Conference

Saeed, F, Pisitkun, T, Knepper, MA et al. (2011). Mining temporal patterns from iTRAQ mass spectrometry(LC-MS/MS) data . 152-159.

cited authors

  • Saeed, F; Pisitkun, T; Knepper, MA; Hoffert, JD

authors

abstract

  • Large-scale proteomic analysis is emerging as a powerful technique in biology and relies heavily on data acquired by state-of-the-art mass spectrometers. As with any other field in Systems Biology, computational tools are required to deal with this ocean of data. iTRAQ (isobaric Tags for Relative and Absolute quantification) is a technique that allows simultaneous quantification of proteins from multiple samples. Although iTRAQ data gives useful insights to the biologist, it is more complex to perform analysis and draw biological conclusions because of its multi-plexed design. One such problem is to find proteins that behave in a similar way (i.e. change in abundance) among various time points since the temporal variations in the proteomics data reveal important biological information. Distance based methods such as Euclidian distance or Pearson coefficient, and clustering techniques such as k-mean etc, are not able to take into account the temporal information of the series. In this paper, we present an linear-time algorithm for clustering similar patterns among various iTRAQ time course data irrespective of their absolute values. The algorithm, referred to as Temporal Pattern Mining(TPM), maps the data from a Cartesian plane to a discrete binary plane. After the mapping a dynamic programming technique allows mining of similar data elements that are temporally closer to each other. The proposed algorithm accurately clusters iTRAQ data that are temporally closer to each other with more than 99% accuracy. Experimental results for different problem sizes are analyzed in terms of quality of clusters, execution time and scalability for large data sets. An example from our proteomics data is provided at the end to demonstrate the performance of the algorithm and its ability to cluster temporal series irrespective of their distance from each other.

publication date

  • December 1, 2011

International Standard Book Number (ISBN) 13

start page

  • 152

end page

  • 159