On the sampling of big mass spectrometry data

On the sampling of big mass spectrometry data Conference

Awan, MG, Saeed, F. (2015). On the sampling of big mass spectrometry data . 143-148.

Mass spectrometry (MS) based proteomics has useful biological applications. Modern mass spectrometers produce millions of spectra from complex biological samples in a short time. Existing solutions for analysis of MS data are based on sequential approaches to analyze these complex data sets. However, the enormous data deluge from mass spectrometers have severely limited the capabilities of these tools. If useful research has to proceed there is a need to develop big data techniques that will allow efficient computation of these heterogeneous noisy MS data sets. In this paper, we show how random sampling of MS data sets can lead to efficient and fast analysis while decreasing the amount of data that needs to be analyzed for each spectrum. For millions of spectra our sampling techniques decreases the number of peaks by a significant number that need to be analyzed for downstream analysis.