Predicting peptide properties from mass spectrometry data using deep attention-based multitask network and uncertainty quantification. Article

Tariq, Usman, Saeed, Fahad. (2024). Predicting peptide properties from mass spectrometry data using deep attention-based multitask network and uncertainty quantification. . 10.1101/2024.08.21.609035

cited authors

  • Tariq, Usman; Saeed, Fahad

authors

abstract

  • Database search algorithms reduce the number of potential candidate peptides against which scoring needs to be performed using a single (i.e. mass) property for filtering. While useful, filtering based on one property may lead to exclusion of non-abundant spectra and uncharacterized peptides - potentially exacerbating the streetlight effect. Here we present ProteoRift, a novel attention and multitask deep-network, which can predict multiple peptide properties (length, missed cleavages, and modification status) directly from spectra. We demonstrate that ProteoRift can predict these properties with up to 97% accuracy resulting in search-space reduction by more than 90%. As a result, our end-to-end pipeline is shown to exhibit 8x to 12x speedups with peptide deduction accuracy comparable to algorithmic techniques. We also formulate two uncertainty estimation metrics, which can distinguish between in-distribution and out-of-distribution data (ROC-AUC 0.99) and predict high-scoring mass spectra against correct peptide (ROC-AUC 0.94). These models and metrics are integrated in an end-to-end ML pipeline available at https://github.com/pcdslab/ProteoRift.

publication date

  • August 22, 2024

keywords

  • Bioinformatics
  • Deep Learning
  • Mass spectrometry
  • Proteomics
  • Uncertainty

Location

  • United States

Digital Object Identifier (DOI)