Impact of aliasing on deep CNN-based end-to-end acoustic models Conference

Gong, Y, Poellabauer, C. (2018). Impact of aliasing on deep CNN-based end-to-end acoustic models . 2018-September 2698-2702. 10.21437/Interspeech.2018-1371

cited authors

  • Gong, Y; Poellabauer, C

abstract

  • A recent trend in audio and speech processing is to learn target labels directly from raw waveforms rather than hand-crafted acoustic features. Previous work has shown that deep convolutional neural networks (CNNs) as front-end can learn effective representations from the raw waveform. However, due to the large dimension of raw audio waveforms, pooling layers are usually used aggressively between temporal convolutional layers. In essence, these pooling layers perform operations that are similar to signal downsampling, which may lead to temporal aliasing according to the Nyquist-Shannon sampling theorem. This paper explores, using a series of experiments, if and how this aliasing effect impacts modern deep CNN-based models.

publication date

  • January 1, 2018

Digital Object Identifier (DOI)

start page

  • 2698

end page

  • 2702

volume

  • 2018-September