Impact of aliasing on deep CNN-based end-to-end acoustic models

Impact of aliasing on deep CNN-based end-to-end acoustic models Conference

Gong, Y, Poellabauer, C. (2018). Impact of aliasing on deep CNN-based end-to-end acoustic models . 2018-September 2698-2702. 10.21437/Interspeech.2018-1371

cited authors

Gong, Y; Poellabauer, C

authors

Poellabauer, Christian

abstract

A recent trend in audio and speech processing is to learn target labels directly from raw waveforms rather than hand-crafted acoustic features. Previous work has shown that deep convolutional neural networks (CNNs) as front-end can learn effective representations from the raw waveform. However, due to the large dimension of raw audio waveforms, pooling layers are usually used aggressively between temporal convolutional layers. In essence, these pooling layers perform operations that are similar to signal downsampling, which may lead to temporal aliasing according to the Nyquist-Shannon sampling theorem. This paper explores, using a series of experiments, if and how this aliasing effect impacts modern deep CNN-based models.

publication date

January 1, 2018

Digital Object Identifier (DOI)

https://doi.org/10.21437/interspeech.2018-1371

start page

2698

end page

2702

volume

2018-September

FIU Discovery

Impact of aliasing on deep CNN-based end-to-end acoustic models Conference

Overview

cited authors

authors

abstract

publication date

Identifiers

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume