Embedded Neural Networks in Resource-Constrained Hearing Instruments

Zuzana Jelcicová

Research output: Book/ReportPh.D. thesis

383 Downloads (Orbit)

Abstract

Deep neural networks have revolutionized many different areas, including speech enhancement, speech recognition, and speech separation that are relevant for hearing instrument users and professionals. At the same time, the state-of-the-art neural networks are huge models that require megabytes of storage and millions of operations for every input, which makes them extremely energy-intensive and thus difficult to deploy to resource-constrained devices such as hearing instruments. For these reasons, neural networks were previously only executed in a high-performance computing environment (cloud), and the results were afterwards sent back to IoT (edge) devices. However, real-time applications such as hearing instruments require low-latency connections and processing to not compromise sound quality. Moreover, in order to communicate with the cloud, an edge device must be connected constantly, which is unfeasible and quickly drains the battery of the device. Last but not least, sharing data with the cloud is not desirable due to security issues. Therefore, the focus in recent years has been on enabling the execution of neural networks directly in low-power devices.

To successfully accomplish this goal, it is imperative to develop computationally efficient hardware-aware deep learning algorithms that in turn should be executed on custom hardware accelerators optimized for neural network processing. Exploring the ways to achieve such algorithm-hardware co-optimization is the objective of this PhD thesis.

Therefore, this work firstly proposes two novel dynamic pruning algorithms, called PeakRNN and StatsRNN, for reducing the number of multiply-accumulates and memory accesses during inference. Since our focus is on audio (speech enhancement and speech recognition), we primarily explore Recurrent Neural Networks (RNNs) and Transformer Neural Networks (TNNs) due to their capabilities to process temporal information. All our experiments demonstrate substantial reductions in computations while maintaining high performance in the evaluation metrics. PeakRNN is chosen for the next stage of the project as it prunes a layer by selecting a constant number of top elements every timestep, which offers, among others, determinism and worst-case execution time guarantees for the subsequent network operations.

The second part of the project focuses on efficient hardware support for neural networks. In total, three custom ASIC accelerators are presented. Firstly, a smallfootprint low-power configurable accelerator for speech recognition is proposed. It implements a novel deterministic two-step scaling method for reducing the number of activation memory accesses at runtime. The accelerator is compared against a typical digital signal processor, and it considerably outperforms the processor in all aspects, including lower power consumption, a smaller area, and fewer memory accesses. Therefore, the accelerator can be easily used in hearing instruments. Secondly, an energy-efficient min-heap accelerator is designed to realize the selection of the top elements for PeakRNN. It is also a part of the third and final accelerator, called PeakEngine, that is capable of executing inference for both dense and pruned layers. PeakEngine is configurable, and it represents the first RNN ASIC accelerator for hearing-instrument-relevant use cases that applies dynamic pruning by selecting a constant number of elements to guarantee deterministic inference. The co-optimization between PeakRNN and PeakEngine results in a significant reduction of energy and latency of the original dense network, making the execution of big RNNs viable in hearing instruments within the imposed time and energy budget.
Original languageEnglish
PublisherTechnical University of Denmark
Number of pages186
Publication statusPublished - 2022

Fingerprint

Dive into the research topics of 'Embedded Neural Networks in Resource-Constrained Hearing Instruments'. Together they form a unique fingerprint.

Cite this