Herbert Taucher - Siemens AG
Machine learning is being applied to an increasing number of embedded systems. While some systems may be able to send inferencing tasks to the cloud, many will be required to perform inferencing on board. With constrained compute resources and limited power inferencing algorithms need to be carefully designed to balance the need for performance while remaining within a strict power budget.
Inferencing algorithms often rely on multi-dimensional convolutions, which generate large amounts of intermediate data. In fact, the design of the memory architecture and the management of the data movement in the system may be more impactful on performance and power consumption than the architecture of the computational units themselves.
However, waiting until RTL is available to evaluate the performance of the inferencing engine is often too late in the design cycle. Further, evaluating multiple architectures at the RTL level is too time consuming and expensive for most projects.
This tutorial will teach developers to use NVidia Matchlib to perform throughput accurate analysis of an inferencing system using C++/SystemC, prior to the development of RTL. NVidia MatchLib is a new open-source library of C++/SystemC components that can be used to model a system in a way that throughput can be verified early in the design cycle.
To concretely illustrate these concepts, an example based on the implementation of a “wake word” inferencing system will be explored. Wake word systems continuously monitor an incoming audio stream for a specific word, using a trained neural network to identify the specific word or phrase. Once the word is identified, the embedded system is brought to life. For battery powered systems, the continuously running convolutional neural network must be as power efficient as possible, while meeting a hard real-time requirements.