Towards Universal 1-bit Weight Quantization of Neural Networks with end-to-end deployment on ultra-low power sensors

Abstract

Machine learning has proven its strength and versatility in both academia and the industry, and is still expected to expand its exciting growth for the foreseeable future. Though, not without challenges laying at its frontier, machine learning has met the large world of tiny hardware. In the case of MEMS, embedded hardware has distinct objectives and challenges&amp both memory and computation constraints (integers, unsupported ops, …), operates always–on with limited latency and power usage, yet it must process sensor data efficiently and give expected outputs. One of the biggest and first challenges of tinyML is to find feasible solutions that meet the high constraints of low–power devices without losing the full–scale performance. Thus, effective quantization both in terms of preserving the original performance and hardware integrability is a critical step towards robust tinyML. Hence, we favor methods that can easily integrate into common low–power hardware over fancier algorithms that need building non–trivial support. Currently, at TDK InvenSense we have an end–to–end tinyML platform that enables training and deployment of machine learning models (decision trees, standard neural networks&amp CNN, RNNs, Dense, and various activations) on any low–power MCU platform for sensor applications. Neural network weights are quantized to 8–bits and run with full integer inference. For our sensor applications (motion activity, audio detection, …), we can get ultra–low power–friendly models that get deployed with less than ~10kB total (code + model), low latency, and power consumption. We found that our 8-bits quantization loss is consistently close to its floating–point counterpart (1-2% accuracy in general), which means we can go further! To improve our quantization, we focus on a promising 1–bit weight quantization approach for neural networks that optimizes the model and the weights at the same time during training. It has a low training overhead and is hassle–free, scalable, and automatable. We show that this method 2 can generalize to n–bits quantization, granted sufficient int–n support is available on the edge device. This algorithm is model–agnostic and can integrate into any training phase of the TinyMLOps workflow, for any application. We show that almost all weights converge to binary values or close to them and that the large majority stabilizes during training. We will also explore binary quantization tradeoffs and challenges&amp binary model size vs full–scale performance, binary rounding problem, hybrid quantization, We test our binary quantization algorithm on MNIST and a bring–to–see (face–to–wake) dataset for smartwatches and compare it to related approaches and discuss limitations. We found minimal accuracy loss (max. 2–3%) and very robust results over independent repetitions.

Publication
Minh Tri LÊ, Ph.D.
Minh Tri LÊ, Ph.D.
Doctor in deep learning

Seeking an R&D role in AI
Available in Grenoble or remotely

I am a recent Ph.D. graduate in deep learning. I am passionnate about AI, innovation, R&D, and software development.

Next
Previous

Related