Exploring implementation of n-bit precision binarization convolutional neural networks

Project (M.S., Computer Engineering)--California State University, Sacramento, 2020.

Convolutional neural networks (CNNs) are essential to many computer vision tasks since they provide accurate predictions. One of the problems with CNNs is that the method utilized to achieve high accuracies requires a vast amount of storage and computational power. One solution introduced to address the high demand for memory and computational capability focused on reducing the precision of the neural network. Binary neural networks (BNNs) have become a promising method since they allow weights and activations to be binarized, resulting in a significant memory reduction. The primary goal of the project is to test the performance of a BNN using a complex dataset. Most of the performance testing done in previous papers used simple test sets, which did not push the design to the limit. We assessed the weakness and strengths of the model by using a Radio Modulation Recognition dataset that required a higher order of precision. Another goal was to replicate an existing BNN using an FPGA accelerator. The purpose of building the FPGA model was to analyze the possibility of increasing the accuracy of the neural network by migrating to an N-Bit precision. In this work, the models of the BNNs testing the complex dataset were implemented using Python language. We developed two models using different machine learning frameworks, namely Lasagne and Keras. We used existing implementations of the binarized convolution and the dense layer in both frameworks. We utilized Google Colaboratory to execute the design since it provides CPUs, GPUs, and TPUs for runtime execution. Lasagne could not take advantage of the GPU in Google Colaboratory. Hence, we decided to build a second model using Keras, which was able to execute using GPU backend operations. We gathered the classification data using the radio machine learning (RML) dataset and showed the accuracy of the results. We then compared the results of the binarized and full-precision models. We concluded that the binarized network achieved similar classification accuracies as the full-precision model. The FPGA implementation relied on open-source code in C++ for the BNN. We replicated and synthesized the accelerator design into an FPGA. During synthesis, we added optimization algorithms to decrease timing delays. After executing the model on the FPGA, we gathered timing and power-dissipation measurements. The source code was then analyzed to calculate the complexity of the changes required to transition to an N-bit precision. Increasing to N-bit precision required new quantization, pop-count, and classification procedures. Not many changes were made to the source code since the analysis was the primary goal before the implementation could proceed. An N-bit parameter was integrated into a new quantization function to automate the required structural updates.