This paper presents a hardware inference accelerator architecture of quantized deep neural networks (DNN). The proposed accelerator implements all computation in a quantize version of DNN including linear transformations like matrix multiplications, nonlinear activation functions such as ReLU, quantization and dequantization operation. The hardware accelerator of quantized DNN consists of matrix multiplication core which is implemented in systolic array architecture, and the QDR core for computing the operation of quantization, dequantization, and ReLU. This proposed hardware architecture is implemented in Verilog Hardware Description Language (HDL) code using modelsim. To validate, we simulated the quantized DNN using Python programming language and compared the results with our proposed hardware accelerator. The result of this comparison shows a very slight difference, confirming the validity of our quantized DNN hardware accelerator.
Copyrights © 2024