• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. A Low Power In-DRAM Architecture for Quantized CNNs using Fast Winograd Convolutions
 
  • Details
  • Full
Options
2020
Conference Paper
Title

A Low Power In-DRAM Architecture for Quantized CNNs using Fast Winograd Convolutions

Abstract
In recent years, the performance and memory bandwidth bottlenecks associated with memory intensive applications are encouraging researchers to explore Processing in Memory (PIM) architectures. In this paper, we focus on DRAM-based PIM architecture for Convolutional Neural Network (CNN) inference. The close proximity of the computation units and the memory cells in a PIM architecture reduces the data movement costs and improves the overall energy efficiency. In this context, CNN inference requires efficient implementations of the area-intensive arithmetic multipliers near the highly dense DRAM regions. Additionally, the multiplication units increase the overall latency and power consumption. Due to this, most previous works in this domain uses binary or ternary weights, which replaces the complicated multipliers with bitwise logical operations resulting in efficient implementations. However, it is well known that the binary and ternary weight networks considerably affect the accuracy and hence can be used only for limited applications. In this work, we present a novel DRAM-based PIM architecture for quantized (8-bit weight and input) CNN inference by utilizing the complexity reduction offered by fast convolution algorithms. The Winograd convolution accelerates the widely-used small convolution sizes by reducing the number of multipliers as compared to direct convolution. In order to exploit data parallelism and minimize energy, the proposed architecture integrates the basic computation units at the output of the Primary Sense Amplifiers (PSAs) and the rest of the substantial logic near the Secondary Sense Amplifiers (SSAs) and completely comply with the commodity DRAM technology and process. Commodity DRAMs are temperature sensitive devices, hence integration of the additional logic is challenging due to increase in the overall power consumption. In contrast to previous works, our architecture consumes 0.525 W, which is within the range of commodity DRAM thermal design power (i.e. < 1 W). For VGG16, the proposed architecture achieves 21.69 GOPS per device and an area overhead of 2.04% compared to a commodity 8 Gb DRAM. The architecture delivers a peak performance of 7.552 TOPS per memory channel while maintaining a high energy efficiency of 95.52 GOPS/W. We also demonstrate that our architecture consumes 10.1 × less power and is 2.23 × energy efficient as compared to prior DRAM-based PIM architectures.
Author(s)
Ghaffar, Muhammad Mohsin
Sudarshan, Chirag
Weis, Christian
Jung, Matthias  
Fraunhofer-Institut für Experimentelles Software Engineering IESE  
Wehn, Norbert
Mainwork
The International Symposium on Memory Systems, MEMSYS 2020. Proceedings  
Project(s)
OPRECOMP
Funder
European Commission  
Conference
International Symposium on Memory Systems (MEMSYS) 2020  
DOI
10.1145/3422575.3422790
Language
English
Fraunhofer-Institut für Experimentelles Software Engineering IESE  
Keyword(s)
  • DRAM

  • Neural Networks

  • PIM architecture

  • Quantized CNN

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024