Type of work:
Master Thesis / Diplomarbeit
Memory-intensive applications such as neural networks are trending since the last decade. A wide variety of accelerators for such applications are presented to achieve higher performance and lower power consumption compared to general-purpose computing. A current emerging computing paradigm in this context is the Processing-in-Memory (PIM). The aim of PIM architectures is to eliminate the memory-bound challenges (i.e. high energy, long latency, and low external bandwidth) that are encountered in conventional accelerator platforms designed for such applications. The tight coupling of the memory arrays and computation logic enables massive internal data parallelism (i.e. performance increase) and minimal data movement cost (i.e. reduced power), which results in much higher energy efficiency. Researchers have investigated a wide variety of memory types for PIM architectures, such as SRAM, RRAM, and DRAM. So far, the first two types (i.e. SRAM and RRAM) received most of the attention. However, there are fundamental drawbacks in using them (i.e. SRAM and RRAM) for PIM architectures. Particularly, SRAM-based PIM accelerators have a low memory capacity and are not suitable for networks with a large memory footprint, while RRAMs are not technologically as mature as DRAMs or SRAMs. DRAMs satisfy both technological maturity and high memory capacity requirements. Hence, the DRAM-based PIM are investigated more in recent times. In the year 2021, Samsung published the first silicon-proven DRAM-based PIM architecture.
One of the major bottlenecks encountered while designing a DRAM-based PIM architecture is the accurate full system analysis of power, performance, and throughput including the memory controller overhead (such as input/weight data storage, command issue latency, etc.). Most of the DRAM-PIM research work present an optimistic result excluding the memory controller overhead and also fail to present a detailed analysis on the desired optimizations in the memory controller to achieve high throughput from these PIM devices. In this thesis, you will model one of the previously published DRAM-based PIM architecture (i.e. Samsung’s HBM-PIM) in an open-source cycle-accurate SystemC/TLM-based DRAM simulator called DRAMSys. You will perform the memory sub–system level analysis, instead of device level analysis done by many PIM papers, and compare the results against the original paper’s claim. This model will be a framework for memory sub–system level analysis and results calculation of novel PIM architectures.