This course aims to present recent advancements that strive to achieve efficient processing of DNNs. Specifically, it will offer an overview of DNNs, delve into techniques to distribute the workload, dive into various architectures and systems that support DNNs, and highlight key trends in recent techniques for efficient processing. These techniques aim to reduce the computational and communication costs associated with DNNs, either through hardware and system optimizations. The course will also provide a summary of various development resources to help researchers and practitioners initiate DNN deployments swiftly. Additionally, it will emphasize crucial benchmarking metrics and design considerations for evaluating the rapidly expanding array of DNN hardware designs, system optimizations, proposed in both academia and industry.
This course covers modern computer architecture, including out-of-order instruction execution, branch prediction, multi-level caches and cache optimizations, memory, cache coherence, memory consistency, and multi-core processors. By the conclusion of this course, you will appreciate major topics in the field of Computer Architecture and understand the main principles of operation of modern general-purpose computer hardware.
Prerequisites: Undergraduate computer architecture course that covers basic computer organization: instruction sets, basic caching, pipelining, etc. (CS2200 or ECE3057). It may be beneficial to refresh your knowledge on these topics before taking this course. The course has a significant project component that heavily relies on C/C++ programming. You need to be familiar with C/C++ and Linux.