A Homogenous High-Performance Accelerator for Machine Learning Techniques

2021 ELE Engineering Design Project (FM02)

Faculty Lab Coordinator

Farah Mohammadi

Topic Category

Consumer Products/Applications

Preamble

Machine Learning (ML) algorithms have shown significant advantages in AI-based applications. Pattern recognition, prediction, and control optimization are some examples of these artificial intelligence (AI) based applications. Due to the massive parallel processing of ML algorithms, the performance of the existing computational platforms for running of ML algorithms is often limited by their huge communication overheads and storage requirements. As a result, architecting a low-overhead homogenous framework for execution of ML techniques to improve of future AI-based system is a crucial task.

Objective

Architecting a high-performance homogenous framework (Pure multi-CPUs or Pure multi-GPUs) to execute ML algorithms is the goal of this work. Popular companies such Intel and NVDIA are looking for low-cost high-performance hardware platforms to execute complex ML algorithms. Homogeneous architectures for running complex algorithms are more accessible than heterogenous platforms. Therefore, the target of this work is comparing the existing homogenous architectures and proposing a low-overhead high-performance homogenous framework for future ML algorithms.

Partial Specifications

-Simulating of Multi-CPU architectures for each ML algorithm.
-Simulating of Multi-GPU architectures for each ML algorithm.
- Studying on Energy, Performance, and Area of each ML algorithm on multi-CPU.
- Studying on Energy, Performance, and Area of each ML algorithm on multi-GPU.
- Gem5 should be installed.
-GPGPUSim should be installed.
- McPAT and HotSpot should be installed.

Suggested Approach

1- Developing a framework for different topologies of Multi-CPUs.
2- Simulating a Multi-CPU architecture on GEM5.
3- Simulating a Multi-GPU architecture on GPGPUSim.
4- Comparing of Energy Consumption and Performance of Multi-CPU and Multi-GPU under different ML workloads and proposing an optimum architecture for each ML workload.

Group Responsibilities

- Studying on ML algorithms resource usage in existing Simulators such as GEM5 and GPGPUSim that Intel and AMD are working on them.
- Working on GEM5.
-Working on GPGPUSim.
- Comparison between Multi-CPUs and Multi-GPUs in terms of energy consumption and performance.
-Proposing an optimum architecture for each ML workload.
-Prepare a technical report and present the results at the end of the program.

Student A Responsibilities

-Designing a Model for resource sharing for each ML algorithm on Multi-CPUs.
-Working with GEM5 Simulator for the Multi-CPUs.

Student B Responsibilities

-Designing a Model for resource sharing for each ML algorithm on Multi-GPUs.
-Working with GPGPUSim Simulator for the Multi-GPUs.

Student C Responsibilities

-Installing the GPGPUSim Simulator for the GPUs and extracting related performance and power log.
-Installing the GEM5 Simulator for the CPUs and extracting related performance and power log.

Student D Responsibilities

-Simulating the final proposed architecture for each ML algorithm.

Course Co-requisites

Digital Systems, Programming in C, Microprocessors

To ALL EDP Students

Due to COVID-19 pandemic, in the event University is not open for in-class/in-lab activities during the Winter term, your EDP topic specifications, requirements, implementations, and assessment methods will be adjusted by your FLCs at their discretion.

FM02: A Homogenous High-Performance Accelerator for Machine Learning Techniques | Farah Mohammadi | Sunday September 12th 2021 at 07:51 AM