Project

Improving the use of scheduling features on SoC GPUs

Graphical Processing Units (GPUs) used in embedded systems such as cars, robotics, mobile devices are usually required to run multiple tasks at the same time. But traditionally GPUs are designed to run only one task at a time. Prior studies have introduced schedulers which allow running multiple tasks simultaneously. However, these schedulers were mostly designed for dedicated GPUs. Since embedded systems use System-On-Chip (SoC) GPUs which have different architectures than dedicated GPUs (i.e., the GPU and CPU share the same memory, and all are on the same chip), previously developed schedulers are not directly applicable to embedded systems.
 Previous studies have been using “concurrent kernel” and “dynamic parallelism” features of the modern GPUs to build schedulers to run multiple tasks simultaneously. This project aims to improve the use of these scheduling features on SoC architecture by utilizing “zero copy” and “unified memory” techniques which eliminate the explicit copies from the CPU to GPU memory and vice-versa. The results show that unified memory is a more effective technique than zero copy for improving the scheduling features of the SoC architecture. It is possible that the findings of this study in the future will allow building schedulers that are customized for SoC GPUs.
 In this project, the GPU computing language specific for NVIDIA GPUs called CUDA and an emerging SoC GPUs from NVIDIA called Tegra X1 and Tegra K1 are utilized.

Project (M.S., Computer Science)--California State University, Sacramento, 2018.

Graphical Processing Units (GPUs) used in embedded systems such as cars, robotics, mobile devices are usually required to run multiple tasks at the same time. But traditionally GPUs are designed to run only one task at a time. Prior studies have introduced schedulers which allow running multiple tasks simultaneously. However, these schedulers were mostly designed for dedicated GPUs. Since embedded systems use System-On-Chip (SoC) GPUs which have different architectures than dedicated GPUs (i.e., the GPU and CPU share the same memory, and all are on the same chip), previously developed schedulers are not directly applicable to embedded systems. Previous studies have been using “concurrent kernel” and “dynamic parallelism” features of the modern GPUs to build schedulers to run multiple tasks simultaneously. This project aims to improve the use of these scheduling features on SoC architecture by utilizing “zero copy” and “unified memory” techniques which eliminate the explicit copies from the CPU to GPU memory and vice-versa. The results show that unified memory is a more effective technique than zero copy for improving the scheduling features of the SoC architecture. It is possible that the findings of this study in the future will allow building schedulers that are customized for SoC GPUs. In this project, the GPU computing language specific for NVIDIA GPUs called CUDA and an emerging SoC GPUs from NVIDIA called Tegra X1 and Tegra K1 are utilized.

Relationships

Items