Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction
- What is GPU programming?
- Why utilize GPU programming?
- What are the challenges and trade-offs of GPU programming?
- Which frameworks and tools are available for GPU programming?
- How to choose the right framework and tool for your application
OpenCL
- What is OpenCL?
- What are the advantages and disadvantages of OpenCL?
- Setting up the development environment for OpenCL
- Creating a basic OpenCL program that performs vector addition
- Using the OpenCL API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
- Writing kernels in OpenCL C that execute on the device and manipulate data
- Using OpenCL built-in functions, variables, and libraries to perform common tasks and operations
- Leveraging OpenCL memory spaces, such as global, local, constant, and private, to optimize data transfers and memory accesses
- Using the OpenCL execution model to control work-items, work-groups, and ND-ranges that define parallelism
- Debugging and testing OpenCL programs using tools like CodeXL
- Optimizing OpenCL programs using techniques such as coalescing, caching, prefetching, and profiling
CUDA
- What is CUDA?
- What are the advantages and disadvantages of CUDA?
- Setting up the development environment for CUDA
- Creating a basic CUDA program that performs vector addition
- Using the CUDA API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
- Writing kernels in CUDA C/C++ that execute on the device and manipulate data
- Using CUDA built-in functions, variables, and libraries to perform common tasks and operations
- Leveraging CUDA memory spaces, such as global, shared, constant, and local, to optimize data transfers and memory accesses
- Using the CUDA execution model to control threads, blocks, and grids that define parallelism
- Debugging and testing CUDA programs using tools such as CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight
- Optimizing CUDA programs using techniques such as coalescing, caching, prefetching, and profiling
ROCm
- What is ROCm?
- What are the advantages and disadvantages of ROCm?
- Setting up the development environment for ROCm
- Creating a basic ROCm program that performs vector addition
- Using the ROCm API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
- Writing kernels in ROCm C/C++ that execute on the device and manipulate data
- Using ROCm built-in functions, variables, and libraries to perform common tasks and operations
- Leveraging ROCm memory spaces, such as global, local, constant, and private, to optimize data transfers and memory accesses
- Using the ROCm execution model to control threads, blocks, and grids that define parallelism
- Debugging and testing ROCm programs using tools like ROCm Debugger and ROCm Profiler
- Optimizing ROCm programs using techniques such as coalescing, caching, prefetching, and profiling
HIP
- What is HIP?
- What are the advantages and disadvantages of HIP?
- Setting up the development environment for HIP
- Creating a basic HIP program that performs vector addition
- Using the HIP language to write kernels that execute on the device and manipulate data
- Using HIP built-in functions, variables, and libraries to perform common tasks and operations
- Leveraging HIP memory spaces, such as global, shared, constant, and local, to optimize data transfers and memory accesses
- Using the HIP execution model to control threads, blocks, and grids that define parallelism
- Debugging and testing HIP programs using tools like ROCm Debugger and ROCm Profiler
- Optimizing HIP programs using techniques such as coalescing, caching, prefetching, and profiling
Comparison
- Comparing features, performance, and compatibility of OpenCL, CUDA, ROCm, and HIP
- Evaluating GPU programs using benchmarks and metrics
- Learning best practices and tips for GPU programming
- Exploring current and future trends and challenges in GPU programming
Summary and Next Steps
Requirements
- Understanding of the C/C++ language and parallel programming concepts
- Foundational knowledge of computer architecture and memory hierarchy
- Experience with command-line tools and code editors
Audience
- Developers seeking to learn the basics of GPU programming and the primary frameworks and tools for building GPU applications.
- Developers aiming to write portable and scalable code compatible with various platforms and devices.
- Programmers interested in exploring the benefits and challenges of GPU programming and optimization.
21 Hours