Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction
- What is OpenCL?
- Comparison: OpenCL vs CUDA vs SYCL
- Overview of OpenCL features and architecture
- Setting up the Development Environment
Getting Started
- Creating a new OpenCL project with Visual Studio Code
- Exploring the project structure and files
- Compiling and running the program
- Displaying output using printf and fprintf
OpenCL API
- Understanding the role of the OpenCL API in host programs
- Querying device information and capabilities via the OpenCL API
- Creating contexts, command queues, buffers, kernels, and events using the OpenCL API
- Enqueuing commands such as read, write, copy, map, unmap, execute, and wait using the OpenCL API
- Handling errors and exceptions with the OpenCL API
OpenCL C
- Understanding the role of OpenCL C in device programs
- Writing device-executed kernels and manipulating data with OpenCL C
- Using OpenCL C data types, qualifiers, operators, and expressions
- Utilizing OpenCL C built-in functions (e.g., math, geometric, relational)
- Leveraging OpenCL C extensions and libraries (e.g., atomic, image, cl_khr_fp16)
OpenCL Memory Model
- Distinguishing between host and device memory models
- Using OpenCL memory spaces: global, local, constant, and private
- Utilizing OpenCL memory objects: buffers, images, and pipes
- Applying OpenCL memory access modes (read-only, write-only, read-write, etc.)
- Implementing OpenCL memory consistency models and synchronization mechanisms
OpenCL Execution Model
- Differentiating between host and device execution models
- Defining parallelism using OpenCL work-items, work-groups, and ND-ranges
- Using OpenCL work-item functions (e.g., get_global_id, get_local_id, get_group_id)
- Using OpenCL work-group functions (e.g., barrier, work_group_reduce, work_group_scan)
- Using OpenCL device functions (e.g., get_num_groups, get_global_size, get_local_size)
Debugging
- Identifying common errors and bugs in OpenCL programs
- Using the Visual Studio Code debugger to inspect variables, breakpoints, and call stacks
- Debugging and analyzing OpenCL programs on AMD devices using CodeXL
- Debugging and analyzing OpenCL programs on Intel devices using Intel VTune
- Debugging and analyzing OpenCL programs on NVIDIA devices using NVIDIA Nsight
Optimization
- Understanding factors affecting OpenCL program performance
- Improving arithmetic throughput using OpenCL vector data types and vectorization techniques
- Reducing control overhead and increasing locality with loop unrolling and tiling techniques
- Optimizing memory accesses and bandwidth using OpenCL local memory and related functions
- Measuring and improving execution time and resource utilization through profiling and profiling tools
Summary and Next Steps
Requirements
- Proficiency in C/C++ programming and concepts of parallel programming.
- Fundamental knowledge of computer architecture and memory hierarchy.
- Experience using command-line tools and code editors.
Target Audience
- Developers seeking to learn how to program heterogeneous devices using OpenCL and exploit their parallelism.
- Developers aiming to write portable and scalable code compatible with various platforms and devices.
- Programmers interested in exploring the low-level aspects of heterogeneous programming to optimize code performance.
28 Hours