Get in Touch

Course Outline

Introduction

  • What is OpenCL?
  • Comparison: OpenCL vs CUDA vs SYCL
  • Overview of OpenCL features and architecture
  • Setting up the Development Environment

Getting Started

  • Creating a new OpenCL project with Visual Studio Code
  • Exploring the project structure and files
  • Compiling and running the program
  • Displaying output using printf and fprintf

OpenCL API

  • Understanding the role of the OpenCL API in host programs
  • Querying device information and capabilities via the OpenCL API
  • Creating contexts, command queues, buffers, kernels, and events using the OpenCL API
  • Enqueuing commands such as read, write, copy, map, unmap, execute, and wait using the OpenCL API
  • Handling errors and exceptions with the OpenCL API

OpenCL C

  • Understanding the role of OpenCL C in device programs
  • Writing device-executed kernels and manipulating data with OpenCL C
  • Using OpenCL C data types, qualifiers, operators, and expressions
  • Utilizing OpenCL C built-in functions (e.g., math, geometric, relational)
  • Leveraging OpenCL C extensions and libraries (e.g., atomic, image, cl_khr_fp16)

OpenCL Memory Model

  • Distinguishing between host and device memory models
  • Using OpenCL memory spaces: global, local, constant, and private
  • Utilizing OpenCL memory objects: buffers, images, and pipes
  • Applying OpenCL memory access modes (read-only, write-only, read-write, etc.)
  • Implementing OpenCL memory consistency models and synchronization mechanisms

OpenCL Execution Model

  • Differentiating between host and device execution models
  • Defining parallelism using OpenCL work-items, work-groups, and ND-ranges
  • Using OpenCL work-item functions (e.g., get_global_id, get_local_id, get_group_id)
  • Using OpenCL work-group functions (e.g., barrier, work_group_reduce, work_group_scan)
  • Using OpenCL device functions (e.g., get_num_groups, get_global_size, get_local_size)

Debugging

  • Identifying common errors and bugs in OpenCL programs
  • Using the Visual Studio Code debugger to inspect variables, breakpoints, and call stacks
  • Debugging and analyzing OpenCL programs on AMD devices using CodeXL
  • Debugging and analyzing OpenCL programs on Intel devices using Intel VTune
  • Debugging and analyzing OpenCL programs on NVIDIA devices using NVIDIA Nsight

Optimization

  • Understanding factors affecting OpenCL program performance
  • Improving arithmetic throughput using OpenCL vector data types and vectorization techniques
  • Reducing control overhead and increasing locality with loop unrolling and tiling techniques
  • Optimizing memory accesses and bandwidth using OpenCL local memory and related functions
  • Measuring and improving execution time and resource utilization through profiling and profiling tools

Summary and Next Steps

Requirements

  • Proficiency in C/C++ programming and concepts of parallel programming.
  • Fundamental knowledge of computer architecture and memory hierarchy.
  • Experience using command-line tools and code editors.

Target Audience

  • Developers seeking to learn how to program heterogeneous devices using OpenCL and exploit their parallelism.
  • Developers aiming to write portable and scalable code compatible with various platforms and devices.
  • Programmers interested in exploring the low-level aspects of heterogeneous programming to optimize code performance.
 28 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories