Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from February 25 to March 18, 2022.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Community Events

LLVM and Clang

Discussions

Commits

  • New AMDGPU target definitions were added for gfx940 and gfx1036. D120688, D120846

MLIR

Discussions

Commits

  • gpu.global_id is added to the GPU dialect. D121548
  • spv.VectorTimesScalar and spv.AssumeTrueKHROp ops are defined in the SPIR-V dialect. D121247 D121601
  • A pass to unify aliased resource variables was added to SPIR-V dialect. D119872
  • A canonicalization pass specifically for GLSL is exposed in SPIR-V dialect. D121222
  • SPIR-V entry point ABI local size is made optional to support OpenCL. D120399
  • spv.GLSL.{U|S}Clamp op type checking is fixed. D121238
  • gpu.barrier is lowered to spv.ControlBarrier now. D120722

OpenMP (Target Offloading)

Discussions

Commits

  • Worked around a bug in libomptarget where necessary globals were removed if the code contained no kernels. D121007

External Compilers

LLPC

oneAPI DPC++

CUDA/HIP support

  • Added bf16 builtins operating on storage types DPCPP#5748 and optimized half builtins for fma, fmin, fmax and, exp2 DPCPP#5724.
  • Optimized atomic operations DPCPP#5710, math functions DPCPP#5747, and async_work_group_copy DPCPP#5611 for the CUDA backend.

SYCL 2020 support

Non-standard extensions

Explicit SIMD
FPGA
  • Exposed value_type and min_capacity from SYCL pipes extension class. DPCPP#5471
  • Fixed max_work_group_size and reqd_work_group_size attribute arguments check. DPCPP#5592
  • Added support for the USM Buffer Location Properties extension. DPCPP#5634

Misc

  • Prepared SYCL compiler for opaque pointers. DPCPP#5830
  • Optimized Level Zero plug-in:
    • Enabled round-robin submissions to multiple compute CCS. DPCPP#5657
    • Set minimum pooled USM allocation size on device to 512. DPCPP#5635
  • Improved -lname static linking with shared objects DPCPP#5790
  • Fixed a memory leak in reduction resources. DPCPP#5653
  • Added XPTI-based tooling for SYCL applications for tracing profiling and sanitization. DPCPP#5389
  • Fixed a bug with constexpr recursion. DPCPP#4257