Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from October 15 to October 28 2021.

This issue brings news from a new external project, oneAPI DPC++, contributed by Alexey Bader.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conferences

LLVM and Clang

Discussions

Commits

  • NVPTX now runs a late SROA pass to optimize away more allocas. D111471
  • AMDGPU now allows the use of a whole register file on gfx90a for VGPRs (Vector General Purpose Registers) with kernels that do not use AGPRs (Vector Accumulation Registers). D111764

MLIR

Discussions

Commits

  • GPU WMMA ops to NVVM conversion is relaxed to support 64-bit indices. D112479
  • SPIR-V utility scripts support automatically pulling in OpenCL definitions from the spec, and a few OpenCL ops were defined. D111886, D111884

OpenMP (Target Offloading)

Discussions

Commits

  • Improved debugging in the new device runtime and documentation on enabling it D112010. D112002
  • Fixes and improvements to the new device runtime in preparation for it to become the default runtime in D111946, D112144, and D112544.
  • New device runtime libraries now built for AMDGPU targets in D112227 and D111987.
  • The DeviceRTL library is now built for AMDGPU. D112227

External Compilers

LLPC

  • The New Pass Manager is enabled by default for the frontend passes. LLPC#1419

oneAPI DPC++

CUDA/HIP support

  • Added Windows platform support for CUDA backend.
  • Fixed mul_hi and frexp math functions implementation for CUDA backend.
  • Improved compiler diagnostics for missing libspirv library for CUDA and HIP backends.
  • Add get_sub_group_local_id() to HIP backend.

SYCL 2020 support

  • Improved diagnostics for invalid kernel names.
  • Added definitions for missing feature test macros.
  • Fixed a few bugs in specialization constants implementation.
  • Remove program class and related APIs and half type declared in the global namespace.

Non-standard extensions

  • Improved printf support on devices w/o doubles support.
  • Added support for using std::tuple on Intel devices.
  • Added support for EXT_ONEAPI_max_work_groups extension adding new device information descriptors: max_global_work_groups and max_work_groups.
  • A number of improvements for Explicit SIMD feature for Intel GPU device including fixes for SLM gather/scatter, adding support for __esimd_svm_block_ld intrinsic and more.

Upstream contributions to LLVM

  • Added support for sycl_special_class attribute to address comments from D71016.