Issue #22

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from October 15 to October 28 2021.

This issue brings news from a new external project, oneAPI DPC++, contributed by Alexey Bader.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conferences

LLVM and Clang

Discussions

Nimit Singhania proposed to add two new static analyses to LLVM to detect performance issues in GPU programs, developed as their PhD thesis. The first analysis detects memory congestion issues across GPU threads, while the second tells if the block-size parameter can be tweaked without affecting program correctness. The code is available on the GPU Drano project GitHub. There are no replies at the time of writing.
Jon Chesterfield observed that “AMDGPUOpenMP.cpp in Driver/ToolChains currently spawns an instance of llvm-link to stitch multiple input files together and splice in ~libm at the same time” and is looking for a solution that would avoid calling llvm-link from the driver. There are no replies at the time of writing.

Commits

NVPTX now runs a late SROA pass to optimize away more allocas. D111471
AMDGPU now allows the use of a whole register file on gfx90a for VGPRs (Vector General Purpose Registers) with kernels that do not use AGPRs (Vector Accumulation Registers). D111764

MLIR

Discussions

Commits

GPU WMMA ops to NVVM conversion is relaxed to support 64-bit indices. D112479
SPIR-V utility scripts support automatically pulling in OpenCL definitions from the spec, and a few OpenCL ops were defined. D111886, D111884

OpenMP (Target Offloading)

Discussions

Commits

Improved debugging in the new device runtime and documentation on enabling it D112010. D112002
Fixes and improvements to the new device runtime in preparation for it to become the default runtime in D111946, D112144, and D112544.
New device runtime libraries now built for AMDGPU targets in D112227 and D111987.
The DeviceRTL library is now built for AMDGPU. D112227

External Compilers

LLPC

The New Pass Manager is enabled by default for the frontend passes. LLPC#1419

oneAPI DPC++

CUDA/HIP support

Added Windows platform support for CUDA backend.
Fixed mul_hi and frexp math functions implementation for CUDA backend.
Improved compiler diagnostics for missing libspirv library for CUDA and HIP backends.
Add get_sub_group_local_id() to HIP backend.

SYCL 2020 support

Improved diagnostics for invalid kernel names.
Added definitions for missing feature test macros.
Fixed a few bugs in specialization constants implementation.
Remove program class and related APIs and half type declared in the global namespace.

Non-standard extensions

Improved printf support on devices w/o doubles support.
Added support for using std::tuple on Intel devices.
Added support for EXT_ONEAPI_max_work_groups extension adding new device information descriptors: max_global_work_groups and max_work_groups.
A number of improvements for Explicit SIMD feature for Intel GPU device including fixes for SLM gather/scatter, adding support for __esimd_svm_block_ld intrinsic and more.

Upstream contributions to LLVM

Added support for sycl_special_class attribute to address comments from D71016.