Issue #22
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from October 15 to October 28 2021.
This issue brings news from a new external project, oneAPI DPC++, contributed by Alexey Bader.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Conferences
LLVM and Clang
Discussions
- Nimit Singhania proposed to add two new static analyses to LLVM to detect performance issues in GPU programs, developed as their PhD thesis. The first analysis detects memory congestion issues across GPU threads, while the second tells if the block-size parameter can be tweaked without affecting program correctness. The code is available on the GPU Drano project GitHub. There are no replies at the time of writing.
- Jon Chesterfield observed that “
AMDGPUOpenMP.cpp
inDriver/ToolChains
currently spawns an instance of llvm-link to stitch multiple input files together and splice in ~libm at the same time” and is looking for a solution that would avoid calling llvm-link from the driver. There are no replies at the time of writing.
Commits
- NVPTX now runs a late SROA pass to optimize away more
alloca
s. D111471 - AMDGPU now allows the use of a whole register file on gfx90a for VGPRs (Vector General Purpose Registers) with kernels that do not use AGPRs (Vector Accumulation Registers). D111764
MLIR
Discussions
Commits
- GPU WMMA ops to NVVM conversion is relaxed to support 64-bit indices. D112479
- SPIR-V utility scripts support automatically pulling in OpenCL definitions from the spec, and a few OpenCL ops were defined. D111886, D111884
OpenMP (Target Offloading)
Discussions
Commits
- Improved debugging in the new device runtime and documentation on enabling it D112010. D112002
- Fixes and improvements to the new device runtime in preparation for it to become the default runtime in D111946, D112144, and D112544.
- New device runtime libraries now built for AMDGPU targets in D112227 and D111987.
- The DeviceRTL library is now built for AMDGPU. D112227
External Compilers
LLPC
- The New Pass Manager is enabled by default for the frontend passes. LLPC#1419
oneAPI DPC++
CUDA/HIP support
- Added Windows platform support for CUDA backend.
- Fixed
mul_hi
andfrexp
math functions implementation for CUDA backend. - Improved compiler diagnostics for missing libspirv library for CUDA and HIP backends.
- Add
get_sub_group_local_id()
to HIP backend.
SYCL 2020 support
- Improved diagnostics for invalid kernel names.
- Added definitions for missing feature test macros.
- Fixed a few bugs in specialization constants implementation.
- Remove program class and related APIs and
half
type declared in the global namespace.
Non-standard extensions
- Improved
printf
support on devices w/o doubles support. - Added support for using
std::tuple
on Intel devices. - Added support for
EXT_ONEAPI_max_work_groups
extension adding new device information descriptors:max_global_work_groups
andmax_work_groups
. - A number of improvements for Explicit SIMD feature for Intel GPU device including fixes for SLM gather/scatter, adding support for
__esimd_svm_block_ld
intrinsic and more.
Upstream contributions to LLVM
- Added support for
sycl_special_class
attribute to address comments from D71016.