Issue #30
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from February 25 to March 18, 2022.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Community Events
- Nicolai Hähnle blogged about convergence control intrinsics.
- The Khronos group released the Vulkan SC 1.0 standard for safety-critical graphics and compute. The standard is aimed at industries like automotive and avionics. One notable difference compared to regular Vulkan is that Vulkan SC does not support online (runtime) pipeline compilation and all shader/pipeline compilations happen offline (ahead-of-time).
LLVM and Clang
Discussions
- Chris B posted an RFC on ‘Adding HLSL and DirectX support to Clang & LLVM’. The plan assumes landing the implementation piece-by-piece, instead of directly merging with the DXC codebase. HLSL can target two intermediate languages: DXIL (based on LLVM IR 3.7) and SPIR-V. The proposal was also discussed in the 3rd LLVM GPU Working Group meeting on March 18. An initial stack of revisions is already available on Phabricator.
Commits
MLIR
Discussions
Commits
gpu.global_id
is added to the GPU dialect. D121548spv.VectorTimesScalar
andspv.AssumeTrueKHROp
ops are defined in the SPIR-V dialect. D121247 D121601- A pass to unify aliased resource variables was added to SPIR-V dialect. D119872
- A canonicalization pass specifically for GLSL is exposed in SPIR-V dialect. D121222
- SPIR-V entry point ABI local size is made optional to support OpenCL. D120399
spv.GLSL.{U|S}Clamp
op type checking is fixed. D121238gpu.barrier
is lowered tospv.ControlBarrier
now. D120722
OpenMP (Target Offloading)
Discussions
Commits
- Worked around a bug in libomptarget where necessary globals were removed if the code contained no kernels. D121007
External Compilers
LLPC
oneAPI DPC++
CUDA/HIP support
- Added
bf16
builtins operating on storage types DPCPP#5748 and optimizedhalf
builtins forfma
,fmin
,fmax
and,exp2
DPCPP#5724. - Optimized atomic operations DPCPP#5710, math functions DPCPP#5747, and
async_work_group_copy
DPCPP#5611 for the CUDA backend.
SYCL 2020 support
- Added default argument support for the
work_group_size_hint
attribute. DPCPP#5565 - Fixed link and compile options. DPCPP#5741, DPCPP#5476
Non-standard extensions
- Added compile-time properties extension support. DPCPP#4976
- Added Clang support for
device_global
. DPCPP#5597, DPCPP#5576
Explicit SIMD
- Moved a number of APIs from the
experimental
namespace. DPCPP#5729, DPCPP#5785, DPCPP#5773 - Improved
single_task
support by the ESIMD emulator. DPCPP#5671 - Enabled SVM gather/scatter for 1, 2, and 4 elements. DPCPP#5780
- Add support for lsc mem access APIs. DPCPP#5512
FPGA
- Exposed
value_type
andmin_capacity
from SYCL pipes extension class. DPCPP#5471 - Fixed
max_work_group_size
andreqd_work_group_size
attribute arguments check. DPCPP#5592 - Added support for the USM Buffer Location Properties extension. DPCPP#5634
Misc
- Prepared SYCL compiler for opaque pointers. DPCPP#5830
- Optimized Level Zero plug-in:
- Enabled round-robin submissions to multiple compute CCS. DPCPP#5657
- Set minimum pooled USM allocation size on device to 512. DPCPP#5635
- Improved
-lname
static linking with shared objects DPCPP#5790 - Fixed a memory leak in reduction resources. DPCPP#5653
- Added XPTI-based tooling for SYCL applications for tracing profiling and sanitization. DPCPP#5389
- Fixed a bug with
constexpr
recursion. DPCPP#4257