Issue #27
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from January 14 to January 27 2022.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Conferences
- Alyssa Rosenzweig blogged about ‘Writing an open source GPU driver – without the hardware’. The post describes various testing techniques that allowed them to implement driver/compiler support for ‘Valhall’ ARM GPUs before they were able to test it on a real Linux system: ‘It only took me a few days after getting the hardware and a serial cable to pass hundreds of tests on the new architecture. Months of speculatively developing the driver came with a huge pay off.’
LLVM and Clang
Discussions
- Artem Belevich proposed to change the NVPTX calling convention for aggregate function arguments passed by value. This would make Clang pass aggregate values directly as opposed to via
byval
pointers. The motivation is to allow SROA (the Scalar Replacement of Aggregates pass) to eliminate more local copies. - Vasileios Porpodas posted an RFC on ‘Spill2Reg: Selectively replace spills to stack with spills to vector registers’. Although the optimization was primarily evaluated on x86 (Skylake), Madhur Amilkanthwar noted that the AMDGPU backend has a similar pass.
Commits
- Added
sycl_special_class
attribute for SYCL types that need additional processing as kernel parameters. D114483
MLIR
Discussions
Commits
- Replaced reference to
LLVMFuncOp
withFuncOpInterface
in the GPU dialect. D118172 - Added support for lowering
ExpM1
to both SPIR-V GLSL/OpenCL instructions. D118081 - Added support for lowering
math.fma
to SPIR-V. D117704 - Added support for size-1 vector inserts to SPIR-V conversion. D115517
OpenMP (Target Offloading)
Discussions
Commits
- The annoying incompatible triple warnings have been fixed. D118495, D117634
- The new OpenMP offloading driver and linker will be upstreamed and available in LLVM 14 with support for Nvidia and AMD offloading. D116541
- An optimization to remove redundant barriers will be upstreamed. D118002
- The old OpenMP device offloading runtime is not built by default anymore in preparation for it being removed. D118268
- Linked-in math libraries (libm) are now optimized on AMDGPU. D116906
External Compilers
LLPC
oneAPI DPC++
CUDA/HIP support
- Improved atomics support on the CUDA backend:
- Added atomic loads and stores with various memory orders and scopes. DPCPP#5191, DPCPP#4853
- Added remaining atomics. DPCPP#5192
- Added support for
bf16
, (u
)int8
, andhalf
data types in matrix operations on the CUDA backend. DPCPP#5009 - Disabled by default the image support on the CUDA backend. DPCPP#5256
- Fixed
PI_MEM_ALLOC_DEVICE
USM pointer query for CUDA and HIP backends. DPCPP#5325 CUDA_PATH
is used as a candidate for path discovery. DPCPP#5194- Fixed up and down shuffles and reductions on AMD GPU. DPCPP#5359
- Enabled part of hierarchical parallelism functionality on AMD GPU by applying an LLVM transformation pass. DPCPP#5149
SYCL 2020 support
- Added declaration of
sycl::decorated
enum class. DPCPP#5399
Non-standard extensions
- Updated
property_list
specification to revision 2 (DPCPP#5338 and DPCPP#5410) and added an implementation design document (DPCPP#4937 and DPCPP#5341). - Updated the
KernelProperties
extension. DPCPP#5343 - Clarified the interaction between
InvokeSIMD
andSYCL_EXTERNAL
in theInvokeSIMD
extension specification. DPCPP#5353 - Updated documentation for Explicit SIMD. DPCPP#5383, DPCPP#5368
- Fixed sampler/stream parameter handling for ESIMD kernels. DPCPP#5165
- Added the
esimd::merge
free function. DPCPP#5308
Misc
- Migrate a number of SYCL specific LLVM passes to the New Pass Manager. DPCPP#5324, DPCPP#5300, DPCPP#5337, DPCPP#5301, DPCPP#5291, DPCPP#5292, DPCPP#5323
- Added experimental support for L0 host ptr import to USM buffer. DPCPP#4891