Issue #26
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from December 17 to January 13 2022.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Conferences
- The first LLVM GPU Working Group Meeting was on January 14. All people interested in GPU/offloading-related development in LLVM are welcome to join. See the document with the meeting scheduling information, agenda, and notes for more details.
- We are running polls on the meeting time and frequency.
LLVM and Clang
Discussions
- Discussion on how to improve math optimizations on GPUs as part of the first LLVM GPU working group meeting.
- Anastasia Stulova posted an RFC: ‘Add linking of separate translation units using
spirv-link
’.spirv-link
is an external tool provided by the SPIRV-Tools project. This landed as D116266. - ‘huoshanl’ asked why the AMDGPU backend dropped the
GCNRegBankReassign
pass. There are no replies at the time of writing.
Commits
- A SPIR-V toolchain was added to Clang. SPIR-V code is generated by the external SPIRV-LLVM translator tool
llvm-spirv
, as a temporary solution until a SPIR-V backend lands in LLVM. D112410 - The OpenCL documentation was updated with C++ for OpenCL 2021 support in Clang. D116271
- CUDA/HIP now allow
__int128
on the host side, even if not supported by the target device. D111047 - NVPTX intrinsics and builtins for CUDA PTX
cvt
instructions were added for sm80 architectures and above. D116673 - Enabled AMDGPU divergence predicates for
not
,min
/max
, andctlz
/cttz
. D115884, D115954, D116044
MLIR
Discussions
Commits
- The
gpu.printf
op is defined to support debugging. D110448 - GPU kernel outlining supports the datalayout spec attribute (
dlspec
) now. D115722 - A few SPIR-V serialization bugs regarding nested control flows were fixed. D115560, D115582
- SPIR-V serialization allows explicit control over debug information emission. D115531
OpenMP (Target Offloading)
Discussions
- The implementation of the new OpenMP offloading driver was discussed, slides.
Commits
- 100% of OvO tests pass on AMDGPU after applying D116906 with optimizations.
- The new OpenMP offloading driver is up for review at D116541 (click ‘Stack’ to see the full list of revisions) and can be pulled from here. It currently supports the following features:
- Support for AMDGPU and NVPTX offloading targets.
- New offloading object files are compatible with the host linker.
- Functional static linking using archive libraries.
- Device-side LTO for offloading applications.
- Embedding LLVM IR for future JIT functionality.
External Compilers
LLPC
- Added multi-threaded compilation support to the standalone compiler tool
amdllpc
. The main goal is to exercise threading in the compiler in the CI, without having to run the full AMDVLK Vulkan driver on a machine with a GPU. LLPC#1601 - A new GitHub Actions workflow was added to automatically produce code coverage reports. Links to coverage reports are posted as Pull Request comments. See a sample report. LLPC#1627, LLPC#1629
oneAPI DPC++
CUDA/HIP support
- Added group collective functions (reduce, scans, broadcast) for HIP. #5202
- Added HIP backend support to filter selector extension. #5171, #5176
- Improved queue barrier support on HIP backend. #4975
- Made a number of small functional fixes improving device information, support for stream, hierarchical parallelism, etc. #4951, #5168, #5293, #5115
- Added
-fcuda-prec-sqrt
flag enabling correctly rounded results of thesqrt
function on CUDA backend (equivalent to the nvcc-prec-sqrt
). #5141 - Enabled in-kernel asserts support for the CUDA backend. #5174
- Fixed the
nan()
builtin for double types on the CUDA backend. #5173 - Fixed out-of-bound behavior for
read_image
in none addressing mode on the CUDA backend. #5204
SYCL 2020 support
- Added basic support for the
generic_space
address space. #5148 - Added property list support to the
stream
class. #4898
Non-standard extensions
- ESIMD: Added infrastructure to support non-standard C++ types
sycl::half
,sycl::bfloat
, etc., and basic operations support forsycl::half
. #5123 - ESIMD: Enabled a number of math and conversion intrinsics for
sycl::half
. #5271 - Matrix: Enabled
joint_matrix_fill
(#4994 and #5277) andwi_slice
(#4979) operations forjoint_matrix
. - Added new SPIR-V specification extensions documentation to support “module” debug information (#3976) and composite types as joint matrix elements (#5228).
- Added new SYCL specification extensions documentation to support “device global” objects (#4686), compile time properties (#4937), and
std::complex
data type in group collective algorithms (#5108). - Added initial support for format strings in non-constant address space for the
printf
function. #5069 - Moved group sort extension to experimental namespace. #5169
Misc
- Enhanced SYCL accessor (#5249) and buffer (#5161) instrumentation with XPTI and reduced XPTI instrumentation overhead #5158.
- Turned on the
-fsycl-dead-args-optimization
flag by default. #3004 - Enabled SPIR-V device image format for fat objects in the driver. #4608, #5251, #4683
- Made a number of performance improvements in the runtime library and Level Zero plug-in like caching command lists (#5197), batching of copy commands (#5155).
- Added static linking of device code to the Level Zero backend. #5266, #5267