Issue #24
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from November 12 to December 2 2021.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Conferences
LLVM and Clang
Discussions
- A group of contributors from UIUC, Intel, and AWS posted a proposal for TLX: Tensor LLVM eXtensions. The extension introduces target-agnostic intrinsics using ‘flat vectors’ that would require target-specific lowering. As noted by Florian Hahn, the proposal is large in both the text length and implementation effort. Chris Lattner expressed a concern that the RFC proposes a single tensor model for LLVM, which may be inappropriate for some frameworks and not general enough.
Commits
- (In-review) Anastasia Stulova submitted a patch adding a toolchain for SPIR-V in Clang. The toolchain is incomplete but functional enough to produce SPIR-V assembly and object code directly via Clang. D112410
- AMDGPU GX10 memory model was updated to account for MALL (memory attached last-level) cache added in GXF10.3. D114076
MLIR
Discussions
Commits
- A chain of CLs have landed to better support GPU to NVVM MMA conversion. D112969, D113383, D113618
- GPU to ROCm now supports target chipset during conversion. D114107
- ROCm integration tests are runnable now. D114184
- SPIR-V dialect definitions were refreshed to catch up with the latest spec. D113667
scf.while
to SPIR-V conversion is now supported. D113007- Math ops to SPIR-V conversion now can generate OpenCL extended instructions. D113780
spv.AtomicFAddEXTOp
is defined and capability bugs for atomics were fixed. D113764, D114551
OpenMP (Target Offloading)
Discussions
- Rewriting the target offloading driver and requiring LLD for OpenMP offloading was discussed, Slides.
Commits
- The new device runtime is now enabled by default for Nvidia offloading. To use the old runtime,
-fno-openmp-target-new-runtime
must be passed. D114890
External Compilers
LLPC
- The standalone compiler tool switched to handling recoverable errors with
llvm::Error
, using the standard LLVM error handling utilities. LLPC#1545, LLPC#1553 - Continued worked towards supporting the New Pass Manager. LLPC#1519
- The pre-merge checks will now warn about typos using the
typos
CLI tool. LLPC#1516
oneAPI DPC++
CUDA/HIP support
- Enabled
cuda-gpu-arch
,cuda-path
,nocudalib
, andfno-sycl-libspirv
options in MSVC compatible driver (clang-cl
). - Improved diagnostics for using unsupported work-group size with HIP backend.
- Added half precision floating point data type support for the
nextafter
function. - Added atomics with scopes and memory orders to CUDA backend. Patch adding NVPTX intrinsics required for this implementation is uploaded for review. D112718
- Added HIP backed implementation for 40+ math functions.
SYCL 2020 support
- Improved diagnostics for using non-forward declarable kernel name types and unsupported data types in device code (added
is_device_copyable
type trait check for SYCL buffers). - Added user-defined and pointer types support to
group_broadcast
operation. - Made
sycl::marray
implementation trivially copyable. - Added
info::device::built_in_kernel_ids
information query support.
Non-standard extensions
- [ESIMD] Added support for align flags to
simd::copy_from
/to
operations. - Added ESIMD-specific IR verification pass.
- Added specification for
discard_events
queue property.
Misc
- Fixed a number of memory leaks in the DPC++ runtime library.
- Added stripped PDB files for the DPC++ runtime library and plugins when building with MSVC.
- Reduced compiler memory consumption during link step.