Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from February 11 to February 24 2022.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Community Events

LLVM and Clang

Discussions

Commits

  • SILoadStoreOptimizer was extended to handle global loads and stores. D120279, D120346
  • AMDGPU now promotes recursive loads from kernel arguments to constant address space. D119886
  • Exposed more NVPTX builtins/intrinsics. D117887, D118977

MLIR

Discussions

Commits

OpenMP (Target Offloading)

Discussions

  • LLVM GPU Working Group Meeting on February 18: initial work done towards OpenMP interoperability with other languages (e.g., CUDA, HIP) by porting them to use the new driver. An initial proof-of-concept is in development at D120273.

Commits

  • Added the -fopenmp-offload-mandatory to remove host-fallback code. D120353
  • Added the -fopenmp-assume-no-thread-state to statically remove thread-states in the device runtime. D120106
  • Support for offloading to the CPU added to the new driver. The new driver now passes 100% of the test suite. D119613
  • The new offloading driver now passes the AMDGPU math libraries to the linker wrapper. D119841
  • Initial support for the atomic compare compare-and-swap operation. D118632
  • Added HIP support for linking archive files of bundled bitcode with -fgpu-rdc. D120070
  • Introduced -fgpu-default-stream={legacy|per-thread} option to support per-thread default stream for the HIP runtime. D120298

External Compilers

LLPC

oneAPI DPC++

CUDA/HIP support

  • Optimized async_work_group_copy for sm_80+ NVIDIA GPUs. DPCPP#5611
  • Set minimum CUDA version for Turing devices (sm_75). DPCPP#5642

SYCL 2020 support

  • Added SYCL 2020 support for the group class. DPCPP#5447
  • Aligned sycl::make_event<backend::opencl> with rev 4 of the SYCL 2020 specification. DPCPP#5498

Non-standard extensions

Explicit SIMD
  • Added support for the dpas API DPCPP#5637 and named barrier APIs DPCPP#5583.
  • Enabled op(vector, scalar) variant of binary math functions. DPCPP#5651
FPGA
  • Silenced unknown attribute warnings on host compilation. DPCPP#5619

Misc

  • Added an experimental Windows build in the GitHub Actions nightly (DPCPP#5560) and post commit (DPCPP#5639) workflows.
  • Turned on nightly testing with the new PM enabled by default. DPCPP#5340
  • Level Zero backend improvements:
    • Honor property::queue::enable_profiling. DPCPP#5543
    • Fixed use of copy-engines in L0 interop queue. DPCPP#5579
    • Fixed timestamp calculation (in ns). DPCPP#5555
    • Tuned USM pooling parameters. DPCPP#5457
  • Optimized online compilation for sub-devices. DPCPP#5240
  • Finished documentation layout restructuring. DPCPP#5605, DPCPP#5607, DPCPP#5578, DPCPP#5556
  • Improved get_kernel_bundle performance. DPCPP#5496
  • Added a warning for converting ‘C’ input to ‘C++’ in the SYCL mode. DPCPP#5598
  • Fixed buffer creation from an rvalue iterator. DPCPP#5609
  • Silenced “unknown attribute” warnings for device_indirectly_callable. DPCPP#5591
  • Fixed sync of host task vs kernel for in-order queue. DPCPP#5551