aboutsummaryrefslogtreecommitdiff
path: root/clang/test/CodeGenOpenCL
AgeCommit message (Collapse)AuthorFilesLines
40 hoursAMDGPU: Use ELF mangling in data layout (#163011)Matt Arsenault1-1/+1
Closes #95219
4 days[clang][CodeGen] Remove "unsafe-fp-math" attribute support (#162779)paperchalice1-6/+0
These global flags block furthur improvements for clang, users should always use fast-math flags see also https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797 Remove them incrementally, this is the clang part.
6 days[clang][SPIR][SPIRV] Materialize non-generic null pointers via addrspacecast ↵Wenju He1-3/+3
(#161773) LLVM models ConstantPointerNull as all-zero, but some GPUs (e.g. AMDGPU and our downstream GPU target) use a non-zero sentinel for null in private / local address spaces. SPIR-V is a supported input for our GPU target. This PR preserves a canonical zero form in the generic AS while allowing later lowering to substitute the target’s real sentinel.
8 days[NFC] Change spelling of cluster feature to "clusters" (#162103)Shilei Tian1-2/+2
9 days[AMDGPU] Make cluster a target feature (#162040)Shilei Tian1-2/+2
This replaces the original arch check.
2025-09-29[AMDGPU][SPIRV] Use SPIR-V syncscopes for some AMDGCN BIs (#154867)Alex Voicu3-22/+30
AMDGCN flavoured SPIR-V allows AMDGCN specific builtins, including those for scoped fences and some specific RMWs. However, at present we don't map syncscopes to their SPIR-V equivalents, but rather use the AMDGCN ones. This ends up pessimising the resulting code as system scope is used instead of device (agent) or subgroup (wavefront), so we correct the behaviour, to ensure that we do the right thing during reverse translation.
2025-09-24[AMDGPU] Add the support for 45-bit buffer resource (#159702)Shilei Tian1-10/+18
On new targets like `gfx1250`, the buffer resource (V#) now uses this format: ``` base (57-bit): resource[56:0] num_records (45-bit): resource[101:57] reserved (6-bit): resource[107:102] stride (14-bit): resource[121:108] ``` This PR changes the type of `num_records` from `i32` to `i64` in both builtin and intrinsic, and also adds the support for lowering the new format. Fixes SWDEV-554034. --------- Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>
2025-09-17[AMDGPU] Add gfx1251 runlines to cooperative atomcis tests. NFC (#159437)Stanislav Mekhanoshin1-0/+1
2025-09-17[AMDGPU] Add gfx1251 subtarget (#159430)Stanislav Mekhanoshin1-0/+2
2025-09-16[AMDGPU] Add s_cluster_barrier on gfx1250 (#159175)Stanislav Mekhanoshin1-0/+10
2025-09-16[AMDGPU] Add missing bf16-pk-insts feature to gfx1250 (#159167)Stanislav Mekhanoshin1-1/+1
2025-09-15[Dwarf] Support heterogeneous DW_{OP,AT}s needed for AMDGPU CFI (#153883)Scott Linder2-50/+51
These are defined in the user range until standard versions of them get adopted into dwarf, which is expected in DWARF6. Some of these amount to reservations currently as no code to use them is included. It would be very helpful to get them committed to avoid conflicts necessitating encoding changes while we are in the process of upstreaming. --------- Co-authored-by: Juan Martinez Fernandez <juamarti@amd.com> Co-authored-by: Emma Pilkington <Emma.Pilkington@amd.com>
2025-09-15[Clang] [Sema] Make `-Wincompatible-pointer-types` an error by default (#157364)Sirraide1-1/+1
GCC 14 also made this an error by default, so we’re following suit. Fixes #74605
2025-09-15Revert "[HIP][Clang] Remove __AMDGCN_WAVEFRONT_SIZE macros" (#158566)Fabian Ritter2-1/+9
Reverts llvm/llvm-project#157463 The PR breaks buildbots with old ROCm versions, so revert it and reapply when buildbots are updated.
2025-09-15[HIP][Clang] Remove __AMDGCN_WAVEFRONT_SIZE macros (#157463)Fabian Ritter2-9/+1
Remove definitions, test uses, and documentation of the macros, which were deprecated in November 2024 with PR #112849 / #115507. Where required, the wavefront size should instead be queried via means provided by the HIP runtime: the (non-constexpr) `warpSize` variable in device code, or `hipGetDeviceProperties` in host code. This change passed AMD-internal testing. Implements SWDEV-522062.
2025-09-12[clang] Regenerate test checks including TBAA semantics (NFC)Antonio Frighetto12-455/+574
Tests exercizing TBAA metadata (both purposefully and not), and previously generated via UTC, have been regenerated and updated to version 6.
2025-09-10[AMDGPU] Add builtins and intrinsics for cluster attributes (#157877)Shilei Tian1-0/+168
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-09-10[AMDGPU] Add builtins for wave reduction intrinsics (#150170)Aaditya1-0/+378
2025-09-04[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418)Pierre van Houtryve1-0/+104
- Add clang built-ins + sema/codegen - Add IR Intrinsic + verifier - Add DAG/GlobalISel codegen for the intrinsics - Add lowering in SIMemoryLegalizer using a MMO flag.
2025-09-03[AMDGPU] Support cluster_load_async_to_lds instructions on gfx1250 (#156595)Changpeng Fang1-0/+40
2025-09-02[AMDGPU] Support cluster load instructions for gfx1250 (#156548)Changpeng Fang1-0/+36
2025-08-27clang/AMDGPU: Add __builtin_amdgcn_inverse_ballot_w{32,64} (#155724)Nicolai Hähnle2-0/+14
Add builtins that expose the underlying llvm.amdgcn.inverse.ballot intrinsic that we've had for a while. This allows more explicitly writing code that selects or branches in terms of lane masks, which can lead to better code quality.
2025-08-27[AMDGPU] Refactor insertWaveSizeFeature (#154850)Stanislav Mekhanoshin1-0/+2
If a wavefrontsize32 or wavefrontsize64 is the only possible value insert it into feature list by default and use that value as an indication that another wavefront size is not legal.
2025-08-26[AMDGCN] Add missing gfx1250 clang tests. NFC. (#155478)Stanislav Mekhanoshin2-0/+192
2025-08-21[clang][CodeGen] cast addr space of ReturnValue if needed (#154380)macurtis-amd2-0/+2
Fixes a bug on AMDGPU targets where a pointer was stored as address space 5, but then loaded as address space 0. Issue found as part of [Kokkos](https://github.com/kokkos/kokkos) testing, specifically `hip.atomics` (see [core/unit_test/TestAtomics.hpp](https://github.com/kokkos/kokkos/blob/develop/core/unit_test/TestAtomics.hpp)). Issue was introduced by commit [39ec9de7c230](https://github.com/llvm/llvm-project/commit/39ec9de7c230) - [clang][CodeGen] sret args should always point to the alloca AS, so use that (https://github.com/llvm/llvm-project/pull/114062).
2025-08-12[AMDGPU] Add s_barrier_init|join|leave instructions (#153296)Stanislav Mekhanoshin2-0/+51
2025-08-12[VectorCombine] Shrink loads used in shufflevector rebroadcasts. (#153138)Leon Clark1-11/+11
Reopen #128938. Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions. --------- Co-authored-by: Leon Clark <leoclark@amd.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-08[IR] Remove size argument from lifetime intrinsics (#150248)Nikita Popov4-34/+34
Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).
2025-08-05[AMDGPU] Add gfx1250 wmma_scale[16]_f32_32x16x128_f4 instructions (#152194)Stanislav Mekhanoshin1-0/+22
2025-08-05[Clang][AMDGPU] Add builtins for some buffer resource atomics (#149216)zGoldthorpe4-39/+102
This patch exposes builtins for atomic `add`, `max`, and `min` operations that operate over buffer resource pointers.
2025-08-04[AMDGPU] gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 codegen (#152036)Stanislav Mekhanoshin1-0/+24
2025-08-04Revert "[VectorCombine] Shrink loads used in shufflevector rebroadcasts" ↵Simon Pilgrim1-11/+11
(#151960) Reverts llvm/llvm-project#128938 while a crash regression is investigated
2025-08-04[VectorCombine] Shrink loads used in shufflevector rebroadcasts (#128938)Leon Clark1-11/+11
Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions. --------- Co-authored-by: Leon Clark <leoclark@amd.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-02[AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (#151810)Stanislav Mekhanoshin1-0/+42
2025-08-02[AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions (#151807)Stanislav Mekhanoshin1-0/+36
2025-08-02[AMDGPU] v_cvt_scale_pk16 gfx1250 instructions (#151804)Stanislav Mekhanoshin1-0/+36
2025-08-01[AMDGPU] gfx1250 v_perm_pk16_* instructions (#151773)Stanislav Mekhanoshin2-1/+56
2025-08-01[AMDGPU] gfx1250 v_cvt_scalef32_sr_pk8_* instructions (#151765)Stanislav Mekhanoshin1-0/+106
2025-08-01[AMDGPU] gfx1250 v_cvt_scalef32_pk8_* instructions (#151758)Stanislav Mekhanoshin1-0/+94
2025-08-01[AMDGPU] gfx1250 v_permlane_* instructions (#151749)Stanislav Mekhanoshin1-0/+126
2025-08-01[AMDGPU] Update tests (#151688)Piotr Sobczak1-1/+1
Fix two minor issues: - Add double quote - Remove unused prefix
2025-07-31[AMDGPU] Add v_cvt_scale_pk8_* gfx1250 instructions (#151616)Stanislav Mekhanoshin1-0/+111
2025-07-31[AMDGPU] Add gfx1250 cvt_pk|sr_fp8|bf8_f32 instructions (#151595)Stanislav Mekhanoshin1-0/+54
2025-07-31[AMDGPU] v_cvt_sr_pk_f16_f32 gfx1250 instruction (#151482)Stanislav Mekhanoshin1-0/+27
2025-07-30[AMDGPU] Add v_cvt_sr|pk_bf8|fp8_f16 gfx1250 instructions (#151415)Stanislav Mekhanoshin1-0/+138
2025-07-30[AMDGPU] Add gfx1250 v_cvt_sr_pk_bf16_f32 instruction (#151385)Stanislav Mekhanoshin2-1/+29
2025-07-29[AMDGPU] Support builtin/intrinsics for async loads/stores on gfx1250 (#151058)Changpeng Fang1-0/+83
2025-07-29[AMDGPU] Allow readonly features to be written to IR when there is no target ↵Changpeng Fang2-1/+17
(#148141) Fixes: SWDEV-541399
2025-07-24[AMDGPU] Support builtin/intrinsics for load monitors on gfx1250 (#150540)Changpeng Fang1-0/+66
2025-07-24[AMDGPU] gfx1250 vmem prefetch target intrinsics and builtins (#150466)Stanislav Mekhanoshin2-1/+20