aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/CodeGen
AgeCommit message (Collapse)AuthorFilesLines
2024-11-14[HLSL] Adding HLSL `clip` function. (#114588)joaosaffran1-0/+45
Adding HLSL `clip` function. - adding llvm intrinsic - adding sema checks - adding dxil lowering - ading spirv lowering - adding sema tests - adding codegen tests - adding lowering tests Closes #99093 --------- Co-authored-by: Joao Saffran <jderezende@microsoft.com>
2024-11-14[clang codegen] Add CreateRuntimeFunction overload that takes a clang type. ↵Eli Friedman3-23/+67
(#113506) Correctly computing the LLVM types/attributes is complicated in general, so add a variant which does that for you.
2024-11-12[AMDGPU] Introduce a new generic target `gfx9-4-generic` (#115190)Shilei Tian1-0/+1
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
2024-11-12Emit constrained atan2 intrinsic for clang builtin (#113636)Tex Riddell1-0/+12
This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - `Builtins.td` - Add f16 support for libm atan2 builtin - `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin - `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of atan2 for clang builtin to lib call calling convention check, now that atan2 maps to an intrinsic. - add atan2 cases to llvm.experimental.constrained tests for more backends: ARM, PowerPC, RISCV, SystemZ. - LangRef.rst: add llvm.experimental.constrained.atan2, revise llvm.atan2 description. Last part of Implement the atan2 HLSL Function. Fixes #70096.
2024-11-12[OpenACC] Implement AST/Sema for combined constructserichkeane2-0/+10
Combined constructs (OpenACC 3.3 section 2.11) are a short-cut for writing a `loop` construct immediately inside of a `compute` construct. However, this interaction requires we do additional work to ensure that we get the semantics between the two correct, as well as diagnostics. This patch adds the semantic analysis for the constructs (but no clauses), as well as the AST nodes.
2024-11-12[X86][AMX] Support AMX-MOVRS (#115151)Malay Sanghi1-1/+17
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-11-11Fix for codegen Crash in Clang when using locator omp_all_memory with depobj ↵CHANDRA GHALE1-6/+8
construct (#114221) A codegen crash is occurring when a depend object was initialized with omp_all_memory in the depobj directive. https://github.com/llvm/llvm-project/issues/114214(url) The root cause of issue looks to be the improper handling of the dependency list when omp_all_memory was specified. The change introduces the use of OMPTaskDataTy to manage dependencies. The buildDependences function is called to construct the dependency list, and the list is iterated over to emit and store the dependencies. Reduced Test Case : ``` #include <omp.h> int main() { omp_depend_t obj; #pragma omp depobj(obj) depend(inout: omp_all_memory) } ``` ``` #1 0x0000000003de6623 SignalHandler(int) Signals.cpp:0:0 #2 0x00007f8e4a6b990f (/lib64/libpthread.so.0+0x1690f) #3 0x00007f8e4a117d2a raise (/lib64/libc.so.6+0x4ad2a) #4 0x00007f8e4a1193e4 abort (/lib64/libc.so.6+0x4c3e4) #5 0x00007f8e4a10fc69 __assert_fail_base (/lib64/libc.so.6+0x42c69) #6 0x00007f8e4a10fcf1 __assert_fail (/lib64/libc.so.6+0x42cf1) #7 0x0000000004114367 clang::CodeGen::CodeGenFunction::EmitOMPDepobjDirective(clang::OMPDepobjDirective const&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x4114367) #8 0x00000000040f8fac clang::CodeGen::CodeGenFunction::EmitStmt(clang::Stmt const*, llvm::ArrayRef<clang::Attr const*>) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x40f8fac) #9 0x00000000040ff4fb clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(clang::CompoundStmt const&, bool, clang::CodeGen::AggValueSlot) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x40ff4fb) #10 0x00000000041847b2 clang::CodeGen::CodeGenFunction::EmitFunctionBody(clang::Stmt const*) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x41847b2) #11 0x0000000004199e4a clang::CodeGen::CodeGenFunction::GenerateCode(clang::GlobalDecl, llvm::Function*, clang::CodeGen::CGFunctionInfo const&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x4199e4a) #12 0x00000000041f7b9d clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x41f7b9d) #13 0x00000000041f16a3 clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x41f16a3) #14 0x00000000041fd954 clang::CodeGen::CodeGenModule::EmitDeferred() (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x41fd954) #15 0x0000000004200277 clang::CodeGen::CodeGenModule::Release() (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x4200277) #16 0x00000000046b6a49 (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit(clang::ASTContext&) ModuleBuilder.cpp:0:0 #17 0x00000000046b4cb6 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x46b4cb6) #18 0x0000000006204d5c clang::ParseAST(clang::Sema&, bool, bool) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x6204d5c) #19 0x000000000496b278 clang::FrontendAction::Execute() (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x496b278) #20 0x00000000048dd074 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x48dd074) #21 0x0000000004a38092 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x4a38092) #22 0x0000000000fd4e9c cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0xfd4e9c) #23 0x0000000000fcca73 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0 #24 0x0000000000fd140c clang_main(int, char**, llvm::ToolContext const&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0xfd140c) #25 0x0000000000ee2ef3 main (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0xee2ef3) #26 0x00007f8e4a10224c __libc_start_main (/lib64/libc.so.6+0x3524c) #27 0x0000000000fcaae9 _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:120:0 clang: error: unable to execute command: Aborted ``` --------- Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
2024-11-07[DXIL][SPIRV] Lower `WaveActiveCountBits` intrinsic (#113382)Finn Plummer2-0/+8
``` - add codegen for llvm builtin to spirv/directx intrinsic in CGBuiltin.cpp - add lowering of spirv intrinsic to spirv backend in SPIRVInstructionSelector.cpp - add lowering of directx intrinsic to dxil op in DXIL.td - add test cases to illustrate passes - add test case for semantic analysis ``` Resolves #80176
2024-11-07[HLSL][SPIRV] Added clamp intrinsic (#113394)Adam Yang2-5/+15
Fixes #88052 - Added the following intrinsics: - `int_spv_uclamp` - `int_spv_sclamp` - `int_spv_fclamp` - Updated DirectX counterparts to have the same three clamp intrinsics. - Update the clamp.hlsl unit tests to include SPIRV - Added the SPIRV specific tests
2024-11-07[Clang] Add __builtin_counted_by_ref builtin (#114495)Bill Wendling3-12/+50
The __builtin_counted_by_ref builtin is used on a flexible array pointer and returns a pointer to the "counted_by" attribute's COUNT argument, which is a field in the same non-anonymous struct as the flexible array member. This is useful for automatically setting the count field without needing the programmer's intervention. Otherwise it's possible to get this anti-pattern: ptr = alloc(<ty>, ..., COUNT); ptr->FAM[9] = 42; /* <<< Sanitizer will complain */ ptr->count = COUNT; To prevent this anti-pattern, the user can create an allocator that automatically performs the assignment: #define alloc(TY, FAM, COUNT) ({ \ TY __p = alloc(get_size(TY, COUNT)); \ if (__builtin_counted_by_ref(__p->FAM)) \ *__builtin_counted_by_ref(__p->FAM) = COUNT; \ __p; \ }) The builtin's behavior is heavily dependent upon the "counted_by" attribute existing. It's main utility is during allocation to avoid the above anti-pattern. If the flexible array member doesn't have that attribute, the builtin becomes a no-op. Therefore, if the flexible array member has a "count" field not referenced by "counted_by", it must be set explicitly after the allocation as this builtin will return a "nullptr" and the assignment will most likely be elided. --------- Co-authored-by: Bill Wendling <isanbard@gmail.com> Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
2024-11-07[HLSL][SPIRV][DXIL] Implement `dot4add_u8packed` intrinsic (#115068)Finn Plummer2-0/+11
```- create a clang built-in in Builtins.td - link dot4add_u8packed in hlsl_intrinsics.h - add lowering to spirv backend through expansion of operation as OpUDot is missing up to SPIRV 1.6 in SPIRVInstructionSelector.cpp - add lowering to spirv backend using OpUDot if applicable SPIRV version or SPV_KHR_integer_dot_product is enabled - add dot4add_u8packed intrinsic to IntrinsicsDirectX.td and mapping to DXIL.td op Dot4AddU8Packed - add tests for HLSL intrinsic lowering to dx/spv intrinsic in dot4add_u8packed.hlsl - add tests for sema checks in dot4add_u8packed-errors.hlsl - add test of spir-v lowering in SPIRV/dot4add_u8packed.ll - add test to dxil lowering in DirectX/dot4add_u8packed.ll ``` Resolves #99219
2024-11-06[HLSL] implement elementwise firstbithigh hlsl builtin (#111082)Sarah Spall2-0/+20
Implements elementwise firstbithigh hlsl builtin. Implements firstbituhigh intrinsic for spirv and directx, which handles unsigned integers Implements firstbitshigh intrinsic for spirv and directx, which handles signed integers. Fixes #113486 Closes #99115
2024-11-05clang/AMDGPU: Emit grid size builtins with range metadata (#113038)Matt Arsenault1-0/+6
These cannot be 0.
2024-11-05[CUDA] Add support for __grid_constant__ attribute (#114589)Artem Belevich1-7/+29
LLVM support for the attribute has been implemented already, so it just plumbs it through to the CUDA front-end. One notable difference from NVCC is that the attribute can be used regardless of the targeted GPU. On the older GPUs it will just be ignored. The attribute is a performance hint, and does not warrant a hard error if compiler can't benefit from it on a particular GPU variant.
2024-11-05[HLSL][SPIRV][DXIL] Implement `dot4add_i8packed` intrinsic (#113623)Finn Plummer2-1/+12
- create a clang built-in in Builtins.td - link dot4add_i8packed in hlsl_intrinsics.h - add lowering to spirv backend through expansion of operation as OPSDot is missing up to SPIRV 1.6 in SPIRVInstructionSelector.cpp - add lowering to spirv backend using OpSDot in applicable SPIRV version or if SPV_KHR_integer_dot_product is enabled - add dot4add_i8packed intrinsic to IntrinsicsDirectX.td and mapping to DXIL.td op Dot4AddI8Packed - add tests for HLSL intrinsic lowering to dx/spv intrinsic in dot4add_i8packed.hlsl - add tests for sema checks in dot4add_i8packed-errors.hlsl - add test of spir-v lowering in SPIRV/dot4add_i8packed.ll - add test to dxil lowering in DirectX/dot4add_i8packed.ll Resolves #99220
2024-11-05Remove leftover uses of llvm::Type::getPointerTo() (#114993)Youngsuk Kim1-16/+12
`llvm::Type::getPointerTo()` is to be deprecated. Replace remaining uses of it.
2024-11-04[ubsan] Suppression by type for `-fsanitize=enum` (#114754)Vitaly Buka1-0/+4
Similar to #107332.
2024-11-04[HLSL][SPIRV] Add HLSL type translation for spirv. (#114273)Steven Perron1-0/+79
This commit partially implements SPIRTargetCodeGenInfo::getHLSLType. It can now generate the spirv type for the following HLSL types: 1. RWBuffer 2. Buffer 3. Sampler --------- Co-authored-by: Nathan Gauër <github@keenuts.net>
2024-11-03[Clang] Implement labelled type filtering for overflow/truncation sanitizers ↵Justin Stitt1-4/+32
w/ SSCLs (#107332) [Related RFC](https://discourse.llvm.org/t/rfc-support-globpattern-add-operator-to-invert-matches/80683/5?u=justinstitt) ### Summary Implement type-based filtering via [Sanitizer Special Case Lists](https://clang.llvm.org/docs/SanitizerSpecialCaseList.html) for the arithmetic overflow and truncation sanitizers. Currently, using the `type:` prefix with these sanitizers does nothing. I've hooked up the SSCL parsing with Clang codegen so that we don't emit the overflow/truncation checks if the arithmetic contains an ignored type. ### Usefulness You can craft ignorelists that ignore specific types that are expected to overflow or wrap-around. For example, to ignore `my_type` from `unsigned-integer-overflow` instrumentation: ```bash $ cat ignorelist.txt [unsigned-integer-overflow] type:my_type=no_sanitize $ cat foo.c typedef unsigned long my_type; void foo() { my_type a = ULONG_MAX; ++a; } $ clang foo.c -fsanitize=unsigned-integer-overflow -fsanitize-ignorelist=ignorelist.txt ; ./a.out // --> no sanitizer error ``` If a type is functionally intended to overflow, like [refcount_t](https://kernsec.org/wiki/index.php/Kernel_Protections/refcount_t) and its associated APIs in the Linux kernel, then this type filtering would prove useful for reducing sanitizer noise. Currently, the Linux kernel dealt with this by [littering](https://elixir.bootlin.com/linux/v6.10.8/source/include/linux/refcount.h#L139 ) `__attribute__((no_sanitize("signed-integer-overflow")))` annotations on all the `refcount_t` APIs. I think this serves as an example of how a codebase could be made cleaner. We could make custom types that are filtered out in an ignorelist, allowing for types to be more expressive -- without the need for annotations. This accomplishes a similar goal to https://github.com/llvm/llvm-project/pull/86618. Yet another use case for this type filtering is whitelisting. We could ignore _all_ types, save a few. ```bash $ cat ignorelist.txt [implicit-signed-integer-truncation] type:*=no_sanitize # ignore literally all types type:short=sanitize # except `short` $ cat bar.c // compile with -fsanitize=implicit-signed-integer-truncation void bar(int toobig) { char a = toobig; // not instrumented short b = toobig; // instrumented } ``` ### Other ways to accomplish the goal of sanitizer allowlisting/whitelisting * ignore list SSCL type support (this PR that you're reading) * [my sanitize-allowlist branch](https://github.com/llvm/llvm-project/compare/main...JustinStitt:llvm-project:sanitize-allowlist) - this just implements a sibling flag `-fsanitize-allowlist=`, removing some of the double negative logic present with `skip`/`ignore` when trying to whitelist something. * [Glob Negation](https://discourse.llvm.org/t/rfc-support-globpattern-add-operator-to-invert-matches/80683) - Implement a negation operator to the GlobPattern class so the ignorelist query can use them to simulate allowlisting Please let me know which of the three options we like best. They are not necessarily mutually exclusive. Here's [another related PR](https://github.com/llvm/llvm-project/pull/86618) which implements a `wraps` attribute. This can accomplish a similar goal to this PR but requires in-source changes to codebases and also covers a wider variety of integer definedness problems. ### CCs @kees @vitalybuka @bwendling --------- Signed-off-by: Justin Stitt <justinstitt@google.com>
2024-11-03[PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline (#114577)Shilei Tian1-10/+12
2024-11-03[PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs ↵Shilei Tian1-1/+2
(#114547) The early simplication pipeline is used in non-LTO and (Thin/Full)LTO pre-link stage. There are some passes that we want them in non-LTO mode, but not at LTO pre-link stage. The control is missing currently. This PR adds the support. To demonstrate the use, we only enable the internalization pass in non-LTO mode for AMDGPU because having it run in pre-link stage causes some issues.
2024-11-01[X86][AMX] Support AMX-TRANSPOSE (#113532)Phoebe Wang1-0/+52
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-10-31[RISCV] Pull __builtin_riscv_clz/ctz out of a nested switch. NFCCraig Topper1-22/+19
The nested switch exists to share setting IntrinsicsTypes to {ResultType}. clz/ctz return before we reach that so they can just be in the top level switch.
2024-10-31Fix MSVC "signed/unsigned mismatch" warning. NFC.Simon Pilgrim1-1/+1
2024-10-31[AMDGPU] Allow overload of __builtin_amdgcn_mov_dpp8 (#113610)Stanislav Mekhanoshin1-6/+10
The same handling as for __builtin_amdgcn_mov_dpp.
2024-10-30[llvm] Allow always dropping all llvm.type.test sequencesPaul Kirth1-3/+4
Currently, the `DropTypeTests` parameter only fully works with phi nodes and llvm.assume instructions. However, we'd like CFI to work in conjunction with FatLTO, in so far as the bitcode section should be able to contain the CFI instrumentation, while any incompatible bits are dropped when compiling the object code. To do that, we need to drop the llvm.type.test instructions everywhere, and not just their uses in phi nodes. This patch updates the LowerTypeTest pass so that uses are removed, and replaced with `true` in all cases, and not just in phi nodes. Addressing this will allow us to fix #112053 by modifying the FatLTO pipeline. Reviewers: pcc, nikic Reviewed By: pcc Pull Request: https://github.com/llvm/llvm-project/pull/112787
2024-10-30[HLSL] Remove old resource annotations for UAVs and SRVs (#114139)Helena Kotas1-0/+10
UAVs and SRVs have already been converted to use LLVM target types and we can disable generating of the !hlsl.uavs and !hlsl.srvs! annotations. This will enable adding tests for structured buffers with user defined types that this old resource annotations code does not handle (it crashes). Part 1 of #114126
2024-10-30[clang] Remove some uses of llvm::StructType::setBody. NFC. (#113691)Jay Foad4-44/+37
It is simple to create the struct body up front, now that we have transitioned to opaque pointers.
2024-10-30[C++20] [Modules] Fix the duplicated static initializer problem (#114193)Chuanqi Xu1-2/+2
Reproducer: ``` //--- a.cppm export module a; int func(); static int a = func(); //--- a.cpp import a; ``` The `func()` should only execute once. However, before this patch we will somehow import `static int a` from a.cppm incorrectly and initialize that again. This is super bad and can introduce serious runtime behaviors. And also surprisingly, it looks like the root cause of the problem is simply some oversight choosing APIs.
2024-10-29[Clang][RISCV] Support -fcf-protection=return for RISC-V (#112477)Jesse Huang1-2/+5
Enables the support of `-fcf-protection=return` on RISC-V, which requires Zicfiss. It also adds a string attribute "hw-shadow-stack" to every function if the option is set on RISC-V
2024-10-28Adding splitdouble HLSL function (#109331)joaosaffran4-12/+107
- Adding hlsl `splitdouble` intrinsics - Adding DXIL lowering - Adding SPIRV lowering - Adding test Fixes: #108901 --------- Co-authored-by: Joao Saffran <jderezende@microsoft.com>
2024-10-28[HLSL][SPIRV] Add convergence tokens to entry point wrapper (#112757)Steven Perron2-4/+38
Inlining currently assumes that either all function use controled convergence or none of them do. This is why we need to have the entry point wrapper use controled convergence. https://github.com/llvm/llvm-project/blob/c85611e8583e6392d56075ebdfa60893b6284813/llvm/lib/Transforms/Utils/InlineFunction.cpp#L2431-L2439
2024-10-28Remove support for RenderScript (#112916)Aaron Ballman5-51/+0
See https://discourse.llvm.org/t/rfc-deprecate-and-eventually-remove-renderscript-support/81284 for the RFC
2024-10-28[Clang][AArch64] Fix Pure Scalables Types argument passing and return (#112747)Momchil Velikov2-66/+393
Pure Scalable Types are defined in AAPCS64 here: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#pure-scalable-types-psts And should be passed according to Rule C.7 here: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#682parameter-passing-rules This part of the ABI is completely unimplemented in Clang, instead it treats PSTs sometimes as HFAs/HVAs, sometime as general composite types. This patch implements the rules for passing PSTs by employing the `CoerceAndExpand` method and extending it to: * allow array types in the `coerceToType`; Now only `[N x i8]` are considered padding. * allow mismatch between the elements of the `coerceToType` and the elements of the `unpaddedCoerceToType`; AArch64 uses this to map fixed-length vector types to SVE vector types. Corectly passing a PST argument needs a decision in Clang about whether to pass it in memory or registers or, equivalently, whether to use the `Indirect` or `Expand/CoerceAndExpand` method. It was considered relatively harder (or not practically possible) to make that decision in the AArch64 backend. Hence this patch implements the register counting from AAPCS64 (cf. `NSRN`, `NPRN`) to guide the Clang's decision.
2024-10-28Fix MSVC "signed/unsigned mismatch" warnings. NFC.Simon Pilgrim1-1/+1
2024-10-27[NVPTX] Remove nvvm.ldg.global.* intrinsics (#112834)Alex MacLean1-17/+30
Remove these intrinsics which can be better represented by load instructions with `!invariant.load` metadata: - llvm.nvvm.ldg.global.i - llvm.nvvm.ldg.global.f - llvm.nvvm.ldg.global.p
2024-10-26[rtsan][llvm][NFC] Rename sanitize_realtime_unsafe attr to ↵davidtrevelyan1-1/+1
sanitize_realtime_blocking (#113155) # What This PR renames the newly-introduced llvm attribute `sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise, sibling variables such as `SanitizeRealtimeUnsafe` are renamed to `SanitizeRealtimeBlocking` respectively. There are no other functional changes. # Why? - There are a number of problems that can cause a function to be real-time "unsafe", - we wish to communicate what problems rtsan detects and *why* they're unsafe, and - a generic "unsafe" attribute is, in our opinion, too broad a net - which may lead to future implementations that need extra contextual information passed through them in order to communicate meaningful reasons to users. - We want to avoid this situation and make the runtime library boundary API/ABI as simple as possible, and - we believe that restricting the scope of attributes to names like `sanitize_realtime_blocking` is an effective means of doing so. We also feel that the symmetry between `[[clang::blocking]]` and `sanitize_realtime_blocking` is easier to follow as a developer. # Concerns - I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been part of the tree for a few weeks now (introduced here: https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't been released in version 20 yet, am I correct in considering this to not be a breaking change?
2024-10-25[AMDGPU] Add a type for the named barrier (#113614)Gang Chen2-0/+11
2024-10-25[CLANG][AArch64] Add the modal 8 bit floating-point scalar type (#97277)CarolineConcatto1-0/+7
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point intrinsic. From the PR#323: ``` ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3 8-bit floating-point formats. It is a storage and interchange only type with no arithmetic operations other than intrinsic calls. ```` The type should be an opaque type and its format in undefined in Clang. Only defined in the backend by a status/format register, for AArch64 the FPMR. This patch is an attempt to the add the mfloat8_t scalar type. It has a parser and codegen for the new scalar type. The patch it is lowering to and 8bit unsigned as it has no format. But maybe we should add another opaque type. [1] https://github.com/ARM-software/acle/pull/323
2024-10-25[OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533)Sergio Afonso3-38/+97
This patch implements an approach to communicate errors between the OMPIRBuilder and its users. It introduces `llvm::Error` and `llvm::Expected` objects to replace the values returned by callbacks passed to `OMPIRBuilder` codegen functions. These functions then check the result for errors when callbacks are called and forward them back to the caller, which has the flexibility to recover, exit cleanly or dump a stack trace. This prevents a failed callback to leave the IR in an invalid state and still continue the codegen process, triggering unrelated assertions or segmentation faults. In the case of MLIR to LLVM IR translation of the 'omp' dialect, this change results in the compiler emitting errors and exiting early instead of triggering a crash for not-yet-implemented errors. The behavior in Clang and openmp-opt stays unchanged, since callbacks will continue always returning 'success'.
2024-10-25[Clang] Always forward sret parameters to musttail callsKiran1-1/+5
If a call using the musttail attribute returns it's value through an sret argument pointer, we must forward an incoming sret pointer to it, instead of creating a new alloca. This is always possible because the musttail attribute requires the caller and callee to have the same argument and return types.
2024-10-24[clang] Use {} instead of std::nullopt to initialize empty ArrayRef (#109399)Jay Foad19-82/+75
Follow up to #109133.
2024-10-23 [CLANG][AArch64]Add Neon vectors for mfloat8_t (#99865)CarolineConcatto1-0/+2
This patch adds these new vector sizes for neon: mfloat8x16_t and mfloat8x8_t According to the ARM ACLE PR#323[1]. [1] ARM-software/acle#323
2024-10-23[flang][OpenMP] Support `target enter|update|exit .. nowait` (#113305)Kareem Ergawy1-2/+2
Extends `nowait` support for other device directives. This PR refactors the task generation utils used for the `target` directive so that they are general enough to be reused for other device directives as well.
2024-10-23[AMDGPU] Add a new target for gfx1153 (#113138)Carl Ritson1-0/+1
2024-10-23[clang codegen] avoid to crash when emit init func for global variable with ↵Congcong Cai1-3/+5
flexible array init (#113336) Fixes: #113187 Avoid to create init function since clang does not support global variable with flexible array init. It will cause assertion failure later.
2024-10-22[TBAA] Extend pointer TBAA to pointers of non-builtin types. (#110569)Florian Hahn2-18/+26
Extend the logic added in 123c036bd361d (https://github.com/llvm/llvm-project/pull/76612) to support pointers to non-builtin types by using the mangled name of the canonical type. PR: https://github.com/llvm/llvm-project/pull/110569
2024-10-22[clang][HIP] Don't use the OpenCLKernel CC when targeting AMDGCNSPIRV (#110447)Alex Voicu1-2/+8
When compiling HIP source for AMDGCN flavoured SPIR-V that is expected to be consumed by the ROCm HIP RT, it's not desirable to set the OpenCL Kernel CC on `__global__` functions. On one hand, this is not an OpenCL RT, so it doesn't compose with e.g. OCL specific attributes. On the other it is a "noisy" CC that carries semantics, and breaks overload resolution when using [generic dispatchers such as those used by RAJA](https://github.com/LLNL/RAJAPerf/blob/186d4194a5719788ae96631c923f9ca337f56970/src/common/HipDataUtils.hpp#L39).
2024-10-22[clang][OpenCL][CodeGen][AMDGPU] Do not use `private` as the default AS for ↵Alex Voicu2-3/+11
when `generic` is available (#112442) Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use `private` as the default address space. This is wrong for cases where the `generic` address space is available, and is corrected via this patch. In general, this AS map abuse is a bad hack and we should re-work it altogether, but at least after this patch we will stop being incorrect for e.g. OpenCL 2.0.
2024-10-22[clang codegen] fix crash emitting __array_rank (#113186)Congcong Cai1-1/+1
Fixed: #113044 the type of `ArrayTypeTraitExpr` can be changed, use i32 directly is incorrect. --------- Co-authored-by: Eli Friedman <efriedma@quicinc.com>