aboutsummaryrefslogtreecommitdiff
path: root/clang/test/CodeGen
AgeCommit message (Collapse)AuthorFilesLines
2024-05-01[clang codegen] Fix MS ABI detection of user-provided constructors. (#90151)llvmorg-18.1.5Eli Friedman1-0/+15
In the context of determining whether a class counts as an "aggregate", a constructor template counts as a user-provided constructor. Fixes #86384 (cherry picked from commit 3ab4ae9e58c09dfd8203547ba8916f3458a0a481)
2024-04-29[Clang] Handle structs with inner structs and no fields (#89126)Bill Wendling2-0/+61
A struct that declares an inner struct, but no fields, won't have a field count. So getting the offset of the inner struct fails. This happens in both C and C++: struct foo { struct bar { int Quantizermatrix[]; }; }; Here 'struct foo' has no fields. Closes: https://github.com/llvm/llvm-project/issues/88931
2024-03-27[clang][CodeGen] Allow `memcpy` replace with trivial auto var initAntonio Frighetto3-24/+12
When emitting the storage (or memory copy operations) for constant initializers, the decision whether to split a constant structure or array store into a sequence of field stores or to use `memcpy` is based upon the optimization level and the size of the initializer. In afe8b93ffdfef5d8879e1894b9d7dda40dee2b8d, we extended this by allowing constants to be split when the array (or struct) type does not match the type of data the address to the object (constant) is expected to contain. This may happen when `emitStoresForConstant` is called by `EmitAutoVarInit`, as the element type of the address gets shrunk. When this occurs, let the initializer be split into a bunch of stores only under `-ftrivial-auto-var-init=pattern`. Fixes: https://github.com/llvm/llvm-project/issues/84178.
2024-03-12[Clang][LoongArch] Fix wrong return value type of __iocsrrd_h (#84100)wanglei2-8/+8
relate: https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645016.html (cherry picked from commit 2f479b811274fede36535e34ecb545ac22e399c3)
2024-03-12[Clang][LoongArch] Precommit test for fix wrong return value type of ↵wanglei2-10/+40
__iocsrrd_h. NFC (cherry picked from commit aeda1a6e800e0dd6c91c0332b4db95094ad5b301)
2024-03-11[clang][fat-lto-objects] Make module flags match non-FatLTO pipelines (#83159)Paul Kirth1-1/+20
In addition to being rather hard to follow, there isn't a good reason why FatLTO shouldn't just share the same code for setting module flags for (Thin)LTO. This patch simplifies the logic and makes sure we use set these flags in a consistent way, independent of FatLTO. Additionally, we now test that output in the .llvm.lto section actually matches the output from Full and Thin LTO compilation. (cherry picked from commit 7d8b50aaab8e0f935e3cb1f3f397e98b9e3ee241)
2024-02-27MIPS: Fix asm constraints "f" and "r" for softfloat (#79116)llvmorg-18.1.0-rc4llvmorg-18.1.0YunQiang Su1-0/+11
This include 2 fixes: 1. Disallow 'f' for softfloat. 2. Allow 'r' for softfloat. Currently, 'f' is accpeted by clang, then LLVM meets an internal error. 'r' is rejected by LLVM by: couldn't allocate input reg for constraint 'r'. Fixes: #64241, #63632 --------- Co-authored-by: Fangrui Song <i@maskray.me> (cherry picked from commit c88beb4112d5bbf07d76a615ab7f13ba2ba023e6)
2024-02-26[AArch64] Make +pauth enabled in Armv8.3-a by default (#78027)Anatoly Trosinenko1-5/+5
Add AEK_PAUTH to ARMV8_3A in TargetParser and let it propagate to ARMV8R, as it aligns with GCC defaults. After adding AEK_PAUTH, several tests from TargetParserTest.cpp crashed when trying to format an error message, thus update a format string in AssertSameExtensionFlags to account for bitmask being pre-formatted as std::string. The CHECK-PAUTH* lines in aarch64-target-features.c are updated to account for the fact that FEAT_PAUTH support and pac-ret can be enabled independently and all four combinations are possible. (cherry picked from commit a52eea66795018550e95c4b060165a7250899298)
2024-02-15[AArch64][SME] Implement inline-asm clobbers for za/zt0 (#79276)Matthew Devereau1-0/+8
This enables specifing "za" or "zt0" to the clobber list for inline asm. This complies with the acle SME addition to the asm extension here: https://github.com/ARM-software/acle/pull/276 (cherry picked from commit d9c20e437fe110fb79b5ca73a52762e5b930b361)
2024-02-09[Clang][AArch64] Fix some target guards and remove +sve from tests. (#80681)Sander de Smalen48-216/+216
The TargetGuard fields for 'svldr[_vnum]_za' and 'svstr[_vnum]_za' were incorrectly set to `+sve` instead of `+sme`. This means that compiling code that uses these intrinsics requires compiling for both `+sve` as well as `+sme`. This PR also fixes the target guards for the `svadd` and `svsub` builtins that are enabled under `+sme2,+sme-i16i64` and `+sme2,+sme-f64f64`, as it initially did the following: ``` let TargetGuard = "+sme2" in { let TargetGuard = "+sme-i16i64" in { // Builtins defined here will be predicated only by // '+sme-i16i64', and not '+sme2,+sme-i16i64'. } } ``` This PR also removes `-target-feature +sve` from all the SME tests, to ensure that the SME features are sufficient to build the tests. (cherry picked from commit 3d186a77cf1aa979014a6443cb423a633c167d9f)
2024-01-31[RISCV] Support __riscv_v_fixed_vlen for vbool types. (#76551)Craig Topper6-4/+809
This adopts a similar behavior to AArch64 SVE, where bool vectors are represented as a vector of chars with 1/8 the number of elements. This ensures the vector always occupies a power of 2 number of bytes. A consequence of this is that vbool64_t, vbool32_t, and vool16_t can only be used with a vector length that guarantees at least 8 bits.
2024-01-30Backport [RISCV] Graduate Zicond to non-experimental (#79811) (#80018)Alex Bradbury1-1/+1
The Zicond extension was ratified in the last few months, with no changes that affect the LLVM implementation. Although there's surely more tuning that could be done about when to select Zicond or not, there are no known correctness issues. Therefore, we should mark support as non-experimental. (cherry-picked from commit d833b9d677c9dd0a35a211e2fdfada21ea9a464b)
2024-01-26[Driver,CodeGen] Support -mtls-dialect= (#79256)Fangrui Song1-0/+14
GCC supports -mtls-dialect= for several architectures to select TLSDESC. This patch supports the following values * x86: "gnu". "gnu2" (TLSDESC) is not supported yet. * RISC-V: "trad" (general dynamic), "desc" (TLSDESC, see #66915) AArch64 toolchains seem to support TLSDESC from the beginning, and the general dynamic model has poor support. Nobody seems to use the option -mtls-dialect= at all, so we don't bother with it. There also seems very little interest in AArch32's TLSDESC support. TLSDESC does not change IR, but affects object file generation. Without a backend option the option is a no-op for in-process ThinLTO. There seems no motivation to have fine-grained control mixing trad/desc for TLS, so we just pass -mllvm, and don't bother with a modules flag metadata or function attribute. Co-authored-by: Paul Kirth <paulkirth@google.com> (cherry picked from commit 36b4a9ccd9f7e04010476e6b2a311f2052a4ac20)
2024-01-23[clang][FatLTO] Avoid UnifiedLTO until it can support WPD/CFI (#79061)Paul Kirth1-18/+20
Currently, the UnifiedLTO pipeline seems to have trouble with several LTO features, like SplitLTO units, which means we cannot use important optimizations like Whole Program Devirtualization or security hardening instrumentation like CFI. This patch reverts FatLTO to using distinct pipelines for Full LTO and ThinLTO. It still avoids module cloning, since that was error prone.
2024-01-23[NFC] Size and element numbers are often swapped when calling calloc (#79081)AtariDreams1-2/+2
gcc-14 will now throw a warning if size and elements are swapped.
2024-01-23[clang] Add missing streaming attributes to SVE builtins (#79134)Sam Tebbs1-34/+40
This patch adds `IsStreamingCompatible` or `IsStreamingOrSVE2p1` to the SVE builtins that missed them.
2024-01-23[Clang][AArch64] Add diagnostics for builtins that use ZT0. (#79140)Sander de Smalen8-55/+55
Similar to what we did for ZA, this patch adds diagnostics to flag when using a ZT0 builtin in a function that does not have ZT0 state.
2024-01-23[AArch64][FMV] Support feature MOPS in Function Multi Versioning. (#78788)Alexandros Lamprineas1-15/+15
The patch adds support for FEAT_MOPS (Memory Copy and Memory Set instructions) in Function Multi Versioning. The bits [19:16] of the system register ID_AA64ISAR2_EL1 indicate whether FEAT_MOPS is implemented in AArch64 state. This information is accessible via ELF hwcaps.
2024-01-23[ARM] Introduce the v9.5-A architecture version to Arm targets (#78994)Lucas Duarte Prates1-0/+2
This introduces the Armv9.5-A architecture version to the Arm backend, following on from the existing implementation for AArch64 targets. Mode details about the Armv9.5-A architecture version can be found at: * https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023 * https://developer.arm.com/documentation/ddi0602/2023-09/
2024-01-23[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038)Saiyedul Islam1-1/+1
Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Corresponding llvm-objdump AMDGPU lit tests are updated in a follow-up PR.
2024-01-23[Clang] Amend SME attributes with support for ZT0. (#77941)Sander de Smalen1-0/+57
This patch builds on top of #76971 and implements support for: * __arm_new("zt0") * __arm_in("zt0") * __arm_out("zt0") * __arm_inout("zt0") * __arm_preserves("zt0")
2024-01-23[LoongArch] Add definitions and feature 'frecipe' for FP approximation ↵Ami-zhang7-0/+261
intrinsics/builtins (#78962) This PR adds definitions and 'frecipe' feature for FP approximation intrinsics/builtins. In additions, this adds and complements relative testcases.
2024-01-22Fix a bug in implementation of Smith's algorithm used in complex div. (#78330)Zahira Ammarguellat2-0/+75
This patch fixes a bug in Smith's algorithm (thanks to @andykaylor who detected it) and makes sure that last option in command line rules.
2024-01-22[AArch64][Clang] Fix linker error for function multiversioning (#74358)Dani2-201/+535
AArch64 part of https://github.com/llvm/llvm-project/pull/71706. Default version is now mangled with .default. Resolver for the TargetVersion need to be emitted from the CodeGenModule::EmitMultiVersionFunctionDefinition.
2024-01-22[AArch64][SME] Take arm_sme.h out of draft (#78961)Matthew Devereau60-60/+60
2024-01-22[AArch64][SME2] Refine fcvtu/fcvts/scvtf/ucvtf (#77947)Matthew Devereau1-16/+16
Rename intrinsics for fcvtu to fcvtzu and fcvts to fcvtzs. Use llvm_anyvector_ty for both multi vector returns and operands, therefore the return and operands can be specified in the intrinsic call, e.g. @llvm.aarch64.sve.scvtf.x4.nxv4f32.nxv4i32
2024-01-22[MTE] Disable all MTE protection of globals in sections (#78443)Mitch Phillips1-0/+8
Previous work in this area (#70186) disabled MTE in constructor sections. Looks like I missed one, ".preinit_array". Also, in the meantime, I found an exciting feature in the linker where globals placed into an explicit section, where the section name is a valid C identifer, gets an implicit '__start_<sectionname>' and '__stop_<sectionname>' symbol as well. This is convenient for iterating over some globals, but of course iteration over differently-tagged globals in MTE explodes. Thus, disable MTE globals for anything that has a section.
2024-01-20Warning for incorrect use of 'pure' attribute (#78200)kelbon2-6/+6
This adds a warning when applying the `pure` attribute along with the `const` attribute, or when applying the `pure` attribute to a function with a `void` return type (including constructors and destructors). Fixes https://github.com/llvm/llvm-project/issues/77482
2024-01-18[clang] Fix parenthesized list initialization of arrays not working with ↵Alan Zhao1-0/+57
`new` (#76976) This bug is caused by parenthesized list initialization not being implemented in `CodeGenFunction::EmitNewArrayInitializer(...)`. Parenthesized list initialization of `struct`s with `operator new` already works in Clang and is not affected by this bug. Additionally, fix the test new-delete.cpp as it incorrectly assumes that using parentheses with operator new to initialize arrays is illegal for C++ versions >= C++17. Fixes #68198
2024-01-18[RISCV] Use regexp to check negative extensions in test. NFCLuke Lau1-4/+5
Everytime an extension is added, this test will need to have the negative extension appended to multiple CHECK lines where we're overriding the arch. This is quite time consuming since it needs to be in the right order, so this replaces the explicit list of negative extensions with a regexp instead.
2024-01-18[Clang][SME] Add missing IsStreamingCompatible flag to svget, svcreate & ↵Kerry McLaughlin24-150/+308
svset (#78430)
2024-01-18[X86] Support "f16c" and "avx512fp16" for __builtin_cpu_supports (#78384)Freddy Ye1-0/+2
This resolves issue #65320. This also supports clarify sapphirerapids and cooperlake for cpu_specific/dispatch.
2024-01-16[X86] Add "Ws" constraint and "p" modifier for symbolic address/label ↵Fangrui Song1-0/+11
reference (#77886) Printing the raw symbol is useful in inline asm (e.g. getting the C++ mangled name, referencing a symbol in a custom way while ensuring it is not optimized out even if internal). Similar constraints are available in other targets (e.g. "S" for aarch64/riscv, "Cs" for m68k). ``` namespace ns { extern int var, a[4]; } void foo() { asm(".pushsection .xxx,\"aw\"; .dc.a %p0; .popsection" :: "Ws"(&ns::var)); asm(".reloc ., BFD_RELOC_NONE, %p0" :: "Ws"(&ns::a[3])); } ``` Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105576
2024-01-17[RISCV] Overwrite cpu target features for full arch string in target ↵Luke Lau1-4/+4
attribute (#77426) This patch reworks RISCVTargetInfo::initFeatureMap to fix the issue described in https://github.com/llvm/llvm-project/pull/74889#pullrequestreview-1773445559 (and is an alternative to #75804) When a full arch string is specified, a "full" list of extensions is now passed after the __RISCV_TargetAttrNeedOverride marker feature, which includes any negative features that disable ISA extensions. In initFeatureMap, there are now two code paths: 1. If the arch string was overriden, use the "full" list of override features, only adding back any non-isa features that were specified. Using the full list of positive and negative features will mean that the target-cpu will have no effect on the final arch, e.g. __attribute__((target("arch=rv64i"))) with -mcpu=sifive-x280 will have the features for rv64i, not a mix of both. 2. Otherwise, parse and *append* the list of implied features. By appending, we turn back on any features that might have been disabled by a negative extension, i.e. this handles the case fixed in #74889.
2024-01-17[X86] Use vXi1 for `k` constraint in inline asm (#77733)Phoebe Wang1-4/+4
Fixes #77172
2024-01-16Revert "[CloneFunction][DebugInfo] Avoid cloning DILocalVariables of inlined ↵Davide Italiano3-25/+21
functions (#75385)" This reverts commit fc6faa1113e9069f41b5500db051210af0eea843.
2024-01-16[Clang] Implement the 'counted_by' attribute (#76348)Bill Wendling2-1/+1837
The 'counted_by' attribute is used on flexible array members. The argument for the attribute is the name of the field member holding the count of elements in the flexible array. This information is used to improve the results of the array bound sanitizer and the '__builtin_dynamic_object_size' builtin. The 'count' field member must be within the same non-anonymous, enclosing struct as the flexible array member. For example: ``` struct bar; struct foo { int count; struct inner { struct { int count; /* The 'count' referenced by 'counted_by' */ }; struct { /* ... */ struct bar *array[] __attribute__((counted_by(count))); }; } baz; }; ``` This example specifies that the flexible array member 'array' has the number of elements allocated for it in 'count': ``` struct bar; struct foo { size_t count; /* ... */ struct bar *array[] __attribute__((counted_by(count))); }; ``` This establishes a relationship between 'array' and 'count'; specifically that 'p->array' must have *at least* 'p->count' number of elements available. It's the user's responsibility to ensure that this relationship is maintained throughout changes to the structure. In the following, the allocated array erroneously has fewer elements than what's specified by 'p->count'. This would result in an out-of-bounds access not not being detected: ``` struct foo *p; void foo_alloc(size_t count) { p = malloc(MAX(sizeof(struct foo), offsetof(struct foo, array[0]) + count * sizeof(struct bar *))); p->count = count + 42; } ``` The next example updates 'p->count', breaking the relationship requirement that 'p->array' must have at least 'p->count' number of elements available: ``` void use_foo(int index, int val) { p->count += 42; p->array[index] = val; /* The sanitizer can't properly check this access */ } ``` In this example, an update to 'p->count' maintains the relationship requirement: ``` void use_foo(int index, int val) { if (p->count == 0) return; --p->count; p->array[index] = val; } ```
2024-01-16[RISCV] CodeGen of RVE and ilp32e/lp64e ABIs (#76777)Wang Pengcheng5-141/+434
This commit includes the necessary changes to clang and LLVM to support codegen of `RVE` and the `ilp32e`/`lp64e` ABIs. The differences between `RVE` and `RVI` are: * `RVE` reduces the integer register count to 16(x0-x16). * The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits. `RVE` can be combined with all current standard extensions. The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are: * Only 6 integer argument registers (rather than 8). * Only 2 callee-saved registers (rather than 12). * A Stack Alignment of 32bits (rather than 128bits). * ilp32e isn't compatible with D ISA extension. If `ilp32e` or `lp64` is used with an ISA that has any of the registers x16-x31 and f0-f31, then these registers are considered temporaries. To be compatible with the implementation of ilp32e in GCC, we don't use aligned registers to pass variadic arguments and set stack alignment\ to 4-bytes for types with length of 2*XLEN. FastCC is also supported on RVE, while GHC isn't since there is only one avaiable register. Differential Revision: https://reviews.llvm.org/D70401
2024-01-16[Clang] Make sdot builtins available to SME (#77792)Sander de Smalen1-6/+15
See the specification for more details: * https://github.com/ARM-software/acle/blob/main/main/acle.md#udot-sdot-fdot-vectors * https://github.com/ARM-software/acle/blob/main/main/acle.md#udot-sdot-fdot-indexed
2024-01-15Revert "[Clang] Implement the 'counted_by' attribute (#76348)"Rashmi Mudduluru2-1837/+1
This reverts commit 164f85db876e61cf4a3c34493ed11e8f5820f968.
2024-01-15[TargetParser] Define AEK_FCMA and AEK_JSCVT for tsv110 (#75516)Qi Hu1-6/+6
This patch defines AEK_JSCVT and AEK_FCMA for CPU features FEAT_JSCVT and FEAT_FCMA respectively, and add them to the feature set of TSV110.
2024-01-15[Clang] Rename and enable boolean get, set, create and undef for sme2 (#77338)Sam Tebbs7-48/+114
This patch renames the get, set, create and undef functions that deal with tuples of booleans to match the ACLE at https://github.com/ARM-software/acle/pull/257/files . It also enables them for SME2.
2024-01-15[Clang][AArch64] Change SME attributes for shared/new/preserved state. (#76971)Sander de Smalen39-806/+806
This patch replaces the `__arm_new_za`, `__arm_shared_za` and `__arm_preserves_za` attributes in favour of: * `__arm_new("za")` * `__arm_in("za")` * `__arm_out("za")` * `__arm_inout("za")` * `__arm_preserves("za")` As described in https://github.com/ARM-software/acle/pull/276. One change is that `__arm_in/out/inout/preserves(S)` are all mutually exclusive, whereas previously it was fine to write `__arm_shared_za __arm_preserves_za`. This case is now represented with `__arm_in("za")`. The current implementation uses the same LLVM attributes under the hood, since `__arm_in/out/inout` are all variations of "shared ZA", so can use the existing `aarch64_pstate_za_shared` attribute in LLVM. #77941 will add support for the new "zt0" state as introduced with SME2.
2024-01-15[Clang][SME2] Fix PSEL builtin predicates (#77097)Kerry McLaughlin2-90/+114
PSEL intrinsics which return a predicate-as-counter are available in SVE2p1 & SME2.
2024-01-15[PowerPC] Implement fence builtin (#76495)Qiu Chaofan1-0/+24
2024-01-12[clang] Adjust -mlarge-data-threshold handling (#77958)Arthur Eubanks1-1/+4
Make it apply to x86-64 medium and large code models since that's what the backend does. Limit logic to exclude x86-32. Default to 0, let the driver set it to 65536 for the medium code model if one is not passed. Set it to 0 for the large code model by default to match gcc and since some users make assumptions about the large code model that any small data will break.
2024-01-12[clang[test] Require x86 target for new testsDavid Spickett2-0/+2
Fixes d199ab469949b104bc4fbb888251ee184fd53de1.
2024-01-12[LLVM][DWARF] Fix accelerator table switching between CU and TU (#77511)Alexander Yermolovich2-0/+131
Bug 1 is triggered when a TU is already created, and we process the same DICompositeType at a top level. We would switch to TU accelerator table, but would not switch back on early exit. As the result we would add CU entries to the TU accelerator table. When we try to write out TUs and normalize entries, the offsets for DIEs that are part of a CU would not have been computed, and it would assert on getOffset(). Bug 2 is triggered when processing nested TUs. When we exit from addDwarfTypeUnitType we switched back to CU accelerator table. If we were processing nested TUs, the rest of the entries from TUs would be added to CU accelerator table. When we write out TUs, all the DIE pointers will become invalid. Eventually it will assert during normalization step after CU is processed.
2024-01-12[AArch64][SME2] Fix SME2 mla/mls tests (#76711)Matthew Devereau5-76/+76
The ACLE defines these builtins as svmla[_single]_za32[_f32]_vg1x2, which means the SVE_ACLE_FUNC macro should test the overloaded forms as SVE_ACLE_FUNC(svmla,_single,_za32,_f32,_vg1x2) https://github.com/ARM-software/acle/blob/b88cbf7e9c104100bb5016c848763171494dee44/main/acle.md?plain=1#L10170-L10205
2024-01-12[AArch64][SME] Fix multi vector cvt builtins (#77656)Matthew Devereau1-16/+16
This fixes cvt multi vector builtins that erroneously had inverted return vectors and vector parameters. This caused the incorrect instructions to be emitted.