aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2024-11-04aarch64: Fix incorrect LS64 documentationRichard Sandiford1-3/+2
As Yuta Mukai pointed out, the manual wrongly said that LS64 is enabled by default for Armv8.7-A and above, and for Armv9.2-A and above. LS64 is not mandatory at any architecture level (and the code correctly implemented that). I think this was a leftover from an early version of the spec. gcc/ * doc/invoke.texi: Fix documentation of LS64 so that it's not implied by Armv8.7-A or Armv9.2-A.
2024-11-04aarch64: Add support for FUJITSU-MONAKA (-mcpu=fujitsu-monaka) CPUYuta Mukai5-2/+69
This patch adds initial support for FUJITSU-MONAKA CPU. The cost model will be corrected in the future. 2024-11-04 Yuta Mukai <mukai.yuta@fujitsu.com> gcc/ChangeLog: * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add fujitsu-monaka. * config/aarch64/aarch64-tune.md: Regenerate. * config/aarch64/aarch64.cc: Include fujitsu-monaka tuning model. * doc/invoke.texi: Document -mcpu=fujitsu-monaka. * config/aarch64/tuning_models/fujitsu_monaka.h: New file.
2024-11-04Move vect_update_inits_of_drsRichard Biener1-5/+4
Move vect_update_inits_of_drs to after setting up the epilog metadata. * tree-vect-loop.cc (update_epilogue_loop_vinfo): Update DR inits after adjusting the epilog metadata.
2024-11-04Preserve ->move_dr behavior when adjusting epilogue infoRichard Biener1-1/+0
When update_epilogue_loop_vinfo relates the shared loop DRs with the epilogue stmts and infos it should not fiddle with how pattern recognition applied move_dr. * tree-vect-loop.cc (update_epilogue_loop_vinfo): A DRs main stmt vinfo dr_aux should refer to a pattern stmt which is how move_dr sets this up. We shouldn't undo this.
2024-11-04Move updated versioning threshold computeRichard Biener1-11/+12
The following moves computing the combined main + epilogue loop versioning threshold until we figured the epilogues to use rather than incrementally updating it with the chance to joust candidates after the fact. * tree-vect-loop.cc (vect_analyze_loop): Move lowest_th compute until after epilogue_vinfos is final.
2024-11-04Add regression testEric Botcazou1-0/+14
This is for the latest fix made to Selected_Length_Checks in Checks. gcc/testsuite * gnat.dg/specs/array7.ads: New test.
2024-11-04simplify-rtx: Simplify ROTATE:HI (X:HI, 8) into BSWAP:HI (X)Kyrylo Tkachov2-0/+31
With recent patch to improve detection of vector rotates at RTL level combine now tries matching a V8HImode rotate by 8 in the example in the testcase. We can teach AArch64 to emit a REV16 instruction for such a rotate but really this operation corresponds to the RTL code BSWAP, for which we already have the right patterns. BSWAP is arguably a simpler representation than ROTATE here because it has only one operand, so let's teach simplify-rtx to generate it. With this patch the testcase now generates the simplest form: .L2: ldr q31, [x1, x0] rev16 v31.16b, v31.16b str q31, [x0, x2] add x0, x0, 16 cmp x0, 2048 bne .L2 instead of the previous: .L2: ldr q31, [x1, x0] shl v30.8h, v31.8h, 8 usra v30.8h, v31.8h, 8 str q30, [x0, x2] add x0, x0, 16 cmp x0, 2048 bne .L2 IMO ideally the bswap detection would have been done during vectorisation time and used the expanders for that, but teaching simplify-rtx to do this transformation is fairly straightforward and, unlike at tree level, we have the native RTL BSWAP code. This change is not enough to generate the equivalent sequence in SVE, but that is something that should be tackled separately. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Simplify (rotate:HI x:HI, 8) -> (bswap:HI x:HI). gcc/testsuite/ * gcc.target/aarch64/rot_to_bswap.c: New test.
2024-11-04aarch64: Emit XAR for vector rotates where possibleKyrylo Tkachov2-6/+121
We can make use of the integrated rotate step of the XAR instruction to implement most vector integer rotates, as long we zero out one of the input registers for it. This allows for a lower-latency sequence than the fallback SHL+USRA, especially when we can hoist the zeroing operation away from loops and hot parts. This should be safe to do for 64-bit vectors as well even though the XAR instructions operate on 128-bit values, as the bottom 64-bit results is later accessed through the right subregs. This strategy is used whenever we have XAR instructions, the logic in aarch64_emit_opt_vec_rotate is adjusted to resort to expand_rotate_as_vec_perm only when it's expected to generate a single REV* instruction or when XAR instructions are not present. With this patch we can gerate for the input: v4si G1 (v4si r) { return (r >> 23) | (r << 9); } v8qi G2 (v8qi r) { return (r << 3) | (r >> 5); } the assembly for +sve2: G1: movi v31.4s, 0 xar z0.s, z0.s, z31.s, #23 ret G2: movi v31.4s, 0 xar z0.b, z0.b, z31.b, #5 ret instead of the current: G1: shl v31.4s, v0.4s, 9 usra v31.4s, v0.4s, 23 mov v0.16b, v31.16b ret G2: shl v31.8b, v0.8b, 3 usra v31.8b, v0.8b, 5 mov v0.8b, v31.8b ret Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Add generation of XAR sequences when possible. gcc/testsuite/ * gcc.target/aarch64/rotate_xar_1.c: New test.
2024-11-04aarch64: Optimize vector rotates as vector permutes where possibleKyrylo Tkachov7-0/+232
Some vector rotate operations can be implemented in a single instruction rather than using the fallback SHL+USRA sequence. In particular, when the rotate amount is half the bitwidth of the element we can use a REV64,REV32,REV16 instruction. More generally, rotates by a byte amount can be implented using vector permutes. This patch adds such a generic routine in expmed.cc called expand_rotate_as_vec_perm that calculates the required permute indices and uses the expand_vec_perm_const interface. On aarch64 this ends up generating the single-instruction sequences above where possible and can use LDR+TBL sequences too, which are a good choice. With help from Richard, the routine should be VLA-safe. However, the only use of expand_rotate_as_vec_perm introduced in this patch is in aarch64-specific code that for now only handles fixed-width modes. A runtime aarch64 test is added to ensure the permute indices are not messed up. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * expmed.h (expand_rotate_as_vec_perm): Declare. * expmed.cc (expand_rotate_as_vec_perm): Define. * config/aarch64/aarch64-protos.h (aarch64_emit_opt_vec_rotate): Declare prototype. * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Implement. * config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm<mode>): Call the above. gcc/testsuite/ * gcc.target/aarch64/vec-rot-exec.c: New test. * gcc.target/aarch64/simd/pr117048_2.c: New test.
2024-11-04PR 117048: aarch64: Add define_insn_and_split for vector ROTATEKyrylo Tkachov2-0/+102
The ultimate goal in this PR is to match the XAR pattern that is represented as a (ROTATE (XOR X Y) VCST) from the ACLE intrinsics code in the testcase. The first blocker for this was the missing recognition of ROTATE in simplify-rtx, which is fixed in the previous patch. The next problem is that once the ROTATE has been matched from the shifts and orr/xor/plus, it will try to match it in an insn before trying to combine the XOR into it. But as we don't have a backend pattern for a vector ROTATE this recog fails and combine does not try the followup XOR+ROTATE combination which would have succeeded. This patch solves that by introducing a sort of "scaffolding" pattern for vector ROTATE, which allows it to be combined into the XAR. If it fails to be combined into anything the splitter will break it back down into the SHL+USRA sequence that it would have emitted. By having this splitter we can special-case some rotate amounts in the future to emit more specialised instructions e.g. from the REV* family. This can be done if the ROTATE is not combined into something else. This optimisation is done in the next patch in the series. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ PR target/117048 * config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm<mode>): New define_insn_and_split. gcc/testsuite/ PR target/117048 * gcc.target/aarch64/simd/pr117048.c: New test.
2024-11-04aarch64: Use canonical RTL representation for SVE2 XAR and extend it to ↵Kyrylo Tkachov13-59/+191
fixed-width modes The MD pattern for the XAR instruction in SVE2 is currently expressed with non-canonical RTL by using a ROTATERT code with a constant rotate amount. Fix it by using the left ROTATE code. This necessitates splitting out the expander separately to translate the immediate coming from the intrinsic from a right-rotate to a left-rotate immediate. Additionally, as the SVE2 XAR instruction is unpredicated and can handle all element sizes from .b to .d, it is a good fit for implementing the XOR+ROTATE operation for Advanced SIMD modes where the TARGET_SHA3 cannot be used (that can only handle V2DImode operands). Therefore let's extend the accepted modes of the SVE2 patternt to include the Advanced SIMD integer modes. This leads to some tests for the svxar* intrinsics to fail because they now simplify to a plain EOR when the rotate amount is the width of the element. This simplification is desirable (EOR instructions have better or equal throughput than XAR, and they are non-destructive of their input) so the tests are adjusted. For V2DImode XAR operations we should prefer the Advanced SIMD version when it is available (TARGET_SHA3) because it is non-destructive, so restrict the SVE2 pattern accordingly. Tests are added to confirm this. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for mainline? Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * config/aarch64/iterators.md (SVE_ASIMD_FULL_I): New mode iterator. * config/aarch64/aarch64-sve2.md (@aarch64_sve2_xar<mode>): Use SVE_ASIMD_FULL_I modes. Use ROTATE code for the rotate step. Adjust output logic. * config/aarch64/aarch64-sve-builtins-sve2.cc (svxar_impl): Define. (svxar): Use the above. gcc/testsuite/ * gcc.target/aarch64/xar_neon_modes.c: New test. * gcc.target/aarch64/xar_v2di_nonsve.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_s16.c: Scan for EOR rather than XAR. * gcc.target/aarch64/sve2/acle/asm/xar_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_u8.c: Likewise.
2024-11-04PR 117048: simplify-rtx: Simplify (X << C1) [+,^] (X >> C2) into ROTATEKyrylo Tkachov1-48/+156
simplify-rtx can transform (X << C1) | (X >> C2) into ROTATE (X, C1) when C1 + C2 == mode-width. But the transformation is also valid for PLUS and XOR. Indeed GIMPLE can also do the fold. Let's teach RTL to do it too. The motivating testcase for this is in AArch64 intrinsics: uint64x2_t G2(uint64x2_t a, uint64x2_t b) { uint64x2_t c = veorq_u64(a, b); return veorq_u64(vaddq_u64(c, c), vshrq_n_u64(c, 63)); } which I was hoping to fold to a single XAR (a ROTATE+XOR instruction) but GCC was failing to detect the rotate operation for two reasons: 1) The combination of the two arms of the expression is done under XOR rather than IOR that simplify-rtx currently supports. 2) The ASHIFT operation is actually a (PLUS X X) operation and thus is not detected as the LHS of the two arms we require. The patch fixes both issues. The analysis of the two arms of the rotation expression is factored out into a common helper simplify_rotate which is then used in the PLUS, XOR, IOR cases in simplify_binary_operation_1. The check-assembly testcase for this is added in the following patch because it needs some extra AArch64 backend work, but I've added self-tests in this patch to validate the transformation. Bootstrapped and tested on aarch64-none-linux-gnu Signed-off-by: Kyrylo Tkachov <ktachov@nvidia.com> PR target/117048 * simplify-rtx.cc (extract_ashift_operands_p): Define. (simplify_rotate_op): Likewise. (simplify_context::simplify_binary_operation_1): Use the above in the PLUS, IOR, XOR cases. (test_vector_rotate): Define. (test_vector_ops): Use the above.
2024-11-04Daily bump.GCC Administrator3-1/+26
2024-11-03docs: Document that __builtin_assoc_barrier also can be used for FMAs [PR115023]Andrew Pinski1-2/+14
I noticed that __builtin_assoc_barrier makes a differnce for FMAs formation but it was not documented. This adds that documentation even with a small example. Build the HTML documents to make sure everything looks correct. gcc/ChangeLog: PR middle-end/115023 * doc/extend.texi (__builtin_assoc_barrier): Document ffp-contract=fast and FMA usage. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-03match: Fix `a != 0 ? a - 1 : 0` pattern [PR117363]Andrew Pinski3-2/+56
There are a couple of things wrong with this pattern which I missed during the review. First each nop_convert should be nop_convert1 or nop_convert2. Second is we need to the minus in the same type as the minus was originally so we don't introduce extra undefined behavior (signed integer overflow). And we need a convert into the new type too. pr117363-1.c tests not introducing extra undefined behavior. pr117363-2.c tests the casting to the correct final type, ldist introduces the cond_expr here. Bootstraped and tested on x86_64-linux-gnu. PR tree-optimization/117363 gcc/ChangeLog: * match.pd (`a != 0 ? a - 1 : 0`): Fix type handling and nop_convert handling. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr117363-1.c: New test. * gcc.dg/torture/pr117363-2.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-03Fortran: Fix associate_69.f90 that fails on some platforms [PR115700]Paul Thomas1-5/+0
2024-11-03 Paul Thomas <pault@gcc.gnu.org> gcc/testsuite/ PR fortran/115700 * gfortran.dg/associate_69.f90: Remove the test that produces a variable string length because the optimized count depends on the platform. This is tested in associate_70.f90.
2024-11-03Daily bump.GCC Administrator6-1/+124
2024-11-02testsuite: Require fpic support for pr116887.cDimitar Dimitrov1-0/+1
Test case pr116887.c is passing -fpic, so mark it as such. With this patch the test is now properly marked as unsupported for pru-unknown-elf. Test still passes for x86_64-pc-linux-gnu. gcc/testsuite/ChangeLog: * gcc.dg/pr116887.c: Require effective target fpic. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2024-11-02testsuite: Require trampoline support for pr117245.cDimitar Dimitrov1-0/+1
Test case pr117245.c is using trampolines, so mark it as such. With this patch the test is now properly marked as unsupported for pru-unknown-elf. Test still passes for x86_64-pc-linux-gnu. gcc/testsuite/ChangeLog: * gcc.dg/pr117245.c: Require effective target with trampolines. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2024-11-02Add UMASKR and UMASKL intrinsics.Thomas Koenig9-9/+226
gcc/fortran/ChangeLog: * check.cc (gfc_check_mask): Handle BT_INSIGNED. * gfortran.h (enum gfc_isym_id): Add GFC_ISYM_UMASKL and GFC_ISYM_UMASKR. * gfortran.texi: List UMASKL and UMASKR, remove unsigned future unsigned arguments for MASKL and MASKR. * intrinsic.cc (add_functions): Add UMASKL and UMASKR. * intrinsic.h (gfc_simplify_umaskl): New function. (gfc_simplify_umaskr): New function. (gfc_resolve_umasklr): New function. * intrinsic.texi: Document UMASKL and UMASKR. * iresolve.cc (gfc_resolve_umasklr): New function. * simplify.cc (gfc_simplify_umaskr): New function. (gfc_simplify_umaskl): New function. gcc/testsuite/ChangeLog: * gfortran.dg/unsigned_39.f90: New test.
2024-11-02gimplify: Fix up RAW_DATA_CST related ICE [PR117384]Jakub Jelinek2-0/+41
Apparently tree_output_constant_def doesn't strictly guarantee that the returned VAR_DECL will have the same or uselessly convertible type as the type of the constant passed to it, compare_constants says: /* For arrays, check that mode, size and storage order match. */ /* For record and union constructors, require exact type equality. */ The older use of tree_output_constant_def in gimplify.cc was already handling this right: ctor = tree_output_constant_def (ctor); if (!useless_type_conversion_p (type, TREE_TYPE (ctor))) ctor = build1 (VIEW_CONVERT_EXPR, type, ctor); but the spot I've added for RAW_DATA_CST missed this. So, the following patch adds that. 2024-11-02 Jakub Jelinek <jakub@redhat.com> PR middle-end/117384 * gimplify.cc (gimplify_init_ctor_eval): Add VIEW_CONVERT_EXPR around rctor if it doesn't have expected type. * c-c++-common/init-7.c: New test.
2024-11-02c++/modules: Propagate TYPE_CANONICAL for partial specialisations [PR113814]Nathaniel Shead5-5/+60
In some cases, when we go to import a partial specialisation there might already be an incomplete implicit instantiation in the specialisation table. This causes ICEs described in the linked PR as we now have two separate matching specialisations for this same arguments with different TYPE_CANONICAL. We already support multiple specialisations with the same args however, as they may be differently constrained. So we can solve this by simply ensuring that the TYPE_CANONICAL of the new partial specialisation matches the existing specialisation. PR c++/113814 gcc/cp/ChangeLog: * pt.cc (add_mergeable_specialization): Propagate TYPE_CANONICAL. gcc/testsuite/ChangeLog: * g++.dg/modules/partial-6.h: New test. * g++.dg/modules/partial-6_a.H: New test. * g++.dg/modules/partial-6_b.H: New test. * g++.dg/modules/partial-6_c.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Co-authored-by: Jason Merrill <jason@redhat.com>
2024-11-02c++/modules: Fix recursive dependencies [PR116317]Nathaniel Shead3-25/+89
In cases like the linked PR we sometimes get mutually recursive dependencies that both rely on the other to have been streamed as part of their merge key information. In the linked PR, this causes an ICE. The root cause is that 'sort_cluster' is not correctly ordering the dependencies; both the element_t specialisation and the reverse_adaptor::first function decl depend on each other, but by streaming element_t first it ends up trying to stream itself recursively as part of calculating its own merge key, which apart from the checking ICE will also cause issues on stream-in, as the merge key will not properly stream. There is a comment already in 'sort_cluster' describing this issue, but it says: Finding the single cluster entry dep is very tricky and expensive. Let's just not do that. It's harmless in this case anyway. However in this case it was not harmless: it's just somewhat luck that the sorting happened to work for the existing cases in the testsuite. This patch solves the issue by noting any declarations that rely on deps first seen within their own merge key. This declaration gets marked as an "entry" dep; any of these deps that end up recursively referring back to that entry dep as part of their own merge key do not. Then within sort_cluster we can ensure that the entry dep is written to be streamed first of its cluster; this will ensure that any other deps are just emitted as back-references, and the mergeable dep itself will structurally decompose. PR c++/116317 gcc/cp/ChangeLog: * module.cc (depset::DB_MAYBE_RECURSIVE_BIT): New flag. (depset::DB_ENTRY_BIT): New flag. (depset::is_maybe_recursive): New accessor. (depset::is_entry): New accessor. (depset::hash::writing_merge_key): New field. (trees_out::decl_value): Inform dep_hash while we're writing the merge key information for a decl. (depset::hash::add_dependency): Find recursive deps and mark the entry point. (sort_cluster): Ensure that the entry dep is streamed first. gcc/testsuite/ChangeLog: * g++.dg/modules/late-ret-4_a.H: New test. * g++.dg/modules/late-ret-4_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2024-11-01[committed] ft32 doesn't support trampolines.Jeff Law1-0/+1
The ft32 has never supported trampolines, but the target supports bits weren't appropriately updated. Fixed thusly. gcc/testsuite * lib/target-supports.exp (check_effective_target_trampolines): ft32 does not support trampolines.
2024-11-01[committed] Make LRA default for ft32 and remove -mlra optionJeff Law3-17/+3
I was looking to clean up an old patch I'm carrying in my tester. My first thought was that ft32 was likely going to be deprecated because it wasn't using LRA -- which in turn would mean the patch in question could just be removed. But then I checked, ft32 has an LRA option and if turned on it gets the exact same test results as with reload. While the port mentions a failure with sieve.c, that's been there since the port was introduced in 2015. It's working well enough that I think just converting it is the right thing to do. The testsuite patch which precipitated this one will follow separately. I've kept the -mlra option for compatibility sake, but it's ignored. Pushing to the trunk. gcc/ * config/ft32/ft32.cc (ft32_lra_p): Remove. (TARGET_LRA_P): Likewise. * config/ft32/ft32.opt: Make -mlra ignored. * doc/invoke.texi: Adjust documentation for -mlra on ft32.
2024-11-01analyzer: use std::unique_ptr in "to_json" functionsDavid Malcolm28-151/+161
No functional change intended. gcc/analyzer/ChangeLog: * analyzer.cc: Include "make-unique.h". Convert "to_json" functions to use std::unique_ptr. * call-string.cc: Likewise. * constraint-manager.cc: Likewise. * diagnostic-manager.cc: Likewise. * engine.cc: Likewise. * program-point.cc: Likewise. * program-state.cc: Likewise. * ranges.cc: Likewise. * region-model.cc: Likewise. * region.cc: Likewise. * svalue.cc: Likewise. * sm.cc: Likewise. * store.cc: Likewise. * supergraph.cc: Likewise. * analyzer.h: Convert "to_json" functions to return std::unique_ptr. * call-string.h: Likewise. * constraint-manager.h: Likewise. (bounded_range::set_json_attr): Pass "obj" by reference. * diagnostic-manager.h: Convert "to_json" functions to return std::unique_ptr. * exploded-graph.h: Likewise. * program-point.h: Likewise. * program-state.h: Likewise. * ranges.h: Likewise. * region-model.h: Likewise. * region.h: Likewise. * sm.h: Likewise. * store.h: Likewise. * supergraph.h: Likewise. * svalue.h: Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-11-02Daily bump.GCC Administrator9-1/+953
2024-11-01builtins: Fix expand_builtin_prefetch [PR117407]Jakub Jelinek1-2/+2
On Fri, Nov 01, 2024 at 04:47:35PM +0800, Haochen Jiang wrote: > * builtins.cc (expand_builtin_prefetch): Use IN_RANGE to > avoid second usage of INTVAL. I doubt this has been actually tested. > --- a/gcc/builtins.cc > +++ b/gcc/builtins.cc > @@ -1297,7 +1297,7 @@ expand_builtin_prefetch (tree exp) > else > op1 = expand_normal (arg1); > /* Argument 1 must be 0, 1 or 2. */ > - if (INTVAL (op1) < 0 || INTVAL (op1) > 2) > + if (IN_RANGE (INTVAL (op1), 0, 2)) > { > warning (0, "invalid second argument to %<__builtin_prefetch%>;" > " using zero"); > @@ -1315,7 +1315,7 @@ expand_builtin_prefetch (tree exp) > else > op2 = expand_normal (arg2); > /* Argument 2 must be 0, 1, 2, or 3. */ > - if (INTVAL (op2) < 0 || INTVAL (op2) > 3) > + if (IN_RANGE (INTVAL (op2), 0, 3)) > { > warning (0, "invalid third argument to %<__builtin_prefetch%>; using zero"); > op2 = const0_rtx; because it inverts the tests, previously it was warning when op1 wasn't 0, 1, 2, now it warns when it is 0, 1 or 2, previously it was warning when op2 wasn't 0, 1, 2 or 3, now it warns when it is 0, 1, 2, or 3. Fixed thusly. 2024-11-01 Jakub Jelinek <jakub@redhat.com> PR bootstrap/117407 * builtins.cc (expand_builtin_prefetch): Use !IN_RANGE rather than IN_RANGE.
2024-11-01Update bitwise_or op_range.Andrew MacLeod2-1/+49
If the LHS of a bitwise OR is positive, then so are both operands when using op1_range or op2_range. gcc/ * range-op.cc (operator_bitwise_or::op1_range): If LHS is signed positive, so are both operands. gcc/testsuite * g++.dg/cpp23/attr-assume-opt.C (f2b): Alternate flow test.
2024-11-01Reimplement 'assume' processing pass.Andrew MacLeod6-271/+408
Rework the assume pass to work properly and fail conservatively when it does. Also move it to its own file. PR tree-optimization/117287 gcc/ * Makefile.in (IBJS): Add tree-assume.o * gimple-range.cc (assume_query::assume_range_p): Remove. (assume_query::range_of_expr): Remove. (assume_query::assume_query): Move to tree-assume.cc. (assume_query::~assume_query): Remove. (assume_query::calculate_op): Move to tree-assume.cc. (assume_query::calculate_phi): Likewise. (assume_query::check_taken_edge): Remove. (assume_query::calculate_stmt): Move to tree-assume.cc. (assume_query::dump): Remove. * gimple-range.h (class assume_query): Move to tree-assume.cc * tree-assume.cc: New * tree-vrp.cc (struct pass_data_assumptions): Move to tree-assume.cc. (class pass_assumptions): Likewise. (make_pass_assumptions): Likewise. gcc/testsuite/ * g++.dg/cpp23/pr117287-attr.C: New.
2024-11-01Make fur_edge accessible.Andrew MacLeod2-20/+14
Move the decl of fur_edge out of the source file into the header file. * gimple-range-fold.cc (class fur_edge): Relocate from here. (fur_edge::fur_edge): Also move to: * gimple-range-fold.h (class fur_edge): Relocate to here. (fur_edge::fur_edge): Likewise.
2024-11-01c++: Adjust docs and option descriptions for the publishing of C++23Jakub Jelinek3-16/+26
Now that C++23 has been finally published, the following patch attempts to mention it in the option descriptions and documentation. Given that it has been published about 1.5 years after being finalized and has the 14882:2024 document number pair rather than :2023, I wasn't sure when exactly to use 2023 (as informal name) and when 2024 (as year of publishing), so I've tried to use 2024 in standards.texi which talks more formally about the standards and a note that it has been published in 2024 when it is talked about more informally. I remember at least one older edition has been published in January too, but the ISO pages pretend it was published still in December of the previous year, in this case it doesn't. 2024-11-01 Jakub Jelinek <jakub@redhat.com> gcc/ * doc/standards.texi (C++ Language): Mention also the 2024 revision and -std=gnu++23 option. * doc/invoke.texi (-std=): Adjust description of c++23, c++2b, gnu++23 and gnu++2b now that ISO C++ 14882:2024 is published. gcc/c-family/ * c.opt (std=c++2b, std=c++23, std=gnu++2b, std=gnu++23): Adjust description now that ISO C++ 14882:2024 is published.
2024-11-01c++: Attempt to implement C++26 P3034R1 - Module Declarations Shouldn't be ↵Jakub Jelinek34-4/+232
Macros [PR114461] This is an attempt to implement the https://wg21.link/p3034r1 paper, but I'm afraid the wording in the paper is bad for multiple reasons. I think I understand the intent, that the module name and partition if any shouldn't come from macros so that they can be scanned for without preprocessing, but on the other side doesn't want to disable macro expansion in pp-module altogether, because e.g. the optional attribute in module-declaration would be nice to come from macros as which exact attribute is needed might need to be decided based on preprocessor checks. The paper added https://eel.is/c++draft/cpp.module#2 which uses partly the wording from https://eel.is/c++draft/cpp.module#1 The first issue I see is that using that "defined as an object-like macro" from there means IMHO something very different in those 2 paragraphs. As per https://eel.is/c++draft/cpp.pre#7.sentence-1 preprocessing tokens in preprocessing directives aren't subject to macro expansion unless otherwise stated, and so the export and module tokens aren't expanded and so the requirement that they aren't defined as an object-like macro makes perfect sense. The problem with the new paragraph is that https://eel.is/c++draft/cpp.module#3.sentence-1 says that the rest of the tokens are macro expanded and after macro expansion none of the tokens can be defined as an object-like macro, if they would be, they'd be expanded to that. So, I think either the wording needs to change such that not all preprocessing tokens after module are macro expanded, only those which are after the pp-module-name and if any pp-module-partition tokens, or all tokens after module are macro expanded but none of the tokens in pp-module-name and pp-module-partition if any must come from macro expansion. The patch below implements it as if the former would be specified (but see later), so essentially scans the preprocessing tokens after module without expansion, if the first one is an identifier, it disables expansion for it and then if followed by . or : expects another such identifier (again with disabled expansion), but stops after second : is seen. Second issue is that while the global-module-fragment start is fine, matches the syntax of the new paragraph where the pp-tokens[opt] aren't present, there is also private-module-fragment in the syntax where module is followed by : private ; and in that case the colon doesn't match the pp-module-name grammar and appears now to be invalid. I think the https://eel.is/c++draft/cpp.module#2 paragraph needs to change so that it allows also that pp-tokens of a pp-module may also be : pp-tokens[opt] (and in that case, I think the colon shouldn't come from a macro and private and/or ; can). Third issue is that there are too many pp-tokens in https://eel.is/c++draft/cpp.module , one is all the tokens between module keyword and the semicolon and one is the optional extra tokens after pp-module-partition (if any, if missing, after pp-module). Perhaps introducing some other non-terminal would help talking about it? So in "where the pp-tokens (if any) shall not begin with a ( preprocessing token" it isn't obvious which pp-tokens it is talking about (my assumption is the latter) and also whether ( can't appear there just before macro expansion or also after expansion. The patch expects only before expansion, so #define F (); export module foo F would be valid during preprocessing but obviously invalid during compilation, but #define foo(n) n; export module foo (3) would be invalid already during preprocessing. The last issue applies only if the first issue is resolved to allow expansion of tokens after : if first token, or after pp-module-partition if present or after pp-module-name if present. When non-preprocessing scanner sees export module foo.bar:baz.qux; it knows nothing can come from preprocessing macros and is ok, but if it sees export module foo.bar:baz qux then it can't know whether it will be export module foo.bar:baz; or export module foo.bar:baz [[]]; or export module foo.bar:baz.freddy.garply; because qux could be validly a macro, which expands to ; or [[]]; or .freddy.garply; etc. So, either the non-preprocessing scanner would need to note it as possible export of foo.bar:baz* module partitions and preprocess if it needs to know the details or just compile, or if that is not ok, the wording would need to rule out that the expansion of (the second) pp-tokens if any can't start with . or : (colon would be only problematic if it isn't present in the tokens before it already). So, if e.g. defining qux above to . whatever is invalid, then the scanner can rely it sees the whole module name and partition. The patch below implements what is above described as the first variant of the first issue resolution, i.e. disables expansion of as many tokens as could be in the valid module name and module partition syntax, but as soon as it e.g. sees two adjacent identifiers, the second one can be macro expanded. If it is macro expanded though, the expansion can't start with . or :, and if it expands to nothing, tokens after it (whether they come from macro expansion or not) can't start with . or :. So, effectively: #define SEMI ; export module SEMI used to be valid and isn't anymore, #define FOO bar export module FOO; isn't valid, #define COLON : export module COLON private; isn't valid, #define BAR baz export module foo.bar:baz.qux.BAR; isn't valid, #define BAZ .qux export module foo BAZ; isn't valid, #define FREDDY :garply export module foo FREDDY; isn't valid, while #define QUX [[]] export module foo QUX; or #define GARPLY private module : GARPLY; etc. is. 2024-11-01 Jakub Jelinek <jakub@redhat.com> PR c++/114461 libcpp/ * include/cpplib.h: Implement C++26 P3034R1 - Module Declarations Shouldn’t be Macros (or more precisely its expected intent). (NO_DOT_COLON): Define. * internal.h (struct cpp_reader): Add diagnose_dot_colon_from_macro_p member. * lex.cc (cpp_maybe_module_directive): For pp-module, if module keyword is followed by CPP_NAME, ensure all CPP_NAME tokens possibly matching module name and module partition syntax aren't expanded and aren't defined as object-like macros. Verify first token after that doesn't start with open paren. If the next token after module name/partition is CPP_NAME defined as macro, set NO_DOT_COLON flag on it. * macro.cc (cpp_get_token_1): Set pfile->diagnose_dot_colon_from_macro_p if token to be expanded has NO_DOT_COLON bit set in flags. Before returning, if pfile->diagnose_dot_colon_from_macro_p is true and not returning CPP_PADDING or CPP_COMMENT and not during macro expansion preparation, set pfile->diagnose_dot_colon_from_macro_p to false and diagnose if returning CPP_DOT or CPP_COLON. gcc/testsuite/ * g++.dg/modules/cpp-7.C: New test. * g++.dg/modules/cpp-8.C: New test. * g++.dg/modules/cpp-9.C: New test. * g++.dg/modules/cpp-10.C: New test. * g++.dg/modules/cpp-11.C: New test. * g++.dg/modules/cpp-12.C: New test. * g++.dg/modules/cpp-13.C: New test. * g++.dg/modules/cpp-14.C: New test. * g++.dg/modules/cpp-15.C: New test. * g++.dg/modules/cpp-16.C: New test. * g++.dg/modules/cpp-17.C: New test. * g++.dg/modules/cpp-18.C: New test. * g++.dg/modules/cpp-19.C: New test. * g++.dg/modules/cpp-20.C: New test. * g++.dg/modules/pmp-4.C: New test. * g++.dg/modules/pmp-5.C: New test. * g++.dg/modules/pmp-6.C: New test. * g++.dg/modules/token-6.C: New test. * g++.dg/modules/token-7.C: New test. * g++.dg/modules/token-8.C: New test. * g++.dg/modules/token-9.C: New test. * g++.dg/modules/token-10.C: New test. * g++.dg/modules/token-11.C: New test. * g++.dg/modules/token-12.C: New test. * g++.dg/modules/token-13.C: New test. * g++.dg/modules/token-14.C: New test. * g++.dg/modules/token-15.C: New test. * g++.dg/modules/token-16.C: New test. * g++.dg/modules/dir-only-3.C: Expect an error. * g++.dg/modules/dir-only-4.C: Expect an error. * g++.dg/modules/dir-only-5.C: New test. * g++.dg/modules/atom-preamble-2_a.C: In export module malcolm; replace malcolm with kevin. Don't define malcolm macro. * g++.dg/modules/atom-preamble-4.C: Expect an error. * g++.dg/modules/atom-preamble-5.C: New test.
2024-11-02Use IN_RANGE in prefetch builtinHaochen Jiang1-2/+2
These are the last minute changes that should apply to MOVRS patch but disappeared in patch. Using IN_RANGE will avoid second usage of INTVAL for prefetch check. gcc/ChangeLog: * builtins.cc (expand_builtin_prefetch): Use IN_RANGE to avoid second usage of INTVAL.
2024-11-02i386: Do not allow pointer conversion for CMPccXADD intrin under -O0Haochen Jiang2-3/+18
The pointer conversion to wider type under macro would not consider whether the higher bit is cleaned or not. It will lead to unexpected cmp result. After this change, it will throw an incompatible pointer type error just like -O2 does currently. gcc/ChangeLog: * config/i386/cmpccxaddintrin.h (_cmpccxadd_epi32): Do not do type conversion for pointer. (_cmpccxadd_epi64): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/cmpccxadd-1b.c: New test.
2024-11-02testsuite: Fix up builtin-prefetch-1.c testsXi Ruoyao2-2/+2
How can you use "read-shared" as an identifier? It's not allowed by all C standard versions. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/builtin-prefetch-1.c (rws): Use "read_shared" instead of "read-shared" as the identifier for enum value. * gcc.dg/builtin-prefetch-1.c (rws): Likewise.
2024-11-02LoongArch: testsuite: Add -O for jump-table-annotate.cXi Ruoyao1-1/+1
Without optimization, GCC does not emit a jump table for the test case. I'm not sure if the test case has been wrong in the first place or something has changed in these months... gcc/testsuite/ChangeLog: * gcc.target/loongarch/jump-table-annotate.c (dg-additional-options): Add -O.
2024-11-01c++: Add testcase for now fixed issue [PR101887]Simon Martin1-1/+2
The testcase in PR101887 has been working since the fix for PR104846, via r12-7599-gac8310dd122172. This patch simply adds the case to the testsuite. PR c++/101887 gcc/testsuite/ChangeLog: * g++.dg/init/delete5.C: Add testcase from PR c++/101887.
2024-11-02Always set SECTION_RELRO for or .data.rel.ro{,.local} [PR116887]Xi Ruoyao2-6/+27
At least two ports (hppa and loongarch) need to set SECTION_RELRO for .data.rel.ro{,.local} in section_type_flags (PR52999 and PR116887), and I cannot see a reason not to just set it in the generic code. With this applied we can also remove the hppa-specific pa_section_type_flags in a future patch. gcc/ChangeLog: PR target/116887 * varasm.cc (default_section_type_flags): Always set SECTION_RELRO if name is .data.rel.ro{,.local}. gcc/testsuite/ChangeLog: PR target/116887 * gcc.dg/pr116887.c: New test.
2024-11-01analyzer: fix -Wunused-parameter warning [PR117373]David Malcolm1-1/+1
gcc/analyzer/ChangeLog: PR analyzer/117373 * infinite-loop.cc (infinite_loop_diagnostic::describe_final_event): Fix -Wunused-parameter warning Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-11-01Use LC_ALL=C when running selftests [PR117361]David Malcolm4-9/+12
gcc/ChangeLog: PR bootstrap/117361 * Makefile.in (GCC_FOR_SELFTESTS): New. gcc/c/ChangeLog: PR bootstrap/117361 * Make-lang.in (s-selftest-c): Use GCC_FOR_SELFTESTS. (selftest-c-gdb): Likewise. (selftest-c-valgrind): Likewise. gcc/cp/ChangeLog: PR bootstrap/117361 * Make-lang.in (s-selftest-c++): Use GCC_FOR_SELFTESTS. (selftest-c++-gdb): Likewise. (selftest-c++-valgrind): Likewise. gcc/rust/ChangeLog: PR bootstrap/117361 * Make-lang.in (s-selftest-rust): Use GCC_FOR_SELFTESTS. (selftest-rust-gdb): Likewise. (selftest-rust-valgrind): Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-11-01Fix -mod(unsigned, unsigned).Thomas Koenig3-3/+23
gcc/fortran/ChangeLog: * resolve.cc (resolve_operator): Also handle BT_UNSIGNED. gcc/testsuite/ChangeLog: * gfortran.dg/unsigned_38.f90: Add -pedantic and adjust error message. * gfortran.dg/unsigned_40.f90: New test.
2024-11-01openmp: Return error_mark_node from tsubst_attribute for errneous varidJakub Jelinek2-1/+60
We incorrectly accept some invalid declare variant cases as if declare variant wasn't there, in particular if a function template has some dependent arguments and variant name lookup fails, because that is during fn_type_unification with complain=tf_none, it just sets it to error_mark_node and doesn't complain further, because it doesn't know the substitution failed (we don't return error_mark_node from tsubst_attribute, just create TREE_LIST with error_mark_node TREE_PURPOSE). The following patch fixes it by returning error_mark_node in that case, then fn_type_unification caller can see it failed and can redo it with explain_p so that errors are reported. 2024-11-01 Jakub Jelinek <jakub@redhat.com> * pt.cc (tsubst_attribute): For "omp declare variant base" attribute if varid is error_mark_node, set val to error_mark_node rather than creating a TREE_LIST with error_mark_node TREE_PURPOSE. * g++.dg/gomp/declare-variant-10.C: New test.
2024-11-01Fortran: Fix problems with substring selectors in ASSOCIATE [PR115700]Paul Thomas3-27/+54
2024-11-01 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/115700 * resolve.cc (resolve_assoc_var): Extract a substring reference with missing as well as non-constant start or end. gcc/testsuite/ PR fortran/115700 * gfortran.dg/associate_69.f90: Activate commented out tests. * gfortran.dg/associate_70.f90: Test correct functioning of references in associate_69.f90 tests.
2024-11-01Support Intel AMX-MOVRSHu, Lin130-14/+334
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect AMX-MOVRS. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AMX_MOVRS_SET): New. (OPTION_MASK_ISA2_AMX_MOVRS_UNSET): Ditto. (ix86_handle_option): Handle -mamx-movrs. * common/config/i386/i386-cpuinfo.h (enum processor_features): Add FEATURE_AMX_MOVRS. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for amx-movrs. * config.gcc: Add amxmovrsintrin.h. * config/i386/cpuid.h (bit_AMX_MOVRS): New. * config/i386/i386-c.cc (ix86_target_macros_internal): Define __AMX_MOVRS__. * config/i386/i386-isa.def (AMX_MOVRS): Add DEF_PTA(AMX_MOVRS). * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): Handle amx-movrs. * config/i386/i386.opt: Add option -mamx-movrs. * config/i386/i386.opt.urls: Regenerated. * config/i386/immintrin.h: Include amxmovrsintrin.h * doc/extend.texi: Document amx-movrs. * doc/invoke.texi: Document -mamx-movrs. * doc/sourcebuild.texi: Document target amx-movrs. * config/i386/amxmovrsintrin.h: New file. gcc/testsuite/ChangeLog: * g++.dg/other/i386-2.C: Add -mamx-movrs. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/amx-check.h: Add new check for amx-movrs. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * gcc.target/i386/sse-12.c: Add -mamx-movrs. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add amx-movrs. * gcc.target/i386/sse-23.c: Ditto. * lib/target-supports.exp (check_effective_target_amx_movrs): New. * gcc.target/i386/amxmovrs-asmatt-1.c: New test. * gcc.target/i386/amxmovrs-asmintel-1.c: Ditto. * gcc.target/i386/amxmovrs-t2rpntlvw-2.c: Ditto. * gcc.target/i386/amxmovrs-tileloaddrs-2.c: Ditto.
2024-11-01Support Intel MOVRSHu, Lin139-29/+762
gcc/ChangeLog: * builtins.cc (expand_builtin_prefetch): Expand for prefetchrst2. * common/config/i386/cpuinfo.h (get_available_features): Detect movrs. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_MOVRS_SET): New. (OPTION_MASK_ISA2_MOVRS_UNSET): Ditto. (ix86_handle_option): Handle -mmovrs. * common/config/i386/i386-cpuinfo.h (enum processor_features): Add FEATURE_MOVRS. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for movrs. * config.gcc: Add movrsintrin.h * config/i386/cpuid.h (bit_MOVRS): New. * config/i386/i386-builtin-types.def: Add DEF_FUNCTION_TYPE (CHAR, PCCHAR), (SHORT, PCSHORT), (INT, PCINT), (INT64, PCINT64). * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-c.cc (ix86_target_macros_internal): Add __MOVRS__. * config/i386/i386-expand.cc (ix86_expand_special_args_builtin): Define __MOVRS__. * config/i386/i386-isa.def (MOVRS): Add DEF_PTA(MOVRS) * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): Handle movrs. * config/i386/i386.md (movrs<mode>): New. * config/i386/i386.opt: Add option -mmovrs. * config/i386/i386.opt.urls: Regenerated. * config/i386/immintrin.h: Include movrsintrin.h * config/i386/sse.md (unspecv): Add UNSPEC_VMOVRS. (VI1248_AVX10_2): New. (avx10_2_movrs_vmovrs<ssemodesuffix><mode><mask_name>): New define_insn. * config/i386/xmmintrin.h: Add prefetchrst2. * doc/extend.texi: Document movrs. * doc/invoke.texi: Document -mmovrs. * doc/rtl.texi: Document extension of prefetchrst2. * doc/sourcebuild.texi: Document target movrs. * config/i386/movrsintrin.h: New. gcc/testsuite/ChangeLog: * g++.dg/other/i386-2.C: Add -mmovrs. * g++.dg/other/i386-3.C: Ditto. * gcc.c-torture/execute/builtin-prefetch-1.c: Expand rws. * gcc.dg/builtin-prefetch-1.c: Ditto. * gcc.target/i386/avx-1.c: Ditto. * gcc.target/i386/avx-2.c: Ditto. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * gcc.target/i386/sse-12.c: Add -mmovrs. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add movrs. * gcc.target/i386/sse-23.c: Ditto * gcc.target/i386/avx10_2-512-movrs-1.c: New test. * gcc.target/i386/avx10_2-movrs-1.c: Ditto. * gcc.target/i386/movrs-1.c: Ditto. Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
2024-11-01Support Intel AMX-FP8Liwei Xu35-15/+973
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect amx-fp8. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AMX_FP8_SET): New macros. (OPTION_MASK_ISA2_AMX_FP8_UNSET): Ditto. (ix86_handle_option): Handle -mamx-fp8. * common/config/i386/i386-cpuinfo.h (enum processor_features): Add FEATURE_AMX_FP8. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for amx-fp8. * config.gcc: Add amxfp8intrin.h. * config/i386/cpuid.h (bit_AMX_FP8): New. * config/i386/i386-c.cc (ix86_target_macros_internal): Define __AMX_FP8__. * config/i386/i386-isa.def (AMX_FP8): Add DEF_PTA for AMX_FP8. * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): Add new ATTR. * config/i386/i386.opt: Add -mamx-fp8. * config/i386/i386.opt.urls: Regenerated. * config/i386/immintrin.h: Include amxfp8intrin.h. * doc/extend.texi: Document -mamx-fp8. * doc/invoke.texi: Document -mamx-fp8. * doc/sourcebuild.texi: Document -mamx-fp8. * config/i386/amxfp8intrin.h: New file. gcc/testsuite/ChangeLog: * g++.dg/other/i386-2.C: Add -mamx-fp8. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/amx-check.h: Check for amx-fp8. * gcc.target/i386/amx-helper.h: Ditto. * gcc.target/i386/fp8-helper.h: Ditto. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * gcc.target/i386/sse-12.c: Add -mamx-fp8. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * lib/target-supports.exp: New proc. * gcc.target/i386/amxfp8-asmatt-1.c: New test. * gcc.target/i386/amxfp8-asmintel-1.c: Ditto. * gcc.target/i386/amxfp8-dpbf8ps-2.c: Ditto. * gcc.target/i386/amxfp8-dpbhf8ps-2.c: Ditto. * gcc.target/i386/amxfp8-dphbf8ps-2.c: Ditto. * gcc.target/i386/amxfp8-dphf8ps-2.c: Ditto. * gcc.target/i386/fp-emulation.h: Emulates NaN behaviour. Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>
2024-11-01Support Intel AMX-TRANSPOSEHaochen Jiang38-16/+857
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect AMX-TRANSPOSE. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AMX_TRANSPOSE_SET, OPTION_MASK_ISA2_AMX_TRANSPOSE_UNSET): New. (ix86_handle_option): Handle -mamx-transpose. * common/config/i386/i386-cpuinfo.h (enum processor_features): Add FEATURE_AMX_TRANSPOSE. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for amx-transpose. * config.gcc: Add amxtransposeintrin.h. * config/i386/cpuid.h (bit_AMX_TRANSPOSE): New. * config/i386/i386-c.cc (ix86_target_macros_internal): Define __AMX_TRANSPOSE__. * config/i386/i386-isa.def (AMX_TRANSPOSE): Add DEF_PTA(AMX_TRANSPOSE). * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): Handle amx-transpose. * config/i386/i386.opt: Add option -mamx-transpose. * config/i386/i386.opt.urls: Regenerated. * config/i386/immintrin.h: Include amxtransposeintrin.h. * doc/extend.texi: Document amx-transpose. * doc/invoke.texi: Document -mamx-transpose. * doc/sourcebuild.texi: Document target amx-transpose. * config/i386/amxtransposeintrin.h: New file. gcc/testsuite/ChangeLog: * g++.dg/other/i386-2.C: Add -mamx-transpose. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/amx-check.h: Add new check for amx-transpose. (__tilepair): New. (zero_pair_tile_src): New. (check_pair_tile_register): New. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * gcc.target/i386/amx-helper.h: Add amx-transpose support. (init_pair_tile_src): New function. * gcc.target/i386/sse-12.c: Add -mamx-tranpose. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add amx-transpose. * gcc.target/i386/sse-23.c: Ditto. * lib/target-supports.exp (check_effective_target_amx_transposed): New. * gcc.target/i386/amxtranspose-asmatt-1.c: New test. * gcc.target/i386/amxtranspose-asmintel-1.c: Ditto. * gcc.target/i386/amxtranspose-2rpntlvw-2.c: Ditto. * gcc.target/i386/amxtranspose-conjtcmmimfp16ps-2.c: Ditto. * gcc.target/i386/amxtranspose-conjtfp16-2.c: Ditto. * gcc.target/i386/amxtranspose-tcmmimfp16ps-2.c: Ditto. * gcc.target/i386/amxtranspose-tcmmrlfp16ps-2.c: Ditto. * gcc.target/i386/amxtranspose-tdpbf16ps-2.c: Ditto. * gcc.target/i386/amxtranspose-tdpfp16ps-2.c: Ditto. * gcc.target/i386/amxtranspose-tmmultf32ps-2.c: Ditto. * gcc.target/i386/amxtranspose-transposed-2.c: Ditto.
2024-11-01Support Intel AMX-TF32Haochen Jiang30-15/+217
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect AMX-TF32. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AMX_TF32_SET, OPTION_MASK_ISA2_AMX_TF32_UNSET): New. (ix86_handle_option): Handle -mamx-tf32. * common/config/i386/i386-cpuinfo.h (enum processor_features): Add FEATURE_AMX_TF32. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for amx-tf32. * config.gcc: Add amxtf32intrin.h * config/i386/cpuid.h (bit_AMX_TF32): New. * config/i386/i386-c.cc (ix86_target_macros_internal): Handle amx-tf32. * config/i386/i386-isa.def (AMX_TF32): Add DEF_PTA(AMX_TF32). * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): Handle amx-tf32. * config/i386/i386.opt: Add option -mamx-tf32. * config/i386/i386.opt.urls: Regenerated. * config/i386/immintrin.h: Include amxtf32intrin.h. * doc/extend.texi: Document amx-tf32. * doc/invoke.texi: Document -mamx-tf32. * doc/sourcebuild.texi: Document target amx-tf32. * config/i386/amxtf32intrin.h: New file. gcc/testsuite/ChangeLog: * g++.dg/other/i386-2.C: Add -mamx-tf32. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/amx-check.h: Add cpu check for AMX-TF32. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * gcc.target/i386/sse-12.c: Add -mamx-tf32. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add amx-tf32. * gcc.target/i386/sse-23.c: Ditto. * lib/target-supports.exp (check_effective_target_amx_tf32): New. * gcc.target/i386/amx-helper.h: New file for tf32 support. * gcc.target/i386/amxtf32-asmatt-1.c: New test. * gcc.target/i386/amxtf32-asmintel-1.c: Ditto. * gcc.target/i386/amxtf32-mmultf32ps-2.c: Ditto.
2024-11-01Support Intel AMX-AVX512Haochen Jiang33-19/+733
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect AMX-AVX512. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AMX_AVX512_SET, OPTION_MASK_ISA2_AMX_AVX512_UNSET): New. (ix86_handle_option): Handle -mamx-avx512. * common/config/i386/i386-cpuinfo.h (enum processor_features): Add FEATURE_AMX_AVX512. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for amx-avx512. * config.gcc: Add amxavx512intrin.h * config/i386/cpuid.h (bit_AMX_AVX512): New. * config/i386/i386-c.cc (ix86_target_macros_internal): Handle amx-avx512. * config/i386/i386-isa.def (AMX_AVX512): Add DEF_PTA(AMX_AVX512). * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): Handle amx-avx512. * config/i386/i386.opt: Add option -mamx-avx512. * config/i386/i386.opt.urls: Regenerated. * config/i386/immintrin.h: Include amxavx512intrin.h * doc/extend.texi: Document amx-avx512. * doc/invoke.texi: Document -mamx-avx512. * doc/sourcebuild.texi: Document target amx-avx512. * config/i386/amxavx512intrin.h: New file. gcc/testsuite/ChangeLog: * g++.dg/other/i386-2.C: Add -mamx-avx512. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/amx-check.h: Add cpu check for AMX-AVX512. * gcc.target/i386/amx-helper.h: Support amx-avx512. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * gcc.target/i386/sse-12.c: Add -mamx-avx512. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add amx-avx512. * gcc.target/i386/sse-23.c: Ditto. * lib/target-supports.exp (check_effective_target_amx_avx512): New. * gcc.target/i386/amxavx512-asmatt-1.c: New test. * gcc.target/i386/amxavx512-asmintel-1.c: Ditto. * gcc.target/i386/amxavx512-cvtrowd2ps-2.c: Ditto. * gcc.target/i386/amxavx512-cvtrowps2pbf16-2.c: Ditto. * gcc.target/i386/amxavx512-cvtrowps2ph-2.c: Ditto. * gcc.target/i386/amxavx512-movrow-2.c: Ditto. Co-authored-by: Yu, Bing <bing1.yu@intel.com>