aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-10-13RISC-V: Add test for FP llceil auto vectorizationPan Li3-0/+114
The below FP API are supported already by sharing the same standard name, as well as the machine mode. long long llceil (double); This patch would like to add the test cases for ensuring the correctness. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-llceil-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-llceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-llceil-0.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-13C99 testsuite readiness: Some verified test case adjustmentsFlorian Weimer7-20/+19
The updated test cases still reproduce the bugs with old compilers. gcc/testsuite/ * gcc.c-torture/compile/pc44485.c (func_21): Add missing cast. * gcc.c-torture/compile/pr106101.c: Use builtins to avoid calls to undeclared functions. Change type of yyvsp to char ** and introduce yyvsp1 to avoid type errors. * gcc.c-torture/execute/pr111331-1.c: Add missing int. * gcc.dg/pr100512.c: Unreduce test case and suppress only -Wpointer-to-int-cast. * gcc.dg/pr103003.c: Likewise. * gcc.dg/pr103451.c: Add cast to long and suppress -Wdiv-by-zero only. * gcc.dg/pr68435.c: Avoid implicit int and missing static function implementation warning.
2023-10-13C99 test suite readiness: Some unverified test case adjustmentsFlorian Weimer8-9/+20
These changes are assumed not to interfere with the test objective, but it was not possible to reproduce the historic test case failures (with or without the modification here). gcc/testsuite/ * gcc.c-torture/compile/20000105-1.c: Add missing int return type. Call __builtin_exit instead of exit. * gcc.c-torture/compile/20000105-2.c: Add missing void types. * gcc.c-torture/compile/20000211-1.c (Lstream_fputc, Lstream_write) (Lstream_flush_out, parse_doprnt_spec): Add missing function declaration. * gcc.c-torture/compile/20000224-1.c (call_critical_lisp_code): Declare. * gcc.c-torture/compile/20000314-2.c: Add missing void types. * gcc.c-torture/compile/980816-1.c (XtVaCreateManagedWidget) (XtAddCallback): Likewise. * gcc.c-torture/compile/pr49474.c: Use struct gfc_formal_arglist * instead of (implied) int type. * gcc.c-torture/execute/20001111-1.c (foo): Add cast to char *. (main): Call __builtin_abort and __builtin_exit.
2023-10-13C99 test suite readiness: Mark some C89 testsFlorian Weimer5-0/+5
Add -std=gnu89 to some tests which evidently target C89-only language features. gcc/testsuite/ * gcc.c-torture/compile/920501-11.c: Compile with -std=gnu89. * gcc.c-torture/compile/920501-23.c: Likewise. * gcc.c-torture/compile/920501-8.c: Likewise. * gcc.c-torture/compile/920701-1.c: Likewise. * gcc.c-torture/compile/930529-1.c: Likewise.
2023-10-13or1k: Fix -Wincompatible-pointer-types warning during libgcc buildFlorian Weimer1-1/+1
libgcc/ * config/or1k/linux-unwind.h (or1k_fallback_frame_state): Add missing cast.
2023-10-13arc: Fix -Wincompatible-pointer-types warning during libgcc buildFlorian Weimer1-1/+1
libgcc/ * config/arc/linux-unwind.h (arc_fallback_frame_state): Add missing cast.
2023-10-13riscv: Fix -Wincompatible-pointer-types warning during libgcc buildFlorian Weimer1-1/+1
libgcc/ * config/riscv/linux-unwind.h (riscv_fallback_frame_state): Add missing cast.
2023-10-13csky: Fix -Wincompatible-pointer-types warning during libgcc buildFlorian Weimer1-1/+1
libgcc/ * config/csky/linux-unwind.h (csky_fallback_frame_state): Add missing cast.
2023-10-13m68k: Avoid implicit function declaration in libgccFlorian Weimer1-0/+1
libgcc/ * config/m68k/fpgnulib.c (__cmpdf2): Declare.
2023-10-13libstdc++: Fix tr1/8_c_compatibility/cstdio/functions.cc regression with ↵Jakub Jelinek2-2/+2
recent glibc The following testcase started FAILing recently after the https://sourceware.org/git/?p=glibc.git;a=commit;h=64b1a44183a3094672ed304532bedb9acc707554 glibc change which marked vfscanf with nonnull (1) attribute. While vfwscanf hasn't been marked similarly (strangely), the patch changes that too. By using va_arg one hides the value of it from the compiler (volatile keyword would do too, or making the FILE* stream a function argument, but then it might need to be guarded by #if or something). 2023-10-13 Jakub Jelinek <jakub@redhat.com> * testsuite/tr1/8_c_compatibility/cstdio/functions.cc (test01): Initialize stream to va_arg(ap, FILE*) rather than 0. * testsuite/tr1/8_c_compatibility/cwchar/functions.cc (test01): Likewise.
2023-10-13tree-optimization/111779 - Handle some BIT_FIELD_REFs in SRARichard Biener4-6/+83
The following handles byte-aligned, power-of-two and byte-multiple sized BIT_FIELD_REF reads in SRA. In particular this should cover BIT_FIELD_REFs created by optimize_bit_field_compare. For gcc.dg/tree-ssa/ssa-dse-26.c we now SRA the BIT_FIELD_REF appearing there leading to more DSE, fully eliding the aggregates. This results in the same false positive -Wuninitialized as the older attempt to remove the folding from optimize_bit_field_compare, fixed by initializing part of the aggregate unconditionally. PR tree-optimization/111779 gcc/ * tree-sra.cc (sra_handled_bf_read_p): New function. (build_access_from_expr_1): Handle some BIT_FIELD_REFs. (sra_modify_expr): Likewise. (make_fancy_name_1): Skip over BIT_FIELD_REF. gcc/fortran/ * trans-expr.cc (gfc_trans_assignment_1): Initialize lhs_caf_attr and rhs_caf_attr codimension flag to avoid false positive -Wuninitialized. gcc/testsuite/ * gcc.dg/tree-ssa/ssa-dse-26.c: Adjust for more DSE. * gcc.dg/vect/vect-pr111779.c: New testcase.
2023-10-13tree-optimization/111773 - avoid CD-DCE of noreturn special callsRichard Biener2-0/+39
The support to elide calls to allocation functions in DCE runs into the issue that when implementations are discovered noreturn we end up DCEing the calls anyway, leaving blocks without termination and without outgoing edges which is both invalid IL and wrong-code when as in the example the noreturn call would throw. The following avoids taking advantage of both noreturn and the ability to elide allocation at the same time. For the testcase it's valid to throw or return 10 by eliding the allocation. But we have to do either where currently we'd run off the function. PR tree-optimization/111773 * tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Do not elide noreturn calls that are reflected to the IL. * g++.dg/torture/pr111773.C: New testcase.
2023-10-13RISC-V: Add test for FP llround auto vectorizationPan Li3-0/+114
The below FP API are supported already by sharing the same standard name, as well as the machine mode. long long llround (double); This patch would like to add the test cases for ensuring the correctness. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-llround-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-llround-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-llround-0.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-13RISC-V Regression: Fix FAIL of bb-slp-pr69907.c for RVVJuzhe-Zhong1-2/+2
Like ARM SVE and GCN, add RVV. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-pr69907.c: Add RVV.
2023-10-13RISC-V: Add test for FP iroundf auto vectorizationPan Li3-0/+112
The below FP API are supported already by sharing the same standard name, as well as the machine mode. int iroundf (float); This patch would like to add the test cases for ensuring the correctness. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-iround-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-iround-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-iround-0.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-12RISC-V: Fix the riscv_legitimize_poly_move issue on targets where the ↵Kito Cheng4-3/+43
minimal VLEN exceeds 512. riscv_legitimize_poly_move was expected to ensure the poly value is at most 32 times smaller than the minimal VLEN (32 being derived from '4096 / 128'). This assumption held when our mode modeling was not so precisely defined. However, now that we have modeled the mode size according to the correct minimal VLEN info, the size difference between different RVV modes can be up to 64 times. For instance, comparing RVVMF64BI and RVVMF1BI, the sizes are [1, 1] versus [64, 64] respectively. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimize_poly_move): Bump max_power to 64. * config/riscv/riscv.h (MAX_POLY_VARIANT): New. gcc/testsuite/ChangeLog: * g++.target/riscv/rvv/autovec/bug-01.C: New. * g++.target/riscv/rvv/rvv.exp: Add autovec folder.
2023-10-13RISC-V: Leverage stdint-gcc.h for RVV test casesPan Li3-2/+2
Leverage stdint-gcc.h for the int64_t types instead of typedef. Or we may have conflict with stdint-gcc.h in somewhere else. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: Include stdint-gcc.h for int types. * gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/test-math.h: Remove int64_t typedef. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-13RISC-V: Support FP lfloor/lfloorf auto vectorizationPan Li9-0/+259
This patch would like to support the FP lfloor/lfloorf auto vectorization. * long lfloor (double) for rv64 * long lfloorf (float) for rv32 Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lfloormn2 only act on DF => DI for rv64, and SF => SI for rv32. Given we have code like: void test_lfloor (long *out, double *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_lfloor (in[i]); } Before this patch: .L3: ... fld fa5,0(a1) fcvt.l.d a5,fa5,rdn sd a5,-8(a0) ... bne a1,a4,.L3 After this patch: frrm a6 ... fsrmi 2 // RDN .L3: ... vsetvli a3,zero,e64,m1,ta,ma vfcvt.x.f.v v1,v1 vsetvli zero,a2,e64,m1,ta,ma vse32.v v1,0(a0) ... bne a2,zero,.L3 ... fsrm a6 The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lfloor<mode><v_i_l_ll_convert>2): New pattern for lfloor/lfloorf. * config/riscv/riscv-protos.h (enum insn_type): New enum value. (expand_vec_lfloor): New func decl for expanding lfloor. * config/riscv/riscv-v.cc (expand_vec_lfloor): New func impl for expanding lfloor. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-lfloor-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lfloor-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lfloor-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lfloor-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-13testsuite: Replace many dg-require-thread-fence with ↵Hans-Peter Nilsson9-9/+9
dg-require-atomic-cmpxchg-word These tests actually use a form of atomic compare and exchange operation, not just atomic loading and storing. Some targets (not supported by e.g. libatomic) have atomic loading and storing, but not compare and exchange, yielding linker errors for missing library functions. This change is just for existing uses of dg-require-thread-fence. It does not fix any other tests that should also be gated on dg-require-atomic-cmpxchg-word. * testsuite/29_atomics/atomic/compare_exchange_padding.cc, testsuite/29_atomics/atomic_flag/clear/1.cc, testsuite/29_atomics/atomic_flag/cons/value_init.cc, testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc, testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc, testsuite/29_atomics/atomic_ref/compare_exchange_padding.cc, testsuite/29_atomics/atomic_ref/generic.cc, testsuite/29_atomics/atomic_ref/integral.cc, testsuite/29_atomics/atomic_ref/pointer.cc: Replace dg-require-thread-fence with dg-require-atomic-cmpxchg-word.
2023-10-13testsuite: Add dg-require-atomic-cmpxchg-wordHans-Peter Nilsson2-0/+46
Some targets (armv6-m) support inline atomic load and store, i.e. dg-require-thread-fence matches, but not atomic operations like compare and exchange. This directive can be used to replace uses of dg-require-thread-fence where an atomic operation is actually used. * testsuite/lib/dg-options.exp (dg-require-atomic-cmpxchg-word): New proc. * testsuite/lib/libstdc++.exp (check_v3_target_atomic_cmpxchg_word): Ditto.
2023-10-13Daily bump.GCC Administrator8-1/+597
2023-10-13RISC-V: Support FP lceil/lceilf auto vectorizationPan Li9-0/+259
This patch would like to support the FP lceil/lceilf auto vectorization. * long lceil (double) for rv64 * long lceilf (float) for rv32 Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lceilmn2 only act on DF => DI for rv64, and SF => SI for rv32. Given we have code like: void test_lceil (long *out, double *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_lceil (in[i]); } Before this patch: .L3: ... fld fa5,0(a1) fcvt.l.d a5,fa5,rup sd a5,-8(a0) ... bne a1,a4,.L3 After this patch: frrm a6 ... fsrmi 3 // RUP .L3: ... vsetvli a3,zero,e64,m1,ta,ma vfcvt.x.f.v v1,v1 vsetvli zero,a2,e64,m1,ta,ma vse32.v v1,0(a0) ... bne a2,zero,.L3 ... fsrm a6 The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lceil<mode><v_i_l_ll_convert>2): New pattern] for lceil/lceilf. * config/riscv/riscv-protos.h (enum insn_type): New enum value. (expand_vec_lceil): New func decl for expanding lceil. * config/riscv/riscv-v.cc (expand_vec_lceil): New func impl for expanding lceil. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-lceil-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lceil-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lceil-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lceil-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lceil-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-12PR111778, PowerPC: Do not depend on an undefined shiftMichael Meissner1-3/+26
I was building a cross compiler to PowerPC on my x86_86 workstation with the latest version of GCC on October 11th. I could not build the compiler on the x86_64 system as it died in building libgcc. I looked into it, and I discovered the compiler was recursing until it ran out of stack space. If I build a native compiler with the same sources on a PowerPC system, it builds fine. I traced this down to a change made around October 10th: | commit 8f1a70a4fbcc6441c70da60d4ef6db1e5635e18a (HEAD) | Author: Jiufu Guo <guojiufu@linux.ibm.com> | Date: Tue Jan 10 20:52:33 2023 +0800 | | rs6000: build constant via li/lis;rldicl/rldicr | | If a constant is possible left/right cleaned on a rotated value from | a negative value of "li/lis". Then, using "li/lis ; rldicl/rldicr" | to build the constant. The code was doing a -1 << 64 which is undefined behavior because different machines produce different results. On the x86_64 system, (-1 << 64) produces -1 while on a PowerPC 64-bit system, (-1 << 64) produces 0. The x86_64 then recurses until the stack runs out of space. If I apply this patch, the compiler builds fine on both x86_64 as a PowerPC crosss compiler and on a native PowerPC system. 2023-10-12 Michael Meissner <meissner@linux.ibm.com> gcc/ PR target/111778 * config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): Protect code from shifts that are undefined. (can_be_built_by_li_lis_and_rldicr): Likewise. (can_be_built_by_li_and_rldic): Protect code from shifts that undefined. Also replace uses of 1ULL with HOST_WIDE_INT_1U.
2023-10-12libgomp.texi: Clarify OMP_TARGET_OFFLOAD=mandatoryTobias Burnus1-5/+16
In OpenMP 5.0/5.1, the semantic of OMP_TARGET_OFFLOAD=mandatory was insufficiently specified; 5.2 clarified this with extensions/clarifications (omp_initial_device, omp_invalid_device, "conforming device number"). GCC's implementation matches OpenMP 5.2. libgomp/ChangeLog: * libgomp.texi (OMP_DEFAULT_DEVICE): Update spec ref; add @ref to OMP_TARGET_OFFLOAD. (OMP_TARGET_OFFLOAD): Update spec ref; add @ref to OMP_DEFAULT_DEVICE; clarify MANDATORY behavior.
2023-10-12reg-notes.def: Fix up description of REG_NOALIASAlex Coplan1-2/+3
The description of the REG_NOALIAS note in reg-notes.def isn't quite right. It describes it as being attached to call insns, but it is instead attached to a move insn receiving the return value from a call. This can be seen by looking at the code in calls.cc:expand_call which attaches the note: emit_move_insn (temp, valreg); /* The return value from a malloc-like function cannot alias anything else. */ last = get_last_insn (); add_reg_note (last, REG_NOALIAS, temp); gcc/ChangeLog: * reg-notes.def (NOALIAS): Correct comment.
2023-10-12RISC-V: Make xtheadcondmov-indirect tests robust against instruction reorderingChristoph Müllner1-60/+29
Fixes: c1bc7513b1d7 ("RISC-V: const: hide mvconst splitter from IRA") A recent change broke the xtheadcondmov-indirect tests, because the order of emitted instructions changed. Since the test is too strict when testing for a fixed instruction order, let's change the tests to simply count instruction, like it is done for similar tests. Reported-by: Patrick O'Neill <patrick@rivosinc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadcondmov-indirect.c: Make robust against instruction reordering.
2023-10-12wide-int: Fix build with gcc < 12 or clang++ [PR111787]Jakub Jelinek3-4/+21
While my wide_int patch bootstrapped/regtested fine when I used GCC 12 as system gcc, apparently it doesn't with GCC 11 and older or clang++. For GCC before PR96555 C++ DR1315 implementation the compiler complains about template argument involving template parameters, for clang++ the same + complains about missing needs_write_val_arg static data member in some wi::int_traits specializations. 2023-10-12 Jakub Jelinek <jakub@redhat.com> PR bootstrap/111787 * tree.h (wi::int_traits <unextended_tree>::needs_write_val_arg): New static data member. (int_traits <extended_tree <N>>::needs_write_val_arg): Likewise. (wi::ints_for): Provide separate partial specializations for generic_wide_int <extended_tree <N>> and INL_CONST_PRECISION or that and CONST_PRECISION, rather than using int_traits <extended_tree <N> >::precision_type as the second template argument. * rtl.h (wi::int_traits <rtx_mode_t>::needs_write_val_arg): New static data member. * double-int.h (wi::int_traits <double_int>::needs_write_val_arg): Likewise.
2023-10-12RISCV: Bugfix for incorrect documentation heading nestingMary Bennett1-1/+1
PR middle-end/111777 gcc/ChangeLog: * doc/extend.texi: Change subsubsection to subsection for CORE-V built-ins.
2023-10-12AArch64: Fix Armv9-a warnings that get emitted whenever a ACLE header is used.Tamar Christina2-0/+6
At the moment, trying to use -march=armv9-a with any ACLE header such as arm_neon.h results in rows and rows of warnings saying: <built-in>: warning: "__ARM_ARCH" redefined <built-in>: note: this is the location of the previous definition This is obviously not useful and happens because the header was defined at __ARM_ARCH == 8 and the commandline changes it. The Arm port solves this by undef the macro during argument processing and we do the same on AArch64 for the majority of macros. However we define this macro using a different helper which requires the manual undef. Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Add undef. gcc/testsuite/ChangeLog: * gcc.target/aarch64/armv9_warning.c: New test.
2023-10-12wide-int: Add simple CHECKING_P stack-protector canary like checkingJakub Jelinek1-0/+5
This patch adds hopefully not so expensive --enable-checking=yes verification that the widest_int upper length bound estimates are really upper bounds and nothing attempts to write more elements. It is done only if the estimated upper length bound is smaller than WIDE_INT_MAX_INL_ELTS, but that should be the most common case unless large _BitInt is involved. 2023-10-12 Jakub Jelinek <jakub@redhat.com> * wide-int.h (widest_int_storage <N>::write_val): If l is small and there is space in u.val array, store a canary value at the end when checking. (widest_int_storage <N>::set_len): Check the canary hasn't been overwritten.
2023-10-12wide-int: Allow up to 16320 bits wide_int and change widest_int precision to ↵Jakub Jelinek35-297/+1035
32640 bits [PR102989] As mentioned in the _BitInt support thread, _BitInt(N) is currently limited by the wide_int/widest_int maximum precision limitation, which is depending on target 191, 319, 575 or 703 bits (one less than WIDE_INT_MAX_PRECISION). That is fairly low limit for _BitInt, especially on the targets with the 191 bit limitation. The following patch bumps that limit to 16319 bits on all arches (which support _BitInt at all), which is the limit imposed by INTEGER_CST representation (unsigned char members holding number of HOST_WIDE_INT limbs). In order to achieve that, wide_int is changed from a trivially copyable type which contained just an inline array of WIDE_INT_MAX_ELTS (3, 5, 9 or 11 limbs depending on target) limbs into a non-trivially copy constructible, copy assignable and destructible type which for the usual small cases (up to WIDE_INT_MAX_INL_ELTS which is the former WIDE_INT_MAX_ELTS) still uses an inline array of limbs, but for larger precisions uses heap allocated limb array. This makes wide_int unusable in GC structures, so for dwarf2out which was the only place which needed it there is a new rwide_int type (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs inline and is trivially copyable (dwarf2out should never deal with large _BitInt constants, those should have been lowered earlier). Similarly, widest_int has been changed from a trivially copyable type which contained also an inline array of WIDE_INT_MAX_ELTS limbs (but unlike wide_int didn't contain precision and assumed that to be WIDE_INT_MAX_PRECISION) into a non-trivially copy constructible, copy assignable and destructible type which has always WIDEST_INT_MAX_PRECISION precision (32640 bits currently, twice as much as INTEGER_CST limitation allows) and unlike wide_int decides depending on get_len () value whether it uses an inline array (again, up to WIDE_INT_MAX_INL_ELTS) or heap allocated one. In wide-int.h this means we need to estimate an upper bound on how many limbs will wide-int.cc (usually, sometimes wide-int.h) need to write, heap allocate if needed based on that estimation and upon set_len which is done at the end if we guessed over WIDE_INT_MAX_INL_ELTS and allocated dynamically, while we actually need less than that copy/deallocate. The unexact guesses are needed because the exact computation of the length in wide-int.cc is sometimes quite complex and especially canonicalize at the end can decrease it. widest_int is again because of this not usable in GC structures, so cfgloop.h has been changed to use fixed_wide_int_storage <WIDE_INT_MAX_INL_PRECISION> and punt if we'd have larger _BitInt based iterators, programs having more than 128-bit iterators will be hopefully rare and I think it is fine to treat loops with more than 2^127 iterations as effectively possibly infinite, omp-general.cc is changed to use fixed_wide_int_storage <1024>, as it better should support scores with the same precision on all arches. Code which used WIDE_INT_PRINT_BUFFER_SIZE sized buffers for printing wide_int/widest_int into buffer had to be changed to use XALLOCAVEC for larger lengths. On x86_64, the patch in --enable-checking=yes,rtl,extra configured bootstrapped cc1plus enlarges the .text section by 1.01% - from 0x25725a5 to 0x25e5555 and similarly at least when compiling insn-recog.cc with the usual bootstrap option slows compilation down by 1.01%, user 4m22.046s and 4m22.384s on vanilla trunk vs. 4m25.947s and 4m25.581s on patched trunk. I'm afraid some code size growth and compile time slowdown is unavoidable in this case, we use wide_int and widest_int everywhere, and while the rare cases are marked with UNLIKELY macros, it still means extra checks for it. The patch also regresses +FAIL: gm2/pim/fail/largeconst.mod, -O +FAIL: gm2/pim/fail/largeconst.mod, -O -g +FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer +FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer -finline-functions +FAIL: gm2/pim/fail/largeconst.mod, -Os +FAIL: gm2/pim/fail/largeconst.mod, -g +FAIL: gm2/pim/fail/largeconst2.mod, -O +FAIL: gm2/pim/fail/largeconst2.mod, -O -g +FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer +FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer -finline-functions +FAIL: gm2/pim/fail/largeconst2.mod, -Os +FAIL: gm2/pim/fail/largeconst2.mod, -g tests, which previously were rejected with error: constant literal ‘12345678912345678912345679123456789123456789123456789123456789123456791234567891234567891234567891234567891234567912345678912345678912345678912345678912345679123456789123456789’ exceeds internal ZTYPE range kind of errors, but now are accepted. Seems the FE tries to parse constants into widest_int in that case and only diagnoses if widest_int overflows, that seems wrong, it should at least punt if stuff doesn't fit into WIDE_INT_MAX_PRECISION, but perhaps far less than that, if it wants support for middle-end for precisions above 128-bit, it better should be using BITINT_TYPE. Will file a PR and defer to Modula2 maintainer. 2023-10-12 Jakub Jelinek <jakub@redhat.com> PR c/102989 * wide-int.h: Adjust file comment. (WIDE_INT_MAX_INL_ELTS): Define to former value of WIDE_INT_MAX_ELTS. (WIDE_INT_MAX_INL_PRECISION): Define. (WIDE_INT_MAX_ELTS): Change to 255. Assert that WIDE_INT_MAX_INL_ELTS is smaller than WIDE_INT_MAX_ELTS. (RWIDE_INT_MAX_ELTS, RWIDE_INT_MAX_PRECISION, WIDEST_INT_MAX_ELTS, WIDEST_INT_MAX_PRECISION): Define. (WI_BINARY_RESULT_VAR, WI_UNARY_RESULT_VAR): Change write_val callers to pass 0 as a new argument. (class widest_int_storage): Likewise. (widest_int, widest2_int): Change typedefs to use widest_int_storage rather than fixed_wide_int_storage. (enum wi::precision_type): Add INL_CONST_PRECISION enumerator. (struct binary_traits): Add partial specializations for INL_CONST_PRECISION. (generic_wide_int): Add needs_write_val_arg static data member. (int_traits): Likewise. (wide_int_storage): Replace val non-static data member with a union u of it and HOST_WIDE_INT *valp. Declare copy constructor, copy assignment operator and destructor. Add unsigned int argument to write_val. (wide_int_storage::wide_int_storage): Initialize precision to 0 in the default ctor. Remove unnecessary {}s around STATIC_ASSERTs. Assert in non-default ctor T's precision_type is not INL_CONST_PRECISION and allocate u.valp for large precision. Add copy constructor. (wide_int_storage::~wide_int_storage): New. (wide_int_storage::operator=): Add copy assignment operator. In assignment operator remove unnecessary {}s around STATIC_ASSERTs, assert ctor T's precision_type is not INL_CONST_PRECISION and if precision changes, deallocate and/or allocate u.valp. (wide_int_storage::get_val): Return u.valp rather than u.val for large precision. (wide_int_storage::write_val): Likewise. Add an unused unsigned int argument. (wide_int_storage::set_len): Use write_val instead of writing val directly. (wide_int_storage::from, wide_int_storage::from_array): Adjust write_val callers. (wide_int_storage::create): Allocate u.valp for large precisions. (wi::int_traits <wide_int_storage>::get_binary_precision): New. (fixed_wide_int_storage::fixed_wide_int_storage): Make default ctor defaulted. (fixed_wide_int_storage::write_val): Add unused unsigned int argument. (fixed_wide_int_storage::from, fixed_wide_int_storage::from_array): Adjust write_val callers. (wi::int_traits <fixed_wide_int_storage>::get_binary_precision): New. (WIDEST_INT): Define. (widest_int_storage): New template class. (wi::int_traits <widest_int_storage>): New. (trailing_wide_int_storage::write_val): Add unused unsigned int argument. (wi::get_binary_precision): Use wi::int_traits <WI_BINARY_RESULT (T1, T2)>::get_binary_precision rather than get_precision on get_binary_result. (wi::copy): Adjust write_val callers. Don't call set_len if needs_write_val_arg. (wi::bit_not): If result.needs_write_val_arg, call write_val again with upper bound estimate of len. (wi::sext, wi::zext, wi::set_bit): Likewise. (wi::bit_and, wi::bit_and_not, wi::bit_or, wi::bit_or_not, wi::bit_xor, wi::add, wi::sub, wi::mul, wi::mul_high, wi::div_trunc, wi::div_floor, wi::div_ceil, wi::div_round, wi::divmod_trunc, wi::mod_trunc, wi::mod_floor, wi::mod_ceil, wi::mod_round, wi::lshift, wi::lrshift, wi::arshift): Likewise. (wi::bswap, wi::bitreverse): Assert result.needs_write_val_arg is false. (gt_ggc_mx, gt_pch_nx): Remove generic template for all generic_wide_int, instead add functions and templates for each storage of generic_wide_int. Make functions for generic_wide_int <wide_int_storage> and templates for generic_wide_int <widest_int_storage <N>> deleted. (wi::mask, wi::shifted_mask): Adjust write_val calls. * wide-int.cc (zeros): Decrease array size to 1. (BLOCKS_NEEDED): Use CEIL. (canonize): Use HOST_WIDE_INT_M1. (wi::from_buffer): Pass 0 to write_val. (wi::to_mpz): Use CEIL. (wi::from_mpz): Likewise. Pass 0 to write_val. Use WIDE_INT_MAX_INL_ELTS instead of WIDE_INT_MAX_ELTS. (wi::mul_internal): Use WIDE_INT_MAX_INL_PRECISION instead of MAX_BITSIZE_MODE_ANY_INT in automatic array sizes, for prec above WIDE_INT_MAX_INL_PRECISION estimate precision from lengths of operands. Use XALLOCAVEC allocated buffers for prec above WIDE_INT_MAX_INL_PRECISION. (wi::divmod_internal): Likewise. (wi::lshift_large): For len > WIDE_INT_MAX_INL_ELTS estimate it from xlen and skip. (rshift_large_common): Remove xprecision argument, add len argument with len computed in caller. Don't return anything. (wi::lrshift_large, wi::arshift_large): Compute len here and pass it to rshift_large_common, for lengths above WIDE_INT_MAX_INL_ELTS using estimations from xlen if possible. (assert_deceq, assert_hexeq): For lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. (test_printing): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION. * wide-int-print.h (WIDE_INT_PRINT_BUFFER_SIZE): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION. * wide-int-print.cc (print_decs, print_decu, print_hex): For lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. * tree.h (wi::int_traits<extended_tree <N>>): Change precision_type to INL_CONST_PRECISION for N == ADDR_MAX_PRECISION. (widest_extended_tree): Use WIDEST_INT_MAX_PRECISION instead of WIDE_INT_MAX_PRECISION. (wi::ints_for): Use int_traits <extended_tree <N> >::precision_type instead of hard coded CONST_PRECISION. (widest2_int_cst): Use WIDEST_INT_MAX_PRECISION instead of WIDE_INT_MAX_PRECISION. (wi::extended_tree <N>::get_len): Use WIDEST_INT_MAX_PRECISION rather than WIDE_INT_MAX_PRECISION. (wi::ints_for::zero): Use wi::int_traits <wi::extended_tree <N> >::precision_type instead of wi::CONST_PRECISION. * tree.cc (build_replicated_int_cst): Formatting fix. Use WIDE_INT_MAX_INL_ELTS rather than WIDE_INT_MAX_ELTS. * print-tree.cc (print_node): Don't print TREE_UNAVAILABLE on INTEGER_CSTs, TREE_VECs or SSA_NAMEs. * double-int.h (wi::int_traits <double_int>::precision_type): Change to INL_CONST_PRECISION from CONST_PRECISION. * poly-int.h (struct poly_coeff_traits): Add partial specialization for wi::INL_CONST_PRECISION. * cfgloop.h (bound_wide_int): New typedef. (struct nb_iter_bound): Change bound type from widest_int to bound_wide_int. (struct loop): Change nb_iterations_upper_bound, nb_iterations_likely_upper_bound and nb_iterations_estimate type from widest_int to bound_wide_int. * cfgloop.cc (record_niter_bound): Return early if wi::min_precision of i_bound is too large for bound_wide_int. Adjustments for the widest_int to bound_wide_int type change in non-static data members. (get_estimated_loop_iterations, get_max_loop_iterations, get_likely_max_loop_iterations): Adjustments for the widest_int to bound_wide_int type change in non-static data members. * tree-vect-loop.cc (vect_transform_loop): Likewise. * tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use XALLOCAVEC allocated buffer for i_bound len above WIDE_INT_MAX_INL_ELTS. (record_estimate): Return early if wi::min_precision of i_bound is too large for bound_wide_int. Adjustments for the widest_int to bound_wide_int type change in non-static data members. (wide_int_cmp): Use bound_wide_int instead of widest_int. (bound_index): Use bound_wide_int instead of widest_int. (discover_iteration_bound_by_body_walk): Likewise. Use widest_int::from to convert it to widest_int when passed to record_niter_bound. (maybe_lower_iteration_bound): Use widest_int::from to convert it to widest_int when passed to record_niter_bound. (estimate_numbers_of_iteration): Don't record upper bound if loop->nb_iterations has too large precision for bound_wide_int. (n_of_executions_at_most): Use widest_int::from. * tree-ssa-loop-ivcanon.cc (remove_redundant_iv_tests): Adjust for the widest_int to bound_wide_int changes. * match.pd (fold_sign_changed_comparison simplification): Use wide_int::from on wi::to_wide instead of wi::to_widest. * value-range.h (irange::maybe_resize): Avoid using memcpy on non-trivially copyable elements. * value-range.cc (irange_bitmask::dump): Use XALLOCAVEC allocated buffer for mask or value len above WIDE_INT_PRINT_BUFFER_SIZE. * fold-const.cc (fold_convert_const_int_from_int, fold_unary_loc): Use wide_int::from on wi::to_wide instead of wi::to_widest. * tree-ssa-ccp.cc (bit_value_binop): Zero extend r1max from width before calling wi::udiv_trunc. * lto-streamer-out.cc (output_cfg): Adjustments for the widest_int to bound_wide_int type change in non-static data members. * lto-streamer-in.cc (input_cfg): Likewise. (lto_input_tree_1): Use WIDE_INT_MAX_INL_ELTS rather than WIDE_INT_MAX_ELTS. For length above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. Formatting fix. * data-streamer-in.cc (streamer_read_wide_int, streamer_read_widest_int): Likewise. * tree-affine.cc (aff_combination_expand): Use placement new to construct name_expansion. (free_name_expansion): Destruct name_expansion. * gimple-ssa-strength-reduction.cc (struct slsr_cand_d): Change index type from widest_int to offset_int. (class incr_info_d): Change incr type from widest_int to offset_int. (alloc_cand_and_find_basis, backtrace_base_for_ref, restructure_reference, slsr_process_ref, create_mul_ssa_cand, create_mul_imm_cand, create_add_ssa_cand, create_add_imm_cand, slsr_process_add, cand_abs_increment, replace_mult_candidate, replace_unconditional_candidate, incr_vec_index, create_add_on_incoming_edge, create_phi_basis_1, replace_conditional_candidate, record_increment, record_phi_increments_1, phi_incr_cost_1, phi_incr_cost, lowest_cost_path, total_savings, ncd_with_phi, ncd_of_cand_and_phis, nearest_common_dominator_for_cands, insert_initializers, all_phi_incrs_profitable_1, replace_one_candidate, replace_profitable_candidates): Use offset_int rather than widest_int and wi::to_offset rather than wi::to_widest. * real.cc (real_to_integer): Use WIDE_INT_MAX_INL_ELTS rather than 2 * WIDE_INT_MAX_ELTS and for words above that use XALLOCAVEC allocated buffer. * tree-ssa-loop-ivopts.cc (niter_for_exit): Use placement new to construct tree_niter_desc and destruct it on failure. (free_tree_niter_desc): Destruct tree_niter_desc if value is non-NULL. * gengtype.cc (main): Remove widest_int handling. * graphite-isl-ast-to-gimple.cc (widest_int_from_isl_expr_int): Use WIDEST_INT_MAX_ELTS instead of WIDE_INT_MAX_ELTS. * gimple-ssa-warn-alloca.cc (pass_walloca::execute): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION and assert get_len () fits into it. * value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks): For mask or value lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. * gimple-ssa-sprintf.cc (adjust_range_for_overflow): Use wide_int::from on wi::to_wide instead of wi::to_widest. * omp-general.cc (score_wide_int): New typedef. (omp_context_compute_score): Use score_wide_int instead of widest_int and adjust for those changes. (struct omp_declare_variant_entry): Change score and score_in_declare_simd_clone non-static data member type from widest_int to score_wide_int. (omp_resolve_late_declare_variant, omp_resolve_declare_variant): Use score_wide_int instead of widest_int and adjust for those changes. (omp_lto_output_declare_variant_alt): Likewise. (omp_lto_input_declare_variant_alt): Likewise. * godump.cc (go_output_typedef): Assert get_len () is smaller than WIDE_INT_MAX_INL_ELTS. gcc/c-family/ * c-warn.cc (match_case_to_enum_1): Use wi::to_wide just once instead of 3 times, assert get_len () is smaller than WIDE_INT_MAX_INL_ELTS. gcc/testsuite/ * gcc.dg/bitint-38.c: New test.
2023-10-12LibF7: Implement atan2.Georg-Johann Lay3-3/+61
libgcc/config/avr/libf7/ * libf7.c (F7MOD_atan2_, f7_atan2): New module and function. * libf7.h: Adjust comments. * libf7-common.mk (CALL_PROLOGUES): Add atan2.
2023-10-12RISC-V: Support FP lround/lroundf auto vectorizationPan Li9-0/+264
This patch would like to support the FP lround/lroundf auto vectorization. * long lround (double) for rv64 * long lroundf (float) for rv32 Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lroundmn2 only act on DF => DI for rv64, and SF => SI for rv32. Given we have code like: void test_lround (long *out, double *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_lround (in[i]); } Before this patch: .L3: ... fld fa5,0(a1) fcvt.l.d a5,fa5,rmm sd a5,-8(a0) ... bne a1,a4,.L3 After this patch: frrm a6 ... fsrmi 4 // RMM .L3: ... vsetvli a3,zero,e64,m1,ta,ma vfcvt.x.f.v v1,v1 vsetvli zero,a2,e64,m1,ta,ma vse32.v v1,0(a0) ... bne a2,zero,.L3 ... fsrm a6 The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lround<mode><v_i_l_ll_convert>2): New pattern for lround/lroundf. * config/riscv/riscv-protos.h (enum insn_type): New enum value. (expand_vec_lround): New func decl for expanding lround. * config/riscv/riscv-v.cc (expand_vec_lround): New func impl for expanding lround. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-lround-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lround-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lround-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lround-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lround-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lround-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-12dwarf2out: Stop using wide_int in GC structuresJakub Jelinek2-17/+55
The planned wide_int/widest_int changes to support larger precisions make wide_int and widest_int unusable in GC structures, because it has non-trivial destructors (and may point to heap allocated memory). dwarf2out.{h,cc} is the only user of wide_int in GC structures for val_wide, but actually doesn't really need much, all those are at one point created from const wide_int_ref & and never changed afterwards, with just a couple of methods used on it. So, this patch replaces use of wide_int there with a new class, dw_wide_int, which contains just precision, len field and the limbs in trailing array. Most needed methods are implemented directly, just for the most complicated cases it temporarily constructs a wide_int_ref from it and calls its methods. 2023-10-12 Jakub Jelinek <jakub@redhat.com> * dwarf2out.h (wide_int_ptr): Remove. (dw_wide_int_ptr): New typedef. (struct dw_val_node): Change type of val_wide from wide_int_ptr to dw_wide_int_ptr. (struct dw_wide_int): New type. (dw_wide_int::elt): New method. (dw_wide_int::operator ==): Likewise. * dwarf2out.cc (get_full_len): Change argument type to const dw_wide_int & from const wide_int &. Use CEIL. Call get_precision method instead of calling wi::get_precision. (alloc_dw_wide_int): New function. (add_AT_wide): Change w argument type to const wide_int_ref & from const wide_int &. Use alloc_dw_wide_int. (mem_loc_descriptor, loc_descriptor): Use alloc_dw_wide_int. (insert_wide_int): Change val argument type to const wide_int_ref & from const wide_int &. (add_const_value_attribute): Pass rtx_mode_t temporary directly to add_AT_wide instead of using a temporary variable.
2023-10-12tree-optimization/111764 - wrong reduction vectorizationRichard Biener2-12/+19
The following removes a misguided attempt to allow x + x in a reduction path, also allowing x * x which isn't valid. x + x actually never arrives this way but instead is canonicalized to 2 * x. This makes reduction path handling consistent with how we handle the single-stmt reduction case. PR tree-optimization/111764 * tree-vect-loop.cc (check_reduction_path): Remove the attempt to allow x + x via special-casing of assigns. * gcc.dg/vect/pr111764.c: New testcase.
2023-10-12Support Intel USER_MSRHu, Lin128-10/+288
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect USER_MSR. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_USER_MSR_SET): New. (OPTION_MASK_ISA2_USER_MSR_UNSET): Ditto. (ix86_handle_option): Handle -musermsr. * common/config/i386/i386-cpuinfo.h (enum processor_features): Add FEATURE_USER_MSR. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for usermsr. * config.gcc: Add usermsrintrin.h * config/i386/cpuid.h (bit_USER_MSR): New. * config/i386/i386-builtin-types.def: Add DEF_FUNCTION_TYPE (VOID, UINT64, UINT64). * config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins): Add __builtin_urdmsr and __builtin_uwrmsr. * config/i386/i386-builtins.h (ix86_builtins): Add IX86_BUILTIN_URDMSR and IX86_BUILTIN_UWRMSR. * config/i386/i386-c.cc (ix86_target_macros_internal): Define __USER_MSR__. * config/i386/i386-expand.cc (ix86_expand_builtin): Handle new builtins. * config/i386/i386-isa.def (USER_MSR): Add DEF_PTA(USER_MSR). * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): Handle usermsr. * config/i386/i386.md (urdmsr): New define_insn. (uwrmsr): Ditto. * config/i386/i386.opt: Add option -musermsr. * config/i386/x86gprintrin.h: Include usermsrintrin.h * doc/extend.texi: Document usermsr. * doc/invoke.texi: Document -musermsr. * doc/sourcebuild.texi: Document target usermsr. * config/i386/usermsrintrin.h: New file. gcc/testsuite/ChangeLog: * gcc.target/i386/funcspec-56.inc: Add new target attribute. * gcc.target/i386/x86gprintrin-1.c: Add -musermsr for 64bit target. * gcc.target/i386/x86gprintrin-2.c: Ditto. * gcc.target/i386/x86gprintrin-3.c: Ditto. * gcc.target/i386/x86gprintrin-4.c: Add musermsr for 64bit target. * gcc.target/i386/x86gprintrin-5.c: Ditto * gcc.target/i386/user_msr-1.c: New test. * gcc.target/i386/user_msr-2.c: Ditto.
2023-10-12LoongArch: Modify check_effective_target_vect_int_mod according to SX/ASX ↵Chenghui Pan1-0/+18
capabilities. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add LoongArch in check_effective_target_vect_int_mod according to SX/ASX capabilities.
2023-10-12LoongArch: Enable vect.exp for LoongArch. [PR111424]Chenghui Pan1-0/+31
gcc/testsuite/ChangeLog: PR target/111424 * lib/target-supports.exp: Enable vect.exp for LoongArch.
2023-10-12LoongArch: Adjust makefile dependency for loongarch headers.Yang Yujie3-6/+4
gcc/ChangeLog: * config.gcc: Add loongarch-driver.h to tm_files. * config/loongarch/loongarch.h: Do not include loongarch-driver.h. * config/loongarch/t-loongarch: Append loongarch-multilib.h to $(GTM_H) instead of $(TM_H) for building generator programs.
2023-10-12Fortran: Set hidden string length for pointer components [PR67740].Paul Thomas2-4/+61
2023-10-11 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/67740 * trans-expr.cc (gfc_trans_pointer_assignment): Set the hidden string length component for pointer assignment to character pointer components. gcc/testsuite/ PR fortran/67740 * gfortran.dg/pr67740.f90: New test
2023-10-12rs6000: Make 32 bit stack_protect support prefixed insn [PR111367]Kewen Lin2-46/+49
As PR111367 shows, with prefixed insn supported, some of checkings consider it's able to leverage prefixed insn for stack protect related load/store, but since we don't actually change the emitted assembly for 32 bit, it can cause the assembler error as exposed. Mike's commit r10-4547-gce6a6c007e5a98 has already handled the 64 bit case (DImode), this patch is to treat the 32 bit case (SImode) by making use of mode iterator P and ptrload attribute iterator, also fixes the constraints to match the emitted operand formats. PR target/111367 gcc/ChangeLog: * config/rs6000/rs6000.md (stack_protect_setsi): Support prefixed instruction emission and incorporate to stack_protect_set<mode>. (stack_protect_setdi): Rename to ... (stack_protect_set<mode>): ... this, adjust constraint. (stack_protect_testsi): Support prefixed instruction emission and incorporate to stack_protect_test<mode>. (stack_protect_testdi): Rename to ... (stack_protect_test<mode>): ... this, adjust constraint. gcc/testsuite/ChangeLog: * g++.target/powerpc/pr111367.C: New test.
2023-10-12testsuite: Avoid uninit var in pr60510.f [PR111427]Kewen Lin1-0/+1
The uninitialized variable a in pr60510.f can cause some random failures as exposed in PR111427. This patch is to make it initialized accordingly. PR testsuite/111427 gcc/testsuite/ChangeLog: * gfortran.dg/vect/pr60510.f (test): Init variable a.
2023-10-12vect: Consider vec_perm costing for VMAT_CONTIGUOUS_REVERSEKewen Lin2-27/+65
For VMAT_CONTIGUOUS_REVERSE, the transform code in function vectorizable_store generates a VEC_PERM_EXPR stmt before storing, but it's never considered in costing. This patch is to make it consider vec_perm in costing, it adjusts the order of transform code a bit to make it easy to early return for costing_p. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_store): Consider generated VEC_PERM_EXPR stmt for VMAT_CONTIGUOUS_REVERSE in costing as vec_perm. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c: New test.
2023-10-12vect: Get rid of vect_model_store_costKewen Lin1-93/+44
This patch is to eventually get rid of vect_model_store_cost, it adjusts the costing for the remaining memory access types VMAT_CONTIGUOUS{, _DOWN, _REVERSE} by moving costing close to the transform code. Note that in vect_model_store_cost, there is one special handling for vectorizing a store into the function result, since it's extra penalty and the transform part doesn't have it, this patch keep it alone. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Remove. (vectorizable_store): Adjust the costing for the remaining memory access types VMAT_CONTIGUOUS{, _DOWN, _REVERSE}.
2023-10-12vect: Adjust vectorizable_store costing on VMAT_CONTIGUOUS_PERMUTEKewen Lin1-54/+74
This patch adjusts the cost handling on VMAT_CONTIGUOUS_PERMUTE in function vectorizable_store. We don't call function vect_model_store_cost for it any more. It's the case of interleaving stores, so it skips all stmts excepting for first_stmt_info, consider the whole group when costing first_stmt_info. This patch shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it will never get VMAT_CONTIGUOUS_PERMUTE and remove VMAT_CONTIGUOUS_PERMUTE related handlings. (vectorizable_store): Adjust the cost handling on VMAT_CONTIGUOUS_PERMUTE without calling vect_model_store_cost.
2023-10-12vect: Adjust vectorizable_store costing on VMAT_LOAD_STORE_LANESKewen Lin1-35/+75
This patch adjusts the cost handling on VMAT_LOAD_STORE_LANES in function vectorizable_store. We don't call function vect_model_store_cost for it any more. It's the case of interleaving stores, so it skips all stmts excepting for first_stmt_info, consider the whole group when costing first_stmt_info. This patch shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it will never get VMAT_LOAD_STORE_LANES. (vectorizable_store): Adjust the cost handling on VMAT_LOAD_STORE_LANES without calling vect_model_store_cost. Factor out new lambda function update_prologue_cost.
2023-10-12vect: Adjust vectorizable_store costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLPKewen Lin2-63/+120
This patch adjusts the cost handling on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP in function vectorizable_store. We don't call function vect_model_store_cost for them any more. Like what we improved for PR82255 on load side, this change helps us to get rid of unnecessary vec_to_scalar costing for some case with VMAT_STRIDED_SLP. One typical test case gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c has been associated. And it helps some cases with some inconsistent costing too. Besides, this also special-cases the interleaving stores for these two affected memory access types, since for the interleaving stores the whole chain is vectorized when the last store in the chain is reached, the other stores in the group would be skipped. To keep consistent with this and follows the transforming handlings like iterating the whole group, it only costs for the first store in the group. Ideally we can only cost for the last one but it's not trivial and using the first one is actually equivalent. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get VMAT_ELEMENTWISE and VMAT_STRIDED_SLP any more, and remove their related handlings. (vectorizable_store): Adjust the cost handling on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP without calling vect_model_store_cost. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c: New test.
2023-10-12vect: Simplify costing on vectorizable_scan_storeKewen Lin1-3/+15
This patch is to simplify the costing on the case vectorizable_scan_store without calling function vect_model_store_cost any more. I considered if moving the costing into function vectorizable_scan_store is a good idea, for doing that, we have to pass several variables down which are only used for costing, and for now we just want to keep the costing as the previous, haven't tried to make this costing consistent with what the transforming does, so I think we can leave it for now. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_store): Adjust costing on vectorizable_scan_store without calling vect_model_store_cost any more.
2023-10-12vect: Adjust vectorizable_store costing on VMAT_GATHER_SCATTERKewen Lin1-70/+118
This patch adjusts the cost handling on VMAT_GATHER_SCATTER in function vectorizable_store (all three cases), then we won't depend on vect_model_load_store for its costing any more. This patch shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get VMAT_GATHER_SCATTER any more, remove VMAT_GATHER_SCATTER related handlings and the related parameter gs_info. (vect_build_scatter_store_calls): Add the handlings on costing with one more argument cost_vec. (vectorizable_store): Adjust the cost handling on VMAT_GATHER_SCATTER without calling vect_model_store_cost any more.
2023-10-12vect: Move vect_model_store_cost next to the transform in vectorizable_storeKewen Lin1-19/+60
This patch is an initial patch to move costing next to the transform, it still adopts vect_model_store_cost for costing but moves and duplicates it down according to the handlings of different vect_memory_access_types or some special handling need, hope it can make the subsequent patches easy to review. This patch should not have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_store): Move and duplicate the call to vect_model_store_cost down to some different transform paths according to the handlings of different vect_memory_access_types or some special handling need.