aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-06-09fortran: Fix ICE on pr96024.f90 on big-endian hosts [PR96024]Jakub Jelinek1-1/+2
The pr96024.f90 testcase ICEs on big-endian hosts. The problem is that length->val.integer is accessed after checking length->expr_type == EXPR_CONSTANT, but it is a CHARACTER constant which uses length->val.character union member instead and on big-endian we end up reading constant 0x100000000 rather than some small number on little-endian and if target doesn't have enough memory for 4 times that (i.e. 16GB allocation), it ICEs. 2023-06-09 Jakub Jelinek <jakub@redhat.com> PR fortran/96024 * primary.cc (gfc_convert_to_structure_constructor): Only do constant string ctor length verification and truncation/padding if constant length has INTEGER type.
2023-06-09Explicitly view_convert_expr mask to signed type when folding pblendvb builtins.liuhongt2-1/+17
Since mask < 0 will be always false for vector char when -funsigned-char, but vpblendvb needs to check the most significant bit. The patch explicitly VCE to vector signed char. gcc/ChangeLog: PR target/110108 * config/i386/i386.cc (ix86_gimple_fold_builtin): Explicitly view_convert_expr mask to signed type when folding pblendvb builtins. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110108-2.c: New test.
2023-06-09Fold _mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple ABSU_EXPR + VCE.liuhongt5-11/+62
r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for TYPE_MIN, but PABSB will store unsigned result into dst. The patch uses ABSU_EXPR + VCE instead of ABS_EXPR. Also don't fold _mm_abs_{pi8,pi16,pi32} w/o TARGET_64BIT since 64-bit vector absm2 is guarded with TARGET_MMX_WITH_SSE. gcc/ChangeLog: PR target/110108 * config/i386/i386.cc (ix86_gimple_fold_builtin): Fold _mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple ABSU_EXPR + VCE, don't fold _mm_abs_{pi8,pi16,pi32} w/o TARGET_64BIT. * config/i386/i386-builtin.def: Replace CODE_FOR_nothing with real codename for __builtin_ia32_pabs{b,w,d}. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110108.c: New test. * gcc.target/i386/pr110108-3.c: New test. * gcc.target/i386/pr109900.c: Adjust testcase.
2023-06-09Daily bump.GCC Administrator6-1/+248
2023-06-09PR modula2/110126 variables are reported as unused when referenced by ASMGaius Mulley12-181/+292
This patches fixes two problems with the asm statement. gm2 -Wall -c fooasm3.mod generates an incorrect warning and gm2 cannot concatenate strings before an ASM statement. The asm statement now accepts a constant expression (rather than a string) and it updates the variable read/write use lists as appropriate. gcc/m2/ChangeLog: PR modula2/110126 * gm2-compiler/M2GenGCC.mod (BuildTreeFromInterface): Remove tokenno parameter. Use object tok instead of tokenno. (BuildTrashTreeFromInterface): Use object tok instead of GetDeclaredMod. (CodeInline): Remove tokenno from parameter list to BuildTreeFromInterface. * gm2-compiler/M2Quads.def (BuildAsmElement): Exported and defined. * gm2-compiler/M2Quads.mod (BuildOptimizeOff): Reformatted. (BuildInline): Reformatted. (BuildLineNo): Reformatted. (UseLineNote): Reformatted. (BuildAsmElement): New procedure. * gm2-compiler/P0SyntaxCheck.bnf (AsmOperands): Use ConstExpression instead of string. (AsmElement): Use ConstExpression instead of string. (TrashList): Use ConstExpression instead of string. * gm2-compiler/P1Build.bnf (AsmOperands): Use ConstExpression instead of string. (AsmElement): Use ConstExpression instead of string. (TrashList): Use ConstExpression instead of string. * gm2-compiler/P2Build.bnf (AsmOperands): Use ConstExpression instead of string. (AsmElement): Use ConstExpression instead of string. (TrashList): Use ConstExpression instead of string. * gm2-compiler/P3Build.bnf (AsmOperands): Rewrite. (AsmOperandSpec): Rewrite. (AsmOutputList): New rule. (AsmInputList): New rule. (TrashList): Rewrite. * gm2-compiler/PCBuild.bnf (AsmOperands): Use ConstExpression instead of string. (AsmElement): Use ConstExpression instead of string. (TrashList): Use ConstExpression instead of string. * gm2-compiler/PHBuild.bnf (AsmOperands): Use ConstExpression instead of string. (AsmElement): Use ConstExpression instead of string. (TrashList): Use ConstExpression instead of string. * gm2-compiler/SymbolTable.def (PutRegInterface): Rewrite interface. (GetRegInterface): Rewrite interface. * gm2-compiler/SymbolTable.mod (SetFirstUsed): New procedure. (PutFirstUsed): New procedure. (PutRegInterface): Rewrite. (GetRegInterface): Rewrite. gcc/testsuite/ChangeLog: PR modula2/110126 * gm2/pim/pass/fooasm3.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-06-08Provide a new dispatch mechanism for range-ops.Andrew MacLeod5-280/+306
Simplify range_op_handler to have a single range_operator pointer and provide a more flexible dispatch mechanism for calls via generic vrange classes. This is more extensible for adding new classes of range support. Any unsupported dispatch patterns will simply return FALSE now rather than generating compile time exceptions, aleviating the need to constantly check for supoprted types. * gimple-range-op.cc (gimple_range_op_handler::gimple_range_op_handler): Adjust. (gimple_range_op_handler::maybe_builtin_call): Adjust. * gimple-range-op.h (operand1, operand2): Use m_operator. * range-op.cc (integral_table, pointer_table): Relocate. (get_op_handler): Rename from get_handler and handle all types. (range_op_handler::range_op_handler): Relocate. (range_op_handler::set_op_handler): Relocate and adjust. (range_op_handler::range_op_handler): Relocate. (dispatch_trio): New. (RO_III, RO_IFI, RO_IFF, RO_FFF, RO_FIF, RO_FII): New consts. (range_op_handler::dispatch_kind): New. (range_op_handler::fold_range): Relocate and Use new dispatch value. (range_op_handler::op1_range): Ditto. (range_op_handler::op2_range): Ditto. (range_op_handler::lhs_op1_relation): Ditto. (range_op_handler::lhs_op2_relation): Ditto. (range_op_handler::op1_op2_relation): Ditto. (range_op_handler::set_op_handler): Use m_operator member. * range-op.h (range_op_handler::operator bool): Use m_operator. (range_op_handler::dispatch_kind): New. (range_op_handler::m_valid): Delete. (range_op_handler::m_int): Delete (range_op_handler::m_float): Delete (range_op_handler::m_operator): New. (range_op_table::operator[]): Relocate from .cc file. (range_op_table::set): Ditto. * value-range.h (class vrange): Make range_op_handler a friend.
2023-06-08Unify range_operators to one class.Andrew MacLeod4-202/+183
Range_operator and range_operator_float are 2 different classes, making generalized dispatch difficult. The distinction between what is a float operator and what is an integral operator also blurs when some methods have multiple types. ie, casts : INT = FLOAT and FLOAT = INT This patch unifies all possible invocation patterns in one class, and switches the float table to use the general range_op_table. * gimple-range-op.cc (cfn_constant_float_p): Change base class. (cfn_pass_through_arg1): Adjust using statemenmt. (cfn_signbit): Change base class, adjust using statement. (cfn_copysign): Ditto. (cfn_sqrt): Ditto. (cfn_sincos): Ditto. * range-op-float.cc (fold_range): Change class to range_operator. (rv_fold): Ditto. (op1_range): Ditto (op2_range): Ditto (lhs_op1_relation): Ditto. (lhs_op2_relation): Ditto. (op1_op2_relation): Ditto. (foperator_*): Ditto. (class float_table): New. Inherit from range_op_table. (floating_tree_table) Change to range_op_table pointer. (class floating_op_table): Delete. * range-op.cc (operator_equal): Adjust using statement. (operator_not_equal): Ditto. (operator_lt, operator_le, operator_gt, operator_ge): Ditto. (operator_minus, operator_cast): Ditto. (operator_bitwise_and, pointer_plus_operator): Ditto. (get_float_handle): Change return type. * range-op.h (range_operator_float): Delete. Relocate all methods into class range_operator. (range_op_handler::m_float): Change type to range_operator. (floating_op_table): Delete. (floating_tree_table): Change type.
2023-06-08Remove tree_code from range-operator.Andrew MacLeod2-37/+79
Range_operator had a tree code added last release to facilitate bitmask operations. This removes the tree_code and replaces it with a virtual routine to peform the masking. Remove any duplicate instances which are no longer needed. * range-op.cc (range_operator::fold_range): Call virtual routine. (range_operator::update_bitmask): New. (operator_equal::update_bitmask): New. (operator_not_equal::update_bitmask): New. (operator_lt::update_bitmask): New. (operator_le::update_bitmask): New. (operator_gt::update_bitmask): New. (operator_ge::update_bitmask): New. (operator_ge::update_bitmask): New. (operator_plus::update_bitmask): New. (operator_minus::update_bitmask): New. (operator_pointer_diff::update_bitmask): New. (operator_min::update_bitmask): New. (operator_max::update_bitmask): New. (operator_mult::update_bitmask): New. (operator_div:operator_div):New. (operator_div::update_bitmask): New. (operator_div::m_code): New member. (operator_exact_divide::operator_exact_divide): New constructor. (operator_lshift::update_bitmask): New. (operator_rshift::update_bitmask): New. (operator_bitwise_and::update_bitmask): New. (operator_bitwise_or::update_bitmask): New. (operator_bitwise_xor::update_bitmask): New. (operator_trunc_mod::update_bitmask): New. (op_ident, op_unknown, op_ptr_min_max): New. (op_nop, op_convert): Delete. (op_ssa, op_paren, op_obj_type): Delete. (op_realpart, op_imagpart): Delete. (op_ptr_min, op_ptr_max): Delete. (pointer_plus_operator:update_bitmask): New. (range_op_table::set): Do not use m_code. (integral_table::integral_table): Adjust to single instances. * range-op.h (range_operator::range_operator): Delete. (range_operator::m_code): Delete. (range_operator::update_bitmask): New.
2023-06-08Fix floating point bug in fold_range.Andrew MacLeod1-1/+1
We currently do not have any floating point operators where operand 1 is a different type than the LHS. When we eventually do there is a bug in fold_range. If either operand is a known NAN, it returns a NAN of the type of operand 1 instead of the result type. * range-op-float.cc (range_operator_float::fold_range): Return NAN of the result type.
2023-06-08RISC-V: Add more test cases for RVV FP16Pan Li2-2/+57
This patch would like to add new test cases to make sure the RVV FP16 works well as expected. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Add new cases. * gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: New test.
2023-06-08analyzer: Standalone OOB-warning [PR109437, PR109439]Benjamin Priour8-26/+49
This patch enhances -Wanalyzer-out-of-bounds that is no longer paired with a -Wanalyzer-use-of-uninitialized-value on out-of-bounds-read. This also fixes PR analyzer/109437. Before there could always be at most one OOB-read warning per frame because -Wanalyzer-use-of-uninitialized-value always terminates the analysis path. PR 109439 gcc/analyzer/ChangeLog: * bounds-checking.cc (region_model::check_symbolic_bounds): Returns whether the BASE_REG region access was OOB. (region_model::check_region_bounds): Likewise. * region-model.cc (region_model::get_store_value): Creates an unknown svalue on OOB-read access to REG. (region_model::check_region_access): Returns whether an unknown svalue needs be created. (region_model::check_region_for_read): Passes check_region_access return value. * region-model.h: Update prior function definitions. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/out-of-bounds-2.c: Cleaned test for uninitialized-value warning * gcc.dg/analyzer/out-of-bounds-5.c: Likewise. * gcc.dg/analyzer/pr101962.c: Likewise. * gcc.dg/analyzer/realloc-5.c: Likewise. * gcc.dg/analyzer/pr109439.c: New test.
2023-06-08optabs: Implement double-word ctz and ffs expansionJakub Jelinek3-9/+72
We have expand_doubleword_clz for a couple of years, where we emit double-word CLZ as if (high_word == 0) return CLZ (low_word) + word_size; else return CLZ (high_word); We can do something similar for CTZ and FFS IMHO, just with the 2 words swapped. So if (low_word == 0) return CTZ (high_word) + word_size; else return CTZ (low_word); for CTZ and if (low_word == 0) { return high_word ? FFS (high_word) + word_size : 0; else return FFS (low_word); The following patch implements that. Note, on some targets which implement both word_mode ctz and ffs patterns, it might be better to incrementally implement those double-word ffs expansion patterns in md files, because we aren't able to optimize it correctly; nothing can detect we have just made sure that argument is not 0 and so don't need to bother with handling that case. So, on ia32 just using CTZ patterns would be better there, but I think we can even do better and instead of doing the comparisons of the operands against 0 do the CTZ expansion followed by testing of flags. 2023-06-08 Jakub Jelinek <jakub@redhat.com> * optabs.cc (expand_ffs): Add forward declaration. (expand_doubleword_clz): Rename to ... (expand_doubleword_clz_ctz_ffs): ... this. Add UNOPTAB argument, handle also doubleword CTZ and FFS in addition to CLZ. (expand_unop): Adjust caller. Also call it for doubleword ctz_optab and ffs_optab. * gcc.target/i386/ctzll-1.c: New test. * gcc.target/i386/ffsll-1.c: New test.
2023-06-08i386: Fix endless recursion in ix86_expand_vector_init_general with MMX ↵Jakub Jelinek1-1/+1
[PR110152] I'm getting +FAIL: gcc.target/i386/3dnow-1.c (internal compiler error: Segmentation fault signal terminated program cc1) +FAIL: gcc.target/i386/3dnow-1.c (test for excess errors) +FAIL: gcc.target/i386/3dnow-2.c (internal compiler error: Segmentation fault signal terminated program cc1) +FAIL: gcc.target/i386/3dnow-2.c (test for excess errors) +FAIL: gcc.target/i386/mmx-1.c (internal compiler error: Segmentation fault signal terminated program cc1) +FAIL: gcc.target/i386/mmx-1.c (test for excess errors) +FAIL: gcc.target/i386/mmx-2.c (internal compiler error: Segmentation fault signal terminated program cc1) +FAIL: gcc.target/i386/mmx-2.c (test for excess errors) regressions on i686-linux since r14-1166. The problem is when ix86_expand_vector_init_general is called with mmx_ok = true and mode = V4HImode, it newly recurses with mmx_ok = false and mode = V2SImode, but as mmx_ok is false and !TARGET_SSE, we recurse again with the same arguments (ok, fresh new tmp and vals) infinitely. The following patch fixes that by passing mmx_ok to that recursive call. For n_words == 4 it isn't needed, because we only care about mmx_ok for V2SImode or V2SFmode and no other modes. 2023-06-08 Jakub Jelinek <jakub@redhat.com> PR target/110152 * config/i386/i386-expand.cc (ix86_expand_vector_init_general): For n_words == 2 recurse with mmx_ok as first argument rather than false.
2023-06-08Fortran: Fix some more blockers in associate meta-bug [PR87477]Paul Thomas10-17/+113
2023-06-08 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/87477 PR fortran/99350 PR fortran/107821 PR fortran/109451 * decl.cc (char_len_param_value): Simplify a copy of the expr and replace the original if there is no error. * gfortran.h : Remove the redundant field 'rankguessed' from 'gfc_association_list'. * resolve.cc (resolve_assoc_var): Remove refs to 'rankguessed'. (resolve_variable): Associate names with constant or structure constructor targets cannot have array refs. * trans-array.cc (gfc_conv_expr_descriptor): Guard expression character length backend decl before using it. Suppress the assignment if lhs equals rhs. * trans-io.cc (gfc_trans_transfer): Scalarize transfer of associate variables pointing to a variable. Add comment. * trans-stmt.cc (trans_associate_var): Remove requirement that the character length be deferred before assigning the value returned by gfc_conv_expr_descriptor. Also, guard the backend decl before testing with VAR_P. gcc/testsuite/ PR fortran/99350 * gfortran.dg/pr99350.f90 : New test. PR fortran/107821 * gfortran.dg/associate_5.f03 : Changed error message. * gfortran.dg/pr107821.f90 : New test. PR fortran/109451 * gfortran.dg/associate_61.f90 : New test
2023-06-08[testsuite] bump some tsvc timeoutsAlexandre Oliva8-1/+9
Several tests are timing out when targeting x86-*-vxworks with qemu. Bump their timeout factor. for gcc/testsuite/ChangeLog * gcc.dg/vect/tsvc/vect-tsvc-s116.c: Bump timeout factor. * gcc.dg/vect/tsvc/vect-tsvc-s241.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s254.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s271.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s2711.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s2712.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s276.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-vdotr.c: Likewise.
2023-06-08Daily bump.GCC Administrator10-1/+536
2023-06-07[Committed] Bug fix to new wi::bitreverse_large function.Roger Sayle1-1/+1
Richard Sandiford was, of course, right to be warry of new code without much test coverage. Converting the nvptx backend to use the BITREVERSE rtx infrastructure, has resulted in far more exhaustive testing and revealed a subtle bug in the new wi::bitreverse implementation. The code needs to use HOST_WIDE_INT_1U (instead of 1) to avoid unintended sign extension. This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu (with a minor tweak to use BITREVERSE), where it fixes regressions of the 32-bit test vectors in gcc.target/nvptx/brev-2.c and the 64-bit test vectors in gcc.target/nvptx/brevll-2.c. Committed as obvious. 2023-06-07 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * wide-int.cc (wi::bitreverse_large): Use HOST_WIDE_INT_1U to avoid sign extension/undefined behaviour when setting each bit.
2023-06-07Add support for stc and cmc instructions in i386.mdRoger Sayle7-4/+175
This patch is the latest revision of my patch to add support for the STC (set carry flag) and CMC (complement carry flag) instructions to the i386 backend, incorporating Uros' previous feedback. The significant changes are (i) the inclusion of CMC, (ii) the use of UNSPEC for pattern, (iii) Use of a new X86_TUNE_SLOW_STC tuning flag to use alternate implementations on pentium4 (which has a notoriously slow STC) when not optimizing for size. An example of the use of the stc instruction is: unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) { return __builtin_ia32_addcarryx_u32 (1, a, b, c); } which previously generated: movl $1, %eax addb $-1, %al adcl %esi, %edi setc %al movl %edi, (%rdx) movzbl %al, %eax ret with this patch now generates: stc adcl %esi, %edi setc %al movl %edi, (%rdx) movzbl %al, %eax ret An example of the use of the cmc instruction (where the carry from a first adc is inverted/complemented as input to a second adc) is: unsigned int bar (unsigned int a, unsigned int b, unsigned int c, unsigned int d) { unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1); return __builtin_ia32_addcarryx_u32 (c1 ^ 1, c, d, &o2); } which previously generated: movl $1, %eax addb $-1, %al adcl %esi, %edi setnc %al movl %edi, o1(%rip) addb $-1, %al adcl %ecx, %edx setc %al movl %edx, o2(%rip) movzbl %al, %eax ret and now generates: stc adcl %esi, %edi cmc movl %edi, o1(%rip) adcl %ecx, %edx setc %al movl %edx, o2(%rip) movzbl %al, %eax ret This version implements Uros' suggestions/refinements. (i) Avoid the UNSPEC_CMC by using the canonical RTL idiom for *x86_cmc, (ii) Use peephole2s to convert x86_stc and *x86_cmc into alternate forms on TARGET_SLOW_STC CPUs (pentium4), when a suitable QImode register is available, (iii) Prefer the addqi_cconly_overflow idiom (addb $-1,%al) over negqi_ccc_1 (neg %al) for setting the carry from a QImode value, These changes required two minor edits to i386.cc: ix86_cc_mode had to be tweaked to suggest CCCmode for the new *x86_cmc pattern, and *x86_cmc needed to be handled/parameterized in ix86_rtx_costs so that combine would appreciate that this complex RTL expression was actually a fast, single byte instruction [i.e. preferable]. 2022-06-07 Roger Sayle <roger@nextmovesoftware.com> Uros Bizjak <ubizjak@gmail.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_builtin) <handlecarry>: Use new x86_stc instruction when the carry flag must be set. * config/i386/i386.cc (ix86_cc_mode): Use CCCmode for *x86_cmc. (ix86_rtx_costs): Provide accurate rtx_costs for *x86_cmc. * config/i386/i386.h (TARGET_SLOW_STC): New define. * config/i386/i386.md (UNSPEC_STC): New UNSPEC for stc. (x86_stc): New define_insn. (define_peephole2): Convert x86_stc into alternate implementation on pentium4 without -Os when a QImode register is available. (*x86_cmc): New define_insn. (define_peephole2): Convert *x86_cmc into alternate implementation on pentium4 without -Os when a QImode register is available. (*setccc): New define_insn_and_split for a no-op CCCmode move. (*setcc_qi_negqi_ccc_1_<mode>): New define_insn_and_split to recognize (and eliminate) the carry flag being copied to itself. (*setcc_qi_negqi_ccc_2_<mode>): Likewise. * config/i386/x86-tune.def (X86_TUNE_SLOW_STC): New tuning flag. gcc/testsuite/ChangeLog * gcc.target/i386/cmc-1.c: New test case. * gcc.target/i386/stc-1.c: Likewise.
2023-06-07c++: allow NRV and non-NRV returns [PR58487]Jason Merrill8-12/+121
Now that we support NRV from an inner block, we can also support non-NRV returns from other blocks, since once the NRV is out of scope a later return expression can't possibly alias it. This fixes 58487 and half-fixes 53637: now one of the returns is elided, but not the other. Fixing the remaining xfails in these testcases will require a very different approach, probably involving a full tree/block walk from finalize_nrv, and check_return_expr only adding to a list of potential return variables. PR c++/58487 PR c++/53637 gcc/cp/ChangeLog: * cp-tree.h (INIT_EXPR_NRV_P): New. * semantics.cc (finalize_nrv_r): Check it. * name-lookup.h (decl_in_scope_p): Declare. * name-lookup.cc (decl_in_scope_p): New. * typeck.cc (check_return_expr): Allow non-NRV returns if the NRV is no longer in scope. gcc/testsuite/ChangeLog: * g++.dg/opt/nrv26.C: New test. * g++.dg/opt/nrv26a.C: New test. * g++.dg/opt/nrv27.C: New test.
2023-06-07MATCH: Fix comment for `(zero_one ==/!= 0) ? y : z <op> y` patternsAndrew Pinski1-2/+2
The patterns match more than just `a & 1` so change the comment for these two patterns to say that. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. gcc/ChangeLog: * match.pd: Fix comment for the `(zero_one ==/!= 0) ? y : z <op> y` patterns.
2023-06-07RISC-V: Eliminate extension after for *w instructionsJeff Law9-35/+309
This patch tries to prevent generating unnecessary sign extension after *w instructions like "addiw" or "divw". The main idea of it is to add SUBREG_PROMOTED fields during expanding. I have tested on SPEC2017 there is no regression. Only gcc.dg/pr30957-1.c test failed. To solve that I did some changes in loop-iv.cc, but not sure that it is suitable. gcc/ChangeLog: * config/riscv/bitmanip.md (rotrdi3, rotrsi3, rotlsi3): New expanders. (rotrsi3_sext): Expose generator. (rotlsi3 pattern): Hide generator. * config/riscv/riscv-protos.h (riscv_emit_binary): New function declaration. * config/riscv/riscv.cc (riscv_emit_binary): Removed static * config/riscv/riscv.md (addsi3, subsi3, negsi2): Hide generator. (mulsi3, <optab>si3): Likewise. (addsi3, subsi3, negsi2, mulsi3, <optab>si3): New expanders. (addv<mode>4, subv<mode>4, mulv<mode>4): Use riscv_emit_binary. (<u>mulsidi3): Likewise. (addsi3_extended, subsi3_extended, negsi2_extended): Expose generator. (mulsi3_extended, <optab>si3_extended): Likewise. (splitter for shadd feeding divison): Update RTL pattern to account for changes in how 32 bit ops are expanded for TARGET_64BIT. * loop-iv.cc (get_biv_step_1): Process src of extension when it PLUS. gcc/testsuite/ChangeLog: * gcc.target/riscv/shift-and-2.c: New tests. * gcc.target/riscv/shift-shift-2.c: Adjust expected output. * gcc.target/riscv/sign-extend.c: New test. * gcc.target/riscv/zbb-rol-ror-03.c: Adjust expected output. Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2023-06-07riscv: Fix scope for memory model calculationDimitar Dimitrov1-4/+9
During libgcc configure stage for riscv32-none-elf, when "--enable-checking=yes,rtl" has been activated, the following error is observed: during RTL pass: final conftest.c: In function 'main': conftest.c:16:1: internal compiler error: RTL check: expected code 'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4462 16 | } | ^ 0x843c4d rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*) /mnt/nvme/dinux/local-workspace/gcc/gcc/rtl.cc:916 0x8ea823 riscv_print_operand /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:4462 0xde84b5 output_operand(rtx_def*, int) /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3632 0xde8ef8 output_asm_insn(char const*, rtx_def**) /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3544 0xded33b output_asm_insn(char const*, rtx_def**) /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3421 0xded33b final_scan_insn_1 /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:2841 0xded6cb final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*) /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:2887 0xded8b7 final_1 /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:1979 0xdee518 rest_of_handle_final /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:4240 0xdee518 execute /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:4318 Fix by moving the calculation of memmodel to the cases where it is used. Regression tested for riscv32-none-elf. No changes in gcc.sum and g++.sum. PR target/109725 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_print_operand): Calculate memmodel only when it is valid. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2023-06-07riscv: Fix insn cost calculationDimitar Dimitrov1-1/+1
When building riscv32-none-elf with "--enable-checking=yes,rtl", the following ICE is observed: cc1: internal compiler error: RTL check: expected code 'const_int', have 'const_double' in riscv_const_insns, at config/riscv/riscv.cc:1313 0x843c4d rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*) /mnt/nvme/dinux/local-workspace/gcc/gcc/rtl.cc:916 0x8eab61 riscv_const_insns(rtx_def*) /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:1313 0x15443bb riscv_legitimate_constant_p /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:826 0xdd3c71 emit_move_insn(rtx_def*, rtx_def*) /mnt/nvme/dinux/local-workspace/gcc/gcc/expr.cc:4310 0x15f28e5 run_const_vector_selftests /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv-selftests.cc:285 0x15f37bd selftest::riscv_run_selftests() /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv-selftests.cc:364 0x1f6fba9 selftest::run_tests() /mnt/nvme/dinux/local-workspace/gcc/gcc/selftest-run-tests.cc:111 0x11d1f39 toplev::run_self_tests() /mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:2185 Fix by following the spirit of the adjacent comment, and using the dedicated riscv_const_insns() function to calculate cost for loading a constant element. Infinite recursion is not possible because the first invocation is on a CONST_VECTOR, whereas the second is on a single element of the vector (e.g. CONST_INT or CONST_DOUBLE). Regression tested for riscv32-none-elf. No changes in gcc.sum and g++.sum. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_const_insns): Recursively call for constant element of a vector. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2023-06-07libstdc++: Fix up 20_util/to_chars/double.cc test for excess precision ↵Jakub Jelinek1-0/+26
[PR110145] This test apparently contains 3 problematic floating point constants, 1e126, 4.91e-6 and 5.547e-6. These constants suffer from double rounding when -fexcess-precision=standard evaluates double constants in the precision of Intel extended 80-bit long double. As written in the PR, e.g. the first one is 0x1.7a2ecc414a03f7ff6ca1cb527787b130a97d51e51202365p+418 in the precision of GCC's internal format, 80-bit long double has 63-bit precision, so the above constant rounded to long double is 0x1.7a2ecc414a03f800p+418L (the least significant bit in the 0 before p isn't there already). 0x1.7a2ecc414a03f800p+418L rounded to IEEE double is 0x1.7a2ecc414a040p+418. Now, if excess precision doesn't happen and we round the GCC's internal format number directly to double, it is 0x1.7a2ecc414a03fp+418 and that is the number the test expects. One can see it on x86-64 (where excess precision to long double doesn't happen) where double(1e126L) != 1e126. The other two constants suffer from the same problem. The following patch tweaks the testcase, such that those problematic constants are used only if FLT_EVAL_METHOD is 0 or 1 (i.e. when we have guarantee the constants will be evaluated in double precision), plus adds corresponding tests with hexadecimal constants which don't suffer from this excess precision problem, they are exact in double and long double can hold all double values. 2023-06-07 Jakub Jelinek <jakub@redhat.com> PR libstdc++/110145 * testsuite/20_util/to_chars/double.cc: Include <cfloat>. (double_to_chars_test_cases, double_scientific_precision_to_chars_test_cases_2, double_fixed_precision_to_chars_test_cases_2): #if out 1e126, 4.91e-6 and 5.547e-6 tests if FLT_EVAL_METHOD is negative or larger than 1. Add unconditional tests with corresponding double constants 0x1.7a2ecc414a03fp+418, 0x1.4981285e98e79p-18 and 0x1.7440bbff418b9p-18.
2023-06-07match.pd: Improve zero_one_valued_pJakub Jelinek1-5/+2
Recently zero_one_valued_p was changed to handle integer_zerop case specially, because tree_nonzero_bits (@0) == 1 only returns true for non-constant values with range [0, 1] or constant 1, constant 0 has tree_nonzero_bits (integer_zero_node) == 0. The following patch reverts that change and instead checks that tree_nonzero_bits is <= 1U. 2023-06-07 Jakub Jelinek <jakub@redhat.com> * match.pd (zero_one_valued_p): Don't handle integer_zerop specially, instead compare tree_nonzero_bits <= 1U rather than just == 1.
2023-06-07aarch64: Allow compiler to define ls64 builtins [PR110132]Alex Coplan8-39/+100
This patch refactors the ls64 builtins to allow the compiler to define them directly instead of having wrapper functions in arm_acle.h. This should be not only easier to maintain, but it makes two important correctness fixes: - It fixes PR110132, where the builtins ended up getting declared with invisible bindings in the C FE, so the FE ended up synthesizing incompatible implicit definitions for these builtins. - It allows the builtins to be used with LTO, which didn't work previously. We also take the opportunity to add test coverage from C++ for these builtins. gcc/ChangeLog: PR target/110132 * config/aarch64/aarch64-builtins.cc (aarch64_general_simulate_builtin): New. Use it ... (aarch64_init_ls64_builtins): ... here. Switch to declaring public ACLE names for builtins. (aarch64_general_init_builtins): Ensure we invoke the arm_acle.h setup if in_lto_p, just like we do for SVE. * config/aarch64/arm_acle.h: (__arm_ld64b): Delete. (__arm_st64b): Delete. (__arm_st64bv): Delete. (__arm_st64bv0): Delete. gcc/testsuite/ChangeLog: PR target/110132 * lib/target-supports.exp (check_effective_target_aarch64_asm_FUNC_ok): Extend to ls64. * g++.target/aarch64/acle/acle.exp: New. * g++.target/aarch64/acle/ls64.C: New test. * g++.target/aarch64/acle/ls64_lto.C: New test. * gcc.target/aarch64/acle/ls64_lto.c: New test. * gcc.target/aarch64/acle/pr110132.c: New test.
2023-06-07aarch64: Fix wrong code with st64b builtin [PR110100]Alex Coplan3-2/+9
The st64b pattern incorrectly had an output constraint on the register operand containing the destination address for the store, leading to wrong code. This patch fixes that. gcc/ChangeLog: PR target/110100 * config/aarch64/aarch64-builtins.cc (aarch64_expand_builtin_ls64): Use input operand for the destination address. * config/aarch64/aarch64.md (st64b): Fix constraint on address operand. gcc/testsuite/ChangeLog: PR target/110100 * gcc.target/aarch64/acle/pr110100.c: New test.
2023-06-07aarch64: Fix whitespace in ls64 builtin implementation [PR110100]Alex Coplan2-43/+43
The ls64 builtin code was using incorrect GNU style with eight spaces where there should be a tab. Fixed thusly. gcc/ChangeLog: PR target/110100 * config/aarch64/aarch64-builtins.cc (aarch64_init_ls64_builtins_types): Replace eight consecutive spaces with tabs. (aarch64_init_ls64_builtins): Likewise. (aarch64_expand_builtin_ls64): Likewise. * config/aarch64/aarch64.md (ld64b): Likewise. (st64b): Likewise. (st64bv): Likewise (st64bv0): Likewise.
2023-06-07libgcc: Fix eh_frame fast path in find_fde_tailFlorian Weimer1-1/+1
The eh_frame value is only used by linear_search_fdes, not the binary search directly in find_fde_tail, so the bug is not immediately apparent with most programs. Fixes commit e724b0480bfa5ec04f39be8c7290330b495c59de ("libgcc: Special-case BFD ld unwind table encodings in find_fde_tail"). libgcc/ PR libgcc/109712 * unwind-dw2-fde-dip.c (find_fde_tail): Correct fast path for parsing eh_frame.
2023-06-07libstdc++: Restore accidentally removed version in abi-checkJonathan Wakely1-0/+1
In r14-1583-g192665feef7129 I meant to add CXXABI_1.3.15 but instead I replaced CXXABI_1.3.14 with it. This restores the CXXABI_1.3.14 version. libstdc++-v3/ChangeLog: * testsuite/util/testsuite_abi.cc (check_version): Re-add CXXABI_1.3.14.
2023-06-07libstdc++: Fix some tests that fail with -fno-exceptionsJonathan Wakely9-1/+24
libstdc++-v3/ChangeLog: * testsuite/18_support/nested_exception/rethrow_if_nested-term.cc: Require effective target exceptions_enabled instead of using dg-skip-if. * testsuite/23_containers/vector/capacity/constexpr.cc: Expect shrink_to_fit() to be a no-op without exceptions enabled. * testsuite/23_containers/vector/capacity/shrink_to_fit.cc: Likewise. * testsuite/ext/bitmap_allocator/check_allocate_max_size.cc: Require effective target exceptions_enabled. * testsuite/ext/malloc_allocator/check_allocate_max_size.cc: Likewise. * testsuite/ext/mt_allocator/check_allocate_max_size.cc: Likewise. * testsuite/ext/new_allocator/check_allocate_max_size.cc: Likewise. * testsuite/ext/pool_allocator/check_allocate_max_size.cc: Likewise. * testsuite/ext/throw_allocator/check_allocate_max_size.cc: Likewise.
2023-06-07libstdc++: Fix some tests that fail with -fexcess-precision=standardJonathan Wakely16-24/+24
libstdc++-v3/ChangeLog: * testsuite/20_util/duration/cons/2.cc: Use values that aren't affected by rounding. * testsuite/20_util/from_chars/5.cc: Cast arithmetic result to double before comparing for equality. * testsuite/20_util/from_chars/6.cc: Likewise. * testsuite/20_util/variant/86874.cc: Use values that aren't affected by rounding. * testsuite/25_algorithms/lower_bound/partitioned.cc: Compare to original value instead of to floating-point-literal. * testsuite/26_numerics/random/discrete_distribution/cons/range.cc: Cast arithmetic result to double before comparing for equality. * testsuite/26_numerics/random/piecewise_constant_distribution/cons/range.cc: Likewise. * testsuite/26_numerics/random/piecewise_linear_distribution/cons/range.cc: Likewise. * testsuite/26_numerics/valarray/transcend.cc (eq): Check that the absolute difference is less than 0.01 instead of comparing to two decimal places. * testsuite/27_io/basic_istream/extractors_arithmetic/char/01.cc: Cast arithmetic result to double before comparing for equality. * testsuite/27_io/basic_istream/extractors_arithmetic/char/09.cc: Likewise. * testsuite/27_io/basic_istream/extractors_arithmetic/char/10.cc: Likewise. * testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/01.cc: Likewise. * testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/09.cc: Likewise. * testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/10.cc: Likewise. * testsuite/ext/random/hoyt_distribution/cons/parms.cc: Likewise.
2023-06-07RA: Constrain class of pic offset table pseudo to general regsVladimir N. Makarov2-0/+33
On some targets an integer pseudo can be assigned to a FP reg. For pic offset table pseudo it means we will reload the pseudo in this case and, as a consequence, memory containing the pseudo might be recognized as wrong one. The patch fix this problem. PR target/109541 gcc/ChangeLog: * ira-costs.cc: (find_costs_and_classes): Constrain classes of pic offset table pseudo to a general reg subset. gcc/testsuite/ChangeLog: * gcc.target/sparc/pr109541.c: New.
2023-06-07aarch64: Represent SQXTUN with RTL operationsKyrylo Tkachov3-14/+56
This patch removes UNSPEC_SQXTUN and uses organic RTL codes to represent the operation. SQXTUN is an odd one. It's described in the architecture as "Signed saturating extract Unsigned Narrow". It's not a straightforward ss_truncate nor a us_truncate. It is a sort of truncating signed clamp operation with limits derived from the unsigned extrema of the narrow mode: (truncate:N (smin:M (smax:M (reg:M) (const_int 0)) (const_int <unsigned-max-for-mode-N>))) This patch implements these semantics. I've checked that the vqmovun tests in advsimd-intrinsics.exp now get constant-folded and still pass validation, so I'm pretty confident in the semantics. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_sqmovun<mode><vczle><vczbe>): Rename to... (*aarch64_sqmovun<mode>_insn<vczle><vczbe>): ... This. Reimplement with RTL codes. (aarch64_sqmovun<mode> [SD_HSDI]): Reimplement with RTL codes. (aarch64_sqxtun2<mode>_le): Likewise. (aarch64_sqxtun2<mode>_be): Likewise. (aarch64_sqxtun2<mode>): Adjust for the above. (aarch64_sqmovun<mode>): New define_expand. * config/aarch64/iterators.md (UNSPEC_SQXTUN): Delete. (half_mask): New mode attribute. * config/aarch64/predicates.md (aarch64_simd_umax_half_mode): New predicate.
2023-06-07aarch64: Improve RTL representation of ADDP instructionsKyrylo Tkachov1-7/+63
Similar to the ADDLP instructions the non-widening ADDP ones can be represented by adding the odd lanes with the even lanes of a vector. These instructions take two vector inputs and the architecture spec describes the operation as concatenating them together before going through it with pairwise additions. This patch chooses to represent ADDP on 64-bit and 128-bit input vectors slightly differently, reasons explained in the comments in aarhc64-simd.md. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_addp<mode><vczle><vczbe>): Reimplement as... (aarch64_addp<mode>_insn): ... This... (aarch64_addp<mode><vczle><vczbe>_insn): ... And this. (aarch64_addp<mode>): New define_expand.
2023-06-07Revert "libstdc++: Use AS_IF in configure.ac"Jonathan Wakely2-590/+578
This reverts commit 97a5e8a2a48d162744a5bd60a012ce6fca13cbbe. libstdc++-v3/ChangeLog: * configure: Regenerate. * configure.ac:
2023-06-07Fix expected test output on hppaJeff Law1-1/+1
Recent changes in the hoisting code change the optimized gimple for the shadd-3 testcase on the PA. That in turn changes the number of expected shadd instructions. I'm not entirely sure the test is actually testing what we want anymore since I don't see a CSE for postreload to discover. But I did verify that the number of shadd instructions is sane, so I just changed the count in the obvious way. gcc/testsuite * gcc.target/hppa/shadd-3.c: Update expected output.
2023-06-07testsuite/libgomp.*/target-present-*.{c,f90}: Improve and fixTobias Burnus6-25/+35
One of the testcases lacked variables in a map clause such that the fail occurred too early. Additionally, it would have failed for all those non-host devices where 'present' is always true, i.e. non-host devices which can access all of the host memory (shared-memory devices). [There are currently none.] The commit now runs the code on all devices, which should succeed for host fallback and for shared-memory devices, finding potenial issues that way. Additionally, a checkpoint (required stdout output) is used to ensure that the execution won't fail (with the same error) before reaching the expected fail location. 2023-06-07 Thomas Schwinge <thomas@codesourcery.com> Tobias Burnus <tobias@codesourcery.com> libgomp/ * testsuite/libgomp.c-c++-common/target-present-1.c: Run code also for non-offload_device targets; check that it runs successfully for those and for all until a checkpoint for all * testsuite/libgomp.c-c++-common/target-present-2.c: Likewise. * testsuite/libgomp.c-c++-common/target-present-3.c: Likewise. * testsuite/libgomp.fortran/target-present-1.f90: Likewise. * testsuite/libgomp.fortran/target-present-3.f90: Likewise. * testsuite/libgomp.fortran/target-present-2.f90: Likewise; add missing vars to map clause.
2023-06-07Support 'UNSUPPORTED: [...]: exception handling disabled' for libstdc++ testingThomas Schwinge1-0/+12
Verbatim copy of what was added to 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune' in Subversion r279246 (Git commit a9046e9853024206bec092dd63e21e152cb5cbca) "[MSP430] -Add fno-exceptions multilib". This greatly improves 'make check-target-libstdc++-v3' results for, for example, x86_64-pc-linux-gnu with: RUNTESTFLAGS='--target_board=unix/-fno-exceptions\{,-m32\}' libstdc++-v3/ * testsuite/lib/prune.exp (libstdc++-dg-prune): Support 'UNSUPPORTED: [...]: exception handling disabled'.
2023-06-07modula2: Fix bootstrapJakub Jelinek1-0/+2
internal-fn.h since yesterday includes insn-opinit.h, which is a generated header. One of my bootstraps today failed because some m2 sources started compiling before insn-opinit.h has been generated. Normally, gcc/Makefile.in has # In order for parallel make to really start compiling the expensive # objects from $(OBJS) as early as possible, build all their # prerequisites strictly before all objects. $(ALL_HOST_OBJS) : | $(generated_files) rule which ensures that all the generated files are generated before any $(ALL_HOST_OBJS) objects start, but use order-only dependency for this because we don't want to rebuild most of the objects whenever one generated header is regenerated. After the initial build in an empty directory we'll have .deps/ files contain the detailed dependencies. $(ALL_HOST_OBJS) includes even some FE files, I think in the m2 case would be m2_OBJS, but m2/Make-lang.in doesn't define those. The following patch just adds a similar rule to m2/Make-lang.in. Another option would be to set m2_OBJS variable in m2/Make-lang.in to something, but not really sure to which exactly and why it isn't done. 2023-06-07 Jakub Jelinek <jakub@redhat.com> * Make-lang.in: Build $(generated_files) before building all $(GM2_C_OBJS).
2023-06-07RISC-V: Support RVV VLA SLP auto-vectorizationJuzhe-Zhong26-37/+1010
This patch enables basic VLA SLP auto-vectorization. Consider this following case: void f (uint8_t *restrict a, uint8_t *restrict b) { for (int i = 0; i < 100; ++i) { a[i * 8 + 0] = b[i * 8 + 7] + 1; a[i * 8 + 1] = b[i * 8 + 7] + 2; a[i * 8 + 2] = b[i * 8 + 7] + 8; a[i * 8 + 3] = b[i * 8 + 7] + 4; a[i * 8 + 4] = b[i * 8 + 7] + 5; a[i * 8 + 5] = b[i * 8 + 7] + 6; a[i * 8 + 6] = b[i * 8 + 7] + 7; a[i * 8 + 7] = b[i * 8 + 7] + 3; } } To enable VLA SLP auto-vectorization, we should be able to handle this following const vector: 1. NPATTERNS = 8, NELTS_PER_PATTERN = 3. { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... } 2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. { 1, 2, 8, 4, 5, 6, 7, 3, ... } And these vector can be generated at prologue. After this patch, we end up with this following codegen: Prologue: ... vsetvli a7,zero,e16,m2,ta,ma vid.v v4 vsrl.vi v4,v4,3 li a3,8 vmul.vx v4,v4,a3 ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... } ... li t1,67633152 addi t1,t1,513 li a3,50790400 addi a3,a3,1541 slli a3,a3,32 add a3,a3,t1 vsetvli t1,zero,e64,m1,ta,ma vmv.v.x v3,a3 ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... } ... LoopBody: ... min a3,... vsetvli zero,a3,e8,m1,ta,ma vle8.v v2,0(a6) vsetvli a7,zero,e8,m1,ta,ma vrgatherei16.vv v1,v2,v4 vadd.vv v1,v1,v3 vsetvli zero,a3,e8,m1,ta,ma vse8.v v1,0(a2) add a6,a6,a4 add a2,a2,a4 mv a3,a5 add a5,a5,t1 bgtu a3,a4,.L3 ... Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 since "vrgatherei16.vv" can cover larger range than "vrgather.vv" (which only can maximum element index = 255). Epilogue: lbu a5,799(a1) addiw a4,a5,1 sb a4,792(a0) addiw a4,a5,2 sb a4,793(a0) addiw a4,a5,8 sb a4,794(a0) addiw a4,a5,4 sb a4,795(a0) addiw a4,a5,5 sb a4,796(a0) addiw a4,a5,6 sb a4,797(a0) addiw a4,a5,7 sb a4,798(a0) addiw a5,a5,3 sb a5,799(a0) ret There is one more last thing we need to do is the "Epilogue auto-vectorization" which needs VLS modes support. I will support VLS modes for "Epilogue auto-vectorization" in the future. gcc/ChangeLog: * config/riscv/riscv-protos.h (expand_vec_perm_const): New function. * config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling. (rvv_builder::single_step_npatterns_p): New function. (rvv_builder::npatterns_all_equal_p): Ditto. (const_vec_all_in_range_p): Support POLY handling. (gen_const_vector_dup): Ditto. (emit_vlmax_gather_insn): Add vrgatherei16. (emit_vlmax_masked_gather_mu_insn): Ditto. (expand_const_vector): Add VLA SLP const vector support. (expand_vec_perm): Support POLY. (struct expand_vec_perm_d): New struct. (shuffle_generic_patterns): New function. (expand_vec_perm_const_1): Ditto. (expand_vec_perm_const): Ditto. * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto. (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA vectorizer. * gcc.target/riscv/rvv/autovec/v-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: New test.
2023-06-06Handle const_int in expand_single_bit_testAndrew Pinski3-3/+45
After expanding directly to rtl instead of creating a tree, we could end up with a const_int which is not ready to be handled by extract_bit_field. So need to the constant folding here instead. OK? bootstrapped and tested on x86_64-linux-gnu with no regressions. PR middle-end/110117 gcc/ChangeLog: * expr.cc (expand_single_bit_test): Handle const_int from expand_expr. gcc/testsuite/ChangeLog: * gcc.dg/pr110117-1.c: New test. * gcc.dg/pr110117-2.c: New test.
2023-06-06Improve do_store_flag for single bit when there is no non-zero bitsAndrew Pinski1-17/+11
In r14-1534-g908e5ab5c11c, I forgot you could turn off CCP or turn off the bit tracking part of CCP so we would lose out what TER was able to do before hand. This moves around the TER code so that it is used instead of just the nonzerobits. It also makes it easier to remove the TER part of the code later on too. OK? Bootstrapped and tested on x86_64-linux-gnu. Note it reintroduces PR 110117 (which was accidently fixed after r14-1534-g908e5ab5c11c). The next patch in series will fix that. gcc/ChangeLog: * expr.cc (do_store_flag): Rearrange the TER code so that it overrides the nonzero bits info if we had `a & POW2`.
2023-06-06For the `-A CMP -B -> B CMP A` pattern allow EQ/NE for all integer typesAndrew Pinski5-2/+80
I noticed while looking at some code generation issue, that forwprop was not handling `-a == 0` for unsigned types and I was confused why it was not. r6-1814-g66e1cacf608045 removed these from fold because they were supposed to be already handled by the match.pd patterns but it was missed that the match.pd patterns checked TYPE_OVERFLOW_UNDEFINED while fold didn't do that for NE/EQ. This patch removes the restriction on NE/EQ on TYPE_OVERFLOW_UNDEFINED. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: PR tree-optimization/110134 * match.pd (-A CMP -B -> B CMP A): Allow EQ/NE for all integer types. (-A CMP CST -> B CMP (-CST)): Likewise. gcc/testsuite/ChangeLog: PR tree-optimization/110134 * gcc.dg/tree-ssa/negneq-1.c: New test. * gcc.dg/tree-ssa/negneq-2.c: New test. * gcc.dg/tree-ssa/negneq-3.c: New test. * gcc.dg/tree-ssa/negneq-4.c: New test.
2023-06-06libiberty: writeargv: Simplify function error mode.Costas Argyris1-3/+1
You are right, this is also a remnant of the old function design that I completely missed. Here is the follow-up patch for that. Thanks for pointing it out. Costas On Tue, 6 Jun 2023 at 04:12, Jeff Law <jeffreyalaw@gmail.com> wrote: On 6/5/23 08:37, Costas Argyris via Gcc-patches wrote: > writeargv can be simplified by getting rid of the error exit mode > that was only relevant many years ago when the function used > to open the file descriptor internally. [ ... ] Thanks. I've pushed this to the trunk. You could (as a follow-up) simplify it even further. There's no need for the status variable as far as I can tell. You could just have the final return be "return 0;" instead of "return status;". libiberty/ * argv.c (writeargv): Constant propagate "0" for "status", simplifying the code slightly.
2023-06-06Add match patterns for `a ? onezero : onezero` where one of the two operands ↵Andrew Pinski10-11/+165
are constant This adds a match pattern that are for boolean values that optimizes `a ? onezero : 0` to `a & onezero` and `a ? 1 : onezero` to `a | onezero`. This was reported a few times and I thought I would finally add the match pattern for this. This hits a few times in GCC itself too. Notes on the testcases: * phi-opt-2.c: This now is optimized to `a & b` in phiopt rather than ifcombine * phi-opt-25b.c: The test part that was failing was parity which now gets `x & y` treatment. * ssa-thread-21.c: there is no longer a threading opportunity, so need to disable phiopt. Note PR 109957 is filed for the now missing optimization in that testcase too. gcc/ChangeLog: PR tree-optimization/89263 PR tree-optimization/99069 PR tree-optimization/20083 PR tree-optimization/94898 * match.pd: Add patterns to optimize `a ? onezero : onezero` with one of the operands are constant. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-2.c: Adjust the testcase. * gcc.dg/tree-ssa/phi-opt-25b.c: Adjust the testcase. * gcc.dg/tree-ssa/ssa-thread-21.c: Disable phiopt. * gcc.dg/tree-ssa/phi-opt-27.c: New test. * gcc.dg/tree-ssa/phi-opt-28.c: New test. * gcc.dg/tree-ssa/phi-opt-29.c: New test. * gcc.dg/tree-ssa/phi-opt-30.c: New test. * gcc.dg/tree-ssa/phi-opt-31.c: New test. * gcc.dg/tree-ssa/phi-opt-32.c: New test.
2023-06-06Match: zero_one_valued_p should match 0 constants tooAndrew Pinski1-0/+5
While working on `bool0 ? bool1 : bool2` I noticed that zero_one_valued_p does not match on the constant zero as in that case tree_nonzero_bits will return 0 and that is different from 1. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd (zero_one_valued_p): Match 0 integer constant too.
2023-06-07RISC-V: Fix ICE when include riscv_vector.h with rv64gcvPan Li1-33/+33
This patch would like to fix the incorrect requirement of the vector builtin types for the ZVFH/ZVFHMIN extension. The incorrect requirement will result in the ops mismatch with iterators, and then ICE will be triggered if ZVFH/ZVFHMIN is not given. Sorry for inconviensient. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-types.def (vfloat32mf2_t): Take RVV_REQUIRE_ELEN_FP_16 as requirement. (vfloat32m1_t): Ditto. (vfloat32m2_t): Ditto. (vfloat32m4_t): Ditto. (vfloat32m8_t): Ditto. (vint16mf4_t): Ditto. (vint16mf2_t): Ditto. (vint16m1_t): Ditto. (vint16m2_t): Ditto. (vint16m4_t): Ditto. (vint16m8_t): Ditto. (vuint16mf4_t): Ditto. (vuint16mf2_t): Ditto. (vuint16m1_t): Ditto. (vuint16m2_t): Ditto. (vuint16m4_t): Ditto. (vuint16m8_t): Ditto. (vint32mf2_t): Ditto. (vint32m1_t): Ditto. (vint32m2_t): Ditto. (vint32m4_t): Ditto. (vint32m8_t): Ditto. (vuint32mf2_t): Ditto. (vuint32m1_t): Ditto. (vuint32m2_t): Ditto. (vuint32m4_t): Ditto. (vuint32m8_t): Ditto.
2023-06-06c++: Add -WnrvoJason Merrill4-2/+61
While looking at PRs about cases where we don't perform the named return value optimization, it occurred to me that it might be useful to have a warning for that. This does not fix PR58487, but might be interesting to people watching it. PR c++/58487 gcc/c-family/ChangeLog: * c.opt: Add -Wnrvo. gcc/ChangeLog: * doc/invoke.texi: Document it. gcc/cp/ChangeLog: * typeck.cc (want_nrvo_p): New. (check_return_expr): Handle -Wnrvo. gcc/testsuite/ChangeLog: * g++.dg/opt/nrv25.C: New test.
2023-06-06c++: enable NRVO from inner block [PR51571]Jason Merrill6-28/+64
Our implementation of the named return value optimization has been limited to variables declared in the outermost block of the function, to avoid needing to handle the case where the variable needs to be destroyed due to going out of scope. PR92407 pointed out a case we were missing, where the variable goes out of scope due to a goto and we were failing to destroy it. It occurred to me that this problem is the flip side of PR33799, where we need to be sure to destroy the return value if a cleanup throws on return; here we want to avoid destroying the return value when exiting the variable's scope on return. We can use the same flag to indicate to both cleanups that we're returning. This implements the guaranteed copy elision specified by P2025 (which is not yet part of the draft standard). PR c++/51571 PR c++/92407 gcc/cp/ChangeLog: * decl.cc (finish_function): Simplify NRV handling. * except.cc (maybe_set_retval_sentinel): Also set if NRV. (maybe_splice_retval_cleanup): Don't add the cleanup region if we don't need it. * semantics.cc (nrv_data): Add simple field. (finalize_nrv): Set it. (finalize_nrv_r): Check it and retval sentinel. * cp-tree.h (finalize_nrv): Adjust declaration. * typeck.cc (check_return_expr): Remove named_labels check. gcc/testsuite/ChangeLog: * g++.dg/opt/nrv23.C: New test.