aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-06-10Daily bump.GCC Administrator6-1/+184
2023-06-10VECT: Add SELECT_VL supportJu-Zhe Zhong7-16/+187
This patch address comments from Richard && Richi and rebase to trunk. This patch is adding SELECT_VL middle-end support allow target have target dependent optimization in case of length calculation. This patch is inspired by RVV ISA and LLVM: https://reviews.llvm.org/D99750 The SELECT_VL is same behavior as LLVM "get_vector_length" with these following properties: 1. Only apply on single-rgroup. 2. non SLP. 3. adjust loop control IV. 4. adjust data reference IV. 5. allow non-vf elements processing in non-final iteration Code # void vvaddint32(size_t n, const int*x, const int*y, int*z) # { for (size_t i=0; i<n; i++) { z[i]=x[i]+y[i]; } } Take RVV codegen for example: Before this patch: vvaddint32: ble a0,zero,.L6 csrr a4,vlenb srli a6,a4,2 .L4: mv a5,a0 bleu a0,a6,.L3 mv a5,a6 .L3: vsetvli zero,a5,e32,m1,ta,ma vle32.v v2,0(a1) vle32.v v1,0(a2) vsetvli a7,zero,e32,m1,ta,ma sub a0,a0,a5 vadd.vv v1,v1,v2 vsetvli zero,a5,e32,m1,ta,ma vse32.v v1,0(a3) add a2,a2,a4 add a3,a3,a4 add a1,a1,a4 bne a0,zero,.L4 .L6: ret After this patch: vvaddint32: vsetvli t0, a0, e32, ta, ma # Set vector length based on 32-bit vectors vle32.v v0, (a1) # Get first vector sub a0, a0, t0 # Decrement number done slli t0, t0, 2 # Multiply number done by 4 bytes add a1, a1, t0 # Bump pointer vle32.v v1, (a2) # Get second vector add a2, a2, t0 # Bump pointer vadd.vv v2, v0, v1 # Sum vectors vse32.v v2, (a3) # Store result add a3, a3, t0 # Bump pointer bnez a0, vvaddint32 # Loop back ret # Finished Co-authored-by: Richard Sandiford<richard.sandiford@arm.com> Co-authored-by: Richard Biener <rguenther@suse.de> gcc/ChangeLog: * doc/md.texi: Add SELECT_VL support. * internal-fn.def (SELECT_VL): Ditto. * optabs.def (OPTAB_D): Ditto. * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto. * tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto. (vectorizable_store): Ditto. (vectorizable_load): Ditto. * tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.
2023-06-09analyzer: add caching to globals with initializers [PR110112]David Malcolm3-31/+79
PR analyzer/110112 notes that -fanalyzer is extremely slow on a source file with large read-only static arrays, repeatedly building the same compound_svalue representing the full initializer, and repeatedly building svalues representing parts of the the full initialiazer. This patch adds caches for both of these; together they reduce the time taken by -fanalyzer -O2 on the testcase in the bug for an optimized build: 91.2s : no caches (status quo) 32.4s : cache in decl_region::get_svalue_for_constructor 3.7s : cache in region::get_initial_value_at_main 3.1s : both caches (this patch) gcc/analyzer/ChangeLog: PR analyzer/110112 * region-model.cc (region_model::get_initial_value_for_global): Move code to region::calc_initial_value_at_main. * region.cc (region::get_initial_value_at_main): New function. (region::calc_initial_value_at_main): New function, based on code in region_model::get_initial_value_for_global. (region::region): Initialize m_cached_init_sval_at_main. (decl_region::get_svalue_for_constructor): Add a cache, splitting out body to... (decl_region::calc_svalue_for_constructor): ...this new function. * region.h (region::get_initial_value_at_main): New decl. (region::calc_initial_value_at_main): New decl. (region::m_cached_init_sval_at_main): New field. (decl_region::decl_region): Initialize m_ctor_svalue. (decl_region::calc_svalue_for_constructor): New decl. (decl_region::m_ctor_svalue): New field. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-06-09Also check type being cast toAndrew MacLeod1-0/+1
before casting into an irange, make sure the type being cast into is also supported. PR ipa/109886 * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Check param type as well.
2023-06-09Relocate range_cast to header, and add a generic version.Andrew MacLeod2-19/+43
Make range_cast inlinable by moving it to the header file. Also trap if the destination is not capable of representing the cast type. Add a generic version which can change range classes.. ie float to int. * range-op.cc (range_cast): Move to... * range-op.h (range_cast): Here and add generic a version.
2023-06-09c++: fix 32-bit spaceship failures [PR110185]Jason Merrill2-0/+2
Various spaceship tests failed after r14-1624. This turned out to be because the comparison category classes return in memory on 32-bit targets, and the synthesized operator<=> looks something like if (auto v = a.x <=> b.x, v == 0); else return v; if (auto v = a.y <=> b.y, v == 0); else return v; etc. so check_return_expr was trying to do NRVO for all the 'v' variables, and now on subsequent returns we check to see if the previous NRV is still in scope. But the NRVs didn't have names, so looking up name bindings crashed. Fixed both by giving 'v' a name so we can NRVO the first one, and fixing the test to give up if the old NRV has no name. PR c++/110185 PR c++/58487 gcc/cp/ChangeLog: * method.cc (build_comparison_op): Give retval a name. * typeck.cc (check_return_expr): Fix for nameless variables.
2023-06-09c++: diagnose auto in template argJason Merrill3-9/+25
We were failing to diagnose this Concepts TS feature that didn't make it into C++20 because the 'auto' was getting converted to a template parameter before we checked for it. So also check in cp_parser_simple_type_specifier. The code in cp_parser_template_type_arg that I initially expected to diagnose this seems unreachable because cp_parser_type_id_1 already checks auto. gcc/cp/ChangeLog: * parser.cc (cp_parser_simple_type_specifier): Check for auto in template argument. (cp_parser_template_type_arg): Remove auto checking. gcc/testsuite/ChangeLog: * g++.dg/concepts/auto7.C: New test. * g++.dg/concepts/auto7a.C: New test.
2023-06-09c++: init-list of uncopyable type [PR110102]Jason Merrill2-0/+23
The maybe_init_list_as_range optimization is a form of copy elision, but we can only elide well-formed copies. PR c++/110102 gcc/cp/ChangeLog: * call.cc (maybe_init_list_as_array): Check that the element type is copyable. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/initlist-opt1.C: New test.
2023-06-09doc: Clarification for -Wmissing-field-initializersMarek Polacek1-2/+3
The manual is incorrect in saying that the option does not warn about designated initializers, which it does in C++. Whether the divergence in behavior is desirable is another thing, but let's at least make the manual match the reality. PR c/39589 PR c++/96868 gcc/ChangeLog: * doc/invoke.texi: Clarify that -Wmissing-field-initializers doesn't warn about designated initializers in C only.
2023-06-09Add Plus to the op list of `(zero_one == 0) ? y : z <op> y` patternAndrew Pinski3-2/+28
This adds plus to the op list of `(zero_one == 0) ? y : z <op> y` patterns which currently has bit_ior and bit_xor. This shows up now in GCC after the boolization work that Uroš has been doing. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/97711 PR tree-optimization/110155 gcc/ChangeLog: * match.pd ((zero_one == 0) ? y : z <op> y): Add plus to the op. ((zero_one != 0) ? z <op> y : y): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/branchless-cond-add-2.c: New test. * gcc.dg/tree-ssa/branchless-cond-add.c: New test.
2023-06-09Change the `(zero_one ==/!= 0) ? y : z <op> y` patterns to use multiply ↵Andrew Pinski1-4/+4
rather than `(-zero_one) & z` Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` already, it is better if we don't do a secondary transformation. This reduces the extra statements produced by match-and-simplify on the gimple level too. gcc/ChangeLog: * match.pd ((zero_one ==/!= 0) ? y : z <op> y): Use multiply rather than negation/bit_and.
2023-06-09MATCH: Allow unsigned types for `X & -Y -> X * Y` patternAndrew Pinski2-4/+7
This allows unsigned types if the inner type where the negation is located has greater than or equal to precision than the outer type. branchless-cond.c needs to be updated since now we change it to use a multiply rather than still having (-a)&c in there. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * match.pd (`X & -Y -> X * Y`): Allow for truncation and the same type for unsigned types. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
2023-06-09MATCH: Fix zero_one_valued_p not to match signed 1 bit integersAndrew Pinski3-3/+71
So for the attached testcase, we assumed that zero_one_valued_p would be the value [0,1] but currently zero_one_valued_p matches also signed 1 bit integers. This changes that not to match that and fixes the 2 new testcases at all optimization levels. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Note the GCC 13 patch will be slightly different due to the changes made to zero_one_valued_p. PR tree-optimization/110165 PR tree-optimization/110166 gcc/ChangeLog: * match.pd (zero_one_valued_p): Don't accept signed 1-bit integers. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr110165-1.c: New test. * gcc.c-torture/execute/pr110166-1.c: New test.
2023-06-09testsuite: fix the condition bug in tsvc s176Lehua Ding2-3/+3
This patch fixes the problem that the loop in the tsvc s176 function is optimized and removed because `iterations/LEN_1D` is 0 (where iterations is set to 10000, LEN_1D is set to 32000 in tsvc.h). This testcase passed on x86 and AArch64 system. Best, Lehua gcc/testsuite/ChangeLog: * gcc.dg/vect/tsvc/vect-tsvc-s176.c: Adjust iterations. * gcc.dg/vect/tsvc/tsvc.h: Adjust expected rsult for s176.
2023-06-09middle-end/110182 - TYPE_PRECISION on VECTOR_TYPE causes wrong-codeRichard Biener1-3/+3
When folding two conversions in a row we use TYPE_PRECISION but that's invalid for VECTOR_TYPE. The following fixes this by using element_precision instead. * match.pd (two conversions in a row): Use element_precision to DTRT for VECTOR_TYPE.
2023-06-09RISC-V: Refactor requirement of ZVFH and ZVFHMIN.Pan Li3-18/+59
This patch would like to refactor the requirement of both the ZVFH and ZVFHMIN. By default, the ZVFHMIN will enable FP16 for all the iterators of RVV. And then the ZVFH will leverage one define attr as the gate for FP16 supported or not. Please note the ZVFH will cover the ZVFHMIN instructions. This patch add one test for this. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai> Co-Authored by: Kito Cheng <kito.cheng@sifive.com> gcc/ChangeLog: * config/riscv/riscv.md (enabled): Move to another place, and add fp_vector_disabled to the cond. (fp_vector_disabled): New attr defined for disabling fp. * config/riscv/vector-iterators.md: Fix V_WHOLE and V_FRACT. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Add vle16 test for ZVFHMIN.
2023-06-09RISC-V: Fix one warning of frm enum.Pan Li1-7/+10
This patch would like to fix one warning similar as below, and add the link for where the values comes from. ./gcc/config/riscv/riscv-protos.h:260:13: warning: binary constants are a C++14 feature or GCC extension FRM_RNE = 0b000, ^~~~~ Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-protos.h (enum frm_field_enum): Adjust literal to int.
2023-06-09fortran: Fix ICE on pr96024.f90 on big-endian hosts [PR96024]Jakub Jelinek1-1/+2
The pr96024.f90 testcase ICEs on big-endian hosts. The problem is that length->val.integer is accessed after checking length->expr_type == EXPR_CONSTANT, but it is a CHARACTER constant which uses length->val.character union member instead and on big-endian we end up reading constant 0x100000000 rather than some small number on little-endian and if target doesn't have enough memory for 4 times that (i.e. 16GB allocation), it ICEs. 2023-06-09 Jakub Jelinek <jakub@redhat.com> PR fortran/96024 * primary.cc (gfc_convert_to_structure_constructor): Only do constant string ctor length verification and truncation/padding if constant length has INTEGER type.
2023-06-09Explicitly view_convert_expr mask to signed type when folding pblendvb builtins.liuhongt2-1/+17
Since mask < 0 will be always false for vector char when -funsigned-char, but vpblendvb needs to check the most significant bit. The patch explicitly VCE to vector signed char. gcc/ChangeLog: PR target/110108 * config/i386/i386.cc (ix86_gimple_fold_builtin): Explicitly view_convert_expr mask to signed type when folding pblendvb builtins. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110108-2.c: New test.
2023-06-09Fold _mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple ABSU_EXPR + VCE.liuhongt5-11/+62
r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for TYPE_MIN, but PABSB will store unsigned result into dst. The patch uses ABSU_EXPR + VCE instead of ABS_EXPR. Also don't fold _mm_abs_{pi8,pi16,pi32} w/o TARGET_64BIT since 64-bit vector absm2 is guarded with TARGET_MMX_WITH_SSE. gcc/ChangeLog: PR target/110108 * config/i386/i386.cc (ix86_gimple_fold_builtin): Fold _mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple ABSU_EXPR + VCE, don't fold _mm_abs_{pi8,pi16,pi32} w/o TARGET_64BIT. * config/i386/i386-builtin.def: Replace CODE_FOR_nothing with real codename for __builtin_ia32_pabs{b,w,d}. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110108.c: New test. * gcc.target/i386/pr110108-3.c: New test. * gcc.target/i386/pr109900.c: Adjust testcase.
2023-06-09Daily bump.GCC Administrator6-1/+248
2023-06-09PR modula2/110126 variables are reported as unused when referenced by ASMGaius Mulley12-181/+292
This patches fixes two problems with the asm statement. gm2 -Wall -c fooasm3.mod generates an incorrect warning and gm2 cannot concatenate strings before an ASM statement. The asm statement now accepts a constant expression (rather than a string) and it updates the variable read/write use lists as appropriate. gcc/m2/ChangeLog: PR modula2/110126 * gm2-compiler/M2GenGCC.mod (BuildTreeFromInterface): Remove tokenno parameter. Use object tok instead of tokenno. (BuildTrashTreeFromInterface): Use object tok instead of GetDeclaredMod. (CodeInline): Remove tokenno from parameter list to BuildTreeFromInterface. * gm2-compiler/M2Quads.def (BuildAsmElement): Exported and defined. * gm2-compiler/M2Quads.mod (BuildOptimizeOff): Reformatted. (BuildInline): Reformatted. (BuildLineNo): Reformatted. (UseLineNote): Reformatted. (BuildAsmElement): New procedure. * gm2-compiler/P0SyntaxCheck.bnf (AsmOperands): Use ConstExpression instead of string. (AsmElement): Use ConstExpression instead of string. (TrashList): Use ConstExpression instead of string. * gm2-compiler/P1Build.bnf (AsmOperands): Use ConstExpression instead of string. (AsmElement): Use ConstExpression instead of string. (TrashList): Use ConstExpression instead of string. * gm2-compiler/P2Build.bnf (AsmOperands): Use ConstExpression instead of string. (AsmElement): Use ConstExpression instead of string. (TrashList): Use ConstExpression instead of string. * gm2-compiler/P3Build.bnf (AsmOperands): Rewrite. (AsmOperandSpec): Rewrite. (AsmOutputList): New rule. (AsmInputList): New rule. (TrashList): Rewrite. * gm2-compiler/PCBuild.bnf (AsmOperands): Use ConstExpression instead of string. (AsmElement): Use ConstExpression instead of string. (TrashList): Use ConstExpression instead of string. * gm2-compiler/PHBuild.bnf (AsmOperands): Use ConstExpression instead of string. (AsmElement): Use ConstExpression instead of string. (TrashList): Use ConstExpression instead of string. * gm2-compiler/SymbolTable.def (PutRegInterface): Rewrite interface. (GetRegInterface): Rewrite interface. * gm2-compiler/SymbolTable.mod (SetFirstUsed): New procedure. (PutFirstUsed): New procedure. (PutRegInterface): Rewrite. (GetRegInterface): Rewrite. gcc/testsuite/ChangeLog: PR modula2/110126 * gm2/pim/pass/fooasm3.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-06-08Provide a new dispatch mechanism for range-ops.Andrew MacLeod5-280/+306
Simplify range_op_handler to have a single range_operator pointer and provide a more flexible dispatch mechanism for calls via generic vrange classes. This is more extensible for adding new classes of range support. Any unsupported dispatch patterns will simply return FALSE now rather than generating compile time exceptions, aleviating the need to constantly check for supoprted types. * gimple-range-op.cc (gimple_range_op_handler::gimple_range_op_handler): Adjust. (gimple_range_op_handler::maybe_builtin_call): Adjust. * gimple-range-op.h (operand1, operand2): Use m_operator. * range-op.cc (integral_table, pointer_table): Relocate. (get_op_handler): Rename from get_handler and handle all types. (range_op_handler::range_op_handler): Relocate. (range_op_handler::set_op_handler): Relocate and adjust. (range_op_handler::range_op_handler): Relocate. (dispatch_trio): New. (RO_III, RO_IFI, RO_IFF, RO_FFF, RO_FIF, RO_FII): New consts. (range_op_handler::dispatch_kind): New. (range_op_handler::fold_range): Relocate and Use new dispatch value. (range_op_handler::op1_range): Ditto. (range_op_handler::op2_range): Ditto. (range_op_handler::lhs_op1_relation): Ditto. (range_op_handler::lhs_op2_relation): Ditto. (range_op_handler::op1_op2_relation): Ditto. (range_op_handler::set_op_handler): Use m_operator member. * range-op.h (range_op_handler::operator bool): Use m_operator. (range_op_handler::dispatch_kind): New. (range_op_handler::m_valid): Delete. (range_op_handler::m_int): Delete (range_op_handler::m_float): Delete (range_op_handler::m_operator): New. (range_op_table::operator[]): Relocate from .cc file. (range_op_table::set): Ditto. * value-range.h (class vrange): Make range_op_handler a friend.
2023-06-08Unify range_operators to one class.Andrew MacLeod4-202/+183
Range_operator and range_operator_float are 2 different classes, making generalized dispatch difficult. The distinction between what is a float operator and what is an integral operator also blurs when some methods have multiple types. ie, casts : INT = FLOAT and FLOAT = INT This patch unifies all possible invocation patterns in one class, and switches the float table to use the general range_op_table. * gimple-range-op.cc (cfn_constant_float_p): Change base class. (cfn_pass_through_arg1): Adjust using statemenmt. (cfn_signbit): Change base class, adjust using statement. (cfn_copysign): Ditto. (cfn_sqrt): Ditto. (cfn_sincos): Ditto. * range-op-float.cc (fold_range): Change class to range_operator. (rv_fold): Ditto. (op1_range): Ditto (op2_range): Ditto (lhs_op1_relation): Ditto. (lhs_op2_relation): Ditto. (op1_op2_relation): Ditto. (foperator_*): Ditto. (class float_table): New. Inherit from range_op_table. (floating_tree_table) Change to range_op_table pointer. (class floating_op_table): Delete. * range-op.cc (operator_equal): Adjust using statement. (operator_not_equal): Ditto. (operator_lt, operator_le, operator_gt, operator_ge): Ditto. (operator_minus, operator_cast): Ditto. (operator_bitwise_and, pointer_plus_operator): Ditto. (get_float_handle): Change return type. * range-op.h (range_operator_float): Delete. Relocate all methods into class range_operator. (range_op_handler::m_float): Change type to range_operator. (floating_op_table): Delete. (floating_tree_table): Change type.
2023-06-08Remove tree_code from range-operator.Andrew MacLeod2-37/+79
Range_operator had a tree code added last release to facilitate bitmask operations. This removes the tree_code and replaces it with a virtual routine to peform the masking. Remove any duplicate instances which are no longer needed. * range-op.cc (range_operator::fold_range): Call virtual routine. (range_operator::update_bitmask): New. (operator_equal::update_bitmask): New. (operator_not_equal::update_bitmask): New. (operator_lt::update_bitmask): New. (operator_le::update_bitmask): New. (operator_gt::update_bitmask): New. (operator_ge::update_bitmask): New. (operator_ge::update_bitmask): New. (operator_plus::update_bitmask): New. (operator_minus::update_bitmask): New. (operator_pointer_diff::update_bitmask): New. (operator_min::update_bitmask): New. (operator_max::update_bitmask): New. (operator_mult::update_bitmask): New. (operator_div:operator_div):New. (operator_div::update_bitmask): New. (operator_div::m_code): New member. (operator_exact_divide::operator_exact_divide): New constructor. (operator_lshift::update_bitmask): New. (operator_rshift::update_bitmask): New. (operator_bitwise_and::update_bitmask): New. (operator_bitwise_or::update_bitmask): New. (operator_bitwise_xor::update_bitmask): New. (operator_trunc_mod::update_bitmask): New. (op_ident, op_unknown, op_ptr_min_max): New. (op_nop, op_convert): Delete. (op_ssa, op_paren, op_obj_type): Delete. (op_realpart, op_imagpart): Delete. (op_ptr_min, op_ptr_max): Delete. (pointer_plus_operator:update_bitmask): New. (range_op_table::set): Do not use m_code. (integral_table::integral_table): Adjust to single instances. * range-op.h (range_operator::range_operator): Delete. (range_operator::m_code): Delete. (range_operator::update_bitmask): New.
2023-06-08Fix floating point bug in fold_range.Andrew MacLeod1-1/+1
We currently do not have any floating point operators where operand 1 is a different type than the LHS. When we eventually do there is a bug in fold_range. If either operand is a known NAN, it returns a NAN of the type of operand 1 instead of the result type. * range-op-float.cc (range_operator_float::fold_range): Return NAN of the result type.
2023-06-08RISC-V: Add more test cases for RVV FP16Pan Li2-2/+57
This patch would like to add new test cases to make sure the RVV FP16 works well as expected. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Add new cases. * gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: New test.
2023-06-08analyzer: Standalone OOB-warning [PR109437, PR109439]Benjamin Priour8-26/+49
This patch enhances -Wanalyzer-out-of-bounds that is no longer paired with a -Wanalyzer-use-of-uninitialized-value on out-of-bounds-read. This also fixes PR analyzer/109437. Before there could always be at most one OOB-read warning per frame because -Wanalyzer-use-of-uninitialized-value always terminates the analysis path. PR 109439 gcc/analyzer/ChangeLog: * bounds-checking.cc (region_model::check_symbolic_bounds): Returns whether the BASE_REG region access was OOB. (region_model::check_region_bounds): Likewise. * region-model.cc (region_model::get_store_value): Creates an unknown svalue on OOB-read access to REG. (region_model::check_region_access): Returns whether an unknown svalue needs be created. (region_model::check_region_for_read): Passes check_region_access return value. * region-model.h: Update prior function definitions. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/out-of-bounds-2.c: Cleaned test for uninitialized-value warning * gcc.dg/analyzer/out-of-bounds-5.c: Likewise. * gcc.dg/analyzer/pr101962.c: Likewise. * gcc.dg/analyzer/realloc-5.c: Likewise. * gcc.dg/analyzer/pr109439.c: New test.
2023-06-08optabs: Implement double-word ctz and ffs expansionJakub Jelinek3-9/+72
We have expand_doubleword_clz for a couple of years, where we emit double-word CLZ as if (high_word == 0) return CLZ (low_word) + word_size; else return CLZ (high_word); We can do something similar for CTZ and FFS IMHO, just with the 2 words swapped. So if (low_word == 0) return CTZ (high_word) + word_size; else return CTZ (low_word); for CTZ and if (low_word == 0) { return high_word ? FFS (high_word) + word_size : 0; else return FFS (low_word); The following patch implements that. Note, on some targets which implement both word_mode ctz and ffs patterns, it might be better to incrementally implement those double-word ffs expansion patterns in md files, because we aren't able to optimize it correctly; nothing can detect we have just made sure that argument is not 0 and so don't need to bother with handling that case. So, on ia32 just using CTZ patterns would be better there, but I think we can even do better and instead of doing the comparisons of the operands against 0 do the CTZ expansion followed by testing of flags. 2023-06-08 Jakub Jelinek <jakub@redhat.com> * optabs.cc (expand_ffs): Add forward declaration. (expand_doubleword_clz): Rename to ... (expand_doubleword_clz_ctz_ffs): ... this. Add UNOPTAB argument, handle also doubleword CTZ and FFS in addition to CLZ. (expand_unop): Adjust caller. Also call it for doubleword ctz_optab and ffs_optab. * gcc.target/i386/ctzll-1.c: New test. * gcc.target/i386/ffsll-1.c: New test.
2023-06-08i386: Fix endless recursion in ix86_expand_vector_init_general with MMX ↵Jakub Jelinek1-1/+1
[PR110152] I'm getting +FAIL: gcc.target/i386/3dnow-1.c (internal compiler error: Segmentation fault signal terminated program cc1) +FAIL: gcc.target/i386/3dnow-1.c (test for excess errors) +FAIL: gcc.target/i386/3dnow-2.c (internal compiler error: Segmentation fault signal terminated program cc1) +FAIL: gcc.target/i386/3dnow-2.c (test for excess errors) +FAIL: gcc.target/i386/mmx-1.c (internal compiler error: Segmentation fault signal terminated program cc1) +FAIL: gcc.target/i386/mmx-1.c (test for excess errors) +FAIL: gcc.target/i386/mmx-2.c (internal compiler error: Segmentation fault signal terminated program cc1) +FAIL: gcc.target/i386/mmx-2.c (test for excess errors) regressions on i686-linux since r14-1166. The problem is when ix86_expand_vector_init_general is called with mmx_ok = true and mode = V4HImode, it newly recurses with mmx_ok = false and mode = V2SImode, but as mmx_ok is false and !TARGET_SSE, we recurse again with the same arguments (ok, fresh new tmp and vals) infinitely. The following patch fixes that by passing mmx_ok to that recursive call. For n_words == 4 it isn't needed, because we only care about mmx_ok for V2SImode or V2SFmode and no other modes. 2023-06-08 Jakub Jelinek <jakub@redhat.com> PR target/110152 * config/i386/i386-expand.cc (ix86_expand_vector_init_general): For n_words == 2 recurse with mmx_ok as first argument rather than false.
2023-06-08Fortran: Fix some more blockers in associate meta-bug [PR87477]Paul Thomas10-17/+113
2023-06-08 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/87477 PR fortran/99350 PR fortran/107821 PR fortran/109451 * decl.cc (char_len_param_value): Simplify a copy of the expr and replace the original if there is no error. * gfortran.h : Remove the redundant field 'rankguessed' from 'gfc_association_list'. * resolve.cc (resolve_assoc_var): Remove refs to 'rankguessed'. (resolve_variable): Associate names with constant or structure constructor targets cannot have array refs. * trans-array.cc (gfc_conv_expr_descriptor): Guard expression character length backend decl before using it. Suppress the assignment if lhs equals rhs. * trans-io.cc (gfc_trans_transfer): Scalarize transfer of associate variables pointing to a variable. Add comment. * trans-stmt.cc (trans_associate_var): Remove requirement that the character length be deferred before assigning the value returned by gfc_conv_expr_descriptor. Also, guard the backend decl before testing with VAR_P. gcc/testsuite/ PR fortran/99350 * gfortran.dg/pr99350.f90 : New test. PR fortran/107821 * gfortran.dg/associate_5.f03 : Changed error message. * gfortran.dg/pr107821.f90 : New test. PR fortran/109451 * gfortran.dg/associate_61.f90 : New test
2023-06-08[testsuite] bump some tsvc timeoutsAlexandre Oliva8-1/+9
Several tests are timing out when targeting x86-*-vxworks with qemu. Bump their timeout factor. for gcc/testsuite/ChangeLog * gcc.dg/vect/tsvc/vect-tsvc-s116.c: Bump timeout factor. * gcc.dg/vect/tsvc/vect-tsvc-s241.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s254.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s271.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s2711.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s2712.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s276.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-vdotr.c: Likewise.
2023-06-08Daily bump.GCC Administrator6-1/+426
2023-06-07[Committed] Bug fix to new wi::bitreverse_large function.Roger Sayle1-1/+1
Richard Sandiford was, of course, right to be warry of new code without much test coverage. Converting the nvptx backend to use the BITREVERSE rtx infrastructure, has resulted in far more exhaustive testing and revealed a subtle bug in the new wi::bitreverse implementation. The code needs to use HOST_WIDE_INT_1U (instead of 1) to avoid unintended sign extension. This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu (with a minor tweak to use BITREVERSE), where it fixes regressions of the 32-bit test vectors in gcc.target/nvptx/brev-2.c and the 64-bit test vectors in gcc.target/nvptx/brevll-2.c. Committed as obvious. 2023-06-07 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * wide-int.cc (wi::bitreverse_large): Use HOST_WIDE_INT_1U to avoid sign extension/undefined behaviour when setting each bit.
2023-06-07Add support for stc and cmc instructions in i386.mdRoger Sayle7-4/+175
This patch is the latest revision of my patch to add support for the STC (set carry flag) and CMC (complement carry flag) instructions to the i386 backend, incorporating Uros' previous feedback. The significant changes are (i) the inclusion of CMC, (ii) the use of UNSPEC for pattern, (iii) Use of a new X86_TUNE_SLOW_STC tuning flag to use alternate implementations on pentium4 (which has a notoriously slow STC) when not optimizing for size. An example of the use of the stc instruction is: unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) { return __builtin_ia32_addcarryx_u32 (1, a, b, c); } which previously generated: movl $1, %eax addb $-1, %al adcl %esi, %edi setc %al movl %edi, (%rdx) movzbl %al, %eax ret with this patch now generates: stc adcl %esi, %edi setc %al movl %edi, (%rdx) movzbl %al, %eax ret An example of the use of the cmc instruction (where the carry from a first adc is inverted/complemented as input to a second adc) is: unsigned int bar (unsigned int a, unsigned int b, unsigned int c, unsigned int d) { unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1); return __builtin_ia32_addcarryx_u32 (c1 ^ 1, c, d, &o2); } which previously generated: movl $1, %eax addb $-1, %al adcl %esi, %edi setnc %al movl %edi, o1(%rip) addb $-1, %al adcl %ecx, %edx setc %al movl %edx, o2(%rip) movzbl %al, %eax ret and now generates: stc adcl %esi, %edi cmc movl %edi, o1(%rip) adcl %ecx, %edx setc %al movl %edx, o2(%rip) movzbl %al, %eax ret This version implements Uros' suggestions/refinements. (i) Avoid the UNSPEC_CMC by using the canonical RTL idiom for *x86_cmc, (ii) Use peephole2s to convert x86_stc and *x86_cmc into alternate forms on TARGET_SLOW_STC CPUs (pentium4), when a suitable QImode register is available, (iii) Prefer the addqi_cconly_overflow idiom (addb $-1,%al) over negqi_ccc_1 (neg %al) for setting the carry from a QImode value, These changes required two minor edits to i386.cc: ix86_cc_mode had to be tweaked to suggest CCCmode for the new *x86_cmc pattern, and *x86_cmc needed to be handled/parameterized in ix86_rtx_costs so that combine would appreciate that this complex RTL expression was actually a fast, single byte instruction [i.e. preferable]. 2022-06-07 Roger Sayle <roger@nextmovesoftware.com> Uros Bizjak <ubizjak@gmail.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_builtin) <handlecarry>: Use new x86_stc instruction when the carry flag must be set. * config/i386/i386.cc (ix86_cc_mode): Use CCCmode for *x86_cmc. (ix86_rtx_costs): Provide accurate rtx_costs for *x86_cmc. * config/i386/i386.h (TARGET_SLOW_STC): New define. * config/i386/i386.md (UNSPEC_STC): New UNSPEC for stc. (x86_stc): New define_insn. (define_peephole2): Convert x86_stc into alternate implementation on pentium4 without -Os when a QImode register is available. (*x86_cmc): New define_insn. (define_peephole2): Convert *x86_cmc into alternate implementation on pentium4 without -Os when a QImode register is available. (*setccc): New define_insn_and_split for a no-op CCCmode move. (*setcc_qi_negqi_ccc_1_<mode>): New define_insn_and_split to recognize (and eliminate) the carry flag being copied to itself. (*setcc_qi_negqi_ccc_2_<mode>): Likewise. * config/i386/x86-tune.def (X86_TUNE_SLOW_STC): New tuning flag. gcc/testsuite/ChangeLog * gcc.target/i386/cmc-1.c: New test case. * gcc.target/i386/stc-1.c: Likewise.
2023-06-07c++: allow NRV and non-NRV returns [PR58487]Jason Merrill8-12/+121
Now that we support NRV from an inner block, we can also support non-NRV returns from other blocks, since once the NRV is out of scope a later return expression can't possibly alias it. This fixes 58487 and half-fixes 53637: now one of the returns is elided, but not the other. Fixing the remaining xfails in these testcases will require a very different approach, probably involving a full tree/block walk from finalize_nrv, and check_return_expr only adding to a list of potential return variables. PR c++/58487 PR c++/53637 gcc/cp/ChangeLog: * cp-tree.h (INIT_EXPR_NRV_P): New. * semantics.cc (finalize_nrv_r): Check it. * name-lookup.h (decl_in_scope_p): Declare. * name-lookup.cc (decl_in_scope_p): New. * typeck.cc (check_return_expr): Allow non-NRV returns if the NRV is no longer in scope. gcc/testsuite/ChangeLog: * g++.dg/opt/nrv26.C: New test. * g++.dg/opt/nrv26a.C: New test. * g++.dg/opt/nrv27.C: New test.
2023-06-07MATCH: Fix comment for `(zero_one ==/!= 0) ? y : z <op> y` patternsAndrew Pinski1-2/+2
The patterns match more than just `a & 1` so change the comment for these two patterns to say that. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. gcc/ChangeLog: * match.pd: Fix comment for the `(zero_one ==/!= 0) ? y : z <op> y` patterns.
2023-06-07RISC-V: Eliminate extension after for *w instructionsJeff Law9-35/+309
This patch tries to prevent generating unnecessary sign extension after *w instructions like "addiw" or "divw". The main idea of it is to add SUBREG_PROMOTED fields during expanding. I have tested on SPEC2017 there is no regression. Only gcc.dg/pr30957-1.c test failed. To solve that I did some changes in loop-iv.cc, but not sure that it is suitable. gcc/ChangeLog: * config/riscv/bitmanip.md (rotrdi3, rotrsi3, rotlsi3): New expanders. (rotrsi3_sext): Expose generator. (rotlsi3 pattern): Hide generator. * config/riscv/riscv-protos.h (riscv_emit_binary): New function declaration. * config/riscv/riscv.cc (riscv_emit_binary): Removed static * config/riscv/riscv.md (addsi3, subsi3, negsi2): Hide generator. (mulsi3, <optab>si3): Likewise. (addsi3, subsi3, negsi2, mulsi3, <optab>si3): New expanders. (addv<mode>4, subv<mode>4, mulv<mode>4): Use riscv_emit_binary. (<u>mulsidi3): Likewise. (addsi3_extended, subsi3_extended, negsi2_extended): Expose generator. (mulsi3_extended, <optab>si3_extended): Likewise. (splitter for shadd feeding divison): Update RTL pattern to account for changes in how 32 bit ops are expanded for TARGET_64BIT. * loop-iv.cc (get_biv_step_1): Process src of extension when it PLUS. gcc/testsuite/ChangeLog: * gcc.target/riscv/shift-and-2.c: New tests. * gcc.target/riscv/shift-shift-2.c: Adjust expected output. * gcc.target/riscv/sign-extend.c: New test. * gcc.target/riscv/zbb-rol-ror-03.c: Adjust expected output. Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2023-06-07riscv: Fix scope for memory model calculationDimitar Dimitrov1-4/+9
During libgcc configure stage for riscv32-none-elf, when "--enable-checking=yes,rtl" has been activated, the following error is observed: during RTL pass: final conftest.c: In function 'main': conftest.c:16:1: internal compiler error: RTL check: expected code 'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4462 16 | } | ^ 0x843c4d rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*) /mnt/nvme/dinux/local-workspace/gcc/gcc/rtl.cc:916 0x8ea823 riscv_print_operand /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:4462 0xde84b5 output_operand(rtx_def*, int) /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3632 0xde8ef8 output_asm_insn(char const*, rtx_def**) /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3544 0xded33b output_asm_insn(char const*, rtx_def**) /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3421 0xded33b final_scan_insn_1 /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:2841 0xded6cb final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*) /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:2887 0xded8b7 final_1 /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:1979 0xdee518 rest_of_handle_final /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:4240 0xdee518 execute /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:4318 Fix by moving the calculation of memmodel to the cases where it is used. Regression tested for riscv32-none-elf. No changes in gcc.sum and g++.sum. PR target/109725 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_print_operand): Calculate memmodel only when it is valid. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2023-06-07riscv: Fix insn cost calculationDimitar Dimitrov1-1/+1
When building riscv32-none-elf with "--enable-checking=yes,rtl", the following ICE is observed: cc1: internal compiler error: RTL check: expected code 'const_int', have 'const_double' in riscv_const_insns, at config/riscv/riscv.cc:1313 0x843c4d rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*) /mnt/nvme/dinux/local-workspace/gcc/gcc/rtl.cc:916 0x8eab61 riscv_const_insns(rtx_def*) /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:1313 0x15443bb riscv_legitimate_constant_p /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:826 0xdd3c71 emit_move_insn(rtx_def*, rtx_def*) /mnt/nvme/dinux/local-workspace/gcc/gcc/expr.cc:4310 0x15f28e5 run_const_vector_selftests /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv-selftests.cc:285 0x15f37bd selftest::riscv_run_selftests() /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv-selftests.cc:364 0x1f6fba9 selftest::run_tests() /mnt/nvme/dinux/local-workspace/gcc/gcc/selftest-run-tests.cc:111 0x11d1f39 toplev::run_self_tests() /mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:2185 Fix by following the spirit of the adjacent comment, and using the dedicated riscv_const_insns() function to calculate cost for loading a constant element. Infinite recursion is not possible because the first invocation is on a CONST_VECTOR, whereas the second is on a single element of the vector (e.g. CONST_INT or CONST_DOUBLE). Regression tested for riscv32-none-elf. No changes in gcc.sum and g++.sum. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_const_insns): Recursively call for constant element of a vector. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2023-06-07match.pd: Improve zero_one_valued_pJakub Jelinek1-5/+2
Recently zero_one_valued_p was changed to handle integer_zerop case specially, because tree_nonzero_bits (@0) == 1 only returns true for non-constant values with range [0, 1] or constant 1, constant 0 has tree_nonzero_bits (integer_zero_node) == 0. The following patch reverts that change and instead checks that tree_nonzero_bits is <= 1U. 2023-06-07 Jakub Jelinek <jakub@redhat.com> * match.pd (zero_one_valued_p): Don't handle integer_zerop specially, instead compare tree_nonzero_bits <= 1U rather than just == 1.
2023-06-07aarch64: Allow compiler to define ls64 builtins [PR110132]Alex Coplan8-39/+100
This patch refactors the ls64 builtins to allow the compiler to define them directly instead of having wrapper functions in arm_acle.h. This should be not only easier to maintain, but it makes two important correctness fixes: - It fixes PR110132, where the builtins ended up getting declared with invisible bindings in the C FE, so the FE ended up synthesizing incompatible implicit definitions for these builtins. - It allows the builtins to be used with LTO, which didn't work previously. We also take the opportunity to add test coverage from C++ for these builtins. gcc/ChangeLog: PR target/110132 * config/aarch64/aarch64-builtins.cc (aarch64_general_simulate_builtin): New. Use it ... (aarch64_init_ls64_builtins): ... here. Switch to declaring public ACLE names for builtins. (aarch64_general_init_builtins): Ensure we invoke the arm_acle.h setup if in_lto_p, just like we do for SVE. * config/aarch64/arm_acle.h: (__arm_ld64b): Delete. (__arm_st64b): Delete. (__arm_st64bv): Delete. (__arm_st64bv0): Delete. gcc/testsuite/ChangeLog: PR target/110132 * lib/target-supports.exp (check_effective_target_aarch64_asm_FUNC_ok): Extend to ls64. * g++.target/aarch64/acle/acle.exp: New. * g++.target/aarch64/acle/ls64.C: New test. * g++.target/aarch64/acle/ls64_lto.C: New test. * gcc.target/aarch64/acle/ls64_lto.c: New test. * gcc.target/aarch64/acle/pr110132.c: New test.
2023-06-07aarch64: Fix wrong code with st64b builtin [PR110100]Alex Coplan3-2/+9
The st64b pattern incorrectly had an output constraint on the register operand containing the destination address for the store, leading to wrong code. This patch fixes that. gcc/ChangeLog: PR target/110100 * config/aarch64/aarch64-builtins.cc (aarch64_expand_builtin_ls64): Use input operand for the destination address. * config/aarch64/aarch64.md (st64b): Fix constraint on address operand. gcc/testsuite/ChangeLog: PR target/110100 * gcc.target/aarch64/acle/pr110100.c: New test.
2023-06-07aarch64: Fix whitespace in ls64 builtin implementation [PR110100]Alex Coplan2-43/+43
The ls64 builtin code was using incorrect GNU style with eight spaces where there should be a tab. Fixed thusly. gcc/ChangeLog: PR target/110100 * config/aarch64/aarch64-builtins.cc (aarch64_init_ls64_builtins_types): Replace eight consecutive spaces with tabs. (aarch64_init_ls64_builtins): Likewise. (aarch64_expand_builtin_ls64): Likewise. * config/aarch64/aarch64.md (ld64b): Likewise. (st64b): Likewise. (st64bv): Likewise (st64bv0): Likewise.
2023-06-07RA: Constrain class of pic offset table pseudo to general regsVladimir N. Makarov2-0/+33
On some targets an integer pseudo can be assigned to a FP reg. For pic offset table pseudo it means we will reload the pseudo in this case and, as a consequence, memory containing the pseudo might be recognized as wrong one. The patch fix this problem. PR target/109541 gcc/ChangeLog: * ira-costs.cc: (find_costs_and_classes): Constrain classes of pic offset table pseudo to a general reg subset. gcc/testsuite/ChangeLog: * gcc.target/sparc/pr109541.c: New.
2023-06-07aarch64: Represent SQXTUN with RTL operationsKyrylo Tkachov3-14/+56
This patch removes UNSPEC_SQXTUN and uses organic RTL codes to represent the operation. SQXTUN is an odd one. It's described in the architecture as "Signed saturating extract Unsigned Narrow". It's not a straightforward ss_truncate nor a us_truncate. It is a sort of truncating signed clamp operation with limits derived from the unsigned extrema of the narrow mode: (truncate:N (smin:M (smax:M (reg:M) (const_int 0)) (const_int <unsigned-max-for-mode-N>))) This patch implements these semantics. I've checked that the vqmovun tests in advsimd-intrinsics.exp now get constant-folded and still pass validation, so I'm pretty confident in the semantics. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_sqmovun<mode><vczle><vczbe>): Rename to... (*aarch64_sqmovun<mode>_insn<vczle><vczbe>): ... This. Reimplement with RTL codes. (aarch64_sqmovun<mode> [SD_HSDI]): Reimplement with RTL codes. (aarch64_sqxtun2<mode>_le): Likewise. (aarch64_sqxtun2<mode>_be): Likewise. (aarch64_sqxtun2<mode>): Adjust for the above. (aarch64_sqmovun<mode>): New define_expand. * config/aarch64/iterators.md (UNSPEC_SQXTUN): Delete. (half_mask): New mode attribute. * config/aarch64/predicates.md (aarch64_simd_umax_half_mode): New predicate.
2023-06-07aarch64: Improve RTL representation of ADDP instructionsKyrylo Tkachov1-7/+63
Similar to the ADDLP instructions the non-widening ADDP ones can be represented by adding the odd lanes with the even lanes of a vector. These instructions take two vector inputs and the architecture spec describes the operation as concatenating them together before going through it with pairwise additions. This patch chooses to represent ADDP on 64-bit and 128-bit input vectors slightly differently, reasons explained in the comments in aarhc64-simd.md. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_addp<mode><vczle><vczbe>): Reimplement as... (aarch64_addp<mode>_insn): ... This... (aarch64_addp<mode><vczle><vczbe>_insn): ... And this. (aarch64_addp<mode>): New define_expand.
2023-06-07Fix expected test output on hppaJeff Law1-1/+1
Recent changes in the hoisting code change the optimized gimple for the shadd-3 testcase on the PA. That in turn changes the number of expected shadd instructions. I'm not entirely sure the test is actually testing what we want anymore since I don't see a CSE for postreload to discover. But I did verify that the number of shadd instructions is sane, so I just changed the count in the obvious way. gcc/testsuite * gcc.target/hppa/shadd-3.c: Update expected output.
2023-06-07modula2: Fix bootstrapJakub Jelinek1-0/+2
internal-fn.h since yesterday includes insn-opinit.h, which is a generated header. One of my bootstraps today failed because some m2 sources started compiling before insn-opinit.h has been generated. Normally, gcc/Makefile.in has # In order for parallel make to really start compiling the expensive # objects from $(OBJS) as early as possible, build all their # prerequisites strictly before all objects. $(ALL_HOST_OBJS) : | $(generated_files) rule which ensures that all the generated files are generated before any $(ALL_HOST_OBJS) objects start, but use order-only dependency for this because we don't want to rebuild most of the objects whenever one generated header is regenerated. After the initial build in an empty directory we'll have .deps/ files contain the detailed dependencies. $(ALL_HOST_OBJS) includes even some FE files, I think in the m2 case would be m2_OBJS, but m2/Make-lang.in doesn't define those. The following patch just adds a similar rule to m2/Make-lang.in. Another option would be to set m2_OBJS variable in m2/Make-lang.in to something, but not really sure to which exactly and why it isn't done. 2023-06-07 Jakub Jelinek <jakub@redhat.com> * Make-lang.in: Build $(generated_files) before building all $(GM2_C_OBJS).
2023-06-07RISC-V: Support RVV VLA SLP auto-vectorizationJuzhe-Zhong26-37/+1010
This patch enables basic VLA SLP auto-vectorization. Consider this following case: void f (uint8_t *restrict a, uint8_t *restrict b) { for (int i = 0; i < 100; ++i) { a[i * 8 + 0] = b[i * 8 + 7] + 1; a[i * 8 + 1] = b[i * 8 + 7] + 2; a[i * 8 + 2] = b[i * 8 + 7] + 8; a[i * 8 + 3] = b[i * 8 + 7] + 4; a[i * 8 + 4] = b[i * 8 + 7] + 5; a[i * 8 + 5] = b[i * 8 + 7] + 6; a[i * 8 + 6] = b[i * 8 + 7] + 7; a[i * 8 + 7] = b[i * 8 + 7] + 3; } } To enable VLA SLP auto-vectorization, we should be able to handle this following const vector: 1. NPATTERNS = 8, NELTS_PER_PATTERN = 3. { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... } 2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. { 1, 2, 8, 4, 5, 6, 7, 3, ... } And these vector can be generated at prologue. After this patch, we end up with this following codegen: Prologue: ... vsetvli a7,zero,e16,m2,ta,ma vid.v v4 vsrl.vi v4,v4,3 li a3,8 vmul.vx v4,v4,a3 ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... } ... li t1,67633152 addi t1,t1,513 li a3,50790400 addi a3,a3,1541 slli a3,a3,32 add a3,a3,t1 vsetvli t1,zero,e64,m1,ta,ma vmv.v.x v3,a3 ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... } ... LoopBody: ... min a3,... vsetvli zero,a3,e8,m1,ta,ma vle8.v v2,0(a6) vsetvli a7,zero,e8,m1,ta,ma vrgatherei16.vv v1,v2,v4 vadd.vv v1,v1,v3 vsetvli zero,a3,e8,m1,ta,ma vse8.v v1,0(a2) add a6,a6,a4 add a2,a2,a4 mv a3,a5 add a5,a5,t1 bgtu a3,a4,.L3 ... Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 since "vrgatherei16.vv" can cover larger range than "vrgather.vv" (which only can maximum element index = 255). Epilogue: lbu a5,799(a1) addiw a4,a5,1 sb a4,792(a0) addiw a4,a5,2 sb a4,793(a0) addiw a4,a5,8 sb a4,794(a0) addiw a4,a5,4 sb a4,795(a0) addiw a4,a5,5 sb a4,796(a0) addiw a4,a5,6 sb a4,797(a0) addiw a4,a5,7 sb a4,798(a0) addiw a5,a5,3 sb a5,799(a0) ret There is one more last thing we need to do is the "Epilogue auto-vectorization" which needs VLS modes support. I will support VLS modes for "Epilogue auto-vectorization" in the future. gcc/ChangeLog: * config/riscv/riscv-protos.h (expand_vec_perm_const): New function. * config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling. (rvv_builder::single_step_npatterns_p): New function. (rvv_builder::npatterns_all_equal_p): Ditto. (const_vec_all_in_range_p): Support POLY handling. (gen_const_vector_dup): Ditto. (emit_vlmax_gather_insn): Add vrgatherei16. (emit_vlmax_masked_gather_mu_insn): Ditto. (expand_const_vector): Add VLA SLP const vector support. (expand_vec_perm): Support POLY. (struct expand_vec_perm_d): New struct. (shuffle_generic_patterns): New function. (expand_vec_perm_const_1): Ditto. (expand_vec_perm_const): Ditto. * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto. (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA vectorizer. * gcc.target/riscv/rvv/autovec/v-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: New test.