aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-06-04Convert H8 port to LRAJeff Law3-34/+1
With Vlad's recent LRA fix to the elimination code, the H8 can be converted to LRA. This patch has two changes of note. First, this turns Zz into a standard constraint. This helps reloading for the H8/SX movqi pattern. Second, this drops the whole pattern for the SX bit memory operations. I can't see why those exist to begin with. They should be handled by the standard bit manipulation patterns. If someone wants to try and improve SX bit support, that'd be great and they can do so within the LRA framework :-) Pushed to the trunk... gcc/ * config/h8300/constraints.md (Zz): Make this a normal constraint. * config/h8300/h8300.cc (TARGET_LRA_P): Remove. * config/h8300/logical.md (H8/SX bit patterns): Remove.
2023-06-04xtensa: Optimize boolean evaluation or branching when EQ/NE to INT_MINTakayuki 'January June' Suwa1-0/+65
This patch optimizes both the boolean evaluation of and the branching of EQ/NE against INT_MIN (-2147483648), by taking advantage of the specifi- cation the ABS machine instruction on Xtensa returns INT_MIN iff INT_MIN, otherwise non-negative value. /* example */ int test0(int x) { return (x == -2147483648); } int test1(int x) { return (x != -2147483648); } extern void foo(void); void test2(int x) { if(x == -2147483648) foo(); } void test3(int x) { if(x != -2147483648) foo(); } ;; before test0: movi.n a9, -1 slli a9, a9, 31 add.n a2, a2, a9 nsau a2, a2 srli a2, a2, 5 ret.n test1: movi.n a9, -1 slli a9, a9, 31 add.n a9, a2, a9 movi.n a2, 1 moveqz a2, a9, a9 ret.n test2: movi.n a9, -1 slli a9, a9, 31 bne a2, a9, .L3 j.l foo, a9 .L3: ret.n test3: movi.n a9, -1 slli a9, a9, 31 beq a2, a9, .L5 j.l foo, a9 .L5: ret.n ;; after test0: abs a2, a2 extui a2, a2, 31, 1 ret.n test1: abs a2, a2 srai a2, a2, 31 addi.n a2, a2, 1 ret.n test2: abs a2, a2 bbci a2, 31, .L3 j.l foo, a9 .L3: ret.n test3: abs a2, a2 bbsi a2, 31, .L5 j.l foo, a9 .L5: ret.n gcc/ChangeLog: * config/xtensa/xtensa.md (*btrue_INT_MIN, *eqne_INT_MIN): New insn_and_split patterns.
2023-06-04RISC-V: Remove redundant vlmul_ext_* patterns to fix PR110109Juzhe-Zhong4-144/+529
This patch is to fix PR110109 issue. This issue happens is because: (define_insn_and_split "*vlmul_extx2<mode>" [(set (match_operand:<VLMULX2> 0 "register_operand" "=vr, ?&vr") (subreg:<VLMULX2> (match_operand:VLMULEXT2 1 "register_operand" " 0, vr") 0))] "TARGET_VECTOR" "#" "&& reload_completed" [(const_int 0)] { emit_insn (gen_rtx_SET (gen_lowpart (<MODE>mode, operands[0]), operands[1])); DONE; }) Such pattern generate such codes in insn-recog.cc: static int pattern57 (rtx x1) { rtx * const operands ATTRIBUTE_UNUSED = &recog_data.operand[0]; rtx x2; int res ATTRIBUTE_UNUSED; if (maybe_ne (SUBREG_BYTE (x1).to_constant (), 0)) return -1; ... PR110109 ICE at maybe_ne (SUBREG_BYTE (x1).to_constant (), 0) since for scalable RVV modes can not be accessed as SUBREG_BYTE (x1).to_constant () I create that patterns is to optimize the following test: vfloat32m2_t test_vlmul_ext_v_f32mf2_f32m2(vfloat32mf2_t op1) { return __riscv_vlmul_ext_v_f32mf2_f32m2(op1); } codegen: test_vlmul_ext_v_f32mf2_f32m2: vsetvli a5,zero,e32,m2,ta,ma vmv.v.i v2,0 vsetvli a5,zero,e32,mf2,ta,ma vle32.v v2,0(a1) vs2r.v v2,0(a0) ret There is a redundant 'vmv.v.i' here, Since GCC doesn't undefine IR (unlike LLVM, LLVM has undef/poison). For vlmul_ext_* RVV intrinsic, GCC will initiate all zeros into register. However, I think it's not a big issue after we support subreg livness tracking. PR target/110109 gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: Change expand approach. * config/riscv/vector.md (@vlmul_extx2<mode>): Remove it. (@vlmul_extx4<mode>): Ditto. (@vlmul_extx8<mode>): Ditto. (@vlmul_extx16<mode>): Ditto. (@vlmul_extx32<mode>): Ditto. (@vlmul_extx64<mode>): Ditto. (*vlmul_extx2<mode>): Ditto. (*vlmul_extx4<mode>): Ditto. (*vlmul_extx8<mode>): Ditto. (*vlmul_extx16<mode>): Ditto. (*vlmul_extx32<mode>): Ditto. (*vlmul_extx64<mode>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr110109-1.c: New test. * gcc.target/riscv/rvv/base/pr110109-2.c: New test.
2023-06-04RISC-V: Support RVV FP16 ZVFHMIN intrinsic APIPan Li4-1/+70
This patch support the 2 intrinsic API of FP16 ZVFHMIN extension. Aka SEW=16 for below instructions vfwcvt.f.f.v vfncvt.f.f.w Then users can leverage the instrinsic APIs to perform the conversion between RVV vector single float point and half float point. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-types.def (vfloat32mf2_t): Add vfloat32mf2_t type to vfncvt.f.f.w operations. (vfloat32m1_t): Likewise. (vfloat32m2_t): Likewise. (vfloat32m4_t): Likewise. (vfloat32m8_t): Likewise. * config/riscv/riscv-vector-builtins.def: Fix typo in comments. * config/riscv/vector-iterators.md: Add single to half machine mode conversion. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: New test.
2023-06-04RISC-V: Move optimization patterns into autovec-opt.mdJuzhe-Zhong3-91/+92
Move all optimization patterns into autovec-opt.md to make organization easier maintain. gcc/ChangeLog: * config/riscv/autovec-opt.md (*<optab>not<mode>): Move to autovec-opt.md. (*n<optab><mode>): Ditto. * config/riscv/autovec.md (*<optab>not<mode>): Ditto. (*n<optab><mode>): Ditto. * config/riscv/vector.md: Ditto.
2023-06-04PR target/110083: Fix-up REG_EQUAL notes on COMPARE in STV.Roger Sayle2-0/+59
This patch fixes PR target/110083, an ICE-on-valid regression exposed by my recent PTEST improvements (to address PR target/109973). The latent bug (admittedly mine) is that the scalar-to-vector (STV) pass doesn't update or delete REG_EQUAL notes attached to COMPARE instructions. As a result the operands of COMPARE would be mismatched, with the register transformed to V1TImode, but the immediate operand left as const_wide_int, which is valid for TImode but not V1TImode. This remained latent when the STV conversion converted the mode of the COMPARE to CCmode, with later passes recognizing the REG_EQUAL note is obviously invalid as the modes didn't match, but now that we (correctly) preserve the CCZmode on COMPARE, the mismatched operand modes trigger a sanity checking ICE downstream. Fixed by updating (or deleting) any REG_EQUAL notes in convert_compare. Before: (expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ]) (const_wide_int 0x80000000000000000000000000000000)) After: (expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ]) (const_vector:V1TI [ (const_wide_int 0x80000000000000000000000000000000) ])) 2023-06-04 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/110083 * config/i386/i386-features.cc (scalar_chain::convert_compare): Update or delete REG_EQUAL notes, converting CONST_INT and CONST_WIDE_INT immediate operands to a suitable CONST_VECTOR. gcc/testsuite/ChangeLog PR target/110083 * gcc.target/i386/pr110083.c: New test case.
2023-06-03c++: use __cxa_call_terminate for MUST_NOT_THROW [PR97720]Jason Merrill6-4/+53
[except.handle]/7 says that when we enter std::terminate due to a throw, that is considered an active handler. We already implemented that properly for the case of not finding a handler (__cxa_throw calls __cxa_begin_catch before std::terminate) and the case of finding a callsite with no landing pad (the personality function calls __cxa_call_terminate which calls __cxa_begin_catch), but for the case of a throw in a try/catch in a noexcept function, we were emitting a cleanup that calls std::terminate directly without ever calling __cxa_begin_catch to handle the exception. A straightforward way to fix this seems to be calling __cxa_call_terminate instead. However, that requires exporting it from libstdc++, which we have not previously done. Despite the name, it isn't actually part of the ABI standard. Nor is __cxa_call_unexpected, as far as I can tell, but that one is also used by clang. For this case they use __clang_call_terminate; it seems reasonable to me for us to stick with __cxa_call_terminate. I also change __cxa_call_terminate to take void* for simplicity in the front end (and consistency with __cxa_call_unexpected) but that isn't necessary if it's undesirable for some reason. This patch does not fix the issue that representing the noexcept as a cleanup is wrong, and confuses the handler search; since it looks like a cleanup in the EH tables, the unwinder keeps looking until it finds the catch in main(), which it should never have gotten to. Without the try/catch in main, the unwinder would reach the end of the stack and say no handler was found. The noexcept is a handler, and should be treated as one, as it is when the landing pad is omitted. The best fix for that issue seems to me to be to represent an ERT_MUST_NOT_THROW after an ERT_TRY in an action list as though it were an ERT_ALLOWED_EXCEPTIONS (since indeed it is an exception-specification). The actual code generation shouldn't need to change (apart from the change made by this patch), only the action table entry. PR c++/97720 gcc/cp/ChangeLog: * cp-tree.h (enum cp_tree_index): Add CPTI_CALL_TERMINATE_FN. (call_terminate_fn): New macro. * cp-gimplify.cc (gimplify_must_not_throw_expr): Use it. * except.cc (init_exception_processing): Set it. (cp_protect_cleanup_actions): Return it. gcc/ChangeLog: * tree-eh.cc (lower_resx): Pass the exception pointer to the failure_decl. * except.h: Tweak comment. libstdc++-v3/ChangeLog: * libsupc++/eh_call.cc (__cxa_call_terminate): Take void*. * config/abi/pre/gnu.ver: Add it. gcc/testsuite/ChangeLog: * g++.dg/eh/terminate2.C: New test.
2023-06-04reload_cse_move2add: Handle trivial single_set:sHans-Peter Nilsson1-29/+36
The reload_cse_move2add part of "postreload" handled only insns whose PATTERN was a SET. That excludes insns that e.g. clobber a flags register, which it does only for "simplicity". The patch extends the "simplicity" to most single_set insns. For a subset of those insns there's still an assumption; that the single_set of a PARALLEL insn is the first element in the PARALLEL. If the assumption fails, it's no biggie; the optimization just isn't performed. Don't let the name deceive you, this optimization doesn't hit often, but as often (or as rarely) for LRA as for reload at least on e.g. cris-elf where the biggest effect was seen in reducing repeated addresses in copies from fixed-address arrays, like in gcc.c-torture/compile/pr78694.c. * postreload.cc (move2add_use_add2_insn): Handle trivial single_sets. Rename variable PAT to SET. (move2add_use_add3_insn, reload_cse_move2add): Similar.
2023-06-04RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spillPan Li6-0/+257
This patch would like to allow the mov and spill operation for the RVV vfloat16*_t types. The involved machine mode includes VNx1HF, VNx2HF, VNx4HF, VNx8HF, VNx16HF, VNx32HF and VNx64HF. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-types.def (vfloat16mf4_t): Add the float16 type to DEF_RVV_F_OPS. (vfloat16mf2_t): Likewise. (vfloat16m1_t): Likewise. (vfloat16m2_t): Likewise. (vfloat16m4_t): Likewise. (vfloat16m8_t): Likewise. * config/riscv/riscv.md: Add vfloat16*_t to attr mode. * config/riscv/vector-iterators.md: Add vfloat16*_t machine mode to V, V_WHOLE, V_FRACT, VINDEX, VM, VEL and sew. * config/riscv/vector.md: Add vfloat16*_t machine mode to sew, vlmul and ratio. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/mov-14.c: New test. * gcc.target/riscv/rvv/base/spill-13.c: New test.
2023-06-04Daily bump.GCC Administrator5-1/+79
2023-06-03[RISC-V] fix cfi issue in save-restore.Fei Gao1-2/+2
This patch fixes a cfi issue introduced by https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=60524be1e3929d83e15fceac6e2aa053c8a6fb20 Test code: char my_getchar(); float getf(); int test_f0() { int s0 = my_getchar(); float f0 = getf(); int b = my_getchar(); return f0+s0+b; } cflags: -g -Os -march=rv32imafc -mabi=ilp32f -msave-restore -mcmodel=medlow before patch: test_f0: ... .cfi_startproc call t0,__riscv_save_1 .cfi_offset 8, -8 .cfi_offset 1, -4 .cfi_def_cfa_offset 16 ... addi sp,sp,-16 .cfi_def_cfa_offset 32 ... addi sp,sp,16 .cfi_def_cfa_offset 0 // issue here ... tail __riscv_restore_1 .cfi_restore 8 .cfi_restore 1 .cfi_def_cfa_offset -16 // issue here .cfi_endproc after patch: test_f0: ... .cfi_startproc call t0,__riscv_save_1 .cfi_offset 8, -8 .cfi_offset 1, -4 .cfi_def_cfa_offset 16 ... addi sp,sp,-16 .cfi_def_cfa_offset 32 ... addi sp,sp,16 .cfi_def_cfa_offset 16 // corrected here ... tail __riscv_restore_1 .cfi_restore 8 .cfi_restore 1 .cfi_def_cfa_offset 0 // corrected here .cfi_endproc gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_epilogue): fix cfi issue with correct offset.
2023-06-03Remove unnecessary md pattern for TARGET_XTHEADCONDMOVDie Li1-14/+1
There are 2 small changes in this patch, but they do not affect the result. 1. Remove unnecessary md pattern for TARGET_XTHEADCONDMOV in thead.md. The operands[4] in "if_then_else" are always comparison operations, so the generated rtl does not match the pattern that is expected to be deleted. 2. Change operands[4] from const0_rtx to operands[1] to maintain rtl consistency. Although when output assembly, only operands[4] CODE will affect the output result. Signed-off-by: Die Li <lidie@eswincomputing.com> gcc/ChangeLog: * config/riscv/thead.md (*th_cond_gpr_mov<GPR:mode><GPR2:mode>): Delete.
2023-06-03PR modula2/110003 Wrong source line listed for unused parametersGaius Mulley1-4/+5
Ensure that the parameter token position is recorded for both definition and implementation modules. The shadow variable is created inside BuildFormalParameterSection. The shadow variable needs to have the other definition or implementation module token position set when CheckFormalParameterSection is called. This allows the MetaError family of procedures to request the implementation module token position when reporting unused parameters. gcc/m2/ChangeLog: PR modula2/110003 * gm2-compiler/P2SymBuild.mod (GetParameterShadowVar): Import. (CheckFormalParameterSection): Call PutDeclared for the shadow variable associated with the parameter. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-06-03c++: is_specialization_of_friend confusion [PR109923]Patrick Palka2-0/+21
The check for a non-template member function of a class template in is_specialization_of_friend is overbroad, and accidentally holds for a non-template hidden friend too, which for the testcase below causes the predicate to bogusly return true for decl = void non_templ_friend(A<int>, A<void>) friend_decl = void non_templ_friend(A<void>, A<void>) This patch refines the check appropriately. PR c++/109923 gcc/cp/ChangeLog: * pt.cc (is_specialization_of_friend): Fix overbroad check for a non-template member function of a class template. gcc/testsuite/ChangeLog: * g++.dg/template/friend79.C: New test.
2023-06-03c++: simplify TEMPLATE_TEMPLATE_PARM hashingPatrick Palka1-12/+1
r10-7815-gaa576f2a860c82 added special hashing for TEMPLATE_TEMPLATE_PARM to work around non-lowered ttps having TYPE_CANONICAL set but lowered ttps did not. But ever since r13-737-gd0ef9e06197d14 this is no longer the case, and all ttps should now have TYPE_CANONICAL set. So this special hashing is now unnecessary and we can fall back to always using TYPE_CANONICAL. gcc/cp/ChangeLog: * pt.cc (iterative_hash_template_arg): Don't hash TEMPLATE_TEMPLATE_PARM specially.
2023-06-03c++: replace in_template_functionPatrick Palka7-26/+7
All uses of in_template_function except for the one in cp_make_fname_decl seem like they could be generalized to consider any template context. To that end this patch replaces the predicate with a generalized in_template_context predicate that returns true if we're inside any template context. If we legitimately need to consider only function contexts, as in cp_make_fname_decl, we can just additionally check e.g. current_function_decl. One concrete benefit of this, which the adjusted testcase below demonstrates, is that we no longer instantiate/odr-use entities based on uses within a non-function template. gcc/cp/ChangeLog: * class.cc (build_base_path): Check in_template_context instead of in_template_function. (resolves_to_fixed_type_p): Likewise. * cp-tree.h (in_template_context): Define. (in_template_function): Remove. * decl.cc (cp_make_fname_decl): Check current_function_decl and in_template_context instead of in_template_function. * decl2.cc (mark_used): Check in_template_context instead of in_template_function. * pt.cc (in_template_function): Remove. * semantics.cc (enforce_access): Check in_template_context instead of current_template_parms directly. gcc/testsuite/ChangeLog: * g++.dg/warn/Waddress-of-packed-member2.C: No longer expect a() to be marked as odr-used.
2023-06-03c++: mangle noexcept-expr [PR70790]Patrick Palka2-0/+19
This implements noexcept(expr) mangling and demangling as per the Itanium ABI. PR c++/70790 gcc/cp/ChangeLog: * mangle.cc (write_expression): Handle NOEXCEPT_EXPR. libiberty/ChangeLog: * cp-demangle.c (cplus_demangle_operators): Add the noexcept operator. (d_print_comp_inner) <case DEMANGLE_COMPONENT_UNARY>: Always print parens around the operand of noexcept too. * testsuite/demangle-expected: Test noexcept operator demangling. gcc/testsuite/ChangeLog: * g++.dg/abi/mangle78.C: New test.
2023-06-03RISC-V: Fix warning in predicated.mdJuzhe-Zhong1-1/+1
Notice there is warning in predicates.md: ../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ‘bool arith_operand_or_mode_mask(rtx, machine_mode)’: ../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] (match_test "INTVAL (op) == GET_MODE_MASK (HImode) ../../../riscv-gcc/gcc/config/riscv/predicates.md:34:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] || INTVAL (op) == GET_MODE_MASK (SImode)")))) gcc/ChangeLog: * config/riscv/predicates.md: Change INTVAL into UINTVAL.
2023-06-03RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction ↵Juzhe-Zhong6-1/+206
optimizations This patch is to enhance vwmul.vv combine optimizations. Consider this following code: void vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2, int16_t *__restrict dst3, int16_t *__restrict dst4, int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict a2, int8_t *__restrict b2, int n) { for (int i = 0; i < n; i++) { dst[i] = (int16_t) a[i] * (int16_t) b[i]; dst2[i] = (int16_t) a2[i] * (int16_t) b[i]; dst3[i] = (int16_t) a2[i] * (int16_t) a[i]; dst4[i] = (int16_t) a[i] * (int16_t) b2[i]; } } In such complicate case, the operand is not single used, used by multiple statements. GCC combine optimization will iterate the combination of the operands. Also, we add another pattern of vwmulsu.vv to enhance the vwmulsu.vv optimization. Currently, we have format: (mult: (sign_extend) (zero_extend)) in vector.md for intrinsics calling. Now, we add a new vwmulsu.ww with this format: (mult: (zero_extend) (sign_extend)) To handle this following cases (sign and unsigned widening multiplication mixing codes): void vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2, int16_t *__restrict dst3, int16_t *__restrict dst4, int8_t *__restrict a, uint8_t *__restrict b, uint8_t *__restrict a2, int8_t *__restrict b2, int n) { for (int i = 0; i < n; i++) { dst[i] = (int16_t) a[i] * (int16_t) b[i]; dst2[i] = (int16_t) a2[i] * (int16_t) b[i]; dst3[i] = (int16_t) a2[i] * (int16_t) a[i]; dst4[i] = (int16_t) a[i] * (int16_t) b2[i]; } } Before this patch: ... vsext.vf2 v6,v1 add t0,a0,t4 vzext.vf2 v4,v1 vmul.vv v2,v4,v6 add t0,a1,t4 vzext.vf2 v2,v1 vmul.vv v4,v2,v4 add t0,a2,t4 vmul.vv v2,v2,v6 add t0,a3,t4 sub t6,t6,t1 vsext.vf2 v2,v1 vmul.vv v2,v2,v6 ... After this patch: ... add t0,a0,t3 vwmulsu.vv v2,v1,v3 add t0,a1,t3 vwmulu.vv v4,v3,v2 add t0,a2,t3 vwmulsu.vv v3,v1,v2 add t0,a3,t3 sub t4,t4,t1 vwmul.vv v2,v1,v3 ... gcc/ChangeLog: * config/riscv/vector.md: Add vector-opt.md. * config/riscv/autovec-opt.md: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test.
2023-06-03Daily bump.GCC Administrator6-1/+343
2023-06-03Don't try bswap + rotate when TYPE_PRECISION(n->type) > n->range.liuhongt2-0/+80
For the testcase in the PR, we have br64 = br; br64 = ((br64 << 16) & 0x000000ff00000000ull) | (br64 & 0x0000ff00ull); n->n: 0x3000000200. n->range: 32. n->type: uint64. The original code assumes n->range is same as TYPE PRECISION(n->type), and tries to rotate the mask from 0x300000200 -> 0x20300 which is incorrect. The patch fixed this bug by not trying bswap + rotate when TYPE_PRECISION(n->type) is not equal to n->range. gcc/ChangeLog: PR tree-optimization/110067 * gimple-ssa-store-merging.cc (find_bswap_or_nop): Don't try bswap + rotate when TYPE_PRECISION(n->type) > n->range. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110067.c: New test.
2023-06-03i386: Add missing vector truncate patterns [PR92658].liuhongt2-0/+48
Add missing insn patterns for v2si -> v2hi/v2qi and v2hi-> v2qi vector truncate. gcc/ChangeLog: PR target/92658 * config/i386/mmx.md (truncv2hiv2qi2): New define_insn. (truncv2si<mode>2): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr92658-avx512bw-trunc-2.c: New test.
2023-06-02rtl-optimization: [PR102733] DSE removing address which only differ by ↵Andrew Pinski2-1/+29
address space. The problem here is DSE was not taking into account the address space which meant if you had two addresses say `fs:0` and `gs:0` (on x86_64), DSE would think they were the same and remove the first store. This fixes that issue by adding a check for the address space too. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR rtl-optimization/102733 gcc/ChangeLog: * dse.cc (store_info): Add addrspace field. (record_store): Record the address space and check to make sure they are the same. gcc/testsuite/ChangeLog: * gcc.target/i386/addr-space-6.c: New test.
2023-06-02Fix PR 110042: ifcvt regression due to paradoxical subregsAndrew Pinski2-5/+36
After r14-1014-gc5df248509b489364c573e8, GCC started to emit directly a zero_extract for `(t1&0x8)!=0`. This introduced a small regression where ifcvt would not do the ifconversion as there is now a paradoxical subreg in the dest which was being rejected. Since paradoxical subreg set the whole register, we can treat it as the same as a reg in the two places. OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu. gcc/ChangeLog: PR rtl-optimization/110042 * ifcvt.cc (bbs_ok_for_cmove_arith): Allow paradoxical subregs. (bb_valid_for_noce_process_p): Strip the subreg for the SET_DEST. gcc/testsuite/ChangeLog: PR rtl-optimization/110042 * gcc.target/aarch64/csel_bfx_2.c: New test.
2023-06-02Darwin, PPC: Fix struct layout with pragma pack [PR110044].Iain Sandoe5-1/+108
This bug was essentially that darwin_rs6000_special_round_type_align() was ignoring externally-imposed capping of field alignment. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> PR target/110044 gcc/ChangeLog: * config/rs6000/rs6000.cc (darwin_rs6000_special_round_type_align): Make sure that we do not have a cap on field alignment before altering the struct layout based on the type alignment of the first entry. gcc/testsuite/ChangeLog: * gcc.target/powerpc/darwin-abi-13-0.c: New test. * gcc.target/powerpc/darwin-abi-13-1.c: New test. * gcc.target/powerpc/darwin-abi-13-2.c: New test. * gcc.target/powerpc/darwin-structs-0.h: New test.
2023-06-02Fortran: fix diagnostics for SELECT RANK [PR100607]Steve Kargl2-6/+52
gcc/fortran/ChangeLog: PR fortran/100607 * resolve.cc (resolve_select_rank): Remove duplicate error. (resolve_fl_var_and_proc): Prevent NULL pointer dereference and suppress error message for temporary. gcc/testsuite/ChangeLog: PR fortran/100607 * gfortran.dg/select_rank_6.f90: New test.
2023-06-02btf: fix bootstrap -Wformat errors [PR110073]David Faust1-4/+11
Commit 7aae58b04b9 "btf: improve -dA comments for testsuite" broke bootstrap on a number of architectures because it introduced some new -Wformat errors. Fix those errors by properly using PRIu64 and a small refactor to the offending code. Based on the suggested patch from Rainer Orth. PR debug/110073 gcc/ChangeLog: * btfout.cc (btf_absolute_func_id): New function. (btf_asm_func_type): Call it here. Change index parameter from size_t to ctf_id_t. Use PRIu64 formatter.
2023-06-02btf: Fix -Wformat errorsAlex Coplan1-2/+2
g:7aae58b04b92303ccda3ead600be98f0d4b7f462 introduced -Wformat errors breaking bootstrap on some targets. This patch fixes that. Committed as obvious. gcc/ChangeLog: * btfout.cc (btf_asm_type): Use PRIu64 instead of %lu for uint64_t. (btf_asm_datasec_type): Likewise.
2023-06-02c++: fix explicit/copy problem [PR109247]Jason Merrill2-0/+46
In the testcase, the user wants the assignment to use the operator= declared in the class, but because [over.match.list] says that explicit constructors are also considered for list-initialization, as affirmed in CWG1228, we end up choosing the implicitly-declared copy assignment operator, using the explicit constructor template for the argument, which is ill-formed. Other implementations haven't implemented CWG1228, so we keep getting bug reports. Discussion in CWG led to the idea for this targeted relaxation: if we use an explicit constructor for the conversion to the argument of a copy or move special member function, that makes the candidate worse than another. DR 2735 PR c++/109247 gcc/cp/ChangeLog: * call.cc (sfk_copy_or_move): New. (joust): Add tiebreaker for explicit conv and copy ctor. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/initlist-explicit3.C: New test.
2023-06-02rs6000: Fix arguments for __builtin_altivec_tr_stxvrwx, ↵Carl Love2-2/+106
__builtin_altivec_tr_stxvrhx The third argument for __builtin_altivec_tr_stxvrhx should be short * not int *. Similarly, the third argument for __builtin_altivec_tr_stxvrwx should be int * not short *. This patch fixes the arguments in the two builtins. A runnable test case is added to test the __builtin_altivec_tr_stxvrbx, __builtin_altivec_tr_stxvrhx, __builtin_altivec_tr_stxvrwx and __builtin_altivec_tr_stxvrdx builtins. gcc/ * config/rs6000/rs6000-builtins.def (__builtin_altivec_tr_stxvrhx, __builtin_altivec_tr_stxvrwx): Fix type of third argument. gcc/testsuite/ * gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c: New test for __builtin_altivec_tr_stxvrbx, __builtin_altivec_tr_stxvrhx, __builtin_altivec_tr_stxvrwx, __builtin_altivec_tr_stxvrdx.
2023-06-02c++: make initializer_list array static again [PR110070]Jason Merrill14-8/+104
After the maybe_init_list_as_* patches, I noticed that we were putting the array of strings into .rodata, but then memcpying it into an automatic array, which is pointless; we should be able to use it directly. This doesn't happen automatically because TREE_ADDRESSABLE is set (since r12-657 for PR100464), and so gimplify_init_constructor won't promote the variable to static. Theoretically we could do escape analysis to recognize that the address, though taken, never leaves the function; that would allow promotion when we're only using the address for indexing within the function, as in initlist-opt2.C. But this would be a new pass. And in initlist-opt1.C, we're passing the array address to another function, so it definitely escapes; it's only safe in this case because it's calling a standard library function that we know only uses it for indexing. So, a flag seems needed. I first thought to put the flag on the TARGET_EXPR, but the VAR_DECL seems more appropriate. In a previous revision of the patch I called this flag DECL_NOT_OBSERVABLE, but I think DECL_MERGEABLE is a better name, especially if we're going to apply it to the backing array of initializer_list, which is observable. I then also check it in places that check for -fmerge-all-constants, so that multiple equivalent initializer-lists can also be combined. And then it seemed to make sense for [[no_unique_address]] to have this meaning for user-written variables. I think the note in [dcl.init.list]/6 intended to allow this kind of merging for initializer_lists, but it didn't actually work; for an explicit array with the same initializer, if the address escapes the program could tell whether the same variable in two frames have the same address. P2752 is trying to correct this defect, so I'm going to assume that this is the intent. PR c++/110070 PR c++/105838 gcc/ChangeLog: * tree.h (DECL_MERGEABLE): New. * tree-core.h (struct tree_decl_common): Mention it. * gimplify.cc (gimplify_init_constructor): Check it. * cgraph.cc (symtab_node::address_can_be_compared_p): Likewise. * varasm.cc (categorize_decl_for_section): Likewise. gcc/cp/ChangeLog: * call.cc (maybe_init_list_as_array): Set DECL_MERGEABLE. (convert_like_internal) [ck_list]: Set it. (set_up_extended_ref_temp): Copy it. * tree.cc (handle_no_unique_addr_attribute): Set it. gcc/testsuite/ChangeLog: * g++.dg/tree-ssa/initlist-opt1.C: Check for static array. * g++.dg/tree-ssa/initlist-opt2.C: Likewise. * g++.dg/tree-ssa/initlist-opt4.C: New test. * g++.dg/opt/icf1.C: New test. * g++.dg/opt/icf2.C: New test. * g++.dg/opt/icf3.C: New test. * g++.dg/tree-ssa/array-temp1.C: Revert r12-657 change.
2023-06-02reg-stack: Change return type of predicate functions from int to boolUros Bizjak2-75/+86
Also change some internal variables to bool and recode handling of boolean varialbes to not use bitwise or. gcc/ChangeLog: * rtl.h (stack_regs_mentioned): Change return type from int to bool. * reg-stack.cc (struct_block_info_def): Change "done" to bool. (stack_regs_mentioned_p): Change return type from int to bool and adjust function body accordingly. (stack_regs_mentioned): Ditto. (check_asm_stack_operands): Ditto. Change "malformed_asm" variable to bool. (move_for_stack_reg): Recode handling of control_flow_insn_deleted. (swap_rtx_condition_1): Change return type from int to bool and adjust function body accordingly. Change "r" variable to bool. (swap_rtx_condition): Change return type from int to bool and adjust function body accordingly. (subst_stack_regs_pat): Recode handling of control_flow_insn_deleted. (subst_stack_regs): Ditto. (convert_regs_entry): Change return type from int to bool and adjust function body accordingly. Change "inserted" variable to bool. (convert_regs_1): Recode handling of control_flow_insn_deleted. (convert_regs_2): Recode handling of cfg_altered. (convert_regs): Ditto. Change "inserted" variable to bool.
2023-06-02varasm: check float sizeJason Merrill1-5/+6
In PR95226, the testcase was failing because we tried to output_constant a NOP_EXPR to float from a double REAL_CST, and so we output a double where the caller wanted a float. That doesn't happen anymore, but with the output_constant hunk we will ICE in that situation rather than emit the wrong number of bytes. Part of the problem was that initializer_constant_valid_p_1 returned true for that NOP_EXPR, because it compared the sizes of integer types but not floating-point types. So the C++ front end assumed it didn't need to fold the initializer. PR c++/95226 gcc/ChangeLog: * varasm.cc (output_constant) [REAL_TYPE]: Check that sizes match. (initializer_constant_valid_p_1): Compare float precision.
2023-06-02analyzer: implement various atomic builtins [PR109015]David Malcolm5-0/+983
This patch implements many of the __atomic_* builtins from sync-builtins.def as known_function subclasses within the analyzer. gcc/analyzer/ChangeLog: PR analyzer/109015 * kf.cc (class kf_atomic_exchange): New. (class kf_atomic_exchange_n): New. (class kf_atomic_fetch_op): New. (class kf_atomic_op_fetch): New. (class kf_atomic_load): New. (class kf_atomic_load_n): New. (class kf_atomic_store_n): New. (register_atomic_builtins): New function. (register_known_functions): Call register_atomic_builtins. gcc/testsuite/ChangeLog: PR analyzer/109015 * gcc.dg/analyzer/atomic-builtins-1.c: New test. * gcc.dg/analyzer/atomic-builtins-haproxy-proxy.c: New test. * gcc.dg/analyzer/atomic-builtins-qemu-sockets.c: New test. * gcc.dg/analyzer/atomic-types-1.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-06-02analyzer: regions in different memory spaces can't aliasDavid Malcolm1-0/+12
gcc/analyzer/ChangeLog: * store.cc (store::eval_alias_1): Regions in different memory spaces can't alias. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-06-02testsuite: Require LTO for pr107557-[12].cDavid Edelsohn2-0/+2
pr107557-[12].c invoke -flto option but do not check that the target support LTO. This patch adds dg-require lto to the testcases. * gcc.dg/pr107557-1.c: Require LTO support. * gcc.dg/pr107557-2.c: Require LTO support. Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
2023-06-02doc: clarify semantics of vector bitwise shiftsAlexander Monakov1-1/+8
Explicitly say that attempted shift past element bit width is UB for vector types. Mention that integer promotions do not happen. gcc/ChangeLog: * doc/extend.texi (Vector Extensions): Clarify bitwise shift semantics.
2023-06-02VECT: Change flow of decrement IVJu-Zhe Zhong1-11/+25
Follow Richi's suggestion, I change current decrement IV flow from: do { remain -= MIN (vf, remain); } while (remain != 0); into: do { old_remain = remain; len = MIN (vf, remain); remain -= vf; } while (old_remain >= vf); to enhance SCEV. Include fixes from kewen. This patch will need to wait for Kewen's test feedback. Testing on X86 is on-going Co-Authored by: Kewen Lin <linkw@linux.ibm.com> PR tree-optimization/109971 gcc/ChangeLog: * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change decrement IV flow. (vect_set_loop_condition_partial_vectors): Ditto.
2023-06-02target/110088: Improve operation of l-reg with const after move from d-reg.Georg-Johann Lay1-1/+40
After reload, there may be sequences like lreg = dreg lreg = lreg <op> const with an LD_REGS dreg, non-LD_REGS lreg, and <op> in PLUS, IOR, AND. If dreg dies after the first insn, it is possible to use dreg = dreg <op> const lreg = dreg instead which is more efficient. gcc/ PR target/110088 * config/avr/avr.md: Add an RTL peephole to optimize operations on non-LD_REGS after a move from LD_REGS. (piaop): New code iterator.
2023-06-02Support parallel testing in libgomp: fallback Perl 'flock' [PR66005]Thomas Schwinge1-0/+3
Follow-up to commit 6c3b30ef9e0578509bdaf59c13da4a212fe6c2ba "Support parallel testing in libgomp, part II [PR66005]" ("..., and enable if 'flock' is available for serializing execution testing"), where we saw: > On my Dell Precision 7530 laptop: > > $ uname -srvi > Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 > $ grep '^model name' < /proc/cpuinfo | uniq -c > 12 model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz > $ nvidia-smi -L > GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea) > > ... [...]: case (c) standard configuration, no offloading > configured, [...] > $ \time make check-target-libgomp > > Case (c), baseline; [...]: > > 1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata 505148maxresident)k > 1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata 505212maxresident)k > > Case (c), parallelized [using 'flock']: > > [...] > -j12 GCC_TEST_PARALLEL_SLOTS=12 > 2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata 505216maxresident)k > 2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata 505212maxresident)k Quite the same when instead of 'flock' using this fallback Perl 'flock': 2565.23user 194.35system 4:46.77elapsed 962%CPU (0avgtext+0avgdata 505216maxresident)k 2549.38user 200.20system 4:46.08elapsed 961%CPU (0avgtext+0avgdata 505216maxresident)k PR testsuite/66005 gcc/ * doc/install.texi: Document (optional) Perl usage for parallel testing of libgomp. libgomp/ * testsuite/lib/libgomp.exp: 'flock' through stdout. * testsuite/flock: New. * configure.ac (FLOCK): Point to that if no 'flock' available, but 'perl' is. * configure: Regenerate.
2023-06-02Back to requiring "Perl version 5.6.1 (or later)" [PR82856]Thomas Schwinge1-1/+1
With Subversion r265695 (Git commit 22e052725189a472e4e86ebb6595278a49f4bcdd) "Update GCC to autoconf 2.69, automake 1.15.1 (PR bootstrap/82856)" we're back to normal; per Automake 1.15.1 'configure.ac' still "[...] perl 5.6 or better is required [...]". PR bootstrap/82856 gcc/ * doc/install.texi (Perl): Back to requiring "Perl version 5.6.1 (or later)".
2023-06-02Fortran: Fix some problems blocking associate meta-bug [PR87477]Paul Thomas9-29/+310
2023-06-02 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/87477 * parse.cc (parse_associate): Replace the existing evaluation of the target rank with calls to gfc_resolve_ref and gfc_expression_rank. Identify untyped target function results with structure constructors by finding the appropriate derived type. * resolve.cc (resolve_symbol): Allow associate variables to be assumed shape. gcc/testsuite/ PR fortran/87477 * gfortran.dg/associate_54.f90 : Cope with extra error. PR fortran/102109 * gfortran.dg/pr102109.f90 : New test. PR fortran/102112 * gfortran.dg/pr102112.f90 : New test. PR fortran/102190 * gfortran.dg/pr102190.f90 : New test. PR fortran/102532 * gfortran.dg/pr102532.f90 : New test. PR fortran/109948 * gfortran.dg/pr109948.f90 : New test. PR fortran/99326 * gfortran.dg/pr99326.f90 : New test.
2023-06-02RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vidJuzhe-Zhong2-9/+13
Base on these: https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232 https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233 Add _mu C++ overloaded intrinsics for load && viota && vid. Co-authored-by: KuanLin Chen <best124612@gmail.com> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: Add _mu overloaded intrinsics. * config/riscv/riscv-vector-builtins-shapes.cc (struct fault_load_def): Ditto.
2023-06-02RISC-V: Optimize reverse series index vectorJuzhe-Zhong2-0/+19
This patch optimizes the following seriese vector: [nunits - 1, nunits - 2, ...., 0] Before this patch: vid vmul vadd After this patch: vid vrsub This patch is an obvious and simple optimization, ok for trunk? gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vec_series): Optimize reverse series index vector. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Add assembly check.
2023-06-02RISC-V: Fix warning in predicated.mdJuzhe-Zhong1-1/+1
Notice there is warning in predicates.md: ../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ‘bool arith_operand_or_mode_mask(rtx, machine_mode)’: ../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] (match_test "INTVAL (op) == GET_MODE_MASK (HImode) ../../../riscv-gcc/gcc/config/riscv/predicates.md:34:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] || INTVAL (op) == GET_MODE_MASK (SImode)")))) gcc/ChangeLog: * config/riscv/predicates.md: Change INTVAL into UINTVAL.
2023-06-02RISC-V: Add test for vfloat16*_t (non tuple) typesPan Li2-0/+12
This patch would like to add some test cases of vfloat16*_t (non tuple), no 'zvfh' or 'zvfhmin' will meet unknown type. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/abi-16.c: Add test cases. * gcc.target/riscv/rvv/base/user-7.c: Likewise.
2023-06-02RISC-V: Add __RISCV_ prefix to VXRM and FRM enumJuzhe-Zhong10-31/+31
According to doc: https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222/files https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226 Add __RISCV_ prefix to VXRM and FRM enum. gcc/ChangeLog: * config/riscv/riscv-vector-builtins.cc (DEF_RVV_VXRM_ENUM): Add __RISCV_ prefix. (DEF_RVV_FRM_ENUM): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/frm-1.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-1.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-10.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-11.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-12.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-6.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-7.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-8.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-9.c: Ditto.
2023-06-02RISC-V: Add vwadd.wv/vwsub.wv auto-vectorization lowering optimizationJuzhe-Zhong8-6/+215
1. This patch optimize the codegen of the following auto-vectorization codes: void foo (int32_t * __restrict a, int64_t * __restrict b, int64_t * __restrict c, int n) { for (int i = 0; i < n; i++) c[i] = (int64_t)a[i] + b[i]; } Combine instruction from: ... vsext.vf2 vadd.vv ... into: ... vwadd.wv ... Since for PLUS operation, GCC prefer the following RTL operand order when combining: (plus: (sign_extend:..) (reg:) instead of (plus: (reg:..) (sign_extend:) which is different from MINUS pattern. I split patterns of vwadd/vwsub, and add dedicated patterns for them. 2. This patch not only optimize the case as above (1) mentioned, also enhance vwadd.vv/vwsub.vv optimization for complicate PLUS/MINUS codes, consider this following codes: __attribute__ ((noipa)) void vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2, int16_t *__restrict dst3, int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict a2, int8_t *__restrict b2, int n) { for (int i = 0; i < n; i++) { dst[i] = (int16_t) a[i] + (int16_t) b[i]; dst2[i] = (int16_t) a2[i] + (int16_t) b[i]; dst3[i] = (int16_t) a2[i] + (int16_t) a[i]; } } Before this patch: ... vsetvli zero,a6,e8,mf2,ta,ma vle8.v v2,0(a3) vle8.v v1,0(a4) vsetvli t1,zero,e16,m1,ta,ma vsext.vf2 v3,v2 vsext.vf2 v2,v1 vadd.vv v1,v2,v3 vsetvli zero,a6,e16,m1,ta,ma vse16.v v1,0(a0) vle8.v v4,0(a5) vsetvli t1,zero,e16,m1,ta,ma vsext.vf2 v1,v4 vadd.vv v2,v1,v2 ... After this patch: ... vsetvli zero,a6,e8,mf2,ta,ma vle8.v v3,0(a4) vle8.v v1,0(a3) vsetvli t4,zero,e8,mf2,ta,ma vwadd.vv v2,v1,v3 vsetvli zero,a6,e16,m1,ta,ma vse16.v v2,0(a0) vle8.v v2,0(a5) vsetvli t4,zero,e8,mf2,ta,ma vwadd.vv v4,v3,v2 vsetvli zero,a6,e16,m1,ta,ma vse16.v v4,0(a1) vsetvli t4,zero,e8,mf2,ta,ma sub a7,a7,a6 vwadd.vv v3,v2,v1 vsetvli zero,a6,e16,m1,ta,ma vse16.v v3,0(a2) ... The reason why current upstream GCC can not optimize codes using vwadd thoroughly is combine PASS needs intermediate RTL IR (extend one of the operand pattern (vwadd.wv)), then base on this intermediate RTL IR, extend the other operand to generate vwadd.vv. So vwadd.wv/vwsub.wv definitely helps to vwadd.vv/vwsub.vv code optimizations. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: Change vwadd.wv/vwsub.wv intrinsic API expander * config/riscv/vector.md (@pred_single_widen_<plus_minus:optab><any_extend:su><mode>): Remove it. (@pred_single_widen_sub<any_extend:su><mode>): New pattern. (@pred_single_widen_add<any_extend:su><mode>): New pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/widen-5.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-6.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: New test.
2023-06-02RISC-V: Support RVV permutation auto-vectorizationJuzhe-Zhong19-0/+1217
This patch supports vector permutation for VLS only by vec_perm pattern. We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation in the future. Fixed following comments from Robin. gcc/ChangeLog: * config/riscv/autovec.md (vec_perm<mode>): New pattern. * config/riscv/predicates.md (vector_perm_operand): New predicate. * config/riscv/riscv-protos.h (enum insn_type): New enum. (expand_vec_perm): New function. * config/riscv/riscv-v.cc (const_vec_all_in_range_p): Ditto. (gen_const_vector_dup): Ditto. (emit_vlmax_gather_insn): Ditto. (emit_vlmax_masked_gather_mu_insn): Ditto. (expand_vec_perm): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: New test.
2023-06-02Daily bump.GCC Administrator5-1/+145