aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-06-26Merge from trunk revision 3a39a31b8ae9c6465434aefa657f7fcc86f905c0.devel/gccgoIan Lance Taylor356-1457/+18020
2023-06-26compiler: support -fgo-importcfgIan Lance Taylor10-7/+182
* lang.opt (fgo-importcfg): New option. * go-c.h (struct go_create_gogo_args): Add importcfg field. * go-lang.cc (go_importcfg): New static variable. (go_langhook_init): Set args.importcfg. (go_langhook_handle_option): Handle -fgo-importcfg. * gccgo.texi (Invoking gccgo): Document -fgo-importcfg. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/506095
2023-06-26aarch64: Use <DWI> instead of <V2XWIDE> in scalar SQRSHRUN patternKyrylo Tkachov1-10/+10
In the scalar pattern for SQRSHRUN it's a bit clearer to use DWI instead of V2XWIDE to make it more clear that no vector modes are involved. No behavioural change intended. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_sqrshrun_n<mode>_insn): Use <DWI> instead of <V2XWIDE>. (aarch64_sqrshrun_n<mode>): Likewise.
2023-06-26aarch64: Clean up some rounding immediate predicatesKyrylo Tkachov4-24/+20
aarch64_simd_rsra_rnd_imm_vec is now used for more than just RSRA and accepts more than just vectors so rename it to make it more truthful. The aarch64_simd_rshrn_imm_vec is now unused and can be deleted. No behavioural change intended. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_const_vec_rsra_rnd_imm_p): Rename to... (aarch64_rnd_imm_p): ... This. * config/aarch64/predicates.md (aarch64_simd_rsra_rnd_imm_vec): Rename to... (aarch64_int_rnd_operand): ... This. (aarch64_simd_rshrn_imm_vec): Delete. * config/aarch64/aarch64-simd.md (aarch64_<sra_op>rsra_n<mode>_insn): Adjust for the above. (aarch64_<sra_op>rshr_n<mode><vczle><vczbe>_insn): Likewise. (*aarch64_<shrn_op>rshrn_n<mode>_insn): Likewise. (*aarch64_sqrshrun_n<mode>_insn<vczle><vczbe>): Likewise. (aarch64_sqrshrun_n<mode>_insn): Likewise. (aarch64_<shrn_op>rshrn2_n<mode>_insn_le): Likewise. (aarch64_<shrn_op>rshrn2_n<mode>_insn_be): Likewise. (aarch64_sqrshrun2_n<mode>_insn_le): Likewise. (aarch64_sqrshrun2_n<mode>_insn_be): Likewise. * config/aarch64/aarch64.cc (aarch64_const_vec_rsra_rnd_imm_p): Rename to... (aarch64_rnd_imm_p): ... This.
2023-06-26libstdc++: Fix std::format for pointers [PR110239]Jonathan Wakely2-23/+15
The formatter for pointers was casting to uint64_t which sign extends a 32-bit pointer and produces a value that won't fit in the provided buffer. Cast to uintptr_t instead. There was also a bug in the __parse_integer helper when converting a wide string to a narrow string in order to use std::from_chars on it. The function would always try to read 32 characters, even if the format string was shorter than that. Fix that bug, and remove the constexpr implementation of __parse_integer by just using __from_chars_alnum instead of from_chars, because that's usable in constexpr even in C++20. libstdc++-v3/ChangeLog: PR libstdc++/110239 * include/std/format (__format::__parse_integer): Fix buffer overflow for wide chars. (formatter<const void*, C>::format): Cast to uintptr_t instead of uint64_t. * testsuite/std/format/string.cc: Test too-large widths.
2023-06-26libstdc++: Implement P2538R1 ADL-proof std::projectedJonathan Wakely2-10/+67
This was recently approved for C++26, but there's no harm in implementing it unconditionally for C++20 and C++23. As it says in the paper, it doesn't change the meaning of any valid code. It only enables things that were previously ill-formed for questionable reasons. libstdc++-v3/ChangeLog: * include/bits/iterator_concepts.h (projected): Replace class template with alias template denoting an ADL-proofed helper. (incremental_traits<projected<Iter, Proj>>): Remove. * testsuite/24_iterators/indirect_callable/projected-adl.cc: New test.
2023-06-26libstdc++: Qualify calls to debug mode helpersJonathan Wakely1-11/+21
These functions should be qualified to disable unwanted ADL. The overload of __check_singular_aux for safe iterators was previously being found by ADL, because it wasn't declared before __check_singular. Add a declaration so that it can be found by qualified lookup. libstdc++-v3/ChangeLog: * include/debug/helper_functions.h (__get_distance) (__check_singular, __valid_range_aux, __valid_range): Qualify calls to disable ADL. (__check_singular_aux(const _Safe_iterator_base*)): Declare overload that was previously found via ADL.
2023-06-26IBM zSystems: Assume symbols without explicit alignment to be okAndreas Krebbel2-2/+36
A change we have committed back in 2015 relies on the backend requested ABI alignment to be applied to ALL symbols by the middle-end. However, this does not appear to be the case for external symbols. With this commit we assume all symbols without explicit alignment to be aligned according to the ABI. That's the behavior we had before. This fixes a performance regression caused by the 2015 patch. Since then the address of external char type symbols have been pushed to the literal pool, although it is safe to access them with larl (which requires symbols to reside at even addresses). gcc/ * config/s390/s390.cc (s390_encode_section_info): Set SYMBOL_FLAG_SET_NOTALIGN2 only if the symbol has explicitely been misaligned. gcc/testsuite/ * gcc.target/s390/larl-1.c: New test.
2023-06-26Fix profile of forwarders produced by cd-dceJan Hubicka1-0/+3
compiling the testcase from PR109849 (which uses std:vector based stack to drive a loop) with profile feedbakc leads to profile mismatches introduced by tree-ssa-dce. This is the new code to produce unified forwarder blocks for PHIs. I am not including the testcase itself since checking it for Invalid sum is probably going to be too fragile and this should show in our LNT testers. The patch however fixes the mismatch. Bootstrapped/regtested x86_64-linux and plan to commit it shortly. gcc/ChangeLog: PR tree-optimization/109849 * tree-ssa-dce.cc (make_forwarders_with_degenerate_phis): Fix profile count of newly constructed forwarder block.
2023-06-26docs: Fix typoAndrew Carlotti1-1/+1
gcc/ChangeLog: * doc/optinfo.texi: Fix "steam" -> "stream".
2023-06-26DSE: Add LEN_MASK_STORE analysis into DSE and fix LEN_STOREJu-Zhe Zhong1-16/+31
Hi, Richi. This patch is adding LEN_MASK_STORE into DSE. My understanding is LEN_MASK_STORE is predicated by mask and len. No matter len is constant or not, the ao_ref should be the same as MASK_STORE. Wheras for LEN_STORE, when len is constant, we use (len - bias), otherwise, it's the same as MASK_STORE/LEN_MASK_STORE. Not sure whether I am on the same page with you, feel free to correct me. Thanks. gcc/ChangeLog: * tree-ssa-dse.cc (initialize_ao_ref_for_dse): Add LEN_MASK_STORE and fix LEN_STORE. (dse_optimize_stmt): Add LEN_MASK_STORE.
2023-06-26GIMPLE_FOLD: Fix gimple fold for LEN_{MASK}_{LOAD,STORE}Ju-Zhe Zhong2-2/+47
Hi, previous I made a mistake on GIMPLE_FOLD of LEN_MASK_{LOAD,STORE}. We should fold LEN_MASK_{LOAD,STORE} (bias+len) == vf (nunits instead of bytesize) && mask = all trues mask into: MEM_REF [...]. This patch added testcase to test gimple fold of LEN_MASK_{LOAD,STORE}. Also, I fix LEN_LOAD/LEN_STORE, to make them have the same behavior. Ok for trunk ? gcc/ChangeLog: * gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Fix gimple fold of LOAD/STORE with length. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/gimple_fold-1.c: New test.
2023-06-26Avoid redundant GORI calcuations.Andrew MacLeod1-4/+17
When GORI evaluates a statement, if operand 1 and 2 are both in the dependency chain, GORI evaluates the name through both operands sequentially and combines the results. If either operand is in the dependency chain of the other, this evaluation will do the same work twice, for questionable gain. Instead, simple evaluate only the operand which depends on the other and keep the evaluation linear in time. * gimple-range-gori.cc (compute_operand1_and_operand2_range): Check for interdependence between operands 1 and 2.
2023-06-26vect: Cost intermediate conversionsRichard Sandiford1-2/+3
g:6f19cf7526168f8 extended N-vector to N-vector conversions to handle cases where an intermediate integer extension or truncation is needed. This patch adjusts the cost to account for these intermediate conversions. gcc/ * tree-vect-stmts.cc (vectorizable_conversion): Take multi_step_cvt into account when costing non-widening/truncating conversions.
2023-06-26tree-optimization/110381 - preserve SLP permutation with in-order reductionsRichard Biener2-2/+56
The following fixes a bug that manifests itself during fold-left reduction transform in picking not the last scalar def to replace and thus double-counting some elements. But the underlying issue is that we merge a load permutation into the in-order reduction which is of course wrong. Now, reduction analysis has not yet been performend when optimizing permutations so we have to resort to check that ourselves. PR tree-optimization/110381 * tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts): Materialize permutes before fold-left reductions. * gcc.dg/vect/pr110381.c: New testcase.
2023-06-26RISC-V: Remove duplicated extern function_base declPan Li1-5/+0
Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.h: Remove duplicated decl.
2023-06-26narrowing initializers and initializer_constant_valid_p_1Richard Biener1-0/+2
initializer_constant_valid_p_1 attempts to handle narrowing differences and sums but fails to handle when the overall value looks like VIEW_CONVERT_EXPR<long long int>(NON_LVALUE_EXPR <v> - VEC_COND_EXPR < { 0, 0 } == { 0, 0 } , { -1, -1 } , { 0, 0 } > ) where endtype is scalar integer but value is a vector type. In this particular case all is good and we recurse since two vector lanes is more than 64bits of long long. But still it compares apples and oranges. Fixed by appropriately also requiring the type of the value to be scalar integral. * varasm.cc (initializer_constant_valid_p_1): Also constrain the type of value to be scalar integral before dispatching to narrowing_initializer_constant_valid_p.
2023-06-26Avoid shorten_binary_op on VECTOR_TYPERichard Biener1-0/+4
When we disallow TYPE_PRECISION on VECTOR_TYPEs it shows that shorten_binary_op performs some checks on that that are likely harmless in the end. The following bails out early for VECTOR_TYPE operations to avoid those questionable checks. gcc/c-family/ * c-common.cc (shorten_binary_op): Exit early for VECTOR_TYPE operations.
2023-06-26Fix TYPE_PRECISION use in hashable_expr_equal_pRichard Biener1-1/+1
While the checks look unnecessary they probably are quick and thus done early. The following avoids using TYPE_PRECISION on VECTOR_TYPEs by making the code match the comment which talks about precision and signedness. An alternative would be to only retain the ERROR_MARK and TYPE_MODE checks or use TYPE_PRECISION_RAW (but I like that least). * tree-ssa-scopedtables.cc (hashable_expr_equal_p): Use element_precision.
2023-06-26RISC-V: Remove redundant vcond patternsJuzhe-Zhong3-61/+0
Previously, Richi has suggested that vcond patterns are only needed when target support comparison + select consuming 1 instruction. Now, I do the experiments on removing those "vcond" patterns, it works perfectly. All testcases PASS. Really appreicate Richi helps us recognize such issue. Now remove all "vcond" patterns as Richi suggested. gcc/ChangeLog: * config/riscv/autovec.md (vcond<V:mode><VI:mode>): Remove redundant vcond patterns. (vcondu<V:mode><VI:mode>): Ditto. * config/riscv/riscv-protos.h (expand_vcond): Ditto. * config/riscv/riscv-v.cc (expand_vcond): Ditto.
2023-06-26tree-optimization/110392 - ICE with predicate analysisRichard Biener1-2/+2
Feeding not optimized IL can result in predicate normalization to simplify things so a predicate can get true or false. The following re-orders the early exit in that case to come after simplification and normalization to take care of that. PR tree-optimization/110392 * gimple-predicate-analysis.cc (uninit_analysis::is_use_guarded): Do early exits on true/false predicate only after normalization.
2023-06-26SCCVN: Fix repeating variable name "len"Ju-Zhe Zhong1-7/+7
Line 3292: has variable name "len": tree mask = NULL_TREE, len = NULL_TREE, bias = NULL_TREE; Line 3349: has variable name "len": HOST_WIDE_INT start = 0, len = 0; Since they are never used simultaneously, such issue is not recognized for now. However, I want to add LEN_MASK_{LOAD,STORE} which will need these 2 variables, so fix naming in this path. Change HOST_WIDE_INT start = 0, len = 0; into HOST_WIDE_INT start = 0, length = 0; gcc/ChangeLog: * tree-ssa-sccvn.cc (vn_reference_lookup_3): Change name "len" into "length".
2023-06-26i386: New *ashl<dwi3>_doubleword_highpart define_insn_and_split.Roger Sayle3-0/+67
This patch contains a pair of (related) optimizations in i386.md that allow us to generate better code for the example below (this is a step towards fixing a bugzilla PR, but I've forgotten the number). __int128 foo64(__int128 x, long long y) { __int128 t = (__int128)y << 64; return x ^ t; } The hidden issue is that the RTL currently seen by reload contains the sign extension of y from DImode to TImode, even though this is dead (not required) for left shifts by more than WORD_SIZE bits. (insn 11 8 12 2 (parallel [ (set (reg:TI 0 ax [orig:91 y ] [91]) (sign_extend:TI (reg:DI 1 dx [97]))) (clobber (reg:CC 17 flags)) (clobber (scratch:DI)) ]) {extendditi2} What makes this particularly undesirable is that the sign-extension pattern above requires an additional DImode scratch register, indicated by the clobber, which unnecessarily increases register pressure. The proposed solution is to add a define_insn_and_split for such left shifts (of sign or zero extensions) that only have a non-zero highpart, where the extension is redundant and eliminated, that can be split after reload, without scratch registers or early clobbers. This (late split) exposes a second optimization opportunity where setting the lowpart to zero can sometimes be combined/simplified with the following instruction during peephole2. For the test case above, we previously generated with -O2: foo64: xorl %eax, %eax xorq %rsi, %rdx xorq %rdi, %rax ret with this patch, we now generate: foo64: movq %rdi, %rax xorq %rsi, %rdx ret Likewise for the related -m32 test case, we go from: foo32: movl 12(%esp), %eax movl %eax, %edx xorl %eax, %eax xorl 8(%esp), %edx xorl 4(%esp), %eax ret to the improved: foo32: movl 12(%esp), %edx movl 4(%esp), %eax xorl 8(%esp), %edx ret 2023-06-26 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.md (peephole2): Simplify zeroing a register followed by an IOR, XOR or PLUS operation on it, into a move. (*ashl<dwi>3_doubleword_highpart): New define_insn_and_split to eliminate (and hide from reload) unnecessary word to doubleword extensions that are followed by left shifts by sufficiently large, but valid, bit counts. gcc/testsuite/ChangeLog * gcc.target/i386/ashldi3-1.c: New 32-bit test case. * gcc.target/i386/ashlti3-2.c: New 64-bit test case.
2023-06-26Use cvt_op to save intermediate type operand instead of "subtle" vec_dest.liuhongt2-4/+30
When there're multiple operands in vec_oprnds0, vec_dest will be overwrited to vectype_out, but in multi_step_cvt case, cvt_type is expected. It caused an ICE when verify_gimple_in_cfg. gcc/ChangeLog: PR tree-optimization/110371 PR tree-optimization/110018 * tree-vect-stmts.cc (vectorizable_conversion): Use cvt_op to save intermediate type operand instead of "subtle" vec_dest for case NONE. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr110371.c: New test.
2023-06-26Don't use intermiediate type for FIX_TRUNC_EXPR when ftrapping-math.liuhongt3-3/+4
> > Hmm, good question. GENERIC has a direct truncation to unsigned char > > for example, the C standard generally says if the integral part cannot > > be represented then the behavior is undefined. So I think we should be > > safe here (0x1.0p32 doesn't fit an int). > > We should be following Annex F (unspecified value plus "invalid" exception > for out-of-range floating-to-integer conversions rather than undefined > behavior). But we don't achieve that very well at present (see bug 93806 > comments 27-29 for examples of how such conversions produce wobbly > values). That would mean guarding this with !flag_trapping_math would be the appropriate thing to do. gcc/ChangeLog: PR tree-optimization/110371 PR tree-optimization/110018 * tree-vect-stmts.cc (vectorizable_conversion): Don't use intermiediate type for FIX_TRUNC_EXPR when ftrapping-math. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110018-1.c: Add -fno-trapping-math to dg-options. * gcc.target/i386/pr110018-2.c: Ditto.
2023-06-26i386: Sync tune_string with arch_string for target attribute arch=*Hongyu Wang2-1/+16
For function with target attribute arch=*, current logic will set its tune to -mtune from command line so all target_clones will get same tuning flags which would affect the performance for each clone. Override tune with arch if tune was not explicitly specified to get proper tuning flags for target_clones. gcc/ChangeLog: * config/i386/i386-options.cc (ix86_valid_target_attribute_tree): Override tune_string with arch_string if tune_string is not explicitly specified. gcc/testsuite/ChangeLog: * gcc.target/i386/mvc17.c: New test.
2023-06-26RISC-V: Fix one test failure of dg config.Juzhe-Zhong1-1/+1
gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/vlmul_ext-2.c: Add -Wno-psabi for dg.
2023-06-26d: Suboptimal codegen for __builtin_expect(cond, false)Iain Buclaw2-12/+41
Since PR96435, both boolean objects and expressions have been evaluated in the following way. (*(ubyte*)&obj_or_expr) & 1 It has been noted that sometimes this can cause the back-end to optimize in non-obvious ways - in particular with __builtin_expect. This @safe feature is now restricted to just when reading the value of a bool field that comes from a union. PR d/110359 gcc/d/ChangeLog: * d-convert.cc (convert_for_rvalue): Only apply the @safe boolean conversion to boolean fields of a union. (convert_for_condition): Call convert_for_rvalue in the default case. gcc/testsuite/ChangeLog: * gdc.dg/pr110359.d: New test.
2023-06-26Daily bump.GCC Administrator6-1/+203
2023-06-26d: Merge upstream dmd, druntime a45f4e9f43, phobos 106038f2e.Iain Buclaw46-207/+435
D front-end changes: - Import dmd v2.103.1. - Deprecated invalid special token sequences inside token strings. D runtime changes: - Import druntime v2.103.1. Phobos changes: - Import phobos v2.103.1. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd a45f4e9f43. * dmd/VERSION: Bump version to v2.103.1. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime a45f4e9f43. * src/MERGE: Merge upstream phobos 106038f2e.
2023-06-25RISC-V: Optimize VSETVL codegen of SELECT_VL with LEN_MASK_{LOAD, STORE}Juzhe-Zhong4-4/+76
This patch is depending on LEN_MASK_{LOAD,STORE} patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622742.html After enabling the LEN_MASK_{LOAD,STORE}, I notice that there is a case that VSETVL PASS need to be optimized: void f (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict cond, int n) { for (int i = 0; i < 8; i++) if (cond[i]) a[i] = b[i]; } Before this patch: f: vsetivli a5,8,e8,mf4,tu,mu --> Propagate "8" to the following vsetvl vsetvli zero,a5,e32,m1,ta,ma vle32.v v0,0(a2) vsetvli a6,zero,e32,m1,ta,ma li a3,8 vmsne.vi v0,v0,0 vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a1),v0.t vse32.v v1,0(a0),v0.t sub a4,a3,a5 beq a3,a5,.L6 slli a5,a5,2 add a2,a2,a5 add a1,a1,a5 add a0,a0,a5 vsetvli a5,a4,e8,mf4,tu,mu --> Propagate "a4" to the following vsetvl vsetvli zero,a5,e32,m1,ta,ma vle32.v v0,0(a2) vsetvli a6,zero,e32,m1,ta,ma vmsne.vi v0,v0,0 vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a1),v0.t vse32.v v1,0(a0),v0.t .L6: ret Current VSETLV PASS only enable AVL propagation of VLMAX AVL ("zero"). Now, we enable AVL propagation of immediate && conservative non-VLMAX. After this patch: f: vsetivli a5,8,e8,mf4,ta,ma vle32.v v0,0(a2) vsetvli a6,zero,e32,m1,ta,ma li a3,8 vmsne.vi v0,v0,0 vsetivli zero,8,e32,m1,ta,ma vle32.v v1,0(a1),v0.t vse32.v v1,0(a0),v0.t sub a4,a3,a5 beq a3,a5,.L6 slli a5,a5,2 vsetvli a4,a4,e8,mf4,ta,ma add a2,a2,a5 vle32.v v0,0(a2) add a1,a1,a5 vsetvli a6,zero,e32,m1,ta,ma add a0,a0,a5 vmsne.vi v0,v0,0 vsetvli zero,a4,e32,m1,ta,ma vle32.v v1,0(a1),v0.t vse32.v v1,0(a0),v0.t .L6: ret gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (vector_insn_info::parse_insn): Ehance AVL propagation. * config/riscv/riscv-vsetvl.h: New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/select_vl-1.c: Add dump checks. * gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: New test.
2023-06-25RISC-V: fix expand function of vlmul_ext RVV intrinsicLi Xu2-1/+9
Consider this following case: void test_vlmul_ext_v_i8mf8_i8mf4(vint8mf8_t op1) { vint8mf4_t res = __riscv_vlmul_ext_v_i8mf8_i8mf4(op1); } Compilation fails with: test.c: In function 'test_vlmul_ext_v_i8mf8_i8mf4': test.c:5:1: error: unrecognizable insn: 5 | } | ^ (insn 30 29 0 2 (set (mem/c:VNx2QI (reg/f:DI 143) [0 x+0 S[2, 2] A32]) (mem/c:VNx2QI (reg/f:DI 148) [0 op1+0 S[2, 2] A16])) "test.c":4:18 -1 (nil)) during RTL pass: vregs test.c:5:1: internal compiler error: in extract_insn, at recog.cc:2791 0x7c61b8 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../.././riscv-gcc/gcc/rtl-error.cc:108 0x7c61d7 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) ../.././riscv-gcc/gcc/rtl-error.cc:116 0xed58a7 extract_insn(rtx_insn*) ../.././riscv-gcc/gcc/recog.cc:2791 0xb7f789 instantiate_virtual_regs_in_insn ../.././riscv-gcc/gcc/function.cc:1611 0xb7f789 instantiate_virtual_regs ../.././riscv-gcc/gcc/function.cc:1984 gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: change emit_insn to emit_move_insn gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/vlmul_ext-2.c: New test.
2023-06-25RISC-V: Enable len_mask{load, store} and remove len_{load, store}Juzhe-Zhong12-15/+346
This patch enable len_mask_{load,store} to support flow-control in RVV auto-vectorization. Consider this following case: void f (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict cond, int n) { for (int i = 0; i < n; i++) if (cond[i]) a[i] = b[i]; } Before this patch: <source>:9:21: missed: couldn't vectorize loop <source>:9:21: missed: not vectorized: control flow in loop. After this patch: f: ble a3,zero,.L5 .L3: vsetvli a5,a3,e32,m1,ta,ma vle32.v v0,0(a2) vsetvli a6,zero,e32,m1,ta,ma slli a4,a5,2 vmsne.vi v0,v0,0 sub a3,a3,a5 vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a1),v0.t vse32.v v1,0(a0),v0.t add a2,a2,a4 add a1,a1,a4 add a0,a0,a4 bne a3,zero,.L3 .L5: ret gcc/ChangeLog: * config/riscv/autovec.md (len_load_<mode>): Remove. (len_maskload<mode><vm>): Remove. (len_store_<mode>): New pattern. (len_maskstore<mode><vm>): New pattern. * config/riscv/predicates.md (autovec_length_operand): New predicate. * config/riscv/riscv-protos.h (enum insn_type): New enum. (expand_load_store): New function. * config/riscv/riscv-v.cc (emit_vlmax_masked_insn): Ditto. (emit_nonvlmax_masked_insn): Ditto. (expand_load_store): Ditto. * config/riscv/riscv-vector-builtins.cc (function_expander::use_contiguous_store_insn): Add avl_type operand into pred_store. * config/riscv/vector.md: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/single_rgroup-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/single_rgroup-2.h: New test. * gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.c: New test. * gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h: New test. * gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c: New test.
2023-06-25internal-fn: Fix bug of BIAS argument indexJu-Zhe Zhong1-1/+1
When trying to enable LEN_MASK_{LOAD,STORE} in RISC-V port, I found I made a mistake in case of argument index of BIAS. This patch is an obvious fix. gcc/ChangeLog: * internal-fn.cc (expand_partial_store_optab_fn): Fix bug of BIAS argument index.
2023-06-25MAINTAINERS: Add myself to write after approvalLehua Ding1-0/+1
ChangeLog: * MAINTAINERS: Add Lehua Ding to write after approval
2023-06-25configure, Darwin: Ensure overrides to host-pie are passed to gcc configure.Iain Sandoe4-35/+89
The latest versions of Darwin on the Aarch64 platform mandate PIE executables. On x86_64 it remains optional, but produces tool warnings after Darwin20, so we default to PIE executables there too. All (non-PowerPC) 64b Darwin platforms mandate PIC code and therefore force host_shared on (we issue a diagnostic if the user tries to configure them non-shared). However, this also means we cannot test the host_shared setting independently of the host_pie setting so that the logic for setting PICFLAG must be amended for Darwin. For Darwin versions required to have PIE executables, in the event that the user tries to configure these as --disable-host-pie, we issue a warning and override the setting. These versions must also switch host_pie on even if it is not given in the configure line. To cater for this we pass the current value of host_pie, as determined by top-level configure, to the GCC configure. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> ChangeLog: * Makefile.def: Pass the enable-host-pie value to GCC configure. * Makefile.in: Regenerate. * configure: Regenerate. * configure.ac: Adjust the logic for shared and PIE host flags to ensure that PIE is passed for hosts that require it.
2023-06-25Revert "RISC-V:Add float16 tuple type abi"Pan Li9-630/+17
This reverts commit f9ab5d62c94547499de52c800ab914cc8e802212 due to the bootstrap failure on machine mode out of range memory access. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/vector.md: Revert. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/abi-10.c: Revert. * gcc.target/riscv/rvv/base/abi-11.c: Ditto. * gcc.target/riscv/rvv/base/abi-12.c: Ditto. * gcc.target/riscv/rvv/base/abi-15.c: Ditto. * gcc.target/riscv/rvv/base/abi-8.c: Ditto. * gcc.target/riscv/rvv/base/abi-9.c: Ditto. * gcc.target/riscv/rvv/base/abi-17.c: Ditto. * gcc.target/riscv/rvv/base/abi-18.c: Ditto.
2023-06-25Revert "RISC-V:Add float16 tuple type support"Pan Li12-366/+3
This reverts commit 8a96f240d71d367a2955ab9e0f0fef3a0b0e2a74 due to bootstrap failure on mode out of range access, will commit this patch after the issue addressed. gcc/ChangeLog: * config/riscv/genrvv-type-indexer.cc (valid_type): Revert changes. * config/riscv/riscv-modes.def (RVV_TUPLE_MODES): Ditto. (ADJUST_ALIGNMENT): Ditto. (RVV_TUPLE_PARTIAL_MODES): Ditto. (ADJUST_NUNITS): Ditto. * config/riscv/riscv-vector-builtins-types.def (vfloat16mf4x2_t): Ditto. (vfloat16mf4x3_t): Ditto. (vfloat16mf4x4_t): Ditto. (vfloat16mf4x5_t): Ditto. (vfloat16mf4x6_t): Ditto. (vfloat16mf4x7_t): Ditto. (vfloat16mf4x8_t): Ditto. (vfloat16mf2x2_t): Ditto. (vfloat16mf2x3_t): Ditto. (vfloat16mf2x4_t): Ditto. (vfloat16mf2x5_t): Ditto. (vfloat16mf2x6_t): Ditto. (vfloat16mf2x7_t): Ditto. (vfloat16mf2x8_t): Ditto. (vfloat16m1x2_t): Ditto. (vfloat16m1x3_t): Ditto. (vfloat16m1x4_t): Ditto. (vfloat16m1x5_t): Ditto. (vfloat16m1x6_t): Ditto. (vfloat16m1x7_t): Ditto. (vfloat16m1x8_t): Ditto. (vfloat16m2x2_t): Ditto. (vfloat16m2x3_t): Diito. (vfloat16m2x4_t): Diito. (vfloat16m4x2_t): Diito. * config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): Ditto. (vfloat16mf4x3_t): Ditto. (vfloat16mf4x4_t): Ditto. (vfloat16mf4x5_t): Ditto. (vfloat16mf4x6_t): Ditto. (vfloat16mf4x7_t): Ditto. (vfloat16mf4x8_t): Ditto. (vfloat16mf2x2_t): Ditto. (vfloat16mf2x3_t): Ditto. (vfloat16mf2x4_t): Ditto. (vfloat16mf2x5_t): Ditto. (vfloat16mf2x6_t): Ditto. (vfloat16mf2x7_t): Ditto. (vfloat16mf2x8_t): Ditto. (vfloat16m1x2_t): Ditto. (vfloat16m1x3_t): Ditto. (vfloat16m1x4_t): Ditto. (vfloat16m1x5_t): Ditto. (vfloat16m1x6_t): Ditto. (vfloat16m1x7_t): Ditto. (vfloat16m1x8_t): Ditto. (vfloat16m2x2_t): Ditto. (vfloat16m2x3_t): Ditto. (vfloat16m2x4_t): Ditto. (vfloat16m4x2_t): Ditto. * config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): Ditto. * config/riscv/riscv.md: Ditto. * config/riscv/vector-iterators.md: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/tuple-28.c: Removed. * gcc.target/riscv/rvv/base/tuple-29.c: Removed. * gcc.target/riscv/rvv/base/tuple-30.c: Removed. * gcc.target/riscv/rvv/base/tuple-31.c: Removed. * gcc.target/riscv/rvv/base/tuple-32.c: Removed. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-06-25GIMPLE_FOLD: Apply LEN_MASK_{LOAD,STORE} into GIMPLE_FOLDJu-Zhe Zhong1-5/+18
Hi, since we are going to have LEN_MASK_{LOAD,STORE} into loopVectorizer. Currenly, 1. we can fold MASK_{LOAD,STORE} into MEM when mask is all ones. 2. we can fold LEN_{LOAD,STORE} into MEM when (len - bias) is VF. Now, I think it makes sense that we can support fold LEN_MASK_{LOAD,STORE} into MEM when both mask = all ones and (len - bias) is VF. gcc/ChangeLog: * gimple-fold.cc (arith_overflowed_p): Apply LEN_MASK_{LOAD,STORE}. (gimple_fold_partial_load_store_mem_ref): Ditto. (gimple_fold_partial_store): Ditto. (gimple_fold_call): Ditto.
2023-06-25Refine maskloadmn pattern with UNSPEC_MASKLOAD.liuhongt2-14/+28
If mem_addr points to a memory region with less than whole vector size bytes of accessible memory and k is a mask that would prevent reading the inaccessible bytes from mem_addr, add UNSPEC_MASKLOAD to prevent it to be transformed to vpblendd. gcc/ChangeLog: PR target/110309 * config/i386/sse.md (maskload<mode><avx512fmaskmodelower>): Refine pattern with UNSPEC_MASKLOAD. (maskload<mode><avx512fmaskmodelower>): Ditto. (*<avx512>_load<mode>_mask): Extend mode iterator to VI12HFBF_AVX512VL. (*<avx512>_load<mode>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110309.c: New test.
2023-06-25SSA ALIAS: Apply LEN_MASK_STORE to 'ref_maybe_used_by_call_p_1'Ju-Zhe Zhong1-0/+1
gcc/ChangeLog: * tree-ssa-alias.cc (call_may_clobber_ref_p_1): Add LEN_MASK_STORE.
2023-06-25SSA ALIAS: Apply LEN_MASK_{LOAD, STORE} into SSA alias analysisJu-Zhe Zhong1-0/+2
gcc/ChangeLog: * tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Apply LEN_MASK_{LOAD,STORE}
2023-06-25RISC-V:Add float16 tuple type abiyulong9-17/+630
gcc/ChangeLog: * config/riscv/vector.md: Add float16 attr at sew、vlmul and ratio. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/abi-10.c: Add float16 tuple type case. * gcc.target/riscv/rvv/base/abi-11.c: Ditto. * gcc.target/riscv/rvv/base/abi-12.c: Ditto. * gcc.target/riscv/rvv/base/abi-15.c: Ditto. * gcc.target/riscv/rvv/base/abi-8.c: Ditto. * gcc.target/riscv/rvv/base/abi-9.c: Ditto. * gcc.target/riscv/rvv/base/abi-17.c: New test. * gcc.target/riscv/rvv/base/abi-18.c: New test.
2023-06-25Daily bump.GCC Administrator5-1/+128
2023-06-24i386: Add alternate representation for {and,or,xor}b %ah,%dh.Roger Sayle1-0/+22
A patch that I'm working on to improve RTL simplifications in the middle-end results in the regression of pr78904-1b.c, due to changes in the canonical representation of high-byte (%ah, %bh, %ch, %dh) logic. See also PR target/78904. This patch avoids/prevents those failures by adding support for the alternate representation, duplicating the existing *<code>qi_ext<mode>_2 as *<code>qi_ext<mode>_3 (the new version also replacing any_or with any_logic to provide *andqi_ext<mode>_3 in the same pattern). Removing the original pattern isn't trivial, as it's generated by define_split, but this can be investigated after the other pieces are approved. The current representation of this instruction is: (set (zero_extract:DI (reg/v:DI 87 [ aD.2763 ]) (const_int 8 [0x8]) (const_int 8 [0x8])) (subreg:DI (xor:QI (subreg:QI (zero_extract:DI (reg:DI 94) (const_int 8 [0x8]) (const_int 8 [0x8])) 0) (subreg:QI (zero_extract:DI (reg/v:DI 87 [ aD.2763 ]) (const_int 8 [0x8]) (const_int 8 [0x8])) 0)) 0)) after my proposed middle-end improvement, we attempt to recognize: (set (zero_extract:DI (reg/v:DI 87 [ aD.2763 ]) (const_int 8 [0x8]) (const_int 8 [0x8])) (zero_extract:DI (xor:DI (reg:DI 94) (reg/v:DI 87 [ aD.2763 ])) (const_int 8 [0x8]) (const_int 8 [0x8]))) 2023-06-24 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.md (*<code>qi_ext<mode>_3): New define_insn.
2023-06-24Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360]Harald Anlauf1-8/+13
gcc/fortran/ChangeLog: PR fortran/110360 * trans-expr.cc (gfc_conv_procedure_call): Truncate constant string argument of length > 1 passed to scalar CHARACTER(1),VALUE dummy.
2023-06-24RISC-V: Refactor the integer ternary autovec patternJuzhe-Zhong1-26/+28
Long time ago, I encounter ICE when trying to set clobber register as Pmode and I forgot the reason. So, I clobber SI scratch and PUT_MODE to make it Pmode after reload which makes patterns look unreasonable. According to Jeff's comments, I tried it again, it works now when we try to set clobber register as Pmode and the patterns look more reasonable now. The tests are all passed, Ok for trunk. gcc/ChangeLog: * config/riscv/autovec.md (*fma<mode>): set clobber to Pmode in expand stage. (*fma<VI:mode><P:mode>): Ditto. (*fnma<mode>): Ditto. (*fnma<VI:mode><P:mode>): Ditto.
2023-06-24RISC-V: Support RVV floating-point auto-vectorizationJuzhe-Zhong40-34/+1386
This patch adds RVV floating-point auto-vectorization. Also, fix attribute bug of floating-point ternary operations in vector.md. gcc/ChangeLog: * config/riscv/autovec.md (fma<mode>4): New pattern. (*fma<mode>): Ditto. (fnma<mode>4): Ditto. (*fnma<mode>): Ditto. (fms<mode>4): Ditto. (*fms<mode>): Ditto. (fnms<mode>4): Ditto. (*fnms<mode>): Ditto. * config/riscv/riscv-protos.h (emit_vlmax_fp_ternary_insn): New function. * config/riscv/riscv-v.cc (emit_vlmax_fp_ternary_insn): Ditto. * config/riscv/vector.md: Fix attribute bug. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: Adjust tests. * gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop-10.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-11.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-12.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-7.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-8.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-9.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-10.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-11.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-12.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-7.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-8.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-9.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-1.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-10.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-11.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-12.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-2.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-3.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-4.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-5.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-6.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-7.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-8.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-9.c: New test.
2023-06-24LOOP IVOPTS: Apply LEN_MASK_{LOAD,STORE}Ju-Zhe Zhong1-3/+11
Hi, Jeff. I fix format as you suggested. Ok for trunk ? gcc/ChangeLog: * tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Apply LEN_MASK_{LOAD,STORE}.
2023-06-24IVOPTS: Add LEN_MASK_{LOAD, STORE} into 'get_alias_ptr_type_for_ptr_address'Ju-Zhe Zhong1-0/+2
gcc/ChangeLog: * tree-ssa-loop-ivopts.cc (get_alias_ptr_type_for_ptr_address): Add LEN_MASK_{LOAD,STORE}.