aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-07-31combine: Narrow comparison of memory and constantStefan Schulze Frielinghaus8-5/+200
Comparisons between memory and constants might be done in a smaller mode resulting in smaller constants which might finally end up as immediates instead of in the literal pool. For example, on s390x a non-symmetric comparison like x <= 0x3fffffffffffffff results in the constant being spilled to the literal pool and an 8 byte memory comparison is emitted. Ideally, an equivalent comparison x0 <= 0x3f where x0 is the most significant byte of x, is emitted where the constant is smaller and more likely to materialize as an immediate. Similarly, comparisons of the form x >= 0x4000000000000000 can be shortened into x0 >= 0x40. gcc/ChangeLog: * combine.cc (simplify_compare_const): Narrow comparison of memory and constant. (try_combine): Adapt new function signature. (simplify_comparison): Adapt new function signature. gcc/testsuite/ChangeLog: * gcc.dg/cmp-mem-const-1.c: New test. * gcc.dg/cmp-mem-const-2.c: New test. * gcc.dg/cmp-mem-const-3.c: New test. * gcc.dg/cmp-mem-const-4.c: New test. * gcc.dg/cmp-mem-const-5.c: New test. * gcc.dg/cmp-mem-const-6.c: New test. * gcc.target/s390/cmp-mem-const-1.c: New test.
2023-07-31RISC-V: Drop unused variableKito Cheng1-2/+0
gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vec_series): Drop unused variable. (expand_vector_init_insert_elems): Ditto.
2023-07-31AArch64: Do not increase the vect reduction latency by multiplying count ↵Hao Liu3-4/+83
[PR110625] The new costs should only count reduction latency by multiplying count for single_defuse_cycle. For other situations, this will increase the reduction latency a lot and miss vectorization opportunities. Tested on aarch64-linux-gnu. gcc/ChangeLog: PR target/110625 * config/aarch64/aarch64.cc (count_ops): Only '* count' for single_defuse_cycle while counting reduction_latency. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr110625_1.c: New testcase. * gcc.target/aarch64/pr110625_2.c: New testcase.
2023-07-31internal-fn: Refine macro define of COND_* and COND_LEN_* internal functionsJu-Zhe Zhong1-67/+56
Hi, Richard and Richi. Base on previous disscussions, we should make COND_* and COND_LEN_* consistent. So, this patch define these internal function together by these 2 wrappers: DEF_INTERNAL_OPTAB_FN (COND_##NAME, FLAGS, cond_##OPTAB, cond_##TYPE) \ DEF_INTERNAL_OPTAB_FN (COND_LEN_##NAME, FLAGS, cond_len_##OPTAB, \ cond_len_##TYPE) UNSIGNED_OPTAB, TYPE) \ DEF_INTERNAL_SIGNED_OPTAB_FN (COND_##NAME, FLAGS, SELECTOR, \ cond_##SIGNED_OPTAB, cond_##UNSIGNED_OPTAB, \ cond_##TYPE) \ DEF_INTERNAL_SIGNED_OPTAB_FN (COND_LEN_##NAME, FLAGS, SELECTOR, \ cond_len_##SIGNED_OPTAB, \ cond_len_##UNSIGNED_OPTAB, cond_len_##TYPE) Bootstrap and Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * internal-fn.def (DEF_INTERNAL_COND_FN): New macro. (DEF_INTERNAL_SIGNED_COND_FN): Ditto. (COND_ADD): Remove. (COND_SUB): Ditto. (COND_MUL): Ditto. (COND_DIV): Ditto. (COND_MOD): Ditto. (COND_RDIV): Ditto. (COND_MIN): Ditto. (COND_MAX): Ditto. (COND_FMIN): Ditto. (COND_FMAX): Ditto. (COND_AND): Ditto. (COND_IOR): Ditto. (COND_XOR): Ditto. (COND_SHL): Ditto. (COND_SHR): Ditto. (COND_FMA): Ditto. (COND_FMS): Ditto. (COND_FNMA): Ditto. (COND_FNMS): Ditto. (COND_NEG): Ditto. (COND_LEN_ADD): Ditto. (COND_LEN_SUB): Ditto. (COND_LEN_MUL): Ditto. (COND_LEN_DIV): Ditto. (COND_LEN_MOD): Ditto. (COND_LEN_RDIV): Ditto. (COND_LEN_MIN): Ditto. (COND_LEN_MAX): Ditto. (COND_LEN_FMIN): Ditto. (COND_LEN_FMAX): Ditto. (COND_LEN_AND): Ditto. (COND_LEN_IOR): Ditto. (COND_LEN_XOR): Ditto. (COND_LEN_SHL): Ditto. (COND_LEN_SHR): Ditto. (COND_LEN_FMA): Ditto. (COND_LEN_FMS): Ditto. (COND_LEN_FNMA): Ditto. (COND_LEN_FNMS): Ditto. (COND_LEN_NEG): Ditto. (ADD): New macro define. (SUB): Ditto. (MUL): Ditto. (DIV): Ditto. (MOD): Ditto. (RDIV): Ditto. (MIN): Ditto. (MAX): Ditto. (FMIN): Ditto. (FMAX): Ditto. (AND): Ditto. (IOR): Ditto. (XOR): Ditto. (SHL): Ditto. (SHR): Ditto. (FMA): Ditto. (FMS): Ditto. (FNMA): Ditto. (FNMS): Ditto. (NEG): Ditto.
2023-07-31Use substituted GDCFLAGSAndreas Schwab3-1/+3
Use the substituted value for GCDFLAGS instead of hardcoding $(CFLAGS) so that the subdir configure scripts use the configured value. * configure.ac (GDCFLAGS): Set default from ${CFLAGS}. * configure: Regenerate. * Makefile.in (GDCFLAGS): Substitute @GDCFLAGS@.
2023-07-31[Committed] PR target/110843: Check TARGET_AVX512VL for V2DI rotates in STV.Roger Sayle2-3/+23
This patch resolves PR target/110843, an ICE caused by my enhancement to support AVX512 DImode and SImode rotates in the scalar-to-vector (STV) pass. Although the vprotate instructions are available on all TARGET_AVX512F microarchitectures, the V2DI and V4SI variants are only available on the TARGET_AVX512VL subset, leading to problems when command line options enable AVX512 (i.e. AVX512F) but not the required AVX512VL functionality. The simple fix is to update/correct the target checks. 2023-07-31 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/110843 * config/i386/i386-features.cc (compute_convert_gain): Check TARGET_AVX512VL (not TARGET_AVX512F) when considering V2DImode and V4SImode rotates in STV. (general_scalar_chain::convert_rotate): Likewise. gcc/testsuite/ChangeLog PR target/110843 * gcc.target/i386/pr110843.c: New test case.
2023-07-31RISC-V: Return machine_mode rather than opt_machine_mode for get_mask_mode, NFCKito Cheng4-40/+37
We always want get_mask_mode return a valid mode, it's something wrong if it failed, so I think we could just move the `.require ()` into get_mask_mode, instead of calling that every call-site. The only exception is riscv_get_mask_mode, it might put supported mode into get_mask_mode, so added a check with riscv_v_ext_mode_p to make sure only valid vector mode will ask get_mask_mode. gcc/ChangeLog: * config/riscv/autovec.md (abs<mode>2): Remove `.require ()`. * config/riscv/riscv-protos.h (get_mask_mode): Update return type. * config/riscv/riscv-v.cc (rvv_builder::rvv_builder): Remove `.require ()`. (emit_vlmax_insn): Ditto. (emit_vlmax_fp_insn): Ditto. (emit_vlmax_ternary_insn): Ditto. (emit_vlmax_fp_ternary_insn): Ditto. (emit_nonvlmax_fp_ternary_tu_insn): Ditto. (emit_nonvlmax_insn): Ditto. (emit_vlmax_slide_insn): Ditto. (emit_nonvlmax_slide_tu_insn): Ditto. (emit_vlmax_merge_insn): Ditto. (emit_vlmax_masked_insn): Ditto. (emit_nonvlmax_masked_insn): Ditto. (emit_vlmax_masked_store_insn): Ditto. (emit_nonvlmax_masked_store_insn): Ditto. (emit_vlmax_masked_mu_insn): Ditto. (emit_nonvlmax_tu_insn): Ditto. (emit_nonvlmax_fp_tu_insn): Ditto. (emit_scalar_move_insn): Ditto. (emit_vlmax_compress_insn): Ditto. (emit_vlmax_reduction_insn): Ditto. (emit_vlmax_fp_reduction_insn): Ditto. (emit_nonvlmax_fp_reduction_insn): Ditto. (expand_vec_series): Ditto. (expand_vector_init_merge_repeating_sequence): Ditto. (expand_vec_perm): Ditto. (shuffle_merge_patterns): Ditto. (shuffle_compress_patterns): Ditto. (shuffle_decompress_patterns): Ditto. (expand_reduction): Ditto. (get_mask_mode): Update return type. * config/riscv/riscv.cc (riscv_get_mask_mode): Check vector type is valid, and use new get_mask_mode interface.
2023-07-31RISC-V: Bugfix for RVV floating-point rm suffix sequencePan Li3-20/+20
According to below RVV intrinsic doc, the RVV floating-point intrinsic name with rounding mode should be: _rm_m instead of: _m_rm https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226 This patch fix this naming sequence issue and adjust the test cases. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-shapes.cc (struct alu_frm_def): Move rm suffix before mask. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/float-point-frm-insert-1.c: Adjust test cases. * gcc.target/riscv/rvv/base/float-point-frm.c: Ditto.
2023-07-31RISC-V: Enable basic VLS auto-vectorizationJuzhe-Zhong13-6/+1034
Consider this following case: void foo (int8_t *in, int8_t *out, int8_t x) { for (int i = 0; i < 16; i++) in[i] = x; } Compile option: --param=riscv-autovec-preference=scalable -fno-builtin Before this patch: foo: li a5,16 csrr a4,vlenb vsetvli a3,zero,e8,m1,ta,ma vmv.v.x v1,a2 bleu a5,a4,.L2 mv a5,a4 .L2: vsetvli zero,a5,e8,m1,ta,ma vse8.v v1,0(a0) ret After this patch: foo: vsetivli zero,16,e8,mf8,ta,ma vmv.v.x v1,a2 vse8.v v1,0(a0) ret gcc/ChangeLog: * config/riscv/autovec-vls.md (@vec_duplicate<mode>): New pattern. * config/riscv/riscv-v.cc (autovectorize_vector_modes): Add VLS autovec support. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/v-1.c: Adapt test. * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/dup-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/dup-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/dup-3.c: New test. * gcc.target/riscv/rvv/autovec/vls/dup-4.c: New test. * gcc.target/riscv/rvv/autovec/vls/dup-5.c: New test. * gcc.target/riscv/rvv/autovec/vls/dup-6.c: New test. * gcc.target/riscv/rvv/autovec/vls/dup-7.c: New test.
2023-07-31MAINTAINERS: Add myself to write after approvalxuli1-0/+1
Signed-off-by: Li Xu <xuli1@eswincomputing.com> ChangeLog: * MAINTAINERS: Add myself.
2023-07-31Daily bump.GCC Administrator2-1/+8
2023-07-30libstdc++: Fix several preprocessor directivesFrançois Dumont3-4/+4
A wrong usage of #define in place of a #error seems to have been replicated at different places in source files. libstdc++-v3/ChangeLog: * src/c++11/compatibility-ldbl-facets-aliases.h: Replace #define with proper #error. * src/c++11/locale-inst-monetary.h: Likewise. * src/c++11/locale-inst-numeric.h: Likewise.
2023-07-30Daily bump.GCC Administrator5-1/+68
2023-07-29[Committed] Use QImode for offsets in zero_extract/sign_extract in i386.mdRoger Sayle3-109/+149
As suggested by Uros, this patch changes the ZERO_EXTRACTs and SIGN_EXTRACTs in i386.md to consistently use QImode for bit offsets (i.e. third and fourth operands), matching the use of QImode for bit counts in shifts and rotates. This iteration also corrects the "ne:QI" vs "eq:QI" mistake in the previous version, which was responsible for PR 110787 and PR 110790 and so was rapidly reverted last weekend. New test cases have been added to check the correct behaviour. 2023-07-29 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/110790 * config/i386/i386.md (extv<mode>): Use QImode for offsets. (extzv<mode>): Likewise. (insv<mode>): Likewise. (*testqi_ext_3): Likewise. (*btr<mode>_2): Likewise. (define_split): Likewise. (*btsq_imm): Likewise. (*btrq_imm): Likewise. (*btcq_imm): Likewise. (define_peephole2 x3): Likewise. (*bt<mode>): Likewise (*bt<mode>_mask): New define_insn_and_split. (*jcc_bt<mode>): Use QImode for offsets. (*jcc_bt<mode>_1): Delete obsolete pattern. (*jcc_bt<mode>_mask): Use QImode offsets. (*jcc_bt<mode>_mask_1): Likewise. (define_split): Likewise. (*bt<mode>_setcqi): Likewise. (*bt<mode>_setncqi): Likewise. (*bt<mode>_setnc<mode>): Likewise. (*bt<mode>_setncqi_2): Likewise. (*bt<mode>_setc<mode>_mask): New define_insn_and_split. (bmi2_bzhi_<mode>3): Use QImode offsets. (*bmi2_bzhi_<mode>3): Likewise. (*bmi2_bzhi_<mode>3_1): Likewise. (*bmi2_bzhi_<mode>3_1_ccz): Likewise. (@tbm_bextri_<mode>): Likewise. gcc/testsuite/ChangeLog PR target/110790 * gcc.target/i386/pr110790-1.c: New test case. * gcc.target/i386/pr110790-2.c: Likewise.
2023-07-29libgomp: cuda.h and omp_target_memcpy_rect cleanupTobias Burnus3-42/+28
Fixes for commit r14-2792-g25072a477a56a727b369bf9b20f4d18198ff5894 "OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect", namely: In that commit, the code was changed to handle shared-memory devices; however, as pointed out, omp_target_memcpy_check already set the pointer to NULL in that case. Hence, this commit reverts to the prior version. In cuda.h, it adds cuMemcpyPeer{,Async} for symmetry for cuMemcpy3DPeer (all currently unused) and in three structs, fixes reserved-member names and remove a bogus 'const' in three structs. And it changes a DLSYM to DLSYM_OPT as not all plugins support the new functions, yet. include/ChangeLog: * cuda/cuda.h (CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): Remove bogus 'const' from 'const void *dst' and fix reserved-name name in those structs. (cuMemcpyPeer, cuMemcpyPeerAsync): Add. libgomp/ChangeLog: * target.c (omp_target_memcpy_rect_worker): Undo dim=1 change for GOMP_OFFLOAD_CAP_SHARED_MEM. (omp_target_memcpy_rect_copy): Likewise for lock condition. (gomp_load_plugin_for_device): Use DLSYM_OPT not DLSYM for memcpy3d/memcpy2d. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): Use memset 0 to nullify reserved and unused src/dst fields for that mem type; remove '{src,dst}LOD = 0'.
2023-07-29Fix profile update after vectorize loop versioningJan Hubicka1-0/+9
gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/vect-profile-upate-2.c: New test.
2023-07-29Fix profile update after vectorize loop versioningJan Hubicka3-3/+75
Vectorizer while loop versioning produces a versioned loop guarded with two conditionals of the form if (cond1) goto scalar_loop else goto next_bb next_bb: if (cond2) godo scalar_loop else goto vector_loop It wants the combined test to be prob (whch is set to likely) and uses profile_probability::split to determine probability of cond1 and cond2. However spliting is turning: if (cond) goto lab; // ORIG probability into if (cond1) goto lab; // FIRST = ORIG * CPROB probability if (cond2) goto lab; // SECOND probability Which is or instead of and. As a result we get pretty low probabiility of entering vectorized loop. The fixes this by introducing sqrt to profile probability (which is correct way to split this) and also adding pow that is needed elsewhere. While loop versioning I now produce code as if there was only one combined conditional and then update probability of conditional produced (containig cond1). Later edge is split and new conditional is added. At that time it is necessary to update probability of the BB containing second conditional so everything matches. gcc/ChangeLog: * profile-count.cc (profile_probability::sqrt): New member function. (profile_probability::pow): Likewise. * profile-count.h: (profile_probability::sqrt): Declare (profile_probability::pow): Likewise. * tree-vect-loop-manip.cc (vect_loop_versioning): Fix profile update.
2023-07-29Daily bump.GCC Administrator7-1/+313
2023-07-28Add a merge_range to ssa_cache and use it. add empty_p and param tweaks.Andrew MacLeod3-6/+51
* gimple-range-cache.cc (ssa_cache::merge_range): New. (ssa_lazy_cache::merge_range): New. * gimple-range-cache.h (class ssa_cache): Adjust protoypes. (class ssa_lazy_cache): Ditto. * gimple-range.cc (assume_query::calculate_op): Use merge_range.
2023-07-28Remove value_query, push into sub&fold classAndrew MacLeod4-48/+39
* tree-ssa-propagate.cc (substitute_and_fold_engine::value_on_edge): Move from value-query.cc. (substitute_and_fold_engine::value_of_stmt): Ditto. (substitute_and_fold_engine::range_of_expr): New. * tree-ssa-propagate.h (substitute_and_fold_engine): Inherit from range_query. New prototypes. * value-query.cc (value_query::value_on_edge): Relocate. (value_query::value_of_stmt): Ditto. * value-query.h (class value_query): Remove. (class range_query): Remove base class. Adjust prototypes.
2023-07-28Fix some warningsAndrew MacLeod3-26/+21
PR tree-optimization/110205 * gimple-range-cache.h (ranger_cache::m_estimate): Delete. * range-op-mixed.h (operator_bitwise_xor::op1_op2_relation_effect): Add final override. * range-op.cc (operator_lshift): Add missing final overrides. (operator_rshift): Ditto.
2023-07-28Update gcc .po filesJoseph Myers19-61503/+61504
* be.po, da.po, de.po, el.po, es.po, fi.po, fr.po, hr.po, id.po, ja.po, nl.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po, zh_CN.po, zh_TW.po: Update.
2023-07-28bpf: disable tail call optimization in BPF targetsJose E. Marchesi1-0/+3
clang disables tail call optimizations in BPF targets. Do the same in GCC. gcc/ChangeLog: * config/bpf/bpf.cc (bpf_option_override): Disable tail-call optimizations in BPF target.
2023-07-28Fortran: do not pass hidden character length for TYPE(*) dummy [PR110825]Harald Anlauf3-1/+55
gcc/fortran/ChangeLog: PR fortran/110825 * gfortran.texi: Clarify argument passing convention. * trans-expr.cc (gfc_conv_procedure_call): Do not pass the character length as hidden argument when the declared dummy argument is assumed-type. gcc/testsuite/ChangeLog: PR fortran/110825 * gfortran.dg/assumed_type_18.f90: New test.
2023-07-28Cleanup profile updating code in unrolling and splittingHonza5-163/+199
I have noticed that for all these three cases I need same update of loop exit probability. While my earlier patch unified it for unrollers, this patch makes it more general and also simplifies tree-ssa-loop-split.cc. I also refactored the code, since with all the special cases for corrupted profile it gets relatively long. I now also handle multiple loop exits in RTL unroller. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * cfgloopmanip.cc (loop_count_in): Break out from ... (loop_exit_for_scaling): Break out from ... (update_loop_exit_probability_scale_dom_bbs): Break out from ...; add more sanity check and debug info. (scale_loop_profile): ... here. (create_empty_loop_on_edge): Fix whitespac. * cfgloopmanip.h (update_loop_exit_probability_scale_dom_bbs): Declare. * loop-unroll.cc (unroll_loop_constant_iterations): Use update_loop_exit_probability_scale_dom_bbs. * tree-ssa-loop-manip.cc (update_exit_probability_after_unrolling): Remove. (tree_transform_and_unroll_loop): Use update_loop_exit_probability_scale_dom_bbs. * tree-ssa-loop-split.cc (split_loop): Use update_loop_exit_probability_scale_dom_bbs.
2023-07-28RISC-V: Specify -mabi in rv64 autovec testcasePatrick O'Neill1-1/+1
On rv32 targets, this patch fixes: FAIL: gcc.target/riscv/rvv/autovec/madd-split2-1.c -O3 -ftree-vectorize (test for excess errors) cc1: error: ABI requires '-march=rv32' gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/madd-split2-1.c: Add -mabi=lp64d to dg-options. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-07-28c++: devirtualization of array destruction [PR110057]Ng YongXiang6-6/+110
PR c++/110057 PR ipa/83054 gcc/cp/ChangeLog: * init.cc (build_vec_delete_1): Devirtualize array destruction. gcc/testsuite/ChangeLog: * g++.dg/warn/pr83054.C: Remove devirtualization warning. * g++.dg/lto/pr89335_0.C: Likewise. * g++.dg/tree-ssa/devirt-array-destructor-1.C: New test. * g++.dg/tree-ssa/devirt-array-destructor-2.C: New test. * g++.dg/warn/pr83054-2.C: New test. Signed-off-by: Ng Yong Xiang <yongxiangng@gmail.com>
2023-07-28loop-split improvements, part 3Jan Hubicka2-17/+80
extend tree-ssa-loop-split to understand test of the form if (i==0) and if (i!=0) which triggers only during the first iteration. Naturally we should also be able to trigger last iteration or split into 3 cases if the test indeed can fire in the middle of the loop. Last iteration is bit trickier pattern matching so I want to do it incrementally, but I implemented easy case using value range that handled loops with constant iterations. The testcase gets misupdated profile, I will also fix that incrementally. gcc/ChangeLog: PR middle-end/77689 * tree-ssa-loop-split.cc: Include value-query.h. (split_at_bb_p): Analyze cases where EQ/NE can be turned into LT/LE/GT/GE; return updated guard code. (split_loop): Use guard code. gcc/testsuite/ChangeLog: PR middle-end/77689 * g++.dg/tree-ssa/loop-split-1.C: New test.
2023-07-28PR rtl-optimization/110587: Reduce useless moves in compile-time hog.Roger Sayle1-9/+4
This patch is one of a series of fixes for PR rtl-optimization/110587, a compile-time regression with -O0, that attempts to address the underlying cause. As noted previously, the pathological test case pr28071.c contains a large number of useless register-to-register moves that can produce quadratic behaviour (in LRA). These moves are generated during RTL expansion in emit_group_load_1, where the middle-end attempts to simplify the source before calling extract_bit_field. This is reasonable if the source is a complex expression (from before the tree-ssa optimizers), or a SUBREG, or a hard register, but it's not particularly useful to copy a pseudo register into a new pseudo register. This patch eliminates that redundancy. The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains 777K lines, with this patch it contains 717K lines, i.e. saving about 60K lines (admittedly of debugging text output, but it makes the point). 2023-07-28 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR middle-end/28071 PR rtl-optimization/110587 * expr.cc (emit_group_load_1): Simplify logic for calling force_reg on ORIG_SRC, to avoid making a copy if the source is already in a pseudo register.
2023-07-28loop-split improvements, part 2Jan Hubicka4-14/+180
this patch fixes profile update in the first case of loop splitting. The pass still gives up on very basic testcases: __attribute__ ((noinline,noipa)) void test1 (int n) { if (n <= 0 || n > 100000) return; for (int i = 0; i <= n; i++) { if (i < n) do_something (); if (a[i]) do_something2(); } } Here I needed to do the conditoinal that enforces sane value range of n. The reason is that it gives up on: !number_of_iterations_exit (loop1, exit1, &niter, false, true) and without the conditonal we get assumption that n>=0 and not INT_MAX. I think from overflow we shold derive that INT_MAX test is not needed and since the loop does nothing for n<0 it is also just an paranoia. I am not sure how to fix this though :(. In general the pass does not really need to compute iteration count. It only needs to know what direction the IVs go so it can detect tests that fires in first part of iteration space. Rich, any idea what the correct test should be? In testcase: for (int i = 0; i < 200; i++) if (i < 150) do_something (); else do_something2 (); the old code did wrong update of the exit condition probabilities. We know that first loop iterates 150 times and the second loop 50 times and we get it by simply scaling loop body by the probability of inner test. With the patch we now get: <bb 2> [count: 1000]: <bb 3> [count: 150000]: <- loop 1 correctly iterates 149 times # i_10 = PHI <i_7(8), 0(2)> do_something (); i_7 = i_10 + 1; if (i_7 <= 149) goto <bb 8>; [99.33%] else goto <bb 17>; [0.67%] <bb 8> [count: 149000]: goto <bb 3>; [100.00%] <bb 16> [count: 1000]: # i_15 = PHI <i_18(17)> <bb 9> [count: 49975]: <- loop 2 should iterate 50 times but we are slightly wrong # i_3 = PHI <i_15(16), i_14(13)> do_something2 (); i_14 = i_3 + 1; if (i_14 != 200) goto <bb 13>; [98.00%] else goto <bb 7>; [2.00%] <bb 13> [count: 48975]: goto <bb 9>; [100.00%] <bb 17> [count: 1000]: <- this test is always true becuase it is reached form bb 3 # i_18 = PHI <i_7(3)> if (i_18 != 200) goto <bb 16>; [99.95%] else goto <bb 7>; [0.05%] <bb 7> [count: 1000]: return; The reason why we are slightly wrong is the condtion in bb17 that is always true but the pass does not konw it. Rich any idea how to do that? I think connect_loops should work out the cas where the loop exit conditon is never satisfied at the time the splitted condition fails for first time. Before patch on hmmer we get a lot of mismatches: Profile report here claims: dump id |static mismat|dynamic mismatch | |in count |in count |time | lsplit | 5 +5| 8151850567 +8151850567| 531506481006 +57.9%| ldist | 9 +4| 15345493501 +7193642934| 606848841056 +14.2%| ifcvt | 10 +1| 15487514871 +142021370| 689469797790 +13.6%| vect | 35 +25| 17558425961 +2070911090| 517375405715 -25.0%| cunroll | 42 +7| 16898736178 -659689783| 452445796198 -4.9%| loopdone| 33 -9| 2678017188 -14220718990| 330969127663 | tracer | 34 +1| 2678018710 +1522| 330613415364 +0.0%| fre | 33 -1| 2676980249 -1038461| 330465677073 -0.0%| expand | 28 -5| 2497468467 -179511782|--------------------------| With patch lsplit | 0 | 0 | 328723360744 -2.3%| ldist | 0 | 0 | 396193562452 +20.6%| ifcvt | 1 +1| 71010686 +71010686| 478743508522 +20.8%| vect | 14 +13| 697518955 +626508269| 299398068323 -37.5%| cunroll | 13 -1| 489349408 -208169547| 257777839725 -10.5%| loopdone| 11 -2| 402558559 -86790849| 201010712702 | tracer | 13 +2| 402977200 +418641| 200651036623 +0.0%| fre | 13 | 402622146 -355054| 200344398654 -0.2%| expand | 11 -2| 333608636 -69013510|--------------------------| So no mismatches for lsplit and ldist and also lsplit thinks it improves speed by 2.3% rather than regressig it by 57%. Update is still not perfect since we do not work out that the second loop never iterates. Ifcft wrecks profile by desing since it insert conditonals with both arms 100% that will be eliminated later after vect. It is not clear to me what happens in vect though. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: PR middle-end/106923 * tree-ssa-loop-split.cc (connect_loops): Change probability of the test preconditioning second loop to very_likely. (fix_loop_bb_probability): Handle correctly case where on of the arms of the conditional is empty. (split_loop): Fold the test guarding first condition to see if it is constant true; Set correct entry block probabilities of the split loops; determine correct loop eixt probabilities. gcc/testsuite/ChangeLog: PR middle-end/106293 * gcc.dg/tree-prof/loop-split-1.c: New test. * gcc.dg/tree-prof/loop-split-2.c: New test. * gcc.dg/tree-prof/loop-split-3.c: New test.
2023-07-28ada: Elide the copy in extended returns for nonlimited by-reference typesEric Botcazou1-3/+4
gcc/ada/ * gcc-interface/trans.cc (gnat_to_gnu): Restrict previous change to the case where the simple return statement has got no storage pool.
2023-07-28ada: Add an assert in Posix Interrupt_WaitClément Chigot1-0/+1
All functions but Interrupt_Wait in s-inmaop__posix are checking the result of their syscalls with an assert. However, any return code of sigwait different than 0 means that something went wrong for it. From sigwait man: > RETURN VALUE > On success, sigwait() returns 0. On error, it returns a > positive error number (listed in ERRORS). gcc/ada/ * libgnarl/s-inmaop__posix.adb: Add assert after sigwait in Interrupt_Wait
2023-07-28ada: Fix unsupported dispatching constructor callJavier Miranda6-151/+418
Add dummy build-in-place parameters when a BIP function does not require the BIP parameters but it is a dispatching operation that inherited them. gcc/ada/ * einfo-utils.adb (Underlying_Type): Protect recursion call against non-available attribute Etype. * einfo.ads (Protected_Subprogram): Fix typo in documentation. * exp_ch3.adb (BIP_Function_Call_Id): New subprogram. (Expand_N_Object_Declaration): Improve code that evaluates if the object is initialized with a BIP function call. * exp_ch6.adb (Is_True_Build_In_Place_Function_Call): New subprogram. (Add_Task_Actuals_To_Build_In_Place_Call): Add dummy actuals if the function does not require the BIP task actuals but it is a dispatching operation that inherited them. (Build_In_Place_Formal): Improve code to avoid never-ending loop if the BIP formal is not found. (Add_Dummy_Build_In_Place_Actuals): New subprogram. (Expand_Call_Helper): Add calls to Add_Dummy_Build_In_Place_Actuals. (Expand_N_Extended_Return_Statement): Adjust assertion. (Expand_Simple_Function_Return): Adjust assertion. (Make_Build_In_Place_Call_In_Allocator): No action needed if the called function inherited the BIP extra formals but it is not a true BIP function. (Make_Build_In_Place_Call_In_Assignment): Ditto. * exp_intr.adb (Expand_Dispatching_Constructor_Call): Remove code reporting unsupported case (since this patch adds support for it). * sem_ch6.adb (Analyze_Subprogram_Body_Helper): Adding assertion to ensure matching of BIP formals when setting the Protected_Formal field of a protected subprogram to reference the corresponding extra formal of the subprogram that implements it. (Might_Need_BIP_Task_Actuals): New subprogram. (Create_Extra_Formals): Improve code adding inherited extra formals.
2023-07-28ada: Add support for binding to a specific network interface controller.Pascal Obry3-2/+34
gcc/ada/ * s-oscons-tmplt.c: Add support for SO_BINDTODEVICE constant. * libgnat/g-socket.ads (Set_Socket_Option): Handle SO_BINDTODEVICE option. (Get_Socket_Option): Handle SO_BINDTODEVICE option. * libgnat/g-socket.adb: Likewise. (Get_Socket_Option): Handle the case where IF_NAMESIZE is not defined and so equal to -1.
2023-07-28ada: Add missing SCO generation for quantified expressions in object declLéo Creuse1-2/+4
This change corrects the Has_Decision predicate in par_sco.adb to properly consider predicates of quantified expressions as decisions. gcc/ada/ * par_sco.adb (Has_Decision): Consider that quantified expressions contain decisions.
2023-07-28ada: Fix race condition in protected entry callRonan Desplanques1-2/+8
This patch only affects the single-entry implementation of protected objects. Before this patch, there was a race condition where a task that called an entry could put itself to sleep right after another task had executed the entry as a proxy and signalled the not-yet-waiting first task, which caused the first task to enter a deadlock. Note that this race condition has been identified and fixed before for the implementations of the run-time that live under hie/. This patch reworks the locking sequence so that it is closer to the one that's used in the multiple-entry implementation of protected objects. The code for the multiple-entry implementation is spread across multiple subprograms. To draw a parallel with the section this patch modifies, one can read the following subprograms: - System.Tasking.Protected_Objects.Operations.Protected_Entry_Call - System.Tasking.Entry_Calls.Wait_For_Completion - System.Tasking.Entry_Calls.Check_Pending_Actions_For_Entry_Call This patch also adds a comment that explicitly states the locking constraint that must hold in the affected section. gcc/ada/ * libgnarl/s-tposen.adb: Fix race condition. Add comment to justify the locking timing.
2023-07-28ada: Small refactorViljar Indus1-2/+3
gcc/ada/ * exp_util.adb (Find_Optional_Prim_Op): use "No" instead of "= Empty"
2023-07-28ada: Add guard for detection of class-wide precondition subprogramsPiotr Trojanek1-1/+4
When skipping check on subprograms built for class-wide preconditions we must deal with the current scope not being a subprogram, e.g. it could be a declare-block. gcc/ada/ * sem_res.adb (Resolve_Actuals): Add guard for the call to Class_Preconditions_Subprogram.
2023-07-28ada: Fix memory explosion on aggregate of nested packed array typeEric Botcazou1-1/+3
It occurs at compile time on an aggregate of a 2-dimensional packed array type whose component type is itself a packed array, because the compiler is trying to pack the intermediate aggregate and ends up rewriting a bunch of subcomponents. This optimization was originally devised for the case of a scalar component type so the change adds this restriction. gcc/ada/ * exp_aggr.adb (Is_Two_Dim_Packed_Array): Return true only if the component type of the array is scalar.
2023-07-28ada: Leave detection of missing return in functions to GNATprovePiotr Trojanek1-9/+2
GNAT has a heuristic to warn about missing return statements in functions. This warning was escalated to errors when operating in GNATprove mode and SPARK_Mode was On. However, this heuristic was imprecise and caused spurious errors. Also, it was applied after the Push_Scope/End_Scope, so for functions acting as compilation units it was using the wrong SPARK_Mode. It is better to simply leave this detection to GNATprove. gcc/ada/ * sem_ch6.adb (Check_Statement_Sequence): Only warn about missing return statements and let GNATprove emit a check when needed.
2023-07-28ada: Emit enums rather than defines for various constantsTom Tromey5-44/+69
This patch changes xsnamest and gen_il-gen to emit various constants as enums rather than a sequence of preprocessor defines. This enables better debugging and somewhat better type safety. gcc/ada/ * fe.h (Convention): Now inline function. * gen_il-gen.adb (Put_C_Type_And_Subtypes.Put_Enum_Lit) (Put_C_Type_And_Subtypes.Put_Kind_Subtype, Put_C_Getter): Emit enum. * snames.h-tmpl (Name_Id, Name_, Attribute_Id, Attribute_) (Convention_Id, Convention_, Pragma_Id, Pragma_): Now enum. (Get_Attribute_Id, Get_Pragma_Id): Now inline functions. * types.h (Node_Kind, Entity_Kind, Convention_Id, Name_Id): Now enum. * xsnamest.adb (Output_Header_Line, Make_Value): Emit enum.
2023-07-28ada: Fix typo in comment of Ada.Exceptions.Save_OccurrencePiotr Trojanek1-1/+1
Minor typo in comment. gcc/ada/ * libgnat/a-except.ads (Save_Occurrence): Fix typo.
2023-07-28ada: Allow calls to Number_Formals when no formals are presentPiotr Trojanek4-5/+9
It is much simpler and safer for the routine Number_Formals to accept subprogram entities that have no formals. gcc/ada/ * einfo-utils.adb (Number_Formals): Change types in body. * einfo-utils.ads (Number_Formals): Change type in spec. * einfo.ads (Number_Formals): Change type in comment. * sem_ch13.adb (Is_Property_Function): Fix style in a caller of Number_Formals that was likely to crash because of missing guards.
2023-07-28ada: Improve defense against illegal code in check for infinite loopsPiotr Trojanek1-1/+3
Fix crash occurring when attribute System'To_Address is used without a WITH clause for package System. gcc/ada/ * sem_warn.adb (Check_Infinite_Loop_Warning): Don't look at the type of actual parameter when it has no type at all, e.g. because the entire subprogram call is illegal.
2023-07-28RISC-V: Remove vxrm parameter for vsadd[u] and vssub[u]xuli37-152/+233
Computation of `vsadd`, `vsaddu`, `vssub`, and `vssubu` do not need the rounding mode, therefore the intrinsics of these instructions do not have the parameter for rounding mode control. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: remove rounding mode of vsadd[u] and vssub[u]. * config/riscv/vector.md: Ditto. gcc/testsuite/ChangeLog: * g++.target/riscv/rvv/base/bug-12.C: Adapt testcase. * g++.target/riscv/rvv/base/bug-14.C: Ditto. * g++.target/riscv/rvv/base/bug-18.C: Ditto. * g++.target/riscv/rvv/base/bug-19.C: Ditto. * g++.target/riscv/rvv/base/bug-20.C: Ditto. * g++.target/riscv/rvv/base/bug-21.C: Ditto. * g++.target/riscv/rvv/base/bug-22.C: Ditto. * g++.target/riscv/rvv/base/bug-23.C: Ditto. * g++.target/riscv/rvv/base/bug-3.C: Ditto. * g++.target/riscv/rvv/base/bug-8.C: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-100.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-101.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-103.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-104.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-105.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-106.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-107.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-109.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-110.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-111.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-112.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-113.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-115.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-116.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-117.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-118.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-97.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-98.c: Ditto. * gcc.target/riscv/rvv/base/merge_constraint-1.c: Ditto. * gcc.target/riscv/rvv/base/fixed-point-vxrm-error.c: New test. * gcc.target/riscv/rvv/base/fixed-point-vxrm.c: New test.
2023-07-28loop-split improvements, part 1Jan Hubicka4-6/+13
while looking on profile misupdate on hmmer I noticed that loop splitting pass is not able to handle the loop it has as an example it should apply on: One transformation of loops like: for (i = 0; i < 100; i++) { if (i < 50) A; else B; } into: for (i = 0; i < 50; i++) { A; } for (; i < 100; i++) { B; } The problem is that ivcanon turns the test into i != 100 and the pass explicitly gives up on any loops ending with != test. It needs to know the directoin of the induction variable in order to derive right conditions, but that can be done also from step. It turns out that there are no testcases for basic loop splitting. I will add some with the profile update fix. gcc/ChangeLog: * tree-ssa-loop-split.cc (split_loop): Also support NE driven loops when IV test is not overflowing. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ifc-12.c: Disable loop splitting. * gcc.target/i386/avx2-gather-6.c: Likewise. * gcc.target/i386/avx2-vect-aggressive.c: Likewise.
2023-07-28Add UNSPEC_MASKOP to vpbroadcastm pattern.liuhongt2-2/+17
Prevent rtl optimization of vec_duplicate + zero_extend to vpbroadcastm since there could be an extra kmov after RA. gcc/ChangeLog: PR target/110788 * config/i386/sse.md (avx512cd_maskb_vec_dup<mode>): Add UNSPEC_MASKOP. (avx512cd_maskw_vec_dup<mode>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110788.c: New test.
2023-07-28Daily bump.GCC Administrator6-1/+279
2023-07-27bpf: ISA V4 sign-extending move and load insns [PR110782,PR110784]David Faust8-1/+133
BPF ISA V4 introduces sign-extending move and load operations. This patch makes the BPF backend generate those instructions, when enabled and useful. A new option, -m[no-]smov gates generation of these instructions, and is enabled by default for -mcpu=v4 and above. Tests for the new instructions and documentation for the new options are included. PR target/110782 PR target/110784 gcc/ * config/bpf/bpf.opt (msmov): New option. * config/bpf/bpf.cc (bpf_option_override): Handle it here. * config/bpf/bpf.md (*extendsidi2): New. (extendhidi2): New. (extendqidi2): New. (extendsisi2): New. (extendhisi2): New. (extendqisi2): New. * doc/invoke.texi (Option Summary): Add -msmov eBPF option. (eBPF Options): Add -m[no-]smov. Document that -mcpu=v4 also enables -msmov. gcc/testsuite/ * gcc.target/bpf/sload-1.c: New test. * gcc.target/bpf/sload-pseudoc-1.c: New test. * gcc.target/bpf/smov-1.c: New test. * gcc.target/bpf/smov-pseudoc-1.c: New test.
2023-07-27bpf: minor doc cleanup for command-line optionsDavid Faust1-25/+23
This patch makes some minor cleanups to eBPF options documented in invoke.texi: - Delete some vestigal docs for removed -mkernel option - Add -mbswap and -msdiv to the option summary - Note the negative versions of several options - Note that -mcpu=v4 also enables -msdiv. gcc/ * doc/invoke.texi (Option Summary): Remove -mkernel eBPF option. Add -mbswap and -msdiv eBPF options. (eBPF Options): Remove -mkernel. Add -mno-{jmpext, jmp32, alu32, v3-atomics, bswap, sdiv}. Document that -mcpu=v4 also enables -msdiv.