riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2023-07-31	combine: Narrow comparison of memory and constant	Stefan Schulze Frielinghaus	8	-5/+200
	Comparisons between memory and constants might be done in a smaller mode resulting in smaller constants which might finally end up as immediates instead of in the literal pool. For example, on s390x a non-symmetric comparison like x <= 0x3fffffffffffffff results in the constant being spilled to the literal pool and an 8 byte memory comparison is emitted. Ideally, an equivalent comparison x0 <= 0x3f where x0 is the most significant byte of x, is emitted where the constant is smaller and more likely to materialize as an immediate. Similarly, comparisons of the form x >= 0x4000000000000000 can be shortened into x0 >= 0x40. gcc/ChangeLog: * combine.cc (simplify_compare_const): Narrow comparison of memory and constant. (try_combine): Adapt new function signature. (simplify_comparison): Adapt new function signature. gcc/testsuite/ChangeLog: * gcc.dg/cmp-mem-const-1.c: New test. * gcc.dg/cmp-mem-const-2.c: New test. * gcc.dg/cmp-mem-const-3.c: New test. * gcc.dg/cmp-mem-const-4.c: New test. * gcc.dg/cmp-mem-const-5.c: New test. * gcc.dg/cmp-mem-const-6.c: New test. * gcc.target/s390/cmp-mem-const-1.c: New test.
2023-07-31	RISC-V: Drop unused variable	Kito Cheng	1	-2/+0
	gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vec_series): Drop unused variable. (expand_vector_init_insert_elems): Ditto.
2023-07-31	AArch64: Do not increase the vect reduction latency by multiplying count ↵	Hao Liu	3	-4/+83
	[PR110625] The new costs should only count reduction latency by multiplying count for single_defuse_cycle. For other situations, this will increase the reduction latency a lot and miss vectorization opportunities. Tested on aarch64-linux-gnu. gcc/ChangeLog: PR target/110625 * config/aarch64/aarch64.cc (count_ops): Only '* count' for single_defuse_cycle while counting reduction_latency. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr110625_1.c: New testcase. * gcc.target/aarch64/pr110625_2.c: New testcase.
2023-07-31	internal-fn: Refine macro define of COND_* and COND_LEN_* internal functions	Ju-Zhe Zhong	1	-67/+56
	Hi, Richard and Richi. Base on previous disscussions, we should make COND_* and COND_LEN_* consistent. So, this patch define these internal function together by these 2 wrappers: DEF_INTERNAL_OPTAB_FN (COND_##NAME, FLAGS, cond_##OPTAB, cond_##TYPE) \ DEF_INTERNAL_OPTAB_FN (COND_LEN_##NAME, FLAGS, cond_len_##OPTAB, \ cond_len_##TYPE) UNSIGNED_OPTAB, TYPE) \ DEF_INTERNAL_SIGNED_OPTAB_FN (COND_##NAME, FLAGS, SELECTOR, \ cond_##SIGNED_OPTAB, cond_##UNSIGNED_OPTAB, \ cond_##TYPE) \ DEF_INTERNAL_SIGNED_OPTAB_FN (COND_LEN_##NAME, FLAGS, SELECTOR, \ cond_len_##SIGNED_OPTAB, \ cond_len_##UNSIGNED_OPTAB, cond_len_##TYPE) Bootstrap and Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * internal-fn.def (DEF_INTERNAL_COND_FN): New macro. (DEF_INTERNAL_SIGNED_COND_FN): Ditto. (COND_ADD): Remove. (COND_SUB): Ditto. (COND_MUL): Ditto. (COND_DIV): Ditto. (COND_MOD): Ditto. (COND_RDIV): Ditto. (COND_MIN): Ditto. (COND_MAX): Ditto. (COND_FMIN): Ditto. (COND_FMAX): Ditto. (COND_AND): Ditto. (COND_IOR): Ditto. (COND_XOR): Ditto. (COND_SHL): Ditto. (COND_SHR): Ditto. (COND_FMA): Ditto. (COND_FMS): Ditto. (COND_FNMA): Ditto. (COND_FNMS): Ditto. (COND_NEG): Ditto. (COND_LEN_ADD): Ditto. (COND_LEN_SUB): Ditto. (COND_LEN_MUL): Ditto. (COND_LEN_DIV): Ditto. (COND_LEN_MOD): Ditto. (COND_LEN_RDIV): Ditto. (COND_LEN_MIN): Ditto. (COND_LEN_MAX): Ditto. (COND_LEN_FMIN): Ditto. (COND_LEN_FMAX): Ditto. (COND_LEN_AND): Ditto. (COND_LEN_IOR): Ditto. (COND_LEN_XOR): Ditto. (COND_LEN_SHL): Ditto. (COND_LEN_SHR): Ditto. (COND_LEN_FMA): Ditto. (COND_LEN_FMS): Ditto. (COND_LEN_FNMA): Ditto. (COND_LEN_FNMS): Ditto. (COND_LEN_NEG): Ditto. (ADD): New macro define. (SUB): Ditto. (MUL): Ditto. (DIV): Ditto. (MOD): Ditto. (RDIV): Ditto. (MIN): Ditto. (MAX): Ditto. (FMIN): Ditto. (FMAX): Ditto. (AND): Ditto. (IOR): Ditto. (XOR): Ditto. (SHL): Ditto. (SHR): Ditto. (FMA): Ditto. (FMS): Ditto. (FNMA): Ditto. (FNMS): Ditto. (NEG): Ditto.
2023-07-31	Use substituted GDCFLAGS	Andreas Schwab	3	-1/+3
	Use the substituted value for GCDFLAGS instead of hardcoding $(CFLAGS) so that the subdir configure scripts use the configured value. * configure.ac (GDCFLAGS): Set default from ${CFLAGS}. * configure: Regenerate. * Makefile.in (GDCFLAGS): Substitute @GDCFLAGS@.
2023-07-31	[Committed] PR target/110843: Check TARGET_AVX512VL for V2DI rotates in STV.	Roger Sayle	2	-3/+23
	This patch resolves PR target/110843, an ICE caused by my enhancement to support AVX512 DImode and SImode rotates in the scalar-to-vector (STV) pass. Although the vprotate instructions are available on all TARGET_AVX512F microarchitectures, the V2DI and V4SI variants are only available on the TARGET_AVX512VL subset, leading to problems when command line options enable AVX512 (i.e. AVX512F) but not the required AVX512VL functionality. The simple fix is to update/correct the target checks. 2023-07-31 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/110843 * config/i386/i386-features.cc (compute_convert_gain): Check TARGET_AVX512VL (not TARGET_AVX512F) when considering V2DImode and V4SImode rotates in STV. (general_scalar_chain::convert_rotate): Likewise. gcc/testsuite/ChangeLog PR target/110843 * gcc.target/i386/pr110843.c: New test case.
2023-07-31	RISC-V: Return machine_mode rather than opt_machine_mode for get_mask_mode, NFC	Kito Cheng	4	-40/+37
	We always want get_mask_mode return a valid mode, it's something wrong if it failed, so I think we could just move the `.require ()` into get_mask_mode, instead of calling that every call-site. The only exception is riscv_get_mask_mode, it might put supported mode into get_mask_mode, so added a check with riscv_v_ext_mode_p to make sure only valid vector mode will ask get_mask_mode. gcc/ChangeLog: * config/riscv/autovec.md (abs<mode>2): Remove `.require ()`. * config/riscv/riscv-protos.h (get_mask_mode): Update return type. * config/riscv/riscv-v.cc (rvv_builder::rvv_builder): Remove `.require ()`. (emit_vlmax_insn): Ditto. (emit_vlmax_fp_insn): Ditto. (emit_vlmax_ternary_insn): Ditto. (emit_vlmax_fp_ternary_insn): Ditto. (emit_nonvlmax_fp_ternary_tu_insn): Ditto. (emit_nonvlmax_insn): Ditto. (emit_vlmax_slide_insn): Ditto. (emit_nonvlmax_slide_tu_insn): Ditto. (emit_vlmax_merge_insn): Ditto. (emit_vlmax_masked_insn): Ditto. (emit_nonvlmax_masked_insn): Ditto. (emit_vlmax_masked_store_insn): Ditto. (emit_nonvlmax_masked_store_insn): Ditto. (emit_vlmax_masked_mu_insn): Ditto. (emit_nonvlmax_tu_insn): Ditto. (emit_nonvlmax_fp_tu_insn): Ditto. (emit_scalar_move_insn): Ditto. (emit_vlmax_compress_insn): Ditto. (emit_vlmax_reduction_insn): Ditto. (emit_vlmax_fp_reduction_insn): Ditto. (emit_nonvlmax_fp_reduction_insn): Ditto. (expand_vec_series): Ditto. (expand_vector_init_merge_repeating_sequence): Ditto. (expand_vec_perm): Ditto. (shuffle_merge_patterns): Ditto. (shuffle_compress_patterns): Ditto. (shuffle_decompress_patterns): Ditto. (expand_reduction): Ditto. (get_mask_mode): Update return type. * config/riscv/riscv.cc (riscv_get_mask_mode): Check vector type is valid, and use new get_mask_mode interface.
2023-07-31	RISC-V: Bugfix for RVV floating-point rm suffix sequence	Pan Li	3	-20/+20
	According to below RVV intrinsic doc, the RVV floating-point intrinsic name with rounding mode should be: _rm_m instead of: _m_rm https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226 This patch fix this naming sequence issue and adjust the test cases. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-shapes.cc (struct alu_frm_def): Move rm suffix before mask. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/float-point-frm-insert-1.c: Adjust test cases. * gcc.target/riscv/rvv/base/float-point-frm.c: Ditto.
2023-07-31	RISC-V: Enable basic VLS auto-vectorization	Juzhe-Zhong	13	-6/+1034
	Consider this following case: void foo (int8_t in, int8_t out, int8_t x) { for (int i = 0; i < 16; i++) in[i] = x; } Compile option: --param=riscv-autovec-preference=scalable -fno-builtin Before this patch: foo: li a5,16 csrr a4,vlenb vsetvli a3,zero,e8,m1,ta,ma vmv.v.x v1,a2 bleu a5,a4,.L2 mv a5,a4 .L2: vsetvli zero,a5,e8,m1,ta,ma vse8.v v1,0(a0) ret After this patch: foo: vsetivli zero,16,e8,mf8,ta,ma vmv.v.x v1,a2 vse8.v v1,0(a0) ret gcc/ChangeLog: * config/riscv/autovec-vls.md (@vec_duplicate<mode>): New pattern. * config/riscv/riscv-v.cc (autovectorize_vector_modes): Add VLS autovec support. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/v-1.c: Adapt test. * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/dup-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/dup-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/dup-3.c: New test. * gcc.target/riscv/rvv/autovec/vls/dup-4.c: New test. * gcc.target/riscv/rvv/autovec/vls/dup-5.c: New test. * gcc.target/riscv/rvv/autovec/vls/dup-6.c: New test. * gcc.target/riscv/rvv/autovec/vls/dup-7.c: New test.
2023-07-31	MAINTAINERS: Add myself to write after approval	xuli	1	-0/+1
	Signed-off-by: Li Xu <xuli1@eswincomputing.com> ChangeLog: * MAINTAINERS: Add myself.
2023-07-31	Daily bump.	GCC Administrator	2	-1/+8

2023-07-30	libstdc++: Fix several preprocessor directives	François Dumont	3	-4/+4
	A wrong usage of #define in place of a #error seems to have been replicated at different places in source files. libstdc++-v3/ChangeLog: * src/c++11/compatibility-ldbl-facets-aliases.h: Replace #define with proper #error. * src/c++11/locale-inst-monetary.h: Likewise. * src/c++11/locale-inst-numeric.h: Likewise.
2023-07-30	Daily bump.	GCC Administrator	5	-1/+68

2023-07-29	[Committed] Use QImode for offsets in zero_extract/sign_extract in i386.md	Roger Sayle	3	-109/+149
	As suggested by Uros, this patch changes the ZERO_EXTRACTs and SIGN_EXTRACTs in i386.md to consistently use QImode for bit offsets (i.e. third and fourth operands), matching the use of QImode for bit counts in shifts and rotates. This iteration also corrects the "ne:QI" vs "eq:QI" mistake in the previous version, which was responsible for PR 110787 and PR 110790 and so was rapidly reverted last weekend. New test cases have been added to check the correct behaviour. 2023-07-29 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/110790 * config/i386/i386.md (extv<mode>): Use QImode for offsets. (extzv<mode>): Likewise. (insv<mode>): Likewise. (testqi_ext_3): Likewise. (btr<mode>_2): Likewise. (define_split): Likewise. (btsq_imm): Likewise. (btrq_imm): Likewise. (btcq_imm): Likewise. (define_peephole2 x3): Likewise. (bt<mode>): Likewise (bt<mode>_mask): New define_insn_and_split. (jcc_bt<mode>): Use QImode for offsets. (jcc_bt<mode>_1): Delete obsolete pattern. (jcc_bt<mode>_mask): Use QImode offsets. (jcc_bt<mode>_mask_1): Likewise. (define_split): Likewise. (bt<mode>_setcqi): Likewise. (bt<mode>_setncqi): Likewise. (bt<mode>_setnc<mode>): Likewise. (bt<mode>_setncqi_2): Likewise. (bt<mode>_setc<mode>_mask): New define_insn_and_split. (bmi2_bzhi_<mode>3): Use QImode offsets. (bmi2_bzhi_<mode>3): Likewise. (bmi2_bzhi_<mode>3_1): Likewise. (bmi2_bzhi_<mode>3_1_ccz): Likewise. (@tbm_bextri_<mode>): Likewise. gcc/testsuite/ChangeLog PR target/110790 gcc.target/i386/pr110790-1.c: New test case. * gcc.target/i386/pr110790-2.c: Likewise.
2023-07-29	libgomp: cuda.h and omp_target_memcpy_rect cleanup	Tobias Burnus	3	-42/+28
	Fixes for commit r14-2792-g25072a477a56a727b369bf9b20f4d18198ff5894 "OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect", namely: In that commit, the code was changed to handle shared-memory devices; however, as pointed out, omp_target_memcpy_check already set the pointer to NULL in that case. Hence, this commit reverts to the prior version. In cuda.h, it adds cuMemcpyPeer{,Async} for symmetry for cuMemcpy3DPeer (all currently unused) and in three structs, fixes reserved-member names and remove a bogus 'const' in three structs. And it changes a DLSYM to DLSYM_OPT as not all plugins support the new functions, yet. include/ChangeLog: * cuda/cuda.h (CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): Remove bogus 'const' from 'const void dst' and fix reserved-name name in those structs. (cuMemcpyPeer, cuMemcpyPeerAsync): Add. libgomp/ChangeLog: target.c (omp_target_memcpy_rect_worker): Undo dim=1 change for GOMP_OFFLOAD_CAP_SHARED_MEM. (omp_target_memcpy_rect_copy): Likewise for lock condition. (gomp_load_plugin_for_device): Use DLSYM_OPT not DLSYM for memcpy3d/memcpy2d. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): Use memset 0 to nullify reserved and unused src/dst fields for that mem type; remove '{src,dst}LOD = 0'.
2023-07-29	Fix profile update after vectorize loop versioning	Jan Hubicka	1	-0/+9
	gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/vect-profile-upate-2.c: New test.
2023-07-29	Fix profile update after vectorize loop versioning	Jan Hubicka	3	-3/+75
	Vectorizer while loop versioning produces a versioned loop guarded with two conditionals of the form if (cond1) goto scalar_loop else goto next_bb next_bb: if (cond2) godo scalar_loop else goto vector_loop It wants the combined test to be prob (whch is set to likely) and uses profile_probability::split to determine probability of cond1 and cond2. However spliting is turning: if (cond) goto lab; // ORIG probability into if (cond1) goto lab; // FIRST = ORIG * CPROB probability if (cond2) goto lab; // SECOND probability Which is or instead of and. As a result we get pretty low probabiility of entering vectorized loop. The fixes this by introducing sqrt to profile probability (which is correct way to split this) and also adding pow that is needed elsewhere. While loop versioning I now produce code as if there was only one combined conditional and then update probability of conditional produced (containig cond1). Later edge is split and new conditional is added. At that time it is necessary to update probability of the BB containing second conditional so everything matches. gcc/ChangeLog: * profile-count.cc (profile_probability::sqrt): New member function. (profile_probability::pow): Likewise. * profile-count.h: (profile_probability::sqrt): Declare (profile_probability::pow): Likewise. * tree-vect-loop-manip.cc (vect_loop_versioning): Fix profile update.
2023-07-29	Daily bump.	GCC Administrator	7	-1/+313

2023-07-28	Add a merge_range to ssa_cache and use it. add empty_p and param tweaks.	Andrew MacLeod	3	-6/+51
	* gimple-range-cache.cc (ssa_cache::merge_range): New. (ssa_lazy_cache::merge_range): New. * gimple-range-cache.h (class ssa_cache): Adjust protoypes. (class ssa_lazy_cache): Ditto. * gimple-range.cc (assume_query::calculate_op): Use merge_range.
2023-07-28	Remove value_query, push into sub&fold class	Andrew MacLeod	4	-48/+39
	* tree-ssa-propagate.cc (substitute_and_fold_engine::value_on_edge): Move from value-query.cc. (substitute_and_fold_engine::value_of_stmt): Ditto. (substitute_and_fold_engine::range_of_expr): New. * tree-ssa-propagate.h (substitute_and_fold_engine): Inherit from range_query. New prototypes. * value-query.cc (value_query::value_on_edge): Relocate. (value_query::value_of_stmt): Ditto. * value-query.h (class value_query): Remove. (class range_query): Remove base class. Adjust prototypes.
2023-07-28	Fix some warnings	Andrew MacLeod	3	-26/+21
	PR tree-optimization/110205 * gimple-range-cache.h (ranger_cache::m_estimate): Delete. * range-op-mixed.h (operator_bitwise_xor::op1_op2_relation_effect): Add final override. * range-op.cc (operator_lshift): Add missing final overrides. (operator_rshift): Ditto.
2023-07-28	Update gcc .po files	Joseph Myers	19	-61503/+61504
	* be.po, da.po, de.po, el.po, es.po, fi.po, fr.po, hr.po, id.po, ja.po, nl.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po, zh_CN.po, zh_TW.po: Update.
2023-07-28	bpf: disable tail call optimization in BPF targets	Jose E. Marchesi	1	-0/+3
	clang disables tail call optimizations in BPF targets. Do the same in GCC. gcc/ChangeLog: * config/bpf/bpf.cc (bpf_option_override): Disable tail-call optimizations in BPF target.
2023-07-28	Fortran: do not pass hidden character length for TYPE(*) dummy [PR110825]	Harald Anlauf	3	-1/+55
	gcc/fortran/ChangeLog: PR fortran/110825 * gfortran.texi: Clarify argument passing convention. * trans-expr.cc (gfc_conv_procedure_call): Do not pass the character length as hidden argument when the declared dummy argument is assumed-type. gcc/testsuite/ChangeLog: PR fortran/110825 * gfortran.dg/assumed_type_18.f90: New test.
2023-07-28	Cleanup profile updating code in unrolling and splitting	Honza	5	-163/+199
	I have noticed that for all these three cases I need same update of loop exit probability. While my earlier patch unified it for unrollers, this patch makes it more general and also simplifies tree-ssa-loop-split.cc. I also refactored the code, since with all the special cases for corrupted profile it gets relatively long. I now also handle multiple loop exits in RTL unroller. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * cfgloopmanip.cc (loop_count_in): Break out from ... (loop_exit_for_scaling): Break out from ... (update_loop_exit_probability_scale_dom_bbs): Break out from ...; add more sanity check and debug info. (scale_loop_profile): ... here. (create_empty_loop_on_edge): Fix whitespac. * cfgloopmanip.h (update_loop_exit_probability_scale_dom_bbs): Declare. * loop-unroll.cc (unroll_loop_constant_iterations): Use update_loop_exit_probability_scale_dom_bbs. * tree-ssa-loop-manip.cc (update_exit_probability_after_unrolling): Remove. (tree_transform_and_unroll_loop): Use update_loop_exit_probability_scale_dom_bbs. * tree-ssa-loop-split.cc (split_loop): Use update_loop_exit_probability_scale_dom_bbs.
2023-07-28	RISC-V: Specify -mabi in rv64 autovec testcase	Patrick O'Neill	1	-1/+1
	On rv32 targets, this patch fixes: FAIL: gcc.target/riscv/rvv/autovec/madd-split2-1.c -O3 -ftree-vectorize (test for excess errors) cc1: error: ABI requires '-march=rv32' gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/madd-split2-1.c: Add -mabi=lp64d to dg-options. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-07-28	c++: devirtualization of array destruction [PR110057]	Ng YongXiang	6	-6/+110
	PR c++/110057 PR ipa/83054 gcc/cp/ChangeLog: * init.cc (build_vec_delete_1): Devirtualize array destruction. gcc/testsuite/ChangeLog: * g++.dg/warn/pr83054.C: Remove devirtualization warning. * g++.dg/lto/pr89335_0.C: Likewise. * g++.dg/tree-ssa/devirt-array-destructor-1.C: New test. * g++.dg/tree-ssa/devirt-array-destructor-2.C: New test. * g++.dg/warn/pr83054-2.C: New test. Signed-off-by: Ng Yong Xiang <yongxiangng@gmail.com>
2023-07-28	loop-split improvements, part 3	Jan Hubicka	2	-17/+80
	extend tree-ssa-loop-split to understand test of the form if (i==0) and if (i!=0) which triggers only during the first iteration. Naturally we should also be able to trigger last iteration or split into 3 cases if the test indeed can fire in the middle of the loop. Last iteration is bit trickier pattern matching so I want to do it incrementally, but I implemented easy case using value range that handled loops with constant iterations. The testcase gets misupdated profile, I will also fix that incrementally. gcc/ChangeLog: PR middle-end/77689 * tree-ssa-loop-split.cc: Include value-query.h. (split_at_bb_p): Analyze cases where EQ/NE can be turned into LT/LE/GT/GE; return updated guard code. (split_loop): Use guard code. gcc/testsuite/ChangeLog: PR middle-end/77689 * g++.dg/tree-ssa/loop-split-1.C: New test.
2023-07-28	PR rtl-optimization/110587: Reduce useless moves in compile-time hog.	Roger Sayle	1	-9/+4
	This patch is one of a series of fixes for PR rtl-optimization/110587, a compile-time regression with -O0, that attempts to address the underlying cause. As noted previously, the pathological test case pr28071.c contains a large number of useless register-to-register moves that can produce quadratic behaviour (in LRA). These moves are generated during RTL expansion in emit_group_load_1, where the middle-end attempts to simplify the source before calling extract_bit_field. This is reasonable if the source is a complex expression (from before the tree-ssa optimizers), or a SUBREG, or a hard register, but it's not particularly useful to copy a pseudo register into a new pseudo register. This patch eliminates that redundancy. The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains 777K lines, with this patch it contains 717K lines, i.e. saving about 60K lines (admittedly of debugging text output, but it makes the point). 2023-07-28 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR middle-end/28071 PR rtl-optimization/110587 * expr.cc (emit_group_load_1): Simplify logic for calling force_reg on ORIG_SRC, to avoid making a copy if the source is already in a pseudo register.
2023-07-28	loop-split improvements, part 2	Jan Hubicka	4	-14/+180
	this patch fixes profile update in the first case of loop splitting. The pass still gives up on very basic testcases: __attribute__ ((noinline,noipa)) void test1 (int n) { if (n <= 0 \|\| n > 100000) return; for (int i = 0; i <= n; i++) { if (i < n) do_something (); if (a[i]) do_something2(); } } Here I needed to do the conditoinal that enforces sane value range of n. The reason is that it gives up on: !number_of_iterations_exit (loop1, exit1, &niter, false, true) and without the conditonal we get assumption that n>=0 and not INT_MAX. I think from overflow we shold derive that INT_MAX test is not needed and since the loop does nothing for n<0 it is also just an paranoia. I am not sure how to fix this though :(. In general the pass does not really need to compute iteration count. It only needs to know what direction the IVs go so it can detect tests that fires in first part of iteration space. Rich, any idea what the correct test should be? In testcase: for (int i = 0; i < 200; i++) if (i < 150) do_something (); else do_something2 (); the old code did wrong update of the exit condition probabilities. We know that first loop iterates 150 times and the second loop 50 times and we get it by simply scaling loop body by the probability of inner test. With the patch we now get: <bb 2> [count: 1000]: <bb 3> [count: 150000]: <- loop 1 correctly iterates 149 times # i_10 = PHI <i_7(8), 0(2)> do_something (); i_7 = i_10 + 1; if (i_7 <= 149) goto <bb 8>; [99.33%] else goto <bb 17>; [0.67%] <bb 8> [count: 149000]: goto <bb 3>; [100.00%] <bb 16> [count: 1000]: # i_15 = PHI <i_18(17)> <bb 9> [count: 49975]: <- loop 2 should iterate 50 times but we are slightly wrong # i_3 = PHI <i_15(16), i_14(13)> do_something2 (); i_14 = i_3 + 1; if (i_14 != 200) goto <bb 13>; [98.00%] else goto <bb 7>; [2.00%] <bb 13> [count: 48975]: goto <bb 9>; [100.00%] <bb 17> [count: 1000]: <- this test is always true becuase it is reached form bb 3 # i_18 = PHI <i_7(3)> if (i_18 != 200) goto <bb 16>; [99.95%] else goto <bb 7>; [0.05%] <bb 7> [count: 1000]: return; The reason why we are slightly wrong is the condtion in bb17 that is always true but the pass does not konw it. Rich any idea how to do that? I think connect_loops should work out the cas where the loop exit conditon is never satisfied at the time the splitted condition fails for first time. Before patch on hmmer we get a lot of mismatches: Profile report here claims: dump id \|static mismat\|dynamic mismatch \| \|in count \|in count \|time \| lsplit \| 5 +5\| 8151850567 +8151850567\| 531506481006 +57.9%\| ldist \| 9 +4\| 15345493501 +7193642934\| 606848841056 +14.2%\| ifcvt \| 10 +1\| 15487514871 +142021370\| 689469797790 +13.6%\| vect \| 35 +25\| 17558425961 +2070911090\| 517375405715 -25.0%\| cunroll \| 42 +7\| 16898736178 -659689783\| 452445796198 -4.9%\| loopdone\| 33 -9\| 2678017188 -14220718990\| 330969127663 \| tracer \| 34 +1\| 2678018710 +1522\| 330613415364 +0.0%\| fre \| 33 -1\| 2676980249 -1038461\| 330465677073 -0.0%\| expand \| 28 -5\| 2497468467 -179511782\|--------------------------\| With patch lsplit \| 0 \| 0 \| 328723360744 -2.3%\| ldist \| 0 \| 0 \| 396193562452 +20.6%\| ifcvt \| 1 +1\| 71010686 +71010686\| 478743508522 +20.8%\| vect \| 14 +13\| 697518955 +626508269\| 299398068323 -37.5%\| cunroll \| 13 -1\| 489349408 -208169547\| 257777839725 -10.5%\| loopdone\| 11 -2\| 402558559 -86790849\| 201010712702 \| tracer \| 13 +2\| 402977200 +418641\| 200651036623 +0.0%\| fre \| 13 \| 402622146 -355054\| 200344398654 -0.2%\| expand \| 11 -2\| 333608636 -69013510\|--------------------------\| So no mismatches for lsplit and ldist and also lsplit thinks it improves speed by 2.3% rather than regressig it by 57%. Update is still not perfect since we do not work out that the second loop never iterates. Ifcft wrecks profile by desing since it insert conditonals with both arms 100% that will be eliminated later after vect. It is not clear to me what happens in vect though. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: PR middle-end/106923 * tree-ssa-loop-split.cc (connect_loops): Change probability of the test preconditioning second loop to very_likely. (fix_loop_bb_probability): Handle correctly case where on of the arms of the conditional is empty. (split_loop): Fold the test guarding first condition to see if it is constant true; Set correct entry block probabilities of the split loops; determine correct loop eixt probabilities. gcc/testsuite/ChangeLog: PR middle-end/106293 * gcc.dg/tree-prof/loop-split-1.c: New test. * gcc.dg/tree-prof/loop-split-2.c: New test. * gcc.dg/tree-prof/loop-split-3.c: New test.
2023-07-28	ada: Elide the copy in extended returns for nonlimited by-reference types	Eric Botcazou	1	-3/+4
	gcc/ada/ * gcc-interface/trans.cc (gnat_to_gnu): Restrict previous change to the case where the simple return statement has got no storage pool.
2023-07-28	ada: Add an assert in Posix Interrupt_Wait	Clément Chigot	1	-0/+1
	All functions but Interrupt_Wait in s-inmaop__posix are checking the result of their syscalls with an assert. However, any return code of sigwait different than 0 means that something went wrong for it. From sigwait man: > RETURN VALUE > On success, sigwait() returns 0. On error, it returns a > positive error number (listed in ERRORS). gcc/ada/ * libgnarl/s-inmaop__posix.adb: Add assert after sigwait in Interrupt_Wait
2023-07-28	ada: Fix unsupported dispatching constructor call	Javier Miranda	6	-151/+418
	Add dummy build-in-place parameters when a BIP function does not require the BIP parameters but it is a dispatching operation that inherited them. gcc/ada/ * einfo-utils.adb (Underlying_Type): Protect recursion call against non-available attribute Etype. * einfo.ads (Protected_Subprogram): Fix typo in documentation. * exp_ch3.adb (BIP_Function_Call_Id): New subprogram. (Expand_N_Object_Declaration): Improve code that evaluates if the object is initialized with a BIP function call. * exp_ch6.adb (Is_True_Build_In_Place_Function_Call): New subprogram. (Add_Task_Actuals_To_Build_In_Place_Call): Add dummy actuals if the function does not require the BIP task actuals but it is a dispatching operation that inherited them. (Build_In_Place_Formal): Improve code to avoid never-ending loop if the BIP formal is not found. (Add_Dummy_Build_In_Place_Actuals): New subprogram. (Expand_Call_Helper): Add calls to Add_Dummy_Build_In_Place_Actuals. (Expand_N_Extended_Return_Statement): Adjust assertion. (Expand_Simple_Function_Return): Adjust assertion. (Make_Build_In_Place_Call_In_Allocator): No action needed if the called function inherited the BIP extra formals but it is not a true BIP function. (Make_Build_In_Place_Call_In_Assignment): Ditto. * exp_intr.adb (Expand_Dispatching_Constructor_Call): Remove code reporting unsupported case (since this patch adds support for it). * sem_ch6.adb (Analyze_Subprogram_Body_Helper): Adding assertion to ensure matching of BIP formals when setting the Protected_Formal field of a protected subprogram to reference the corresponding extra formal of the subprogram that implements it. (Might_Need_BIP_Task_Actuals): New subprogram. (Create_Extra_Formals): Improve code adding inherited extra formals.
2023-07-28	ada: Add support for binding to a specific network interface controller.	Pascal Obry	3	-2/+34
	gcc/ada/ * s-oscons-tmplt.c: Add support for SO_BINDTODEVICE constant. * libgnat/g-socket.ads (Set_Socket_Option): Handle SO_BINDTODEVICE option. (Get_Socket_Option): Handle SO_BINDTODEVICE option. * libgnat/g-socket.adb: Likewise. (Get_Socket_Option): Handle the case where IF_NAMESIZE is not defined and so equal to -1.
2023-07-28	ada: Add missing SCO generation for quantified expressions in object decl	Léo Creuse	1	-2/+4
	This change corrects the Has_Decision predicate in par_sco.adb to properly consider predicates of quantified expressions as decisions. gcc/ada/ * par_sco.adb (Has_Decision): Consider that quantified expressions contain decisions.
2023-07-28	ada: Fix race condition in protected entry call	Ronan Desplanques	1	-2/+8
	This patch only affects the single-entry implementation of protected objects. Before this patch, there was a race condition where a task that called an entry could put itself to sleep right after another task had executed the entry as a proxy and signalled the not-yet-waiting first task, which caused the first task to enter a deadlock. Note that this race condition has been identified and fixed before for the implementations of the run-time that live under hie/. This patch reworks the locking sequence so that it is closer to the one that's used in the multiple-entry implementation of protected objects. The code for the multiple-entry implementation is spread across multiple subprograms. To draw a parallel with the section this patch modifies, one can read the following subprograms: - System.Tasking.Protected_Objects.Operations.Protected_Entry_Call - System.Tasking.Entry_Calls.Wait_For_Completion - System.Tasking.Entry_Calls.Check_Pending_Actions_For_Entry_Call This patch also adds a comment that explicitly states the locking constraint that must hold in the affected section. gcc/ada/ * libgnarl/s-tposen.adb: Fix race condition. Add comment to justify the locking timing.
2023-07-28	ada: Small refactor	Viljar Indus	1	-2/+3
	gcc/ada/ * exp_util.adb (Find_Optional_Prim_Op): use "No" instead of "= Empty"
2023-07-28	ada: Add guard for detection of class-wide precondition subprograms	Piotr Trojanek	1	-1/+4
	When skipping check on subprograms built for class-wide preconditions we must deal with the current scope not being a subprogram, e.g. it could be a declare-block. gcc/ada/ * sem_res.adb (Resolve_Actuals): Add guard for the call to Class_Preconditions_Subprogram.
2023-07-28	ada: Fix memory explosion on aggregate of nested packed array type	Eric Botcazou	1	-1/+3
	It occurs at compile time on an aggregate of a 2-dimensional packed array type whose component type is itself a packed array, because the compiler is trying to pack the intermediate aggregate and ends up rewriting a bunch of subcomponents. This optimization was originally devised for the case of a scalar component type so the change adds this restriction. gcc/ada/ * exp_aggr.adb (Is_Two_Dim_Packed_Array): Return true only if the component type of the array is scalar.
2023-07-28	ada: Leave detection of missing return in functions to GNATprove	Piotr Trojanek	1	-9/+2
	GNAT has a heuristic to warn about missing return statements in functions. This warning was escalated to errors when operating in GNATprove mode and SPARK_Mode was On. However, this heuristic was imprecise and caused spurious errors. Also, it was applied after the Push_Scope/End_Scope, so for functions acting as compilation units it was using the wrong SPARK_Mode. It is better to simply leave this detection to GNATprove. gcc/ada/ * sem_ch6.adb (Check_Statement_Sequence): Only warn about missing return statements and let GNATprove emit a check when needed.
2023-07-28	ada: Emit enums rather than defines for various constants	Tom Tromey	5	-44/+69
	This patch changes xsnamest and gen_il-gen to emit various constants as enums rather than a sequence of preprocessor defines. This enables better debugging and somewhat better type safety. gcc/ada/ * fe.h (Convention): Now inline function. * gen_il-gen.adb (Put_C_Type_And_Subtypes.Put_Enum_Lit) (Put_C_Type_And_Subtypes.Put_Kind_Subtype, Put_C_Getter): Emit enum. * snames.h-tmpl (Name_Id, Name_, Attribute_Id, Attribute_) (Convention_Id, Convention_, Pragma_Id, Pragma_): Now enum. (Get_Attribute_Id, Get_Pragma_Id): Now inline functions. * types.h (Node_Kind, Entity_Kind, Convention_Id, Name_Id): Now enum. * xsnamest.adb (Output_Header_Line, Make_Value): Emit enum.
2023-07-28	ada: Fix typo in comment of Ada.Exceptions.Save_Occurrence	Piotr Trojanek	1	-1/+1
	Minor typo in comment. gcc/ada/ * libgnat/a-except.ads (Save_Occurrence): Fix typo.
2023-07-28	ada: Allow calls to Number_Formals when no formals are present	Piotr Trojanek	4	-5/+9
	It is much simpler and safer for the routine Number_Formals to accept subprogram entities that have no formals. gcc/ada/ * einfo-utils.adb (Number_Formals): Change types in body. * einfo-utils.ads (Number_Formals): Change type in spec. * einfo.ads (Number_Formals): Change type in comment. * sem_ch13.adb (Is_Property_Function): Fix style in a caller of Number_Formals that was likely to crash because of missing guards.
2023-07-28	ada: Improve defense against illegal code in check for infinite loops	Piotr Trojanek	1	-1/+3
	Fix crash occurring when attribute System'To_Address is used without a WITH clause for package System. gcc/ada/ * sem_warn.adb (Check_Infinite_Loop_Warning): Don't look at the type of actual parameter when it has no type at all, e.g. because the entire subprogram call is illegal.
2023-07-28	RISC-V: Remove vxrm parameter for vsadd[u] and vssub[u]	xuli	37	-152/+233
	Computation of `vsadd`, `vsaddu`, `vssub`, and `vssubu` do not need the rounding mode, therefore the intrinsics of these instructions do not have the parameter for rounding mode control. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: remove rounding mode of vsadd[u] and vssub[u]. * config/riscv/vector.md: Ditto. gcc/testsuite/ChangeLog: * g++.target/riscv/rvv/base/bug-12.C: Adapt testcase. * g++.target/riscv/rvv/base/bug-14.C: Ditto. * g++.target/riscv/rvv/base/bug-18.C: Ditto. * g++.target/riscv/rvv/base/bug-19.C: Ditto. * g++.target/riscv/rvv/base/bug-20.C: Ditto. * g++.target/riscv/rvv/base/bug-21.C: Ditto. * g++.target/riscv/rvv/base/bug-22.C: Ditto. * g++.target/riscv/rvv/base/bug-23.C: Ditto. * g++.target/riscv/rvv/base/bug-3.C: Ditto. * g++.target/riscv/rvv/base/bug-8.C: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-100.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-101.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-103.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-104.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-105.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-106.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-107.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-109.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-110.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-111.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-112.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-113.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-115.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-116.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-117.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-118.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-97.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-98.c: Ditto. * gcc.target/riscv/rvv/base/merge_constraint-1.c: Ditto. * gcc.target/riscv/rvv/base/fixed-point-vxrm-error.c: New test. * gcc.target/riscv/rvv/base/fixed-point-vxrm.c: New test.
2023-07-28	loop-split improvements, part 1	Jan Hubicka	4	-6/+13
	while looking on profile misupdate on hmmer I noticed that loop splitting pass is not able to handle the loop it has as an example it should apply on: One transformation of loops like: for (i = 0; i < 100; i++) { if (i < 50) A; else B; } into: for (i = 0; i < 50; i++) { A; } for (; i < 100; i++) { B; } The problem is that ivcanon turns the test into i != 100 and the pass explicitly gives up on any loops ending with != test. It needs to know the directoin of the induction variable in order to derive right conditions, but that can be done also from step. It turns out that there are no testcases for basic loop splitting. I will add some with the profile update fix. gcc/ChangeLog: * tree-ssa-loop-split.cc (split_loop): Also support NE driven loops when IV test is not overflowing. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ifc-12.c: Disable loop splitting. * gcc.target/i386/avx2-gather-6.c: Likewise. * gcc.target/i386/avx2-vect-aggressive.c: Likewise.
2023-07-28	Add UNSPEC_MASKOP to vpbroadcastm pattern.	liuhongt	2	-2/+17
	Prevent rtl optimization of vec_duplicate + zero_extend to vpbroadcastm since there could be an extra kmov after RA. gcc/ChangeLog: PR target/110788 * config/i386/sse.md (avx512cd_maskb_vec_dup<mode>): Add UNSPEC_MASKOP. (avx512cd_maskw_vec_dup<mode>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110788.c: New test.
2023-07-28	Daily bump.	GCC Administrator	6	-1/+279

2023-07-27	bpf: ISA V4 sign-extending move and load insns [PR110782,PR110784]	David Faust	8	-1/+133
	BPF ISA V4 introduces sign-extending move and load operations. This patch makes the BPF backend generate those instructions, when enabled and useful. A new option, -m[no-]smov gates generation of these instructions, and is enabled by default for -mcpu=v4 and above. Tests for the new instructions and documentation for the new options are included. PR target/110782 PR target/110784 gcc/ * config/bpf/bpf.opt (msmov): New option. * config/bpf/bpf.cc (bpf_option_override): Handle it here. * config/bpf/bpf.md (extendsidi2): New. (extendhidi2): New. (extendqidi2): New. (extendsisi2): New. (extendhisi2): New. (extendqisi2): New. doc/invoke.texi (Option Summary): Add -msmov eBPF option. (eBPF Options): Add -m[no-]smov. Document that -mcpu=v4 also enables -msmov. gcc/testsuite/ * gcc.target/bpf/sload-1.c: New test. * gcc.target/bpf/sload-pseudoc-1.c: New test. * gcc.target/bpf/smov-1.c: New test. * gcc.target/bpf/smov-pseudoc-1.c: New test.
2023-07-27	bpf: minor doc cleanup for command-line options	David Faust	1	-25/+23
	This patch makes some minor cleanups to eBPF options documented in invoke.texi: - Delete some vestigal docs for removed -mkernel option - Add -mbswap and -msdiv to the option summary - Note the negative versions of several options - Note that -mcpu=v4 also enables -msdiv. gcc/ * doc/invoke.texi (Option Summary): Remove -mkernel eBPF option. Add -mbswap and -msdiv eBPF options. (eBPF Options): Remove -mkernel. Add -mno-{jmpext, jmp32, alu32, v3-atomics, bswap, sdiv}. Document that -mcpu=v4 also enables -msdiv.