aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-10-12dwarf2out: Stop using wide_int in GC structuresJakub Jelinek2-17/+55
The planned wide_int/widest_int changes to support larger precisions make wide_int and widest_int unusable in GC structures, because it has non-trivial destructors (and may point to heap allocated memory). dwarf2out.{h,cc} is the only user of wide_int in GC structures for val_wide, but actually doesn't really need much, all those are at one point created from const wide_int_ref & and never changed afterwards, with just a couple of methods used on it. So, this patch replaces use of wide_int there with a new class, dw_wide_int, which contains just precision, len field and the limbs in trailing array. Most needed methods are implemented directly, just for the most complicated cases it temporarily constructs a wide_int_ref from it and calls its methods. 2023-10-12 Jakub Jelinek <jakub@redhat.com> * dwarf2out.h (wide_int_ptr): Remove. (dw_wide_int_ptr): New typedef. (struct dw_val_node): Change type of val_wide from wide_int_ptr to dw_wide_int_ptr. (struct dw_wide_int): New type. (dw_wide_int::elt): New method. (dw_wide_int::operator ==): Likewise. * dwarf2out.cc (get_full_len): Change argument type to const dw_wide_int & from const wide_int &. Use CEIL. Call get_precision method instead of calling wi::get_precision. (alloc_dw_wide_int): New function. (add_AT_wide): Change w argument type to const wide_int_ref & from const wide_int &. Use alloc_dw_wide_int. (mem_loc_descriptor, loc_descriptor): Use alloc_dw_wide_int. (insert_wide_int): Change val argument type to const wide_int_ref & from const wide_int &. (add_const_value_attribute): Pass rtx_mode_t temporary directly to add_AT_wide instead of using a temporary variable.
2023-10-12tree-optimization/111764 - wrong reduction vectorizationRichard Biener2-12/+19
The following removes a misguided attempt to allow x + x in a reduction path, also allowing x * x which isn't valid. x + x actually never arrives this way but instead is canonicalized to 2 * x. This makes reduction path handling consistent with how we handle the single-stmt reduction case. PR tree-optimization/111764 * tree-vect-loop.cc (check_reduction_path): Remove the attempt to allow x + x via special-casing of assigns. * gcc.dg/vect/pr111764.c: New testcase.
2023-10-12Support Intel USER_MSRHu, Lin128-10/+288
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect USER_MSR. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_USER_MSR_SET): New. (OPTION_MASK_ISA2_USER_MSR_UNSET): Ditto. (ix86_handle_option): Handle -musermsr. * common/config/i386/i386-cpuinfo.h (enum processor_features): Add FEATURE_USER_MSR. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for usermsr. * config.gcc: Add usermsrintrin.h * config/i386/cpuid.h (bit_USER_MSR): New. * config/i386/i386-builtin-types.def: Add DEF_FUNCTION_TYPE (VOID, UINT64, UINT64). * config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins): Add __builtin_urdmsr and __builtin_uwrmsr. * config/i386/i386-builtins.h (ix86_builtins): Add IX86_BUILTIN_URDMSR and IX86_BUILTIN_UWRMSR. * config/i386/i386-c.cc (ix86_target_macros_internal): Define __USER_MSR__. * config/i386/i386-expand.cc (ix86_expand_builtin): Handle new builtins. * config/i386/i386-isa.def (USER_MSR): Add DEF_PTA(USER_MSR). * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): Handle usermsr. * config/i386/i386.md (urdmsr): New define_insn. (uwrmsr): Ditto. * config/i386/i386.opt: Add option -musermsr. * config/i386/x86gprintrin.h: Include usermsrintrin.h * doc/extend.texi: Document usermsr. * doc/invoke.texi: Document -musermsr. * doc/sourcebuild.texi: Document target usermsr. * config/i386/usermsrintrin.h: New file. gcc/testsuite/ChangeLog: * gcc.target/i386/funcspec-56.inc: Add new target attribute. * gcc.target/i386/x86gprintrin-1.c: Add -musermsr for 64bit target. * gcc.target/i386/x86gprintrin-2.c: Ditto. * gcc.target/i386/x86gprintrin-3.c: Ditto. * gcc.target/i386/x86gprintrin-4.c: Add musermsr for 64bit target. * gcc.target/i386/x86gprintrin-5.c: Ditto * gcc.target/i386/user_msr-1.c: New test. * gcc.target/i386/user_msr-2.c: Ditto.
2023-10-12LoongArch: Modify check_effective_target_vect_int_mod according to SX/ASX ↵Chenghui Pan1-0/+18
capabilities. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add LoongArch in check_effective_target_vect_int_mod according to SX/ASX capabilities.
2023-10-12LoongArch: Enable vect.exp for LoongArch. [PR111424]Chenghui Pan1-0/+31
gcc/testsuite/ChangeLog: PR target/111424 * lib/target-supports.exp: Enable vect.exp for LoongArch.
2023-10-12LoongArch: Adjust makefile dependency for loongarch headers.Yang Yujie3-6/+4
gcc/ChangeLog: * config.gcc: Add loongarch-driver.h to tm_files. * config/loongarch/loongarch.h: Do not include loongarch-driver.h. * config/loongarch/t-loongarch: Append loongarch-multilib.h to $(GTM_H) instead of $(TM_H) for building generator programs.
2023-10-12Fortran: Set hidden string length for pointer components [PR67740].Paul Thomas2-4/+61
2023-10-11 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/67740 * trans-expr.cc (gfc_trans_pointer_assignment): Set the hidden string length component for pointer assignment to character pointer components. gcc/testsuite/ PR fortran/67740 * gfortran.dg/pr67740.f90: New test
2023-10-12rs6000: Make 32 bit stack_protect support prefixed insn [PR111367]Kewen Lin2-46/+49
As PR111367 shows, with prefixed insn supported, some of checkings consider it's able to leverage prefixed insn for stack protect related load/store, but since we don't actually change the emitted assembly for 32 bit, it can cause the assembler error as exposed. Mike's commit r10-4547-gce6a6c007e5a98 has already handled the 64 bit case (DImode), this patch is to treat the 32 bit case (SImode) by making use of mode iterator P and ptrload attribute iterator, also fixes the constraints to match the emitted operand formats. PR target/111367 gcc/ChangeLog: * config/rs6000/rs6000.md (stack_protect_setsi): Support prefixed instruction emission and incorporate to stack_protect_set<mode>. (stack_protect_setdi): Rename to ... (stack_protect_set<mode>): ... this, adjust constraint. (stack_protect_testsi): Support prefixed instruction emission and incorporate to stack_protect_test<mode>. (stack_protect_testdi): Rename to ... (stack_protect_test<mode>): ... this, adjust constraint. gcc/testsuite/ChangeLog: * g++.target/powerpc/pr111367.C: New test.
2023-10-12testsuite: Avoid uninit var in pr60510.f [PR111427]Kewen Lin1-0/+1
The uninitialized variable a in pr60510.f can cause some random failures as exposed in PR111427. This patch is to make it initialized accordingly. PR testsuite/111427 gcc/testsuite/ChangeLog: * gfortran.dg/vect/pr60510.f (test): Init variable a.
2023-10-12vect: Consider vec_perm costing for VMAT_CONTIGUOUS_REVERSEKewen Lin2-27/+65
For VMAT_CONTIGUOUS_REVERSE, the transform code in function vectorizable_store generates a VEC_PERM_EXPR stmt before storing, but it's never considered in costing. This patch is to make it consider vec_perm in costing, it adjusts the order of transform code a bit to make it easy to early return for costing_p. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_store): Consider generated VEC_PERM_EXPR stmt for VMAT_CONTIGUOUS_REVERSE in costing as vec_perm. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c: New test.
2023-10-12vect: Get rid of vect_model_store_costKewen Lin1-93/+44
This patch is to eventually get rid of vect_model_store_cost, it adjusts the costing for the remaining memory access types VMAT_CONTIGUOUS{, _DOWN, _REVERSE} by moving costing close to the transform code. Note that in vect_model_store_cost, there is one special handling for vectorizing a store into the function result, since it's extra penalty and the transform part doesn't have it, this patch keep it alone. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Remove. (vectorizable_store): Adjust the costing for the remaining memory access types VMAT_CONTIGUOUS{, _DOWN, _REVERSE}.
2023-10-12vect: Adjust vectorizable_store costing on VMAT_CONTIGUOUS_PERMUTEKewen Lin1-54/+74
This patch adjusts the cost handling on VMAT_CONTIGUOUS_PERMUTE in function vectorizable_store. We don't call function vect_model_store_cost for it any more. It's the case of interleaving stores, so it skips all stmts excepting for first_stmt_info, consider the whole group when costing first_stmt_info. This patch shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it will never get VMAT_CONTIGUOUS_PERMUTE and remove VMAT_CONTIGUOUS_PERMUTE related handlings. (vectorizable_store): Adjust the cost handling on VMAT_CONTIGUOUS_PERMUTE without calling vect_model_store_cost.
2023-10-12vect: Adjust vectorizable_store costing on VMAT_LOAD_STORE_LANESKewen Lin1-35/+75
This patch adjusts the cost handling on VMAT_LOAD_STORE_LANES in function vectorizable_store. We don't call function vect_model_store_cost for it any more. It's the case of interleaving stores, so it skips all stmts excepting for first_stmt_info, consider the whole group when costing first_stmt_info. This patch shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it will never get VMAT_LOAD_STORE_LANES. (vectorizable_store): Adjust the cost handling on VMAT_LOAD_STORE_LANES without calling vect_model_store_cost. Factor out new lambda function update_prologue_cost.
2023-10-12vect: Adjust vectorizable_store costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLPKewen Lin2-63/+120
This patch adjusts the cost handling on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP in function vectorizable_store. We don't call function vect_model_store_cost for them any more. Like what we improved for PR82255 on load side, this change helps us to get rid of unnecessary vec_to_scalar costing for some case with VMAT_STRIDED_SLP. One typical test case gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c has been associated. And it helps some cases with some inconsistent costing too. Besides, this also special-cases the interleaving stores for these two affected memory access types, since for the interleaving stores the whole chain is vectorized when the last store in the chain is reached, the other stores in the group would be skipped. To keep consistent with this and follows the transforming handlings like iterating the whole group, it only costs for the first store in the group. Ideally we can only cost for the last one but it's not trivial and using the first one is actually equivalent. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get VMAT_ELEMENTWISE and VMAT_STRIDED_SLP any more, and remove their related handlings. (vectorizable_store): Adjust the cost handling on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP without calling vect_model_store_cost. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c: New test.
2023-10-12vect: Simplify costing on vectorizable_scan_storeKewen Lin1-3/+15
This patch is to simplify the costing on the case vectorizable_scan_store without calling function vect_model_store_cost any more. I considered if moving the costing into function vectorizable_scan_store is a good idea, for doing that, we have to pass several variables down which are only used for costing, and for now we just want to keep the costing as the previous, haven't tried to make this costing consistent with what the transforming does, so I think we can leave it for now. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_store): Adjust costing on vectorizable_scan_store without calling vect_model_store_cost any more.
2023-10-12vect: Adjust vectorizable_store costing on VMAT_GATHER_SCATTERKewen Lin1-70/+118
This patch adjusts the cost handling on VMAT_GATHER_SCATTER in function vectorizable_store (all three cases), then we won't depend on vect_model_load_store for its costing any more. This patch shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get VMAT_GATHER_SCATTER any more, remove VMAT_GATHER_SCATTER related handlings and the related parameter gs_info. (vect_build_scatter_store_calls): Add the handlings on costing with one more argument cost_vec. (vectorizable_store): Adjust the cost handling on VMAT_GATHER_SCATTER without calling vect_model_store_cost any more.
2023-10-12vect: Move vect_model_store_cost next to the transform in vectorizable_storeKewen Lin1-19/+60
This patch is an initial patch to move costing next to the transform, it still adopts vect_model_store_cost for costing but moves and duplicates it down according to the handlings of different vect_memory_access_types or some special handling need, hope it can make the subsequent patches easy to review. This patch should not have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_store): Move and duplicate the call to vect_model_store_cost down to some different transform paths according to the handlings of different vect_memory_access_types or some special handling need.
2023-10-12vect: Ensure vect store is supported for some VMAT_ELEMENTWISE caseKewen Lin1-4/+12
When making/testing patches to move costing next to the transform code for vectorizable_store, some ICEs got exposed when I further refined the costing handlings on VMAT_ELEMENTWISE. The apparent cause is triggering the assertion in rs6000 specific function for costing rs6000_builtin_vectorization_cost: if (TARGET_ALTIVEC) /* Misaligned stores are not supported. */ gcc_unreachable (); I used vect_get_store_cost instead of the original way by record_stmt_cost with scalar_store for costing, that is to use one unaligned_store instead, it matches what we use in transforming, it's a vector store as below: else if (group_size >= const_nunits && group_size % const_nunits == 0) { nstores = 1; lnel = const_nunits; ltype = vectype; lvectype = vectype; } So IMHO it's more consistent with vector store instead of scalar store, with the given compilation option -mno-allow-movmisalign, the misaligned vector store is unexpected to be used in vectorizer, but why it's still adopted? In the current implementation of function get_group_load_store_type, we always set alignment support scheme as dr_unaligned_supported for VMAT_ELEMENTWISE, it is true if we always adopt scalar stores, but as the above code shows, we could use vector stores for some cases, so we should use the correct alignment support scheme for it. This patch is to ensure the vector store is supported by further checking with vect_supportable_dr_alignment. The ICEs got exposed with patches moving costing next to the transform but they haven't been landed, the test coverage would be there once they get landed. The affected test cases are: - gcc.dg/vect/slp-45.c - gcc.dg/vect/vect-alias-check-{10,11,12}.c btw, I tried to make some correctness test case, but I realized that -mno-allow-movmisalign is mainly for noting movmisalign optab and it doesn't guard for the actual hw vector memory access insns, so I failed to make it unless I also altered some conditions for them as it. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_store): Ensure the generated vector store for some case of VMAT_ELEMENTWISE is supported.
2023-10-12x86: set spincount 1 for x86 hybrid platformZhang, Jun3-1/+87
By test, we find in hybrid platform spincount 1 is better. Use '-march=native -Ofast -funroll-loops -flto', results as follows: spec2017 speed RPL ADL 657.xz_s 0.00% 0.50% 603.bwaves_s 10.90% 26.20% 607.cactuBSSN_s 5.50% 72.50% 619.lbm_s 2.40% 2.50% 621.wrf_s -7.70% 2.40% 627.cam4_s 0.50% 0.70% 628.pop2_s 48.20% 153.00% 638.imagick_s -0.10% 0.20% 644.nab_s 2.30% 1.40% 649.fotonik3d_s 8.00% 13.80% 654.roms_s 1.20% 1.10% Geomean-int 0.00% 0.50% Geomean-fp 6.30% 21.10% Geomean-all 5.70% 19.10% omp2012 RPL ADL 350.md -1.81% -1.75% 351.bwaves 7.72% 12.50% 352.nab 14.63% 19.71% 357.bt331 -0.20% 1.77% 358.botsalgn 0.00% 0.00% 359.botsspar 0.00% 0.65% 360.ilbdc 0.00% 0.25% 362.fma3d 2.66% -0.51% 363.swim 10.44% 0.00% 367.imagick 0.00% 0.12% 370.mgrid331 2.49% 25.56% 371.applu331 1.06% 4.22% 372.smithwa 0.74% 3.34% 376.kdtree 10.67% 16.03% GEOMEAN 3.34% 5.53% include/ChangeLog: PR target/109812 * spincount.h: New file. libgomp/ChangeLog: * env.c (initialize_env): Use do_adjust_default_spincount. * config/linux/x86/spincount.h: New file.
2023-10-12RISC-V: Support FP llrint auto vectorizationPan Li4-0/+109
This patch would like to support the FP llrint auto vectorization. * long long llrint (double) This will be the CVT from DF => DI from the standard name's perpsective, which has been covered in previous PATCH(es). Thus, this patch only add some test cases. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/test-math.h: Add type int64_t. * gcc.target/riscv/rvv/autovec/unop/math-llrint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-llrint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-llrint-0.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-12[APX] Support Intel APX PUSH2POP2Mo, Zewei5-15/+365
This feature requires stack to be aligned at 16byte, therefore in prologue/epilogue, a standalone push/pop will be emitted before any push2/pop2 if the stack was not aligned to 16byte. Also for current implementation we only support push2/pop2 usage in function prologue/epilogue for those callee-saved registers. gcc/ChangeLog: * config/i386/i386.cc (gen_push2): New function to emit push2 and adjust cfa offset. (ix86_pro_and_epilogue_can_use_push2_pop2): New function to determine whether push2/pop2 can be used. (ix86_compute_frame_layout): Adjust preferred stack boundary and stack alignment needed for push2/pop2. (ix86_emit_save_regs): Emit push2 when available. (ix86_emit_restore_reg_using_pop2): New function to emit pop2 and adjust cfa info. (ix86_emit_restore_regs_using_pop2): New function to loop through the saved regs and call above. (ix86_expand_epilogue): Call ix86_emit_restore_regs_using_pop2 when push2pop2 available. * config/i386/i386.md (push2_di): New pattern for push2. (pop2_di): Likewise for pop2. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-push2pop2-1.c: New test. * gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise. * gcc.target/i386/apx-push2pop2_interrupt-1.c: Likewise. Co-authored-by: Hu Lin1 <lin1.hu@intel.com> Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
2023-10-12RISC-V: Support FP irintf auto vectorizationPan Li5-41/+149
This patch would like to support the FP irintf auto vectorization. * int irintf (float) Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lrintmn2 only act on SF => SI. Given we have code like: void test_irintf (int *out, float *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_irintf (in[i]); } Before this patch: .L3: ... flw fa5,0(a1) fcvt.w.s a5,fa5,dyn sw a5,-4(a0) ... bne a1,a4,.L3 After this patch: .L3: ... vle32.v v1,0(a1) vfcvt.x.f.v v1,v1 vse32.v v1,0(a0) ... bne a2,zero,.L3 The rest part like DF => SI/HF => SI will be covered by the hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lrint<mode><vlconvert>2): Rename from. (lrint<mode><v_i_l_ll_convert>2): Rename to. * config/riscv/vector-iterators.md: Rename and remove TARGET_64BIT. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-irint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-irint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-irint-0.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-12Daily bump.GCC Administrator5-1/+264
2023-10-11RISC-V: Add TARGET_MIN_VLEN_OPTS to fix the buildKito Cheng1-0/+6
gcc/ChangeLog: * config/riscv/riscv-opts.h (TARGET_MIN_VLEN_OPTS): New.
2023-10-11RISC-V Adjust long unconditional branch sequenceJeff Law1-1/+1
Andrew and I independently noted the long unconditional branch sequence was using the "call" pseudo op. Technically it works, but it's a bit odd. This patch flips it to use the "jump" pseudo-op. This was tested with a hacked-up local compiler which forced all branches/jumps to be long jumps. Naturally it triggered some failures for scan-asm tests but no execution regressions (which is mostly what I was testing for). I've updated the long branch support item in the RISE wiki to indicate that we eventually want a register scavenging approach with a fallback to $ra in the future so that we don't muck up the return address predictors. It's not super-high priority and shouldn't be terrible to implement given we've got the $ra fallback when a suitable register can not be found. gcc/ * config/riscv/riscv.md (jump): Adjust sequence to use a "jump" pseudo op instead of a "call" pseudo op.
2023-10-11RISC-V: Extend riscv_subset_list, preparatory for target attribute supportKito Cheng2-0/+220
riscv_subset_list only accept a full arch string before, but we need to parse single extension when supporting target attribute, also we may set a riscv_subset_list directly rather than re-parsing the ISA string again. gcc/ChangeLog: * config/riscv/riscv-subset.h (riscv_subset_list::parse_single_std_ext): New. (riscv_subset_list::parse_single_multiletter_ext): Ditto. (riscv_subset_list::clone): Ditto. (riscv_subset_list::parse_single_ext): Ditto. (riscv_subset_list::set_loc): Ditto. (riscv_set_arch_by_subset_list): Ditto. * common/config/riscv/riscv-common.cc (riscv_subset_list::parse_single_std_ext): New. (riscv_subset_list::parse_single_multiletter_ext): Ditto. (riscv_subset_list::clone): Ditto. (riscv_subset_list::parse_single_ext): Ditto. (riscv_subset_list::set_loc): Ditto. (riscv_set_arch_by_subset_list): Ditto.
2023-10-11RISC-V: Refactor riscv_option_override and riscv_convert_vector_bits. [NFC]Kito Cheng1-41/+52
Allow those funciton apply from a local gcc_options rather than the global options. Preparatory for target attribute, sperate this change for eaiser reivew since it's a NFC. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_convert_vector_bits): Get setting from argument rather than get setting from global setting. (riscv_override_options_internal): New, splited from riscv_override_options, also take a gcc_options argument. (riscv_option_override): Splited most part to riscv_override_options_internal.
2023-10-11options: Define TARGET_<NAME>_P and TARGET_<NAME>_OPTS_P macro for Mask and ↵Kito Cheng2-8/+28
InverseMask We TARGET_<NAME>_P marcro to test a Mask and InverseMask with user specified target_variable, however we may want to test with specific gcc_options variable rather than target_variable. Like RISC-V has defined lots of Mask with TargetVariable, which is not easy to use, because that means we need to known which Mask are associate with which TargetVariable, so take a gcc_options variable is a better interface for such use case. gcc/ChangeLog: * doc/options.texi (Mask): Document TARGET_<NAME>_P and TARGET_<NAME>_OPTS_P. (InverseMask): Ditto. * opth-gen.awk (Mask): Generate TARGET_<NAME>_P and TARGET_<NAME>_OPTS_P macro. (InverseMask): Ditto.
2023-10-11MATCH: [PR111282] Simplify `a & (b ^ ~a)` to `a & b`Andrew Pinski3-3/+56
While `a & (b ^ ~a)` is optimized to `a & b` on the rtl level, it is always good to optimize this at the gimple level and allows us to match a few extra things including where a is a comparison. Note I had to update/change the testcase and-1.c to avoid matching this case as we can match -2 and 1 as bitwise inversions. PR tree-optimization/111282 gcc/ChangeLog: * match.pd (`a & ~(a ^ b)`, `a & (a == b)`, `a & ((~a) ^ b)`): New patterns. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/and-1.c: Update testcase to avoid matching `~1 & (a ^ 1)` simplification. * gcc.dg/tree-ssa/bitops-6.c: New test.
2023-10-11modula2: Narrow subranges to int or unsigned int if ZTYPE is the base type.Gaius Mulley1-15/+54
This patch narrows the subrange base type to INTEGER or CARDINAL providing the range is satisfied. It only does this when the subrange base type is the ZTYPE. gcc/m2/ChangeLog: * gm2-compiler/M2GCCDeclare.mod (DeclareSubrange): Check the base type of the subrange against the ZTYPE and call DeclareSubrangeNarrow if necessary. (DeclareSubrangeNarrow): New procedure function. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-10-11[PATCH v4 2/2] RISC-V: Add support for XCValu extension in CV32E40PMary Bennett24-0/+863
Spec: github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md Contributors: Mary Bennett <mary.bennett@embecosm.com> Nandni Jamnadas <nandni.jamnadas@embecosm.com> Pietra Ferreira <pietra.ferreira@embecosm.com> Charlie Keaney Jessica Mills Craig Blackmore <craig.blackmore@embecosm.com> Simon Cook <simon.cook@embecosm.com> Jeremy Bennett <jeremy.bennett@embecosm.com> Helene Chelin <helene.chelin@embecosm.com> gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Add the XCValu extension. * config/riscv/constraints.md: Add builtins for the XCValu extension. * config/riscv/predicates.md (immediate_register_operand): Likewise. * config/riscv/corev.def: Likewise. * config/riscv/corev.md: Likewise. * config/riscv/riscv-builtins.cc (AVAIL): Likewise. (RISCV_ATYPE_UHI): Likewise. * config/riscv/riscv-ftypes.def: Likewise. * config/riscv/riscv.opt: Likewise. * config/riscv/riscv.cc (riscv_print_operand): Likewise. * doc/extend.texi: Add XCValu documentation. * doc/sourcebuild.texi: Likewise. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add proc for the XCValu extension. * gcc.target/riscv/cv-alu-compile.c: New test. * gcc.target/riscv/cv-alu-fail-compile-addn.c: New test. * gcc.target/riscv/cv-alu-fail-compile-addrn.c: New test. * gcc.target/riscv/cv-alu-fail-compile-addun.c: New test. * gcc.target/riscv/cv-alu-fail-compile-addurn.c: New test. * gcc.target/riscv/cv-alu-fail-compile-clip.c: New test. * gcc.target/riscv/cv-alu-fail-compile-clipu.c: New test. * gcc.target/riscv/cv-alu-fail-compile-subn.c: New test. * gcc.target/riscv/cv-alu-fail-compile-subrn.c: New test. * gcc.target/riscv/cv-alu-fail-compile-subun.c: New test. * gcc.target/riscv/cv-alu-fail-compile-suburn.c: New test. * gcc.target/riscv/cv-alu-fail-compile.c: New test.
2023-10-11[PATCH v4 1/2] RISC-V: Add support for XCVmac extension in CV32E40PMary Bennett30-0/+1186
Spec: github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md Contributors: Mary Bennett <mary.bennett@embecosm.com> Nandni Jamnadas <nandni.jamnadas@embecosm.com> Pietra Ferreira <pietra.ferreira@embecosm.com> Charlie Keaney Jessica Mills Craig Blackmore <craig.blackmore@embecosm.com> Simon Cook <simon.cook@embecosm.com> Jeremy Bennett <jeremy.bennett@embecosm.com> Helene Chelin <helene.chelin@embecosm.com> gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Add XCVmac. * config/riscv/riscv-ftypes.def: Add XCVmac builtins. * config/riscv/riscv-builtins.cc: Likewise. * config/riscv/riscv.md: Likewise. * config/riscv/riscv.opt: Likewise. * doc/extend.texi: Add XCVmac builtin documentation. * doc/sourcebuild.texi: Likewise. * config/riscv/corev.def: New file. * config/riscv/corev.md: New file. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add new effective target check. * gcc.target/riscv/cv-mac-compile.c: New test. * gcc.target/riscv/cv-mac-fail-compile-mac.c: New test. * gcc.target/riscv/cv-mac-fail-compile-machhsn.c: New test. * gcc.target/riscv/cv-mac-fail-compile-machhsrn.c: New test. * gcc.target/riscv/cv-mac-fail-compile-machhun.c: New test. * gcc.target/riscv/cv-mac-fail-compile-machhurn.c: New test. * gcc.target/riscv/cv-mac-fail-compile-macsn.c: New test. * gcc.target/riscv/cv-mac-fail-compile-macsrn.c: New test. * gcc.target/riscv/cv-mac-fail-compile-macun.c: New test. * gcc.target/riscv/cv-mac-fail-compile-macurn.c: New test. * gcc.target/riscv/cv-mac-fail-compile-msu.c: New test. * gcc.target/riscv/cv-mac-fail-compile-mulhhsn.c: New test. * gcc.target/riscv/cv-mac-fail-compile-mulhhsrn.c: New test. * gcc.target/riscv/cv-mac-fail-compile-mulhhun.c: New test. * gcc.target/riscv/cv-mac-fail-compile-mulhhurn.c: New test. * gcc.target/riscv/cv-mac-fail-compile-mulsn.c: New test. * gcc.target/riscv/cv-mac-fail-compile-mulsrn.c: New test. * gcc.target/riscv/cv-mac-fail-compile-mulun.c: New test. * gcc.target/riscv/cv-mac-fail-compile-mulurn.c: New test. * gcc.target/riscv/cv-mac-test-autogeneration.c: New test.
2023-10-11MAINTAINERS: Fix write after approval name orderFilip Kastl1-1/+1
ChangeLog: * MAINTAINERS: Fix name order. Signed-off-by: Filip Kastl <fkastl@suse.cz>
2023-10-11PR modula2/111675 Incorrect packed record field value passed to a procedureGaius Mulley10-54/+118
This patch allows a packed field to be extracted and passed to a procedure. It ensures that the subrange type is the same for both the procedure and record field. It also extends the <* bytealignment (0) *> to cover packed subrange types. gcc/m2/ChangeLog: PR modula2/111675 * gm2-compiler/M2CaseList.mod (appendTree): Replace InitStringCharStar with InitString. * gm2-compiler/M2GCCDeclare.mod: Import AreConstantsEqual. (DeclareSubrange): Add zero alignment test and call BuildSmallestTypeRange if necessary. (WalkSubrangeDependants): Walk the align expression. (IsSubrangeDependants): Test the align expression. * gm2-compiler/M2Quads.mod (BuildStringAdrParam): Correct end name. * gm2-compiler/P2SymBuild.mod (BuildTypeAlignment): Allow subranges to be zero aligned (packed). * gm2-compiler/SymbolTable.mod (Subrange): Add Align field. (MakeSubrange): Set Align to NulSym. (PutAlignment): Assign Subrange.Align to align. (GetAlignment): Return Subrange.Align. * gm2-gcc/m2expr.cc (noBitsRequired): Rewrite. (calcNbits): Rename ... (m2expr_calcNbits): ... to this and test for negative values. (m2expr_BuildTBitSize): Replace calcNBits with m2expr_calcNbits. * gm2-gcc/m2expr.def (calcNbits): Export. * gm2-gcc/m2expr.h (m2expr_calcNbits): New prototype. * gm2-gcc/m2type.cc (noBitsRequired): Remove. (m2type_BuildSmallestTypeRange): Call m2expr_calcNbits. (m2type_BuildSubrangeType): Create range_type from build_range_type (type, lowval, highval). gcc/testsuite/ChangeLog: PR modula2/111675 * gm2/extensions/run/pass/packedrecord3.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-10-11RISC-V: Fix incorrect index(offset) of gather/scatterJuzhe-Zhong4-17/+39
I suddenly discovered I made a mistake that was lucky un-exposed. https://godbolt.org/z/c3jzrh7or GCC is using 32 bit index offset: vsll.vi v1,v1,2 vsetvli zero,a5,e32,m1,ta,ma vluxei32.v v1,(a1),v1 This is wrong since v1 may overflow 32bit after vsll.vi. After this patch: vsext.vf2 v8,v4 vsll.vi v8,v8,2 vluxei64.v v8,(a1),v8 Same as Clang. Regression passed. Ok for trunk ? gcc/ChangeLog: * config/riscv/autovec.md: Fix index bug. * config/riscv/riscv-protos.h (gather_scatter_valid_offset_mode_p): New function. * config/riscv/riscv-v.cc (expand_gather_scatter): Fix index bug. (gather_scatter_valid_offset_mode_p): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New test.
2023-10-11RISC-V: Support FP lrint/lrintf auto vectorizationPan Li12-0/+348
This patch would like to support the FP lrint/lrintf auto vectorization. * long lrint (double) for rv64 * long lrintf (float) for rv32 Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lrintmn2 only act on DF => DI for rv64, and SF => SI for rv32. Given we have code like: void test_lrint (long *out, double *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_lrint (in[i]); } Before this patch: .L3: ... fld fa5,0(a1) fcvt.l.d a5,fa5,dyn sd a5,-8(a0) ... bne a1,a4,.L3 After this patch: .L3: ... vsetvli a3,zero,e64,m1,ta,ma vfcvt.x.f.v v1,v1 vsetvli zero,a2,e64,m1,ta,ma vse32.v v1,0(a0) ... bne a2,zero,.L3 The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lrint<mode><vlconvert>2): New pattern for lrint/lintf. * config/riscv/riscv-protos.h (expand_vec_lrint): New func decl for expanding lint. * config/riscv/riscv-v.cc (emit_vec_cvt_x_f): New helper func impl for vfcvt.x.f.v. (expand_vec_lrint): New function impl for expanding lint. * config/riscv/vector-iterators.md: New mode attr and iterator. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/test-math.h: New define for CVT like test case. * gcc.target/riscv/rvv/autovec/vls/def.h: Ditto. * gcc.target/riscv/rvv/autovec/unop/math-lrint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lrint-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lrint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lrint-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lrint-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lrint-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-11RISC-V: Remove XFAIL of ssa-dom-cse-2.cJuzhe-Zhong1-1/+1
Confirm RISC-V is able to CSE this case no matter whether we enable RVV or not. Remove XFAIL, to fix: XPASS: gcc.dg/tree-ssa/ssa-dom-cse-2.c scan-tree-dump optimized "return 28;" gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove riscv.
2023-10-11tree-ssa-strlen: optimization skips clobbering store [PR111519]Jakub Jelinek2-22/+79
The following testcase is miscompiled, because count_nonzero_bytes incorrectly uses get_strinfo information on a pointer from which an earlier instruction loads SSA_NAME stored at the current instruction. get_strinfo shows a state right before the current store though, so if there are some stores in between the current store and the load, the string length information might have changed. The patch passes around gimple_vuse from the store and punts instead of using strinfo on loads from MEM_REF which have different gimple_vuse from that. 2023-10-11 Richard Biener <rguenther@suse.de> Jakub Jelinek <jakub@redhat.com> PR tree-optimization/111519 * tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Add vuse argument and pass it through to recursive calls and count_nonzero_bytes_addr calls. Don't shadow the stmt argument, but change stmt for gimple_assign_single_p statements for which we don't immediately punt. (strlen_pass::count_nonzero_bytes_addr): Add vuse argument and pass it through to recursive calls and count_nonzero_bytes calls. Don't use get_strinfo if gimple_vuse (stmt) is different from vuse. Don't shadow the stmt argument. * gcc.dg/torture/pr111519.c: New testcase.
2023-10-11Optimize (ne:SI (subreg:QI (ashift:SI x 7) 0) 0) as (and:SI x 1).Roger Sayle2-0/+27
This patch is the middle-end piece of an improvement to PRs 101955 and 106245, that adds a missing simplification to the RTL optimizers. This transformation is to simplify (char)(x << 7) != 0 as x & 1. Technically, the cast can be any truncation, where shift is by one less than the narrower type's precision, setting the most significant (only) bit from the least significant bit. This transformation applies to any target, but it's easy to see (and add a new test case) on x86, where the following function: int f(int a) { return (a << 31) >> 31; } currently gets compiled with -O2 to: foo: movl %edi, %eax sall $7, %eax sarb $7, %al movsbl %al, %eax ret but with this patch, we now generate the slightly simpler. foo: movl %edi, %eax sall $31, %eax sarl $31, %eax ret 2023-10-11 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR middle-end/101955 PR tree-optimization/106245 * simplify-rtx.cc (simplify_relational_operation_1): Simplify the RTL (ne:SI (subreg:QI (ashift:SI x 7) 0) 0) to (and:SI x 1). gcc/testsuite/ChangeLog * gcc.target/i386/pr106245-1.c: New test case.
2023-10-11RISC-V: Enable full coverage vect testsJuzhe-Zhong1-12/+33
I have analyzed all existing FAILs. Except these following FAILs need to be addressed: FAIL: gcc.dg/vect/slp-reduc-7.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/slp-reduc-7.c execution test FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_(LEN_)?SUB" FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_(LEN_)?SUB" All other FAILs are dumple fail can be ignored (Confirm ARM SVE also has such FAILs and didn't fix them on either tests or implementation). Now, It's time to enable full coverage vect tests including vec_unpack, vec_pack, vec_interleave, ... etc. To see what we are still missing: Before this patch: === gcc Summary === # of expected passes 182839 # of unexpected failures 79 # of unexpected successes 11 # of expected failures 1275 # of unresolved testcases 4 # of unsupported tests 4223 After this patch: === gcc Summary === # of expected passes 183411 # of unexpected failures 93 # of unexpected successes 7 # of expected failures 1285 # of unresolved testcases 4 # of unsupported tests 4157 There is an important issue increased that I have noticed after this patch: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts" It has a related PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111721 I am gonna fix this first in the middle-end after commit this patch. Ok for trunk ? gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add RVV.
2023-10-11Refine predicate of operands[2] in divv4hf3 with register_operand.liuhongt2-1/+19
In the expander, it will emit below insn. rtx tmp = gen_rtx_VEC_CONCAT (V4SFmode, operands[2], force_reg (V2SFmode, CONST1_RTX (V2SFmode))); but *vec_concat<mode> only allow register_operand. gcc/ChangeLog: PR target/111745 * config/i386/mmx.md (divv4hf3): Refine predicate of operands[2] with register_operand. gcc/testsuite/ChangeLog: * gcc.target/i386/pr111745.c: New test.
2023-10-11RISC-V Regression: Make pattern match more accurate of vect-live-2.cJuzhe-Zhong1-1/+1
Like previous patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632400.html https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac11b@gmail.com/ gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-live-2.c: Make pattern match more accurate.
2023-10-11RISC-V Regression: Fix FAIL of vect-multitypes-16.c for RVVJuzhe-Zhong2-2/+11
As Richard suggested: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632288.html Add vect_ext_char_longlong to fix FAIL for RVV. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-multitypes-16.c: Adapt check for RVV. * lib/target-supports.exp: Add vect_ext_char_longlong property.
2023-10-11Daily bump.GCC Administrator6-1/+168
2023-10-10RISC-V: far-branch: Handle far jumps and branches for functions larger than 1MBAndrew Waterman5-24/+111
On RISC-V, branches further than +/-1MB require a longer instruction sequence (3 instructions): we can reuse the jump-construction in the assmbler (which clobbers $ra) and a temporary to set up the jump destination. gcc/ChangeLog: * config/riscv/riscv.cc (struct machine_function): Track if a far-branch/jump is used within a function (and $ra needs to be saved). (riscv_print_operand): Implement 'N' (inverse integer branch). (riscv_far_jump_used_p): Implement. (riscv_save_return_addr_reg_p): New function. (riscv_save_reg_p): Use riscv_save_return_addr_reg_p. * config/riscv/riscv.h (FIXED_REGISTERS): Update $ra. (CALL_USED_REGISTERS): Update $ra. * config/riscv/riscv.md: Add new types "ret" and "jalr". (length attribute): Handle long conditional and unconditional branches. (conditional branch pattern): Handle case where jump can not reach the intended target. (indirect_jump, tablejump): Use new "jalr" type. (simple_return): Use new "ret" type. (simple_return_internal, eh_return_internal): Likewise. (gpr_restore_return, riscv_mret): Likewise. (riscv_uret, riscv_sret): Likewise. * config/riscv/generic.md (generic_branch): Also recognize jalr & ret types. * config/riscv/sifive-7.md (sifive_7_jump): Likewise. Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2023-10-10c++: mangle multiple levels of template parms [PR109422]Jason Merrill3-0/+32
This becomes be more important with concepts, but can also be seen with generic lambdas. PR c++/109422 gcc/cp/ChangeLog: * mangle.cc (write_template_param): Also mangle level. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/lambda-generic-mangle1.C: New test. * g++.dg/cpp2a/lambda-generic-mangle1a.C: New test.
2023-10-10MATCH: [PR111679] Add alternative simplification of `a | ((~a) ^ b)`Andrew Pinski2-0/+35
So currently we have a simplification for `a | ~(a ^ b)` but that does not match the case where we had originally `(~a) | (a ^ b)` so we need to add a new pattern that matches that and uses bitwise_inverted_equal_p that also catches comparisons too. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/111679 gcc/ChangeLog: * match.pd (`a | ((~a) ^ b)`): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bitops-5.c: New test.
2023-10-10RISC-V Regression: Make match patterns more accurateJuzhe-Zhong2-2/+2
This patch fixes following 2 FAILs in RVV regression since the check is not accurate. It's inspired by Robin's previous patch: https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac11b@gmail.com/ gcc/testsuite/ChangeLog: * gcc.dg/vect/no-scevccp-outer-7.c: Adjust regex pattern. * gcc.dg/vect/no-scevccp-vect-iv-3.c: Ditto.
2023-10-10RISC-V Regression: Fix FAIL of predcom-2.cJuzhe-Zhong1-1/+1
Like GCN, add -fno-tree-vectorize. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/predcom-2.c: Add riscv.
2023-10-10RISC-V Regression: Fix FAIL of pr65947-8.c for RVVJuzhe-Zhong1-3/+3
This test is testing fold_extract_last pattern so it's more reasonable use vect_fold_extract_last instead of specifying targets. This is the vect_fold_extract_last property: proc check_effective_target_vect_fold_extract_last { } { return [expr { [check_effective_target_aarch64_sve] || [istarget amdgcn*-*-*] || [check_effective_target_riscv_v] }] } include ARM SVE/GCN/RVV. It perfectly matches what we want and more reasonable, better maintainment. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr65947-8.c: Use vect_fold_extract_last.