aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-07-12libstdc++: Fix --enable-cstdio=stdio_pure [PR110574]Jonathan Wakely4-18/+133
When configured with --enable-cstdio=stdio_pure we need to consistently use fseek and not mix seeks on the file descriptor with reads and writes on the FILE stream. There are also a number of bugs related to error handling and return values, because fread and fwrite return 0 on error, not -1, and fseek returns 0 on success, not the file offset. libstdc++-v3/ChangeLog: PR libstdc++/110574 * acinclude.m4 (GLIBCXX_CHECK_LFS): Check for fseeko and ftello and define _GLIBCXX_USE_FSEEKO_FTELLO. * config.h.in: Regenerate. * configure: Regenerate. * config/io/basic_file_stdio.cc (xwrite) [_GLIBCXX_USE_STDIO_PURE]: Check for fwrite error correctly. (__basic_file<char>::xsgetn) [_GLIBCXX_USE_STDIO_PURE]: Check for fread error correctly. (get_file_offset): New function. (__basic_file<char>::seekoff) [_GLIBCXX_USE_STDIO_PURE]: Use fseeko if available. Use get_file_offset instead of return value of fseek. (__basic_file<char>::showmanyc): Use get_file_offset.
2023-07-12IRA+LRA: Change return type of predicate functions from int to boolUros Bizjak2-26/+26
gcc/ChangeLog: * ira.cc (equiv_init_varies_p): Change return type from int to bool and adjust function body accordingly. (equiv_init_movable_p): Ditto. (memref_used_between_p): Ditto. * lra-constraints.cc (valid_address_p): Ditto.
2023-07-12libstdc++: Use __is_enum built-in traitKen Matsui1-3/+3
This patch replaces is_enum<T>::value with __is_enum built-in trait in the type_traits header. libstdc++-v3/ChangeLog: * include/std/type_traits (__make_unsigned_selector): Use __is_enum built-in trait. (__make_signed_selector): Likewise. (__underlying_type_impl): Likewise. Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
2023-07-12[range-op] Enable value/mask propagation in range-op.Aldy Hernandez2-32/+23
Throw the switch in range-ops to make full use of the value/mask information instead of only the nonzero bits. This will cause most of the operators implemented in range-ops to use the value/mask information calculated by CCP's bit_value_binop() function which range-ops uses. This opens up more optimization opportunities. In follow-up patches I will change the global range setter (set_range_info) to be able to save the value/mask pair, and make both CCP and IPA be able to save the known ones bit info, instead of throwing it away. gcc/ChangeLog: * range-op.cc (irange_to_masked_value): Remove. (update_known_bitmask): Update irange value/mask pair instead of only updating nonzero bits. gcc/testsuite/ChangeLog: * gcc.dg/pr83073.c: Adjust testcase.
2023-07-12Improve profile update in loop-chJan Hubicka4-80/+174
Improve profile update in loop-ch to handle situation where duplicated header has loop invariant test. In this case we konw that all count of the exit edge belongs to the duplicated loop header edge and can update probabilities accordingly. Since we also do all the work to track this information from analysis to duplicaiton I also added code to turn those conditionals to constants so we do not need later jump threading pass to clean up. This made me to work out that the propagation was buggy in few aspects 1) it handled every PHI as PHI in header and incorrectly assigned some PHIs to be IV-like when they are not 2) it did not check for novops calls that are not required to return same value on every invocation. 3) I also added check for asm statement since those are not necessarily reproducible either. I would like to do more changes, but tried to prevent this patch from snowballing. The analysis of what statements will remain after duplication can be improved. I think we should use ranger query for other than first basic block, too and possibly drop the IV heuristics then. Also it seems that a lot of this logic is pretty much same to analysis in peeling pass, so unifying this would be nice. I also think I should move the profile update out of gimple_duplicate_sese_region (it is now very specific to ch) and rename it, since those regions are singe entry multiple exit. Bootstrapped/regtsted x86_64-linux, OK? Honza gcc/ChangeLog: * tree-cfg.cc (gimple_duplicate_sese_region): Add ORIG_ELIMINATED_EDGES parameter and rewrite profile updating code to handle edges elimination. * tree-cfg.h (gimple_duplicate_sese_region): Update prototpe. * tree-ssa-loop-ch.cc (loop_invariant_op_p): New function. (loop_iv_derived_p): New function. (should_duplicate_loop_header_p): Track invariant exit edges; fix handling of PHIs and propagation of IV derived variables. (ch_base::copy_headers): Pass around the invariant edges hash set. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/loop-ch-profile-1.c: Remove xfail.
2023-07-12riscv: thead: Fix failing XTheadCondMov tests (indirect-rv[32|64])Christoph Müllner3-208/+118
Recently, two identical XTheadCondMov tests have been added, which both fail. Let's fix that by changing the following: * Merge both files into one (no need for separate tests for rv32 and rv64) * Drop unrelated attribute check test (we already test for `th.mveqz` and `th.mvnez` instructions, so there is little additional value) * Fix the pattern to allow matching Fixes: a1806f0918c0 ("RISC-V: Optimize TARGET_XTHEADCONDMOV") gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved to... * gcc.target/riscv/xtheadcondmov-indirect.c: ...here. * gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Removed. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2023-07-12ifcvt: Change return type of predicate functions from int to boolUros Bizjak1-319/+321
Also change some internal variables and function arguments from int to bool. gcc/ChangeLog: * ifcvt.cc (cond_exec_changed_p): Change variable to bool. (last_active_insn): Change "skip_use_p" function argument to bool. (noce_operand_ok): Change return type from int to bool. (find_cond_trap): Ditto. (block_jumps_and_fallthru_p): Change "fallthru_p" and "jump_p" variables to bool. (noce_find_if_block): Change return type from int to bool. (cond_exec_find_if_block): Ditto. (find_if_case_1): Ditto. (find_if_case_2): Ditto. (dead_or_predicable): Ditto. Change "reversep" function arg to bool. (block_jumps_and_fallthru): Rename from block_jumps_and_fallthru_p. (cond_exec_process_insns): Change return type from int to bool. Change "mod_ok" function arg to bool. (cond_exec_process_if_block): Change return type from int to bool. Change "do_multiple_p" function arg to bool. Change "then_mod_ok" variable to bool. (noce_emit_store_flag): Change return type from int to bool. Change "reversep" function arg to bool. Change "cond_complex" variable to bool. (noce_try_move): Change return type from int to bool. (noce_try_ifelse_collapse): Ditto. (noce_try_store_flag): Ditto. Change "reversep" variable to bool. (noce_try_addcc): Change return type from int to bool. Change "subtract" variable to bool. (noce_try_store_flag_constants): Change return type from int to bool. (noce_try_store_flag_mask): Ditto. Change "reversep" variable to bool. (noce_try_cmove): Change return type from int to bool. (noce_try_cmove_arith): Ditto. Change "is_mem" variable to bool. (noce_try_minmax): Change return type from int to bool. Change "unsignedp" variable to bool. (noce_try_abs): Change return type from int to bool. Change "negate" variable to bool. (noce_try_sign_mask): Change return type from int to bool. (noce_try_move): Ditto. (noce_try_store_flag_constants): Ditto. (noce_try_cmove): Ditto. (noce_try_cmove_arith): Ditto. (noce_try_minmax): Ditto. Change "unsignedp" variable to bool. (noce_try_bitop): Change return type from int to bool. (noce_operand_ok): Ditto. (noce_convert_multiple_sets): Ditto. (noce_convert_multiple_sets_1): Ditto. (noce_process_if_block): Ditto. (check_cond_move_block): Ditto. (cond_move_process_if_block): Ditto. Change "success_p" variable to bool. (rest_of_handle_if_conversion): Change return type to void.
2023-07-12VECT: Apply COND_LEN_* into vectorizable_operationJu-Zhe Zhong3-29/+91
Hi, Richard and Richi. As we disscussed before, COND_LEN_* patterns were added for multiple situations. This patch apply CON_LEN_* for the following situation: Support for the situation that in "vectorizable_operation": /* If operating on inactive elements could generate spurious traps, we need to restrict the operation to active lanes. Note that this specifically doesn't apply to unhoisted invariants, since they operate on the same value for every lane. Similarly, if this operation is part of a reduction, a fully-masked loop should only change the active lanes of the reduction chain, keeping the inactive lanes as-is. */ bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt)) || reduc_idx >= 0); For mask_out_inactive is true with length loop control. So, we can these 2 following cases: 1. Integer division: #define TEST_TYPE(TYPE) \ __attribute__((noipa)) \ void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \ { \ for (int i = 0; i < n; i++) \ dst[i] = a[i] % b[i]; \ } #define TEST_ALL() \ TEST_TYPE(int8_t) \ TEST_ALL() With this patch: _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]); ivtmp_45 = _61 * 4; vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... }); vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... }); vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, vect__4.8_48, _61, 0); .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53); 2. Floating-point arithmetic **WITHOUT** -ffast-math #define TEST_TYPE(TYPE) \ __attribute__((noipa)) \ void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \ { \ for (int i = 0; i < n; i++) \ dst[i] = a[i] + b[i]; \ } #define TEST_ALL() \ TEST_TYPE(float) \ TEST_ALL() With this patch: _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]); ivtmp_45 = _61 * 4; vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... }); vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... }); vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, vect__4.8_48, _61, 0); .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53); With this patch, we can make sure operations won't trap for elements that "mask_out_inactive". gcc/ChangeLog: * internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* support. (CASE): Ditto. (get_conditional_len_internal_fn): New function. * internal-fn.h (get_conditional_len_internal_fn): Ditto. * tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* support.
2023-07-12libgomp.texi: add cross ref, remove duplicated entryTobias Burnus1-3/+1
libgomp/ * libgomp.texi (OpenMP 5.0): Replace '... stub' by @ref to 'Memory allocation' section which contains the full status. (TR11): Remove differently worded duplicated entry.
2023-07-12i386: Fix FAIL of gcc.target/i386/pr91681-1.cRoger Sayle1-1/+1
I committed the wrong version of this patch (with a typo). Updating to the correct bootstrapped and regression tested version as obvious. 2023-07-12 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/91681 * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): Typo.
2023-07-12i386: Fix FAIL of gcc.target/i386/pr91681-1.cRoger Sayle1-0/+33
The recent change in TImode parameter passing on x86_64 results in the FAIL of pr91681-1.c. The issue is that with the extra flexibility, the combine pass is now spoilt for choice between using either the *add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext patterns, when one operand is a *concat and the other is a zero_extend. The solution proposed below is provide an *add<dwi>3_doubleword_concat_zext define_insn_and_split, that can benefit both from the register allocation of *concat, and still avoid the xor normally required by zero extension. I'm investigating a follow-up refinement to improve register allocation further by avoiding the early clobber in the =&r, and handling (custom) reloads explicitly, but this piece resolves the testcase failure. 2023-07-12 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/91681 * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New define_insn_and_split derived from *add<dwi>3_doubleword_concat and *add<dwi>3_doubleword_zext.
2023-07-12PR target/110598: Fix rega = 0; rega ^= rega regression in i386.mdRoger Sayle2-2/+60
This patch fixes the regression PR target/110598 caused by my recent addition of a peephole2. The intention of that optimization was to simplify zeroing a register, followed by an IOR, XOR or PLUS operation on it into a move, or as described in the comment: ;; Peephole2 rega = 0; rega op= regb into rega = regb. The issue is that I'd failed to consider the (rare and unusual) case, where regb is rega, where the transformation leads to the incorrect "rega = rega", when it should be "rega = 0". The minimal fix is to add a !reg_mentioned_p check to the recent peephole2. In addition to resolving the regression, I've added a second peephole2 to optimize the problematic case above, which contains a false dependency and is therefore tricky to optimize elsewhere. This is an improvement over GCC 13, for example, that generates the redundant: xorl %edx, %edx xorq %rdx, %rdx 2023-07-12 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/110598 * config/i386/i386.md (peephole2): Check !reg_mentioned_p when optimizing rega = 0; rega op= regb for op in [XOR,IOR,PLUS]. (peephole2): Simplify rega = 0; rega op= rega cases. gcc/testsuite/ChangeLog PR target/110598 * gcc.target/i386/pr110598.c: New test case.
2023-07-12i386: Tweak ix86_expand_int_compare to use PTEST for vector equality.Roger Sayle1-1/+18
I've come up with an alternate/complementary/supplementary fix to the patch https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622706.html for generating the PTEST during RTL expansion, rather than rely on this being caught/optimized later during STV. You'll notice in this patch, the tests for TARGET_SSE4_1 and TImode appear last. When I was writing this, I initially also added support for AVX VPTEST and OImode, before realizing that x86 doesn't (yet) support 256-bit OImode (which also explains why we don't have an OImode to V1OImode scalar-to-vector pass). Retaining this clause ordering should minimize the lines changed if things change in future. 2023-07-12 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_int_compare): If testing a TImode SUBREG of a 128-bit vector register against zero, use a PTEST instruction instead of first moving it to a pair of scalar registers.
2023-07-12genopinit: Allow more than 256 modes.Robin Dapp3-6/+5
Upcoming changes for RISC-V will have us exceed 255 modes or 8 bits. This patch increases the limit to 10 bits and adjusts the hashing function for the gen* and optabs-query lookups accordingly. Consequently, the number of optabs is limited to 4095. gcc/ChangeLog: * genopinit.cc (main): Adjust maximal number of optabs and machine modes. * gensupport.cc (find_optab): Shift optab by 20 and mode by 10 bits. * optabs-query.h (optab_handler): Ditto. (convert_optab_handler): Ditto.
2023-07-12libgomp: Use libnuma for OpenMP's partition=nearest allocation traitTobias Burnus5-39/+708
As with the memkind library, it is only used when found at runtime; it does not need to be present when building GCC. The included testcase does not check whether the memory has been placed on the nearest node as the Linux kernel memory handling too often ignores that hint, using a different node for the allocation. However, when running with 'numactl --preferred=<node> ./executable', it is clearly visible that the feature works by comparing malloc/default vs. nearest placement (using get_mempolicy to obtain the node for a mem addr). libgomp/ChangeLog: * allocator.c: Add ifdef for LIBGOMP_USE_LIBNUMA. (enum gomp_numa_memkind_kind): Renamed from gomp_memkind_kind; add GOMP_MEMKIND_LIBNUMA. (struct gomp_libnuma_data, gomp_init_libnuma, gomp_get_libnuma): New. (omp_init_allocator): Handle partition=nearest with libnuma if avail. (omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add numa_alloc_local (+ memset), numa_free, and numa_realloc calls as needed. * config/linux/allocator.c (LIBGOMP_USE_LIBNUMA): Define * libgomp.texi: Fix a typo; use 'fi' instead of its ligature char. (Memory allocation): Renamed from 'Memory allocation with libmemkind'; updated for libnuma usage. * testsuite/libgomp.c-c++-common/alloc-11.c: New test. * testsuite/libgomp.c-c++-common/alloc-12.c: New test.
2023-07-12gfortran: Allow ref'ing PDT's len() in parameter-initializer.Andre Vehreschild4-16/+94
Fix declaring a parameter initialized using a pdt_len reference not simplifying the reference to a constant. 2023-07-12 Andre Vehreschild <vehre@gcc.gnu.org> gcc/fortran/ChangeLog: PR fortran/102003 * expr.cc (find_inquiry_ref): Replace len of pdt_string by constant. (simplify_ref_chain): Ensure input to find_inquiry_ref is NULL. (gfc_match_init_expr): Prevent PDT analysis for function calls. (gfc_pdt_find_component_copy_initializer): Get the initializer value for given component. * gfortran.h (gfc_pdt_find_component_copy_initializer): New function. * simplify.cc (gfc_simplify_len): Replace len() of PDT with pdt component ref or constant. gcc/testsuite/ChangeLog: * gfortran.dg/pdt_33.f03: New test.
2023-07-12tree-optimization/110630 - enhance SLP permute supportRichard Biener3-5/+25
The following enhances the existing lowpart extraction support for SLP VEC_PERM nodes to cover all vector aligned extractions. This allows the existing bb-slp-pr95839.c testcase to be vectorized with mips -mpaired-single and the new bb-slp-pr95839-3.c testcase with SSE2. PR tree-optimization/110630 * tree-vect-slp.cc (vect_add_slp_permutation): New offset parameter, honor that for the extract code generation. (vectorizable_slp_permutation_1): Handle offsetted identities. * gcc.dg/vect/bb-slp-pr95839.c: Make stricter. * gcc.dg/vect/bb-slp-pr95839-3.c: New variant testcase.
2023-07-12RISC-V: Support integer mult highpart auto-vectorizationJu-Zhe Zhong5-0/+141
This patch is adding an obvious missing mult_high auto-vectorization pattern. Consider this following case: void __attribute__ ((noipa)) \ mod_##TYPE (TYPE *__restrict dst, TYPE *__restrict src, int count) \ { \ for (int i = 0; i < count; ++i) \ dst[i] = src[i] / 17; \ } T (int32_t) \ TEST_ALL (DEF_LOOP) Before this patch: mod_int32_t: ble a2,zero,.L5 li a5,17 vsetvli a3,zero,e32,m1,ta,ma vmv.v.x v2,a5 .L3: vsetvli a5,a2,e8,mf4,ta,ma vle32.v v1,0(a1) vsetvli a3,zero,e32,m1,ta,ma slli a4,a5,2 vdiv.vv v1,v1,v2 sub a2,a2,a5 vsetvli zero,a5,e32,m1,ta,ma vse32.v v1,0(a0) add a1,a1,a4 add a0,a0,a4 bne a2,zero,.L3 .L5: ret After this patch: mod_int32_t: ble a2,zero,.L5 li a5,2021163008 addiw a5,a5,-1927 vsetvli a3,zero,e32,m1,ta,ma vmv.v.x v3,a5 .L3: vsetvli a5,a2,e8,mf4,ta,ma vle32.v v2,0(a1) vsetvli a3,zero,e32,m1,ta,ma slli a4,a5,2 vmulh.vv v1,v2,v3 sub a2,a2,a5 vsra.vi v2,v2,31 vsra.vi v1,v1,3 vsub.vv v1,v1,v2 vsetvli zero,a5,e32,m1,ta,ma vse32.v v1,0(a0) add a1,a1,a4 add a0,a0,a4 bne a2,zero,.L3 .L5: ret Even though a single "vdiv" is lower into "1 vmulh + 2 vsra + 1 vsub", 4 more instructions are generated, we belive it's much better than before since division is very slow in the hardward. gcc/ChangeLog: * config/riscv/autovec.md (smul<mode>3_highpart): New pattern. (umul<mode>3_highpart): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/mulh-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/mulh-2.c: New test. * gcc.target/riscv/rvv/autovec/binop/mulh_run-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/mulh_run-2.c: New test.
2023-07-12x86: improve fast bfloat->float conversionJan Beulich1-8/+14
There's nothing AVX512BW-ish in here, so no reason to use Yw as the constraints for the AVX alternative. Furthermore by using the 512-bit form of VPSSLD (in a new alternative) all 32 registers can be used directly by the insn without AVX512VL needing to be enabled. Also adjust the originally last alternative's "prefix" attribute to maybe_evex. gcc/ * config/i386/i386.md (extendbfsf2_1): Add new AVX512F alternative. Adjust original last alternative's "prefix" attribute to maybe_evex.
2023-07-12x86: make better use of VBROADCASTSS / VPBROADCASTDJan Beulich5-19/+104
... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are never longer (yet sometimes shorter) than the corresponding VSHUFPS / VPSHUFD, due to the immediate operand of the shuffle insns balancing the (uniform) need for VEX3 in the broadcast ones. When EVEX encoding is respective the broadcast insns are always shorter. Add new alternatives to cover the AVX2 and AVX512 cases as appropriate. While touching this anyway, switch to consistently using "sseshuf1" in the "type" attributes for all shuffle forms. gcc/ * config/i386/sse.md (vec_dupv4sf): Make first alternative use vbroadcastss for AVX2. New AVX512F alternative. (*vec_dupv4si): New AVX2 and AVX512F alternatives using vpbroadcastd. Replace sselog1 by sseshuf1 in "type" attribute. gcc/testsuite/ * gcc.target/i386/avx2-dupv4sf.c: New test. * gcc.target/i386/avx2-dupv4si.c: Likewise. * gcc.target/i386/avx512f-dupv4sf.c: Likewise. * gcc.target/i386/avx512f-dupv4si.c: Likewise.
2023-07-12riscv: thead: Factor out XThead*-specific peepholesChristoph Müllner3-56/+76
This patch moves the XThead*-specific peephole passes into thead-peephole.md with the intend to keep vendor-specific code separated from RISC-V standard code. This patch does not contain any functional changes. gcc/ChangeLog: * config/riscv/peephole.md: Remove XThead* peephole passes. * config/riscv/thead.md: Include thead-peephole.md. * config/riscv/thead-peephole.md: New file. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2023-07-12riscv: Prepare backend for index registersChristoph Müllner3-2/+26
RISC-V does currently not support index registers. However, there are some vendor extensions that specify them. Let's do the necessary changes in the backend so that we can add support for such a vendor extension in the future. This is a non-functional change without any intended side-effects. gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_regno_ok_for_index_p): New prototype. (riscv_index_reg_class): Likewise. * config/riscv/riscv.cc (riscv_regno_ok_for_index_p): New function. (riscv_index_reg_class): New function. * config/riscv/riscv.h (INDEX_REG_CLASS): Call new function riscv_index_reg_class(). (REGNO_OK_FOR_INDEX_P): Call new function riscv_regno_ok_for_index_p(). Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2023-07-12riscv: Move address classification info types to riscv-protos.hChristoph Müllner2-43/+43
enum riscv_address_type and struct riscv_address_info are used to store address classification information. Let's move this types into our common header file in order to share them with other compilation units. This is a non-functional change without any intendet side-effects. gcc/ChangeLog: * config/riscv/riscv-protos.h (enum riscv_address_type): New location of type definition. (struct riscv_address_info): Likewise. * config/riscv/riscv.cc (enum riscv_address_type): Old location of type definition. (struct riscv_address_info): Likewise. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2023-07-12riscv: Define Xmode macroChristoph Müllner1-0/+4
Define a Xmode macro that specifies the registers size (XLEN) similar to Pmode. This allows the backend code to write generic RV32/RV64 C code (under certain circumstances). gcc/ChangeLog: * config/riscv/riscv.h (Xmode): New macro. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2023-07-12riscv: Simplify output of MEM addressesChristoph Müllner1-1/+1
We have the following situation for MEM RTX objects: * TARGET_PRINT_OPERAND expands to riscv_print_operand() * This falls into the default case (unknown or on letter) of the outer switch-case-block and the MEM case of the inner switch-case-block and calls output_address() in final.cc with XEXP (op, 0) (the address) * This calls targetm.asm_out.print_operand_address() which is riscv_print_operand_address() * riscv_print_operand_address() is targeting the address of a MEM RTX * riscv_print_operand_address() calls riscv_print_operand() for the offset and directly prints the register if the address is classified as ADDRESS_REG * This falls into the default case (unknown or on letter) of the outer switch-case-block and the default case of the inner switch-case-block and calls output_addr_const(). However, since we know that offset must be a CONST_INT (which will be followed by a '(<reg>)' string), there is no need to call riscv_print_operand() for the offset. Instead we can take the shortcut and use output_addr_const(). This change also brings the code in riscv_print_operand_address() in line with the other cases, where output_addr_const() is used to print offsets. Tested with GCC regression test suite and SPEC intrate. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> gcc/ChangeLog: * config/riscv/riscv.cc (riscv_print_operand_address): Use output_addr_const rather than riscv_print_operand.
2023-07-12riscv: thead: Adjust constraints of th_addsl INSNChristoph Müllner1-3/+2
A recent change adjusted the constraints of ZBA's shNadd INSN. Let's mirror this change here as well. gcc/ChangeLog: * config/riscv/thead.md: Adjust constraints of th_addsl. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2023-07-12riscv: xtheadmempair: Fix doc for th_mempair_order_operands()Christoph Müllner1-2/+2
There is an incorrect sentence in the documentation of the function th_mempair_order_operands(). Let's remove it. gcc/ChangeLog: * config/riscv/thead.cc (th_mempair_operands_p): Fix documentation of th_mempair_order_operands(). Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2023-07-12riscv: xtheadmempair: Fix CFA reg notesChristoph Müllner1-2/+6
The current implementation triggers an assertion in dwarf2out_frame_debug_cfa_offset() under certain circumstances. The standard code uses REG_FRAME_RELATED_EXPR notes instead of REG_CFA_OFFSET notes when saving registers on the stack. So let's do this as well. gcc/ChangeLog: * config/riscv/thead.cc (th_mempair_save_regs): Emit REG_FRAME_RELATED_EXPR notes in prologue. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2023-07-12riscv: xtheadbb: Add sign/zero extension support for th.ext and th.extuChristoph Müllner4-3/+168
The current support of the bitfield-extraction instructions th.ext and th.extu (XTheadBb extension) only covers sign_extract and zero_extract. This patch add support for sign_extend and zero_extend to avoid any shifts for sign or zero extensions. gcc/ChangeLog: * config/riscv/riscv.md: No base-ISA extension splitter for XThead*. * config/riscv/thead.md (*extend<SHORT:mode><SUPERQI:mode>2_th_ext): New XThead extension INSN. (*zero_extendsidi2_th_extu): New XThead extension INSN. (*zero_extendhi<GPR:mode>2_th_extu): New XThead extension INSN. gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadbb-ext-1.c: New test. * gcc.target/riscv/xtheadbb-extu-1.c: New test. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2023-07-12Break false dependence for vpternlog by inserting vpxor or setting ↵liuhongt4-17/+168
constraint of input operand to '0' False dependency happens when destination is only updated by pternlog. There is no false dependency when destination is also used in source. So either a pxor should be inserted, or input operand should be set with constraint '0'. gcc/ChangeLog: PR target/110438 PR target/110202 * config/i386/predicates.md (int_float_vector_all_ones_operand): New predicate. * config/i386/sse.md (*vmov<mode>_constm1_pternlog_false_dep): New define_insn. (*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep): Ditto. (*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep): Ditto. (*<avx512>_cvtmask2<ssemodesuffix><mode>): Adjust to define_insn_and_split to avoid false dependence. (*<avx512>_cvtmask2<ssemodesuffix><mode>): Ditto. (<mask_codefor>one_cmpl<mode>2<mask_name>): Adjust constraint of operands 1 to '0' to avoid false dependence. (*andnot<mode>3): Ditto. (iornot<mode>3): Ditto. (*<nlogic><mode>3): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110438.c: New test. * gcc.target/i386/pr100711-6.c: Adjust testcase.
2023-07-12Initial Granite Rapids D SupportMo, Zewei10-3/+39
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Handle Granite Rapids D. * common/config/i386/i386-common.cc: (processor_alias_table): Add graniterapids-d. * common/config/i386/i386-cpuinfo.h (enum processor_subtypes): Add INTEL_COREI7_GRANITERAPIDS_D. * config.gcc: Add -march=graniterapids-d. * config/i386/driver-i386.cc (host_detect_local_cpu): Handle graniterapids-d. * config/i386/i386.h: (PTA_GRANITERAPIDS_D): New. * doc/extend.texi: Add graniterapids-d. * doc/invoke.texi: Ditto. gcc/testsuite/ChangeLog: * g++.target/i386/mv16.C: Add graniterapids-d. * gcc.target/i386/funcspec-56.inc: Handle new march.
2023-07-12i386: Guard 128 bit VAES builtins with AVX512VLHaochen Jiang3-5/+23
Since commit 24a8acc, 128 bit intrin is enabled for VAES. However, AVX512VL is not checked until we reached into pattern, which reports an ICE. Added an AVX512VL guard at builtin to report error when checking ISA flags. gcc/ChangeLog: * config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins): Add OPTION_MASK_ISA_AVX512VL. * config/i386/i386-expand.cc (ix86_check_builtin_isa_match): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512vl-vaes-1.c: New test.
2023-07-12MAINTAINERS: Add myself to write after approvalHao Liu1-0/+1
ChangeLog: * MAINTAINERS: Add Hao Liu to write after approval
2023-07-11MAINTAINERS: Add myself to write after approvalKen Matsui1-0/+1
ChangeLog: * MAINTAINERS: Add Ken Matsui to write after approval Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
2023-07-12Daily bump.GCC Administrator9-1/+298
2023-07-12RISC-V: Optimize permutation codegen with vcompressJu-Zhe Zhong14-0/+1190
This patch is to recognize specific permutation pattern which can be applied compress approach. Consider this following case: typedef int8_t vnx64i __attribute__ ((vector_size (64))); 1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31, \ 37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81, \ 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, \ 100, 101, 102, 103, 104, 105, 106, 107 void __attribute__ ((noinline, noclone)) test_1 (int8_t *x, int8_t *y, int8_t *out) { vnx64i v1 = *(vnx64i*)x; vnx64i v2 = *(vnx64i*)y; vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64); *(vnx64i*)out = v3; } https://godbolt.org/z/P33nev6cW Before this patch: lui a4,%hi(.LANCHOR0) addi a4,a4,%lo(.LANCHOR0) vl4re8.v v4,0(a4) li a4,64 vsetvli a5,zero,e8,m4,ta,mu vl4re8.v v20,0(a0) vl4re8.v v16,0(a1) vmv.v.x v12,a4 vrgather.vv v8,v20,v4 vmsgeu.vv v0,v4,v12 vsub.vv v4,v4,v12 vrgather.vv v8,v16,v4,v0.t vs4r.v v8,0(a2) ret After this patch: lui a4,%hi(.LANCHOR0) addi a4,a4,%lo(.LANCHOR0) vsetvli a5,zero,e8,m4,ta,ma vl4re8.v v12,0(a1) vl4re8.v v8,0(a0) vlm.v v0,0(a4) vslideup.vi v4,v12,20 vcompress.vm v4,v8,v0 vs4r.v v4,0(a2) ret gcc/ChangeLog: * config/riscv/riscv-protos.h (enum insn_type): Add vcompress optimization. * config/riscv/riscv-v.cc (emit_vlmax_compress_insn): Ditto. (shuffle_compress_patterns): Ditto. (expand_vec_perm_const_1): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-6.c: New test.
2023-07-11testsuite: Skip failing analyzer tests on AIX.David Edelsohn6-0/+6
Some of the analyzer out-of-bounds-diagram tests fail on AIX. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/out-of-bounds-diagram-4.c: Skip on AIX. * gcc.dg/analyzer/out-of-bounds-diagram-5-ascii.c: Same. * gcc.dg/analyzer/out-of-bounds-diagram-5-unicode.c: Same. * gcc.dg/analyzer/out-of-bounds-diagram-7.c: Same. * gcc.dg/analyzer/out-of-bounds-diagram-13.c: Same. * gcc.dg/analyzer/out-of-bounds-diagram-15.c: Same.
2023-07-11Fortran: formal symbol attributes for intrinsic procedures [PR110288]Harald Anlauf2-0/+20
gcc/fortran/ChangeLog: PR fortran/110288 * symbol.cc (gfc_copy_formal_args_intr): When deriving the formal argument attributes from the actual ones for intrinsic procedure calls, take special care of CHARACTER arguments that we do not wrongly treat them formally as deferred-length. gcc/testsuite/ChangeLog: PR fortran/110288 * gfortran.dg/findloc_10.f90: New test.
2023-07-11cfg+gcse: Change return type of predicate functions from int to boolUros Bizjak6-147/+155
Also change some internal variables from int to bool. gcc/ChangeLog: * cfghooks.cc (verify_flow_info): Change "err" variable to bool. * cfghooks.h (struct cfg_hooks): Change return type of verify_flow_info from integer to bool. * cfgrtl.cc (can_delete_note_p): Change return type from int to bool. (can_delete_label_p): Ditto. (rtl_verify_flow_info): Change return type from int to bool and adjust function body accordingly. Change "err" variable to bool. (rtl_verify_flow_info_1): Ditto. (free_bb_for_insn): Change return type to void. (rtl_merge_blocks): Change "b_empty" variable to bool. (try_redirect_by_replacing_jump): Change "fallthru" variable to bool. (verify_hot_cold_block_grouping): Change return type from int to bool. Change "err" variable to bool. (rtl_verify_edges): Ditto. (rtl_verify_bb_insns): Ditto. (rtl_verify_bb_pointers): Ditto. (rtl_verify_bb_insn_chain): Ditto. (rtl_verify_fallthru): Ditto. (rtl_verify_bb_layout): Ditto. (purge_all_dead_edges): Change "purged" variable to bool. * cfgrtl.h (free_bb_for_insn): Change return type from int to void. * postreload-gcse.cc (expr_hasher::equal): Change "equiv_p" to bool. (load_killed_in_block_p): Change return type from int to bool and adjust function body accordingly. (oprs_unchanged_p): Return true/false. (rest_of_handle_gcse2): Change return type to void. * tree-cfg.cc (gimple_verify_flow_info): Change return type from int to bool. Change "err" variable to bool.
2023-07-11rs6000: Update the vsx-vector-6.* tests.Carl Love22-282/+1267
The vsx-vector-6.h file is included into the processor specific test files vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c. The .h file contains a large number of vsx vector built-in tests. The processor specific files contain the number of instructions that the tests are expected to generate for that processor. The tests are compile only. This patch reworks the tests into a series of files for related tests. The new tests consist of a runnable test to verify the built-in argument types and the functional correctness of each built-in. There is also a compile only test that verifies the built-ins generate the expected number of instructions for the various built-in tests. gcc/testsuite/ * gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file. * gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-1op.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file. * gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-2lop.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file. * gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-2op.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file. * gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-3op.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file. * gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-cmp-all.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file. * gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-cmp.c: New test file. * gcc.target/powerpc/vsx-vector-6.h: Remove test file. * gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file. * gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file. * gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.
2023-07-11testsuite: Require vectors of doubles for pr97428.cMaciej W. Rozycki1-0/+1
The pr97428.c test assumes support for vectors of doubles, but some targets only support vectors of floats, causing this test to fail with such targets. Limit this test to targets that support vectors of doubles then. gcc/testsuite/ * gcc.dg/vect/pr97428.c: Limit to `vect_double' targets.
2023-07-11[modula2] Improve uninitialized variable analysis by combining basic blocksGaius Mulley24-240/+979
This patch combines basic blocks for static analysis of uninitialized variables providing that they are not the top of a loop, are not reached by a conditional and are not reached after a procedure call. It also avoids checking array accesses for static analysis. Finally the patch adds switch modifiers to allow static analysis to include conditional branches for subsequent basic block analysis. gcc/ChangeLog: * doc/gm2.texi (-Wuninit-variable-checking=) New item. gcc/m2/ChangeLog: * gm2-compiler/M2BasicBlock.def (InitBasicBlocksFromRange): New parameter ScopeSym. * gm2-compiler/M2BasicBlock.mod (ConvertQuads2BasicBlock): New parameter ScopeSym. (InitBasicBlocksFromRange): New parameter ScopeSym. Call ConvertQuads2BasicBlock with ScopeSym. (DisplayBasicBlocks): Uncomment. * gm2-compiler/M2Code.mod: Replace VariableAnalysis with ScopeBlockVariableAnalysis. (InitialDeclareAndOptiomize): Add parameter scope. (SecondDeclareAndOptimize): Add parameter scope. * gm2-compiler/M2GCCDeclare.mod (DeclareConstructor): Add scope parameter to DeclareTypesConstantsProceduresInRange. (DeclareTypesConstantsProceduresInRange): New parameter scope. Pass scope to DisplayQuadRange. Reformatted. * gm2-compiler/M2GenGCC.def (ConvertQuadsToTree): New parameter scope. * gm2-compiler/M2GenGCC.mod (ConvertQuadsToTree): New parameter scope. * gm2-compiler/M2Optimize.mod (KnownReachable): New parameter scope. * gm2-compiler/M2Options.def (SetUninitVariableChecking): Add arg parameter. * gm2-compiler/M2Options.mod (SetUninitVariableChecking): Add arg parameter and set boolean UninitVariableChecking and UninitVariableConditionalChecking. (UninitVariableConditionalChecking): New boolean set to FALSE. * gm2-compiler/M2Quads.def (IsGoto): New procedure function. (DisplayQuadRange): Add scope parameter. (LoopAnalysis): Add scope parameter. * gm2-compiler/M2Quads.mod: Import PutVarArrayRef. (IsGoto): New procedure function. (LoopAnalysis): Add scope parameter and use MetaErrorT1 instead of WarnStringAt. (BuildStaticArray): Call PutVarArrayRef. (BuildDynamicArray): Call PutVarArrayRef. (DisplayQuadRange): Add scope parameter. (GetM2OperatorDesc): Add relational condition cases. * gm2-compiler/M2Scope.def (ScopeProcedure): Add parameter. * gm2-compiler/M2Scope.mod (DisplayScope): Pass scopeSym to DisplayQuadRange. (ForeachScopeBlockDo): Pass scopeSym to p. * gm2-compiler/M2SymInit.def (VariableAnalysis): Rename to ... (ScopeBlockVariableAnalysis): ... this. * gm2-compiler/M2SymInit.mod (ScopeBlockVariableAnalysis): Add scope parameter. (bbEntry): New pointer to record. (bbArray): New array. (bbFreeList): New variable. (errorList): New list. (IssueConditional): New procedure. (GenerateNoteFlow): New procedure. (IssueWarning): New procedure. (IsUniqueWarning): New procedure. (CheckDeferredRecordAccess): Re-implement. (CheckBinary): Add warning and lst parameters. (CheckUnary): Add warning and lst parameters. (CheckXIndr): Add warning and lst parameters. (CheckIndrX): Add warning and lst parameters. (CheckBecomes): Add warning and lst parameters. (CheckComparison): Add warning and lst parameters. (CheckReadBeforeInitQuad): Add warning and lst parameters to all Check procedures. Add all case quadruple clauses. (FilterCheckReadBeforeInitQuad): Add warning and lst parameters. (CheckReadBeforeInitFirstBasicBlock): Add warning and lst parameters. (bbArrayKill): New procedure. (DumpBBEntry): New procedure. (DumpBBArray): New procedure. (DumpBBSequence): New procedure. (TestBBSequence): New procedure. (CreateBBPermultations): New procedure. (ScopeBlockVariableAnalysis): New procedure. (GetOp3): New procedure. (GenerateCFG): New procedure. (NewEntry): New procedure. (AppendEntry): New procedure. (init): Initialize bbFreeList and errorList. * gm2-compiler/SymbolTable.def (PutVarArrayRef): New procedure. (IsVarArrayRef): New procedure function. * gm2-compiler/SymbolTable.mod (SymVar): ArrayRef new field. (MakeVar): Set ArrayRef to FALSE. (PutVarArrayRef): New procedure. (IsVarArrayRef): New procedure function. * gm2-gcc/init.cc (_M2_M2SymInit_init): New prototype. (init_PerCompilationInit): Add call to _M2_M2SymInit_init. * gm2-gcc/m2options.h (M2Options_SetUninitVariableChecking): New definition. * gm2-lang.cc (gm2_langhook_handle_option): Add new case OPT_Wuninit_variable_checking_. * lang.opt: Wuninit-variable-checking= new entry. gcc/testsuite/ChangeLog: * gm2/switches/uninit-variable-checking/cascade/fail/cascadedif.mod: New test. * gm2/switches/uninit-variable-checking/cascade/fail/switches-uninit-variable-checking-cascade-fail.exp: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-07-11libgomp: Update OpenMP memory allocation doc, fix omp_high_bw_mem_spaceTobias Burnus2-3/+29
libgomp/ * allocator.c (omp_init_allocator): Use malloc for omp_high_bw_mem_space when the memkind lib is unavailable instead of returning omp_null_allocator. * libgomp.texi (OpenMP 5.0): Fix typo. (Memory allocation with libmemkind): Document implementation in more detail.
2023-07-11c++: coercing variable template from current inst [PR110580]Patrick Palka2-1/+19
Here during ahead of time coercion of the variable template-id v1<int>, since we pass only the innermost arguments to coerce_template_parms (and outer arguments are still dependent at this point), substitution of the default template argument V=U just lowers U from level 2 to level 1 rather than replacing it with int as expected. Thus after coercion we incorrectly end up with (effectively) v1<int, T> instead of v1<int, int>. Coercion of a class/alias template-id on the other hand always passes all levels arguments, which avoids this issue. So this patch makes us do the same for variable template-ids. PR c++/110580 gcc/cp/ChangeLog: * pt.cc (lookup_template_variable): Pass all levels of arguments to coerce_template_parms, and use the parameters from the most general template. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/var-templ83.C: New test.
2023-07-11Fix typo in the testcase.liuhongt1-1/+1
Antony Polukhin 2023-07-11 09:51:58 UTC There's a typo at https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/g%2B%2B.target/i386/pr110170.C;h=e638b12a5ee2264ecef77acca86432a9f24b103b;hb=d41a57c46df6f8f7dae0c0a8b349e734806a837b#l87 It should be `|| !test3() || !test3r()` rather than `|| !test3() || !test4r()` gcc/testsuite/ChangeLog: PR target/110170 * g++.target/i386/pr110170.C: Fix typo.
2023-07-11VECT: Add COND_LEN_* operations for loop control with length targetsJu-Zhe Zhong4-0/+157
Hi, Richard and Richi. This patch is adding cond_len_* operations pattern for target support loop control with length. These patterns will be used in these following case: 1. Integer division: void f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int n) { for (int i = 0; i < n; ++i) { a[i] = b[i] / c[i]; } } ARM SVE IR: ... max_mask_36 = .WHILE_ULT (0, bnd.5_32, { 0, ... }); Loop: ... # loop_mask_29 = PHI <next_mask_37(4), max_mask_36(3)> ... vect__4.8_28 = .MASK_LOAD (_33, 32B, loop_mask_29); ... vect__6.11_25 = .MASK_LOAD (_20, 32B, loop_mask_29); vect__8.12_24 = .COND_DIV (loop_mask_29, vect__4.8_28, vect__6.11_25, vect__4.8_28); ... .MASK_STORE (_1, 32B, loop_mask_29, vect__8.12_24); ... next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... }); ... For target like RVV who support loop control with length, we want to see IR as follows: Loop: ... # loop_len_29 = SELECT_VL ... vect__4.8_28 = .LEN_MASK_LOAD (_33, 32B, loop_len_29); ... vect__6.11_25 = .LEN_MASK_LOAD (_20, 32B, loop_len_29); vect__8.12_24 = .COND_LEN_DIV (dummp_mask, vect__4.8_28, vect__6.11_25, vect__4.8_28, loop_len_29, bias); ... .LEN_MASK_STORE (_1, 32B, loop_len_29, vect__8.12_24); ... next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... }); ... Notice here, we use dummp_mask = { -1, -1, .... , -1 } 2. Integer conditional division: Similar case with (1) but with condtion: void f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int32_t * cond, int n) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i] = b[i] / c[i]; } } ARM SVE: ... max_mask_76 = .WHILE_ULT (0, bnd.6_52, { 0, ... }); Loop: ... # loop_mask_55 = PHI <next_mask_77(5), max_mask_76(4)> ... vect__4.9_56 = .MASK_LOAD (_51, 32B, loop_mask_55); mask__29.10_58 = vect__4.9_56 != { 0, ... }; vec_mask_and_61 = loop_mask_55 & mask__29.10_58; ... vect__6.13_62 = .MASK_LOAD (_24, 32B, vec_mask_and_61); ... vect__8.16_66 = .MASK_LOAD (_1, 32B, vec_mask_and_61); vect__10.17_68 = .COND_DIV (vec_mask_and_61, vect__6.13_62, vect__8.16_66, vect__6.13_62); ... .MASK_STORE (_2, 32B, vec_mask_and_61, vect__10.17_68); ... next_mask_77 = .WHILE_ULT (_3, bnd.6_52, { 0, ... }); Here, ARM SVE use vec_mask_and_61 = loop_mask_55 & mask__29.10_58; to gurantee the correct result. However, target with length control can not perform this elegant flow, for RVV, we would expect: Loop: ... loop_len_55 = SELECT_VL ... mask__29.10_58 = vect__4.9_56 != { 0, ... }; ... vect__10.17_68 = .COND_LEN_DIV (mask__29.10_58, vect__6.13_62, vect__8.16_66, vect__6.13_62, loop_len_55, bias); ... Here we expect COND_LEN_DIV predicated by a real mask which is the outcome of comparison: mask__29.10_58 = vect__4.9_56 != { 0, ... }; and a real length which is produced by loop control : loop_len_55 = SELECT_VL 3. conditional Floating-point operations (no -ffast-math): void f (float *restrict a, float *restrict b, int32_t *restrict cond, int n) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i] = b[i] + a[i]; } } ARM SVE IR: max_mask_70 = .WHILE_ULT (0, bnd.6_46, { 0, ... }); ... # loop_mask_49 = PHI <next_mask_71(4), max_mask_70(3)> ... mask__27.10_52 = vect__4.9_50 != { 0, ... }; vec_mask_and_55 = loop_mask_49 & mask__27.10_52; ... vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56); ... next_mask_71 = .WHILE_ULT (_22, bnd.6_46, { 0, ... }); ... For RVV, we would expect IR: ... loop_len_49 = SELECT_VL ... mask__27.10_52 = vect__4.9_50 != { 0, ... }; ... vect__9.17_62 = .COND_LEN_ADD (mask__27.10_52, vect__6.13_56, vect__8.16_60, vect__6.13_56, loop_len_49, bias); ... 4. Conditional un-ordered reduction: int32_t f (int32_t *restrict a, int32_t *restrict cond, int n) { int32_t result = 0; for (int i = 0; i < n; ++i) { if (cond[i]) result += a[i]; } return result; } ARM SVE IR: Loop: # vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)> ... # loop_mask_40 = PHI <next_mask_58(4), max_mask_57(3)> ... mask__17.11_43 = vect__4.10_41 != { 0, ... }; vec_mask_and_46 = loop_mask_40 & mask__17.11_43; ... vect__33.16_51 = .COND_ADD (vec_mask_and_46, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37); ... next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... }); ... Epilogue: _53 = .REDUC_PLUS (vect__33.16_51); [tail call] For RVV, we expect: Loop: # vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)> ... loop_len_40 = SELECT_VL ... mask__17.11_43 = vect__4.10_41 != { 0, ... }; ... vect__33.16_51 = .COND_LEN_ADD (mask__17.11_43, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37, loop_len_40, bias); ... next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... }); ... Epilogue: _53 = .REDUC_PLUS (vect__33.16_51); [tail call] I name these patterns as "cond_len_*" since I want the length operand comes after mask operand and all other operands except length operand same order as "cond_*" patterns. Such order will make life easier in the following loop vectorizer support. gcc/ChangeLog: * doc/md.texi: Add COND_LEN_* operations for loop control with length. * internal-fn.cc (cond_len_unary_direct): Ditto. (cond_len_binary_direct): Ditto. (cond_len_ternary_direct): Ditto. (expand_cond_len_unary_optab_fn): Ditto. (expand_cond_len_binary_optab_fn): Ditto. (expand_cond_len_ternary_optab_fn): Ditto. (direct_cond_len_unary_optab_supported_p): Ditto. (direct_cond_len_binary_optab_supported_p): Ditto. (direct_cond_len_ternary_optab_supported_p): Ditto. * internal-fn.def (COND_LEN_ADD): Ditto. (COND_LEN_SUB): Ditto. (COND_LEN_MUL): Ditto. (COND_LEN_DIV): Ditto. (COND_LEN_MOD): Ditto. (COND_LEN_RDIV): Ditto. (COND_LEN_MIN): Ditto. (COND_LEN_MAX): Ditto. (COND_LEN_FMIN): Ditto. (COND_LEN_FMAX): Ditto. (COND_LEN_AND): Ditto. (COND_LEN_IOR): Ditto. (COND_LEN_XOR): Ditto. (COND_LEN_SHL): Ditto. (COND_LEN_SHR): Ditto. (COND_LEN_FMA): Ditto. (COND_LEN_FMS): Ditto. (COND_LEN_FNMA): Ditto. (COND_LEN_FNMS): Ditto. (COND_LEN_NEG): Ditto. * optabs.def (OPTAB_D): Ditto.
2023-07-11tree-optimization/110614 - SLP splat and re-align (optimized)Richard Biener1-4/+5
The following properly guards the re-align (optimized) paths used on old power CPUs for the added case of SLP splats from non-grouped loads. Testcases are existing in dg-torture. PR tree-optimization/110614 * tree-vect-data-refs.cc (vect_supportable_dr_alignment): SLP splats are not suitable for re-align ops.
2023-07-11ada: Avoid renaming_decl in case of constrained arrayBob Duff1-1/+15
This patch avoids rewriting "X: S := F(...);" as "X: S renames F(...);". That rewrite is incorrect if S is a constrained array subtype, because it changes the semantics. In the original, the bounds of X are that of S. But constraints are ignored in renamings, so the bounds of X would come from F'Result. This can cause spurious Constraint_Errors in some obscure cases. It causes unnecessary checks to be inserted, and even when such checks pass (more common case), they might be less efficient. gcc/ada/ * exp_ch3.adb (Expand_N_Object_Declaration): Avoid transforming to a renaming in case of constrained array that comes from source.
2023-07-11ada: Fix wrong resolution for hidden discriminant in predicateEric Botcazou1-7/+42
The problem occurs for hidden discriminants of private discriminated types. gcc/ada/ * sem_ch13.adb (Replace_Type_References_Generic.Visible_Component): In the case of private discriminated types, return a discriminant only if it is listed in the discriminant part of the declaration.
2023-07-11testsuite: Unbreak pr110557.cc where long is 32-bitXi Ruoyao1-6/+8
On ports with 32-bit long, the test produced excess errors: gcc/testsuite/g++.dg/vect/pr110557.cc:12:8: warning: width of 'Item::y' exceeds its type Reported-by: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> gcc/testsuite/ChangeLog: * g++.dg/vect/pr110557.cc: Use long long instead of long for 64-bit type. (test): Remove an unnecessary cast.