aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-05-08[x86_64] Introduce insvti_highpart define_insn_and_split.Roger Sayle2-1/+38
This is a repost/respin of a patch that was conditionally approved: https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609470.html This patch adds a convenient post-reload splitter for setting/updating the highpart of a TImode variable, using i386's previously added split_double_concat infrastructure. For the new test case below: __int128 foo(__int128 x, unsigned long long y) { __int128 t = (__int128)y << 64; __int128 r = (x & ~0ull) | t; return r; } mainline GCC with -O2 currently generates: foo: movq %rdi, %rcx xorl %eax, %eax xorl %edi, %edi orq %rcx, %rax orq %rdi, %rdx ret with this patch, GCC instead now generates the much better: foo: movq %rdi, %rcx movq %rcx, %rax ret It turns out that the -m32 equivalent of this testcase, already avoids using explict orl/xor instructions, as it gets optimized (in combine) by a completely different path. Given that this idiom isn't seen in 32-bit code (so this pattern doesn't match with -m32), and also that the shorter 32-bit AND bitmask is represented as a CONST_INT rather than a CONST_WIDE_INT, this new define_insn_and_split is implemented for just TARGET_64BIT rather than contort a "generic" implementation using DWI mode iterators. 2023-05-08 Roger Sayle <roger@nextmovesoftware.com> Uros Bizjak <ubizjak@gmail.com> gcc/ChangeLog * config/i386/i386.md (any_or_plus): Move definition earlier. (*insvti_highpart_1): New define_insn_and_split to overwrite (insv) the highpart of a TImode register/memory. gcc/testsuite/ChangeLog * gcc.target/i386/insvti_highpart-1.c: New test case.
2023-05-08Fix cfg maintenance after inlining in AutoFDOEugene Rozenfeld1-9/+12
Todo from early_inliner needs to be propagated so that cleanup_tree_cfg () is called if necessary. This bug was causing an assert in get_loop_body during ipa-sra in autoprofiledbootstrap build since loops weren't fixed up and one of the loops had num_nodes set to 0. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * auto-profile.cc (auto_profile): Check todo from early_inline to see if cleanup_tree_vfg needs to be called. (early_inline): Return todo from early_inliner.
2023-05-08Fix pr81192.c for int16 targetsAndrew Pinski1-4/+4
I had missed when converting this testcase to Gimple that there was a define for int/unsigned type specifically to get an INT32 type. This means when using a literal integer constant you need to use the `_Literal (type)` to form the types correctly on the constants. This fixes the issue and has been both tested on xstormy16-elf and x86_64-linux-gnu. Committed as obvious. gcc/testsuite/ChangeLog: PR testsuite/109776 * gcc.dg/pr81192.c: Fix integer constants for int16 targets.
2023-05-08RISC-V: Factor out vector manager code in vsetvli insertion pass. [NFC]Kito Cheng1-20/+67
gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pass_vsetvl::get_vector_info): New. (pass_vsetvl::get_block_info): New. (pass_vsetvl::update_vector_info): New. (pass_vsetvl::simple_vsetvl): Use get_vector_info. (pass_vsetvl::compute_local_backward_infos): Ditto. (pass_vsetvl::transfer_before): Ditto. (pass_vsetvl::transfer_after): Ditto. (pass_vsetvl::emit_local_forward_vsetvls): Ditto. (pass_vsetvl::local_eliminate_vsetvl_insn): Ditto. (pass_vsetvl::cleanup_insns): Ditto. (pass_vsetvl::compute_local_backward_infos): Use update_vector_info.
2023-05-08RISC-V: Improve portability of testcasesKito Cheng3-2/+13
stdint.h will require having corresponding multi-lib existing, so using stdint-gcc.h instead, also added a riscv_vector.h wrapper to gcc.target/riscv/rvv/autovec/. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: Change stdint.h to stdint-gcc.h. * gcc.target/riscv/rvv/autovec/template-1.h: Ditto. * gcc.target/riscv/rvv/autovec/riscv_vector.h: New.
2023-05-08Fix minor length computation on stormy16Jeff Law1-1/+2
Today's build of xstormy16-elf failed due to a branch to an out of range target. Manual inspection of the assembly code for the affected function (divdi3) showed that the zero-extension patterns were claiming a length of 2, but clearly assembled into 4 bytes. This patch adds an explicit length to the zero extension pattern and appears to resolve the issue in my test builds. gcc/ * config/stormy16/stormy16.md (zero_extendhisi2): Fix length.
2023-05-08Let each 'lto_init' determine the default 'LTO_OPTIONS', and 'torture-init' ↵Thomas Schwinge7-34/+44
the 'LTO_TORTURE_OPTIONS' Otherwise, for example for 'RUNTESTFLAGS' of '--target_board=unix\{-m64,-m32\}' vs. '--target_board=unix\{-m32,-m64\}', both variants exercise testing with always the first flag variant's 'LTO_OPTIONS'/'LTO_TORTURE_OPTIONS', which results in unequal test results between the two 'RUNTESTFLAGS' variants if one of the flag variants has 'check_linker_plugin_available' but the other doesn't. Fix-up for r180245 (commit c1a7cdbbcca90ad5260bfc543f8c10f3514e76c1) "Update testsuite to run with slim LTO". gcc/testsuite/ * g++.dg/guality/guality.exp: Move 'torture-init' earlier. * gcc.dg/guality/guality.exp: Likewise. * gfortran.dg/guality/guality.exp: Likewise. * lib/c-torture.exp (LTO_TORTURE_OPTIONS): Don't set. * lib/gcc-dg.exp (LTO_TORTURE_OPTIONS): Don't set. * lib/lto.exp (lto_init, lto_finish): Let each 'lto_init' determine the default 'LTO_OPTIONS'. * lib/torture-options.exp (torture-init, torture-finish): Let each 'torture-init' determine the 'LTO_TORTURE_OPTIONS'.
2023-05-08c++: list CTAD and resolve_nondeduced_context [PR106214]Patrick Palka2-14/+41
This extends the PR93107 fix, which made us do resolve_nondeduced_context on the elements of an initializer list during auto deduction, to happen for CTAD as well. PR c++/106214 PR c++/93107 gcc/cp/ChangeLog: * pt.cc (do_auto_deduction): Move up resolve_nondeduced_context calls to happen before do_class_deduction. Add some error_mark_node tests. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/class-deduction114.C: New test.
2023-05-08Bump up precision size to 16 bits.Michael Meissner1-12/+12
The new __dmr type that is being added as a possible future PowerPC instruction set bumps into a structure field size issue. The size of the __dmr type is 1024 bits. The precision field in tree_type_common is currently 10 bits, so if you store 1,024 into field, you get a 0 back. When you get 0 in the precision field, the ccp pass passes this 0 to sext_hwi in hwint.h. That function in turn generates a shift that is equal to the host wide int bit size, which is undefined as machine dependent for shifting in C/C++. int shift = HOST_BITS_PER_WIDE_INT - prec; return ((HOST_WIDE_INT) ((unsigned HOST_WIDE_INT) src << shift)) >> shift; It turns out the x86_64 where I first did my tests returns the original input before the two shifts, while the PowerPC always returns 0. In the ccp pass, the original input is -1, and so it worked. When I did the runs on the PowerPC, the result was 0, which ultimately led to the failure. 2023-02-01 Richard Biener <rguenther@suse.de> Michael Meissner <meissner@linux.ibm.com> PR middle-end/108623 * tree-core.h (tree_type_common): Bump up precision field to 16 bits. Align bit fields > 1 bit to at least an 8-bit boundary.
2023-05-08fortran: Fix coding style around free()Bernhard Reutner-Fischer1-1/+1
Fix coding-style errors introduced in ca2f64d5d08c1699ca4b7cb2bf6a76692e809e0f gcc/fortran/ChangeLog: * resolve.cc (resolve_select_type): Fix coding style. libgfortran/ChangeLog: * caf/single.c (_gfortran_caf_register): Fix coding style. * io/async.c (update_pdt, async_io): Likewise. * io/format.c (free_format_data): Likewise. * io/transfer.c (st_read_done_worker, st_write_done_worker): Likewise. * io/unix.c (mem_close): Likewise.
2023-05-08PHIOPT: factor out unary operations instead of just conversionsAndrew Pinski6-28/+71
After using factor_out_conditional_conversion with diamond bb, we should be able do use it also for all normal unary gimple and not just conversions. This allows to optimize PR 59424 for an example. This is also a start to optimize PR 64700 and a few others. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. An example of this is: ``` static inline unsigned long long g(int t) { unsigned t1 = t; return t1; } static int abs1(int a) { if (a < 0) a = -a; return a; } unsigned long long f(int c, int d, int e) { unsigned long long t; if (d > e) t = g(abs1(d)); else t = g(abs1(e)); return t; } ``` Which should be optimized to: _9 = MAX_EXPR <d_5(D), e_6(D)>; _4 = ABS_EXPR <_9>; t_3 = (long long unsigned intD.16) _4; gcc/ChangeLog: * tree-ssa-phiopt.cc (factor_out_conditional_conversion): Rename to ... (factor_out_conditional_operation): This and add support for all unary operations. (pass_phiopt::execute): Update call to factor_out_conditional_conversion to call factor_out_conditional_operation instead. PR tree-optimization/109424 PR tree-optimization/59424 gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/abs-2.c: Update tree scan for details change in wording. * gcc.dg/tree-ssa/minmax-17.c: Likewise. * gcc.dg/tree-ssa/pr103771.c: Likewise. * gcc.dg/tree-ssa/minmax-18.c: New test. * gcc.dg/tree-ssa/minmax-19.c: New test.
2023-05-08PHIOPT: Loop over calling factor_out_conditional_conversionAndrew Pinski2-12/+36
After adding diamond shaped bb support to factor_out_conditional_conversion, we can get a case where we have two conversions that needs factored out and then would have another phiopt happen. An example is: ``` static inline unsigned long long g(int t) { unsigned t1 = t; return t1; } unsigned long long f(int c, int d, int e) { unsigned long long t; if (c > d) t = g(c); else t = g(d); return t; } ``` In this case we should get a MAX_EXPR in phiopt1 with two casts. Before this patch, we would just factor out the outer cast and then wait till phiopt2 to factor out the inner cast. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (pass_phiopt::execute): Loop over factor_out_conditional_conversion. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/minmax-17.c: New test.
2023-05-08PHIOPT: Add diamond bb form to factor_out_conditional_conversionAndrew Pinski4-4/+42
So the function factor_out_conditional_conversion already supports diamond shaped bb forms, just need to be called for such a thing. harden-cond-comp.c needed to be changed as we would optimize out the conversion now and that causes the compare hardening not needing to split the block which it was testing. So change it such that there would be no chance of optimization. Also add two testcases that showed the improvement. PR 103771 is solved in ifconvert also for the vectorizer but now it is solved in a general sense. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/49959 PR tree-optimization/103771 gcc/ChangeLog: * tree-ssa-phiopt.cc (pass_phiopt::execute): Support Diamond shapped bb form for factor_out_conditional_conversion. gcc/testsuite/ChangeLog: * c-c++-common/torture/harden-cond-comp.c: Change testcase slightly to avoid the new phiopt optimization. * gcc.dg/tree-ssa/abs-2.c: New test. * gcc.dg/tree-ssa/pr103771.c: New test.
2023-05-08RISC-V: Fix ugly && incorrect codes of RVV auto-vectorizationJuzhe-Zhong7-47/+19
1. Add movmisalign pattern for TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT targethook, current RISC-V has supported this target hook, we can't make it supported without movmisalign pattern. 2. Remove global extern of get_mask_policy_no_pred && get_tail_policy_no_pred. These 2 functions are comming from intrinsic builtin frameworks. We are sure we don't need them in auto-vectorization implementation. 3. Refine mask mode implementation. 4. We should not have "riscv_vector_" in riscv_vector namspace since it makes the codes inconsistent and ugly. For example: Before this patch: static opt_machine_mode riscv_get_mask_mode (machine_mode mode) { machine_mode mask_mode = VOIDmode; if (TARGET_VECTOR && riscv_vector::riscv_vector_get_mask_mode (mode).exists (&mask_mode)) return mask_mode; .. After this patch: riscv_get_mask_mode (machine_mode mode) { machine_mode mask_mode = VOIDmode; if (TARGET_VECTOR && riscv_vector::get_mask_mode (mode).exists (&mask_mode)) return mask_mode; .. 5. Fix fail testcase fixed-vlmax-1.c. gcc/ChangeLog: * config/riscv/autovec.md (movmisalign<mode>): New pattern. * config/riscv/riscv-protos.h (riscv_vector_mask_mode_p): Delete. (riscv_vector_get_mask_mode): Ditto. (get_mask_policy_no_pred): Ditto. (get_tail_policy_no_pred): Ditto. (get_mask_mode): New function. * config/riscv/riscv-v.cc (get_mask_policy_no_pred): Delete. (get_tail_policy_no_pred): Ditto. (riscv_vector_mask_mode_p): Ditto. (riscv_vector_get_mask_mode): Ditto. (get_mask_mode): New function. * config/riscv/riscv-vector-builtins.cc (use_real_merge_p): Remove global extern. (get_tail_policy_for_pred): Ditto. * config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred): Ditto. (get_mask_policy_for_pred): Ditto * config/riscv/riscv.cc (riscv_get_mask_mode): Refine codes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: Fix typo.
2023-05-08RISC-V: Handle multi-lib path correclty for linuxKito Cheng4-41/+100
RISC-V Linux encodes the ABI into the path, so in theory, we can only use that to select multi-lib paths, and no way to use different multi-lib paths between `rv32i/ilp32` and `rv32ima/ilp32`, we'll mapping both to `/lib/ilp32`. It's hard to do that with GCC's builtin multi-lib selection mechanism; builtin mechanism did the option string compare and then enumerate all possible reuse rules during the build time. However, it's impossible to RISC-V; we have a huge number of combinations of `-march`, so implementing a customized multi-lib selection becomes the only solution. Multi-lib configuration is only used for determines which ISA should be used when compiling the corresponding ABI variant after this patch. During the multi-lib selection stage, only consider -mabi as the only key to select the multi-lib path. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_select_multilib_by_abi): New. (riscv_select_multilib): New. (riscv_compute_multilib): Extract logic to riscv_select_multilib and also handle select_by_abi. * config/riscv/elf.h (RISCV_USE_CUSTOMISED_MULTI_LIB): Change it to select_by_abi_arch_cmodel from 1. * config/riscv/linux.h (RISCV_USE_CUSTOMISED_MULTI_LIB): Define. * config/riscv/riscv-opts.h (enum riscv_multilib_select_kind): New.
2023-05-08Makefile.in: clean up match.pd-related dependenciesAlexander Monakov1-6/+3
Clean up confusing changes from the recent refactoring for parallel match.pd build. gimple-match-head.o is not built. Remove related flags adjustment. Autogenerated gimple-match-N.o files do not depend on gimple-match-exports.cc. {gimple,generic)-match-auto.h only depend on the prerequisites of the corresponding s-{gimple,generic}-match stamp file, not any .cc file. gcc/ChangeLog: * Makefile.in: (gimple-match-head.o-warn): Remove. (GIMPLE_MATCH_PD_SEQ_SRC): Do not depend on gimple-match-exports.cc. (gimple-match-auto.h): Only depend on s-gimple-match. (generic-match-auto.h): Likewise.
2023-05-07Move substitute_and_fold over to use simple_dce_from_worklistAndrew Pinski10-51/+82
While looking into a different issue, I noticed that it would take until the second forwprop pass to do some forward proping and it was because the ssa name was used more than once but the second statement was "dead" and we don't remove that until much later. So this uses simple_dce_from_worklist instead of manually removing of the known unused statements instead. Propagate engine does not do a cleanupcfg afterwards either but manually cleans up possible EH edges so simple_dce_from_worklist needs to communicate that back to the propagate engine. Some testcases needed to be updated/changed even because of better optimization. gcc.dg/pr81192.c even had to be changed to be using the gimple FE so it would be less fragile in the future too. gcc.dg/tree-ssa/pr98737-1.c was failing because __atomic_fetch_ was being matched but in those cases, the result was not being used so both __atomic_fetch_ and __atomic_x_and_fetch_ are valid choices and would not make a code generation difference. evrp7.c, evrp8.c, vrp35.c, vrp36.c: just needed a slightly change as the removal message is different slightly. kernels-alias-8.c: ccp1 is able to remove an unused load which causes ealias to have one less load to analysis so update the expected scan #. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/109691 * tree-ssa-dce.cc (simple_dce_from_worklist): Add need_eh_cleanup argument. If the removed statement can throw, have need_eh_cleanup include the bb of that statement. * tree-ssa-dce.h (simple_dce_from_worklist): Update declaration. * tree-ssa-propagate.cc (struct prop_stats_d): Remove num_dce. (substitute_and_fold_dom_walker::substitute_and_fold_dom_walker): Initialize dceworklist instead of stmts_to_remove. (substitute_and_fold_dom_walker::~substitute_and_fold_dom_walker): Destore dceworklist instead of stmts_to_remove. (substitute_and_fold_dom_walker::before_dom_children): Set dceworklist instead of adding to stmts_to_remove. (substitute_and_fold_engine::substitute_and_fold): Call simple_dce_from_worklist instead of poping from the list. Don't update the stat on removal statements. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/evrp7.c: Update for output change. * gcc.dg/tree-ssa/evrp8.c: Likewise. * gcc.dg/tree-ssa/vrp35.c: Likewise. * gcc.dg/tree-ssa/vrp36.c: Likewise. * gcc.dg/tree-ssa/pr98737-1.c: Update scan-tree-dump-not to check for assignment too instead of just a call. * c-c++-common/goacc/kernels-alias-8.c: Update test for removal of load. * gcc.dg/pr81192.c: Rewrite testcase in gimple based test.
2023-05-08fortran: Remove conditionals around free()Bernhard Reutner-Fischer1-2/+1
gcc/fortran/ChangeLog: * resolve.cc (resolve_select_type): Call free() unconditionally. libgfortran/ChangeLog: * caf/single.c (_gfortran_caf_register): Call free() unconditionally. * io/async.c (update_pdt, async_io): Likewise. * io/format.c (free_format_data): Likewise. * io/transfer.c (st_read_done_worker, st_write_done_worker): Likewise. * io/unix.c (mem_close): Likewise.
2023-05-08Fortran: Fix mpz and mpfr memory leaks [PR fortran/68800]Bernhard Reutner-Fischer2-4/+9
gcc/fortran/ChangeLog: PR fortran/68800 * expr.cc (find_array_section): Fix mpz memory leak. * simplify.cc (gfc_simplify_reshape): Fix mpz memory leaks in error paths.
2023-05-07Fortran: Reject semicolon after namelist name.Jerry DeLisle1-0/+15
PR fortran/109662 libgfortran/ChangeLog: * io/list_read.c: Add check for a semicolon after a namelist name in read input. Issue a runtime error message. gcc/testsuite/ChangeLog: * gfortran.dg/pr109662-a.f90: New test.
2023-05-08Daily bump.GCC Administrator4-1/+146
2023-05-07c++: fix pretty printing of 'alignof' vs '__alignof__' [PR85979]Patrick Palka3-5/+30
PR c++/85979 gcc/cp/ChangeLog: * cxx-pretty-print.cc (cxx_pretty_printer::unary_expression) <case ALIGNOF_EXPR>: Consider ALIGNOF_EXPR_STD_P. * error.cc (dump_expr) <case ALIGNOF_EXPR>: Likewise. gcc/testsuite/ChangeLog: * g++.dg/diagnostic/alignof4.C: New test.
2023-05-07c++: goto entering scope of obj w/ non-trivial dtor [PR103091]Patrick Palka3-43/+42
It seems ever since DR 2256 goto is permitted to cross the initialization of a trivially initialized object with a non-trivial destructor. We already supported this as an -fpermissive extension, so this patch just makes us unconditionally support this. DR 2256 PR c++/103091 gcc/cp/ChangeLog: * decl.cc (decl_jump_unsafe): Return bool instead of int. Don't consider TYPE_HAS_NONTRIVIAL_DESTRUCTOR. (check_previous_goto_1): Simplify now that decl_jump_unsafe returns bool instead of int. (check_goto): Likewise. gcc/testsuite/ChangeLog: * g++.old-deja/g++.other/init9.C: Don't expect diagnostics for goto made valid by DR 2256. * g++.dg/init/goto4.C: New test.
2023-05-07c++: satisfaction of non-dep member alias template-idPatrick Palka2-3/+18
constraints_satisfied_p already carefully checks dependence of template arguments before proceeding with satisfaction, so the dependence check in instantiate_alias_template is unnecessary and overly conservative. Getting rid of it allows us to check satisfaction ahead of time in more cases as in the below testcase. gcc/cp/ChangeLog: * pt.cc (instantiate_alias_template): Exit early upon error from coerce_template_parms. Remove dependence test guarding constraints_satisfied_p. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-alias6.C: New test.
2023-05-07c++: various code cleanupsPatrick Palka4-25/+20
* Harden some tree accessor macros and fix a couple of bad PLACEHOLDER_TYPE_CONSTRAINTS accesses uncovered by this. * Use strip_innermost_template_args in outer_template_args. * Add !processing_template_decl early exit tests to some dependence predicates. gcc/cp/ChangeLog: * cp-tree.h (PLACEHOLDER_TYPE_CONSTRAINTS_INFO): Harden via TEMPLATE_TYPE_PARM_CHECK. (TPARMS_PRIMARY_TEMPLATE): Harden via TREE_VEC_CHECK. (TEMPLATE_TEMPLATE_PARM_TEMPLATE_DECL): Harden via TEMPLATE_TEMPLATE_PARM_CHECK. * cxx-pretty-print.cc (cxx_pretty_printer::simple_type_specifier): Guard PLACEHOLDER_TYPE_CONSTRAINTS access. * error.cc (dump_type) <case TEMPLATE_TYPE_PARM>: Use separate variable to store CLASS_PLACEHOLDER_TEMPLATE result. * pt.cc (outer_template_args): Use strip_innermost_template_args. (any_type_dependent_arguments_p): Exit early if !processing_template_decl. Use range-based for. (any_dependent_template_arguments_p): Likewise.
2023-05-07c++: parenthesized -> resolving to static data member [PR98283]Patrick Palka3-3/+17
Here we're neglecting to propagate parenthesized-ness when the member access (this->m) resolves to a static data member (and thus finish_class_member_access_expr yields a VAR_DECL instead of a COMPONENT_REF). PR c++/98283 gcc/cp/ChangeLog: * pt.cc (tsubst_copy_and_build) <case COMPONENT_REF>: Propagate REF_PARENTHESIZED_P more generally via force_paren_expr. * semantics.cc (force_paren_expr): Document default argument. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/paren6.C: New test.
2023-05-07c++: bound ttp in lambda function type [PR109651]Patrick Palka3-8/+50
After r14-11-g2245459c85a3f4 we now coerce the template arguments of a bound ttp again after level-lowering it. Notably a level-lowered ttp doesn't have DECL_CONTEXT set, so during this coercion we fall back to using current_template_parms to obtain the relevant set of in-scope parameters. But it turns out current_template_parms isn't properly set when substituting the function type of a generic lambda, and so if the type contains bound ttps that need to be lowered we'll crash during their attempted coercion. Specifically in the first testcase below, current_template_parms during the lambda type substitution (with T=int) is "1 U" instead of the expected "2 TT, 1 U", and we crash when level lowering TT<int>. Ultimately the problem is that tsubst_lambda_expr does things in the wrong order: we ought to substitute (and install) the in-scope template parameters _before_ substituting anything that may use those template parameters (such as the function type of a generic lambda). This patch corrects this substitution order. PR c++/109651 gcc/cp/ChangeLog: * pt.cc (coerce_template_args_for_ttp): Mention we can hit the current_template_parms fallback when level-lowering a bound ttp. (tsubst_template_decl): Add lambda_tparms parameter. Prefer to use lambda_tparms instead of substituting DECL_TEMPLATE_PARMS. (tsubst_decl) <case TEMPLATE_DECL>: Pass NULL_TREE as lambda_tparms to tsubst_template_decl. (tsubst_lambda_expr): For a generic lambda, substitute DECL_TEMPLATE_PARMS and set current_template_parms to it before substituting the function type. Pass the substituted DECL_TEMPLATE_PARMS as lambda_tparms to tsubst_template_decl. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/lambda-generic-ttp1.C: New test. * g++.dg/cpp2a/lambda-generic-ttp2.C: New test.
2023-05-07Fix aarch64/109762: push_options/push_options does not work sometimesAndrew Pinski2-3/+3
aarch64_isa_flags (and aarch64_asm_isa_flags) are both aarch64_feature_flags (uint64_t) but since r12-8000-g14814e20161d, they are saved/restored as unsigned long. This does not make a difference for LP64 targets but on ILP32 and LLP64IL32 targets, it means it does not get restored correctly. This patch changes over to use aarch64_feature_flags instead of unsigned long. Committed as obvious after a bootstrap/test. gcc/ChangeLog: PR target/109762 * config/aarch64/aarch64-builtins.cc (aarch64_simd_switcher::aarch64_simd_switcher): Change argument type to aarch64_feature_flags. * config/aarch64/aarch64-protos.h (aarch64_simd_switcher): Change constructor argument type to aarch64_feature_flags. Change m_old_asm_isa_flags to be aarch64_feature_flags.
2023-05-07c++: non-dep init folding and access checking [PR109480]Patrick Palka2-1/+18
enforce_access currently checks processing_template_decl to decide whether to defer the given access check until instantiation time. But using this flag is unreliable because it gets cleared during e.g. non-dependent initializer folding, and so can lead to premature access check failures as in the below testcase. It seems better to check current_template_parms instead. PR c++/109480 gcc/cp/ChangeLog: * semantics.cc (enforce_access): Check current_template_parms instead of processing_template_decl when deciding whether to defer the access check. gcc/testsuite/ChangeLog: * g++.dg/template/non-dependent25a.C: New test.
2023-05-07c++: potentiality of templated memfn call [PR109480]Patrick Palka3-27/+21
Here we're incorrectly deeming the templated call a.g() inside b's initializer as potentially constant, despite g being non-constexpr, which leads to us needlessly instantiating the initializer ahead of time and which subsequently triggers a bug in access checking deferral (to be fixed by the follow-up patch). This patch fixes this by calling get_fns earlier during CALL_EXPR potentiality checking so that when we extract a FUNCTION_DECL out of a templated member function call (whose overall callee is typically a COMPONENT_REF) we do the usual constexpr-eligibility checking for it. In passing, I noticed the nearby special handling of the object argument of a non-static member function call is effectively the same as the generic argument handling a few lines below. So this patch just gets rid of this special handling; otherwise we'd have to adapt it to handle templated versions of such calls. PR c++/109480 gcc/cp/ChangeLog: * constexpr.cc (potential_constant_expression_1) <case CALL_EXPR>: Reorganize to call get_fns sooner. Remove special handling of the object argument of a non-static member function call. Remove dead store to 'fun'. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept59.C: Make e() constexpr so that the expected "without object" diagnostic isn't replaced by a "call to non-constexpr function" diagnostic. * g++.dg/template/non-dependent25.C: New test.
2023-05-07rs6000: Load high and low part of 64bit constant independentlyJiufu Guo2-12/+54
Compare with previous version, this patch updates the comments only. https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608293.html For a complicate 64bit constant, below is one instruction-sequence to build: lis 9,0x800a ori 9,9,0xabcd sldi 9,9,32 oris 9,9,0xc167 ori 9,9,0xfa16 while we can also use below sequence to build: lis 9,0xc167 lis 10,0x800a ori 9,9,0xfa16 ori 10,10,0xabcd rldimi 9,10,32,0 This sequence is using 2 registers to build high and low part firstly, and then merge them. In parallel aspect, this sequence would be faster. (Ofcause, using 1 more register with potential register pressure). The instruction sequence with two registers for parallel version can be generated only if can_create_pseudo_p. Otherwise, the one register sequence is generated. gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Generate more parallel code if can_create_pseudo_p. gcc/testsuite/ChangeLog: * gcc.target/powerpc/parall_5insn_const.c: New test.
2023-05-07Don't call emit_clobber in lower-subreg.cc's resolve_simple_move.Roger Sayle2-3/+11
Following up on posts/reviews by Segher and Uros, there's some question over why the middle-end's lower subreg pass emits a clobber (of a multi-word register) into the instruction stream before emitting the sequence of moves of the word-sized parts. This clobber interferes with (LRA) register allocation, preventing the multi-word pseudo to remain in the same hard registers. This patch eliminates this (presumably superfluous) clobber and thereby improves register allocation. A concrete example of the observed improvement is PR target/43644. For the test case: __int128 foo(__int128 x, __int128 y) { return x+y; } on x86_64-pc-linux-gnu, gcc -O2 currently generates: foo: movq %rsi, %rax movq %rdi, %r8 movq %rax, %rdi movq %rdx, %rax movq %rcx, %rdx addq %r8, %rax adcq %rdi, %rdx ret with this patch, we now generate the much improved: foo: movq %rdx, %rax movq %rcx, %rdx addq %rdi, %rax adcq %rsi, %rdx ret 2023-05-07 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/43644 * lower-subreg.cc (resolve_simple_move): Don't emit a clobber immediately before moving a multi-word register by parts. gcc/testsuite/ChangeLog PR target/43644 * gcc.target/i386/pr43644.c: New test case.
2023-05-07Daily bump.GCC Administrator3-1/+209
2023-05-06Delete duplicated riscv definition.Jeff Law1-38/+0
gcc/ * config/riscv/riscv-v.cc (riscv_vector_preferred_simd_mode): Delete.
2023-05-06RISC-V: autovec: Verify that GET_MODE_NUNITS is a multiple of 2.Michael Collison1-2/+5
While working on autovectorizing for the RISCV port I encountered an issue where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode), where GET_MODE_NUNITS is equal to one. Tested on RISCV and x86_64-linux-gnu. Okay? gcc/ * tree-vect-slp.cc (can_duplicate_and_interleave_p): Check that GET_MODE_NUNITS is a multiple of 2.
2023-05-06RISC-V:autovec: Add target vectorization hooksMichael Collison1-0/+116
gcc/ * config/riscv/riscv.cc (riscv_estimated_poly_value): Implement TARGET_ESTIMATED_POLY_VALUE. (riscv_preferred_simd_mode): Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE. (riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE. (riscv_empty_mask_is_expensive): Implement TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE. (riscv_vectorize_create_costs): Implement TARGET_VECTORIZE_CREATE_COSTS. (riscv_support_vector_misalignment): Implement TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT. (TARGET_ESTIMATED_POLY_VALUE): Register target macro. (TARGET_VECTORIZE_GET_MASK_MODE): Ditto. (TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto. (TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): Ditto.
2023-05-06Remove duplicated definition in risc-v vector support.Jeff Law1-11/+0
gcc/ * config/riscv/riscv-v.cc (autovec_use_vlmax_p): Remove duplicate definition.
2023-05-06RISC-V:autovec: Add auto-vectorization support functionsMichael Collison1-8/+93
* config/riscv/riscv-v.cc (autovec_use_vlmax_p): New function. (riscv_vector_preferred_simd_mode): Ditto. (get_mask_policy_no_pred): Ditto. (get_tail_policy_no_pred): Ditto. (riscv_vector_mask_mode_p): Ditto. (riscv_vector_get_mask_mode): Ditto.
2023-05-06RISC-V: autovec: Export policy functions to global scopeMichael Collison2-2/+5
gcc/ * config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred): Remove static declaration to to make externally visible. (get_mask_policy_for_pred): Ditto. * config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred): New external declaration. (get_mask_policy_for_pred): Ditto.
2023-05-06RISC-V: autovec: Add new predicates and function prototypesMichael Collison1-0/+4
gcc/ * config/riscv/riscv-protos.h (riscv_vector_mask_mode_p): New. (riscv_vector_get_mask_mode): Ditto. (get_mask_policy_no_pred): Ditto. (get_tail_policy_no_pred): Ditto.
2023-05-07LoongArch: Enable shrink wrappingXi Ruoyao3-3/+197
This commit implements the target macros for shrink wrapping of function prologues/epilogues shrink wrapping on LoongArch. Bootstrapped and regtested on loongarch64-linux-gnu. I don't have an access to SPEC CPU so I hope the reviewer can perform a benchmark to see if there is real benefit. gcc/ChangeLog: * config/loongarch/loongarch.h (struct machine_function): Add reg_is_wrapped_separately array for register wrapping information. * config/loongarch/loongarch.cc (loongarch_get_separate_components): New function. (loongarch_components_for_bb): Likewise. (loongarch_disqualify_components): Likewise. (loongarch_process_components): Likewise. (loongarch_emit_prologue_components): Likewise. (loongarch_emit_epilogue_components): Likewise. (loongarch_set_handled_components): Likewise. (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define. (TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise. (TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise. (TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise. (TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise. (TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise. (loongarch_for_each_saved_reg): Skip registers that are wrapped separately. gcc/testsuite/ChangeLog: * gcc.target/loongarch/shrink-wrap.c: New test.
2023-05-07build: Use -nostdinc generating macro_list [PR109522]Xi Ruoyao1-1/+1
This prevents a spurious message building a cross-compiler when target libc is not installed yet: cc1: error: no include path in which to search for stdc-predef.h As stdc-predef.h was added to define __STDC_* macros by libc, it's unlikely the header will ever contain some bad definitions w/o "__" prefix so it should be safe. gcc/ChangeLog: PR other/109522 * Makefile.in (s-macro_list): Pass -nostdinc to $(GCC_FOR_TARGET).
2023-05-06RISC-V: Enable basic RVV auto-vectorization support.Juzhe-Zhong39-2/+559
gcc/ChangeLog: * config/riscv/riscv-protos.h (preferred_simd_mode): New function. * config/riscv/riscv-v.cc (autovec_use_vlmax_p): Ditto. (preferred_simd_mode): Ditto. * config/riscv/riscv.cc (riscv_get_arg_info): Handle RVV type in function arg. (riscv_convert_vector_bits): Adjust for RVV auto-vectorization. (riscv_preferred_simd_mode): New function. (TARGET_VECTORIZE_PREFERRED_SIMD_MODE): New target hook support. * config/riscv/vector.md: Add autovec.md. * config/riscv/autovec.md: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add testcases for RVV auto-vectorization. * gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: New test. * gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c: New test. * gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: New test. * gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c: New test. * gcc.target/riscv/rvv/autovec/scalable-1.c: New test. * gcc.target/riscv/rvv/autovec/template-1.h: New test. * gcc.target/riscv/rvv/autovec/v-1.c: New test. * gcc.target/riscv/rvv/autovec/v-2.c: New test. * gcc.target/riscv/rvv/autovec/zve32f-1.c: New test. * gcc.target/riscv/rvv/autovec/zve32f-2.c: New test. * gcc.target/riscv/rvv/autovec/zve32f-3.c: New test. * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: New test. * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: New test. * gcc.target/riscv/rvv/autovec/zve32x-1.c: New test. * gcc.target/riscv/rvv/autovec/zve32x-2.c: New test. * gcc.target/riscv/rvv/autovec/zve32x-3.c: New test. * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: New test. * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: New test. * gcc.target/riscv/rvv/autovec/zve64d-1.c: New test. * gcc.target/riscv/rvv/autovec/zve64d-2.c: New test. * gcc.target/riscv/rvv/autovec/zve64d-3.c: New test. * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: New test. * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: New test. * gcc.target/riscv/rvv/autovec/zve64f-1.c: New test. * gcc.target/riscv/rvv/autovec/zve64f-2.c: New test. * gcc.target/riscv/rvv/autovec/zve64f-3.c: New test. * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: New test. * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: New test. * gcc.target/riscv/rvv/autovec/zve64x-1.c: New test. * gcc.target/riscv/rvv/autovec/zve64x-2.c: New test. * gcc.target/riscv/rvv/autovec/zve64x-3.c: New test. * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: New test. * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: New test.
2023-05-06Fortran: Namelist read with invalid input accepted.Jerry DeLisle1-0/+15
PR fortran/109662 libgfortran/ChangeLog: * io/list_read.c: Add a check for a comma after a namelist name in read input. Issue a runtime error message. gcc/testsuite/ChangeLog: * gfortran.dg/pr109662.f90: New test.
2023-05-06gimple-range-op: Improve handling of sin/cos rangesJakub Jelinek4-17/+178
Similarly to the earlier sqrt patch, this patch attempts to improve sin/cos ranges. As the functions are periodic, for the reverse range there is not much we can do (but I've discovered I forgot to take into account the boundary ulps for the discovery of impossible result ranges). For fold_range, we can do something only if the range is narrow enough (narrower than 2*pi). The patch computes the value of the functions (taking ulps into account) and also computes the derivative to find out if the function is growing or declining on the boundaries and from that it figures out if the result range should be [min (fn (lb), fn (ub)), max (fn (lb), fn (ub))] or if it needs to be extended to 1 (actually using +Inf) and/or -1 (actually using -Inf) because there must be a local minimum and/or maximum in the range. 2023-05-06 Jakub Jelinek <jakub@redhat.com> * real.h (dconst_pi): Define. (dconst_e_ptr): Formatting fix. (dconst_pi_ptr): Declare. * real.cc (dconst_pi_ptr): New function. * gimple-range-op.cc (cfn_sincos::fold_range): Intersect the generic boundaries range with range computed from sin/cos of the particular bounds if the argument range is shorter than 2*pi. (cfn_sincos::op1_range): Take bulps into account when determining which result ranges are always invalid or behave like known NAN. * gcc.dg/tree-ssa/range-sincos-2.c: New test.
2023-05-06Remove type from vrange_storage::equal_p.Aldy Hernandez3-21/+15
The equal_p method in vrange_storage is only used to compare ranges that are the same type. No sense passing the type if it can be determined from the range being compared. gcc/ChangeLog: * gimple-range-cache.cc (sbr_sparse_bitmap::set_bb_range): Do not pass type to vrange_storage::equal_p. * value-range-storage.cc (vrange_storage::equal_p): Remove type. (irange_storage::equal_p): Same. (frange_storage::equal_p): Same. * value-range-storage.h (class frange_storage): Same.
2023-05-06RISC-V: Fix incorrect demand info merge in local vsetvli optimization [PR109748]Juzhe-Zhong2-45/+93
This patch is fixing my recent optimization patch: https://github.com/gcc-mirror/gcc/commit/d51f2456ee51bd59a79b4725ca0e488c25260bbf In that patch, the new_info = parse_insn (i) is not correct. Since consider the following case: vsetvli a5,a4, e8,m1 .. vsetvli zero,a5, e32, m4 vle8.v vmacc.vv ... Since we have backward demand fusion in Phase 1, so the real demand of "vle8.v" is e32, m4. However, if we use parse_insn (vle8.v) = e8, m1 which is not correct. So this patch we change new_info = new_info.parse_insn (i) into: vector_insn_info new_info = m_vector_manager->vector_insn_infos[i->uid ()]; So that, we can correctly optimize codes into: vsetvli a5,a4, e32, m4 .. .. (vsetvli zero,a5, e32, m4 is removed) vle8.v vmacc.vv Since m_vector_manager->vector_insn_infos is the member variable of pass_vsetvl class. We remove static void function "local_eliminate_vsetvl_insn", and make it as the member function of pass_vsetvl class. PR target/109748 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): Remove it. (pass_vsetvl::local_eliminate_vsetvl_insn): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr109748.c: New test.
2023-05-06Canonicalize vec_merge when mask is constant.liuhongt2-0/+29
Use swap_communattive_operands_p for canonicalization. When both value has same operand precedence value, then first bit in the mask should select first operand. The canonicalization should help backends for pattern match. .i.e. x86 backend has lots of vec_merge patterns, combine will create any form of vec_merge(mask, or inverted mask), then backend need to add 2 patterns to match exact 1 instruction. The canonicalization can simplify 2 patterns to 1. gcc/ChangeLog: * combine.cc (maybe_swap_commutative_operands): Canonicalize vec_merge when mask is constant. * doc/md.texi: Document vec_merge canonicalization.
2023-05-06gimple-range-op: Improve handling of sqrt rangesJakub Jelinek4-16/+182
The previous patch just added basic intrinsic ranges for sqrt ([-0.0, +Inf] +-NAN being the general result range of the function and [-0.0, +Inf] the general operand range if result isn't NAN etc.), the following patch intersects those ranges with particular range computed from argument or result's exact range with the expected error in ulps taken into account and adds a function (frange_arithmetic variant) which can be used by other functions as well as helper. 2023-05-06 Jakub Jelinek <jakub@redhat.com> * value-range.h (frange_arithmetic): Declare. * range-op-float.cc (frange_arithmetic): No longer static. * gimple-range-op.cc (frange_mpfr_arg1): New function. (cfn_sqrt::fold_range): Intersect the generic boundaries range with range computed from sqrt of the particular bounds. (cfn_sqrt::op1_range): Intersect the generic boundaries range with range computed from squared particular bounds. * gcc.dg/tree-ssa/range-sqrt-2.c: New test.
2023-05-06build: Replace seq for portability with GNU Make variantJakub Jelinek1-10/+12
Some hosts like AIX don't have seq command, this patch replaces it with something that uses just GNU make features we've been using for this already before for the parallel make check. 2023-05-06 Jakub Jelinek <jakub@redhat.com> * Makefile.in (check_p_numbers): Rename to one_to_9999, move earlier with helper variables also renamed. (MATCH_SPLUT_SEQ): Use $(wordlist 1,$(NUM_MATCH_SPLITS),$(one_to_9999)) instead of $(shell seq 1 $(NUM_MATCH_SPLITS)). (check_p_subdirs): Use $(one_to_9999) instead of $(check_p_numbers).