riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2021-11-30	gimple-match: Add a gimple_extract_op function	Richard Sandiford	2	-110/+135
	code_helper and gimple_match_op seem like generally useful ways of summing up a gimple_assign or gimple_call (or gimple_cond). This patch adds a gimple_extract_op function that can be used for that. gcc/ * gimple-match.h (code_helper): Add functions for querying whether the code represents an internal_fn or a built_in_function. Provide explicit conversion operators for both cases. (gimple_extract_op): Declare. * gimple-match-head.c (gimple_extract): New function, extracted from... (gimple_simplify): ...here. (gimple_extract_op): New function.
2021-11-30	Fix -freorder-blocks-and-partition glitch with Windows SEH (continued)	Eric Botcazou	2	-4/+6
	This fixes a thinko in the fix for the -freorder-blocks-and-partition glitch with SEH on 64-bit Windows: https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565208.html Even if no exceptions are active, e.g. in C, we need to consider calls. gcc/ PR target/103274 * config/i386/i386.c (ix86_output_call_insn): Beef up comment about nops emitted with SEH. * config/i386/winnt.c (i386_pe_seh_unwind_emit): When switching to the cold section, emit a nop before the directive if the previous active instruction is a call.
2021-11-30	libcpp: Enable P1949R7 for C++11 and up as it was a DR [PR100977]	Jakub Jelinek	8	-21/+19
	Jonathan mentioned on IRC that: "Accept P1949R7 (C++ Identifier Syntax using Unicode Standard Annex 31) as a Defect Report and apply the changes therein to the C++ working paper." while I've actually implemented it only for -std={gnu,c}++{23,2b}. As the C++98 rules were significantly different, I'm not trying to change anything for C++98. 2021-11-30 Jakub Jelinek <jakub@redhat.com> PR c++/100977 * init.c (lang_defaults): Enable cxx23_identifiers for -std={gnu,c}++{11,14,17,20} too. * c-c++-common/cpp/ucnid-2011-1-utf8.c: Expect errors in C++. * c-c++-common/cpp/ucnid-2011-1.c: Likewise. * g++.dg/cpp/ucnid-4-utf8.C: Add missing space to dg-options. * g++.dg/cpp23/normalize3.C: Enable for c++11 rather than just c++23. * g++.dg/cpp23/normalize4.C: Likewise. * g++.dg/cpp23/normalize5.C: Likewise. * g++.dg/cpp23/normalize7.C: Expect errors rather than just warnings for c++11 and up rather than just c++23. * g++.dg/cpp23/ucnid-2-utf8.C: Expect errors even for c++11 .. c++20.
2021-11-30	c++: Small incremental tweak to source_location::current() folding	Jakub Jelinek	1	-8/+7
	I've already committed the patch, but perhaps we shouldn't do it in cp_fold where it will be folded even for warnings etc. and the locations might not be the final yet. This patch moves it to cp_fold_r so that it is done just once for each function and just once for each static initializer. 2021-11-30 Jakub Jelinek <jakub@redhat.com> * cp-gimplify.c (cp_fold_r): Perform folding of std::source_location::current() calls here... (cp_fold): ... rather than here.
2021-11-30	x86_64: PR target/100711: Splitters for pandn	Roger Sayle	3	-0/+96
	This patch addresses PR target/100711 by introducing define_split patterns so that not/broadcast/pand may be simplified (by combine) to broadcast/pandn. This introduces two splitters one for optimizing pandn on TARGET_SSE for V4SI and V2DI, and another for vpandn on TARGET_AVX2 for V16QI, V8HI, V32QI, V16HI and V8SI. Each splitter has its own new testcase. I've also confirmed that not/broadcast/pandn is already getting simplified to broadcast/pand by the middle-end optimizers. 2021-11-30 Roger Sayle <roger@nextmovesoftware.com> Uroš Bizjak <ubizjak@gmail.com> gcc/ChangeLog PR target/100711 * config/i386/sse.md (define_split): New splitters to simplify not;vec_duplicate;and as vec_duplicate;andn. gcc/testsuite/ChangeLog PR target/100711 * gcc.target/i386/pr100711-1.c: New test case. * gcc.target/i386/pr100711-2.c: New test case.
2021-11-30	Only return after resetting type_param_spec_list	Richard Biener	1	-2/+2
	This fixes an appearant mistake in gfc_insert_parameter_exprs. 2021-11-29 Richard Biener <rguenther@suse.de> gcc/fortran/ * decl.c (gfc_insert_parameter_exprs): Only return after resetting type_param_spec_list.
2021-11-30	middle-end/103485 - fix conversion kind for vectors	Richard Biener	2	-1/+13
	This makes sure to use a VIEW_CONVERT_EXPR for converting vector signedness in the -((int)x >> (prec - 1)) to (unsigned)x >> (prec - 1) simplification. 2021-11-30 Richard Biener <rguenther@suse.de> PR middle-end/103485 * match.pd (-((int)x >> (prec - 1)) to (unsigned)x >> (prec - 1)): Use VIEW_CONVERT_EXPR for vectors. * gcc.dg/pr103485.c: New testcase.
2021-11-30	Avoid some -Wunreachable-code-ctrl	Richard Biener	14	-59/+14
	This cleans up unreachable code diagnosed by -Wunreachable-code-ctrl. It largely follows the previous series but discovers a few extra cases, namely dead code after break or continue or loops without exits. 2021-11-29 Richard Biener <rguenther@suse.de> gcc/c/ * gimple-parser.c (c_parser_gimple_postfix_expression): avoid unreachable code after break. gcc/ * cfgrtl.c (skip_insns_after_block): Refactor code to be more easily readable. * expr.c (op_by_pieces_d::run): Remove unreachable assert. * sched-deps.c (sched_analyze): Remove unreachable gcc_unreachable. * sel-sched-ir.c (in_same_ebb_p): Likewise. * tree-ssa-alias.c (nonoverlapping_refs_since_match_p): Remove unreachable code. * tree-vect-slp.c (vectorize_slp_instance_root_stmt): Refactor to avoid unreachable loop iteration. * tree.c (walk_tree_1): Remove unreachable break. * vec-perm-indices.c (vec_perm_indices::series_p): Remove unreachable return. gcc/cp/ * parser.c (cp_parser_postfix_expression): Remove unreachable code. * pt.c (tsubst_expr): Remove unreachable breaks. gcc/fortran/ * frontend-passes.c (gfc_expr_walker): Remove unreachable break. * scanner.c (skip_fixed_comments): Remove unreachable gcc_unreachable. * trans-expr.c (gfc_expr_is_variable): Refactor to make control flow more obvious.
2021-11-29	rs6000: Remove builtin mask check from builtin_decl [PR102347]	Kewen Lin	2	-10/+19
	As the discussion in PR102347, currently builtin_decl is invoked so early, it's when making up the function_decl for builtin functions, at that time the rs6000_builtin_mask could be wrong for those builtins sitting in #pragma/attribute target functions, though it will be updated properly later when LTO processes all nodes. This patch is to align with the practice i386 port adopts, also align with r10-7462 by relaxing builtin mask checking in some places. gcc/ChangeLog: PR target/102347 * config/rs6000/rs6000-call.c (rs6000_builtin_decl): Remove builtin mask check. gcc/testsuite/ChangeLog: PR target/102347 * gcc.target/powerpc/pr102347.c: New test.
2021-11-29	rs6000: Modify the way for extra penalized cost	Kewen Lin	1	-16/+19
	This patch follows the discussions here[1][2], where Segher pointed out the existing way to guard the extra penalized cost for strided/elementwise loads with a magic bound does not scale. The way with nunits * stmt_cost can get one much exaggerated penalized cost, such as: for V16QI on P8, it's 16 * 20 = 320, that's why we need one bound. To make it better and more readable, the penalized cost is simplified as: unsigned adjusted_cost = (nunits == 2) ? 2 : 1; unsigned extra_cost = nunits * adjusted_cost; For V2DI/V2DF, it uses 2 penalized cost for each scalar load while for the other modes, it uses 1. It's mainly concluded from the performance evaluations. One thing might be related is that: More units vector gets constructed, more instructions are used. It has more chances to schedule them better (even run in parallelly when enough available units at that time), so it seems reasonable not to penalize more for them. The SPEC2017 evaluations on Power8/Power9/Power10 at option sets O2-vect and Ofast-unroll show this change is neutral. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580099.html gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_cost_data::update_target_cost_per_stmt): Adjust the way to compute extra penalized cost. Remove useless parameter. (rs6000_cost_data::rs6000_add_stmt_cost): Adjust the call to function update_target_cost_per_stmt.
2021-11-29	visium: Revert commit r12-5332	Kewen Lin	1	-25/+25
	This reverts commit b8ce19bb1a0592051e8f9a4c3252d12ae605b256 (r12-5332) "visium: Fix non-robust split condition in define_insn_and_split". Jeff found newlib failed to build for visium port since r12-5332, as Eric confirmed, those split conditions in the related define_insn_and_splits are intentional not to join with insn condition (&&), since insn condition won't hold after reload and the proposed concatenation will make the splitting never happen wrongly.
2021-11-29	Don't reuse reference after potential resize.	Andrew MacLeod	1	-3/+4
	When a new def chain is requested, any existing reference may no longer be valid, so just use the object directly. PR tree-optimization/103467 * gimple-range-gori.cc (range_def_chain::register_dependency): Don't use an object reference after a potential resize.
2021-11-30	Daily bump.	GCC Administrator	13	-1/+223

2021-11-29	analyzer: further false leak fixes due to overzealous state merging [PR103217]	David Malcolm	5	-2/+215
	Commit r12-5424-gf573d35147ca8433c102e1721d8c99fc432cb44b fixed a false positive from -Wanalyzer-malloc-leak due to overzealous state merging, erroneously merging two different svalues bound to a particular part of the store when one has sm-state. A further case was discovered by the reporter of PR analyzer/103217, which this patch fixes. In this variant, different states have set different fields of a struct, and on attempting to merge them, the states have a different set of binding keys, leading to one state having an svalue with sm-state, and its peer state having a NULL value for that binding key. The state merger code was erroneously treating them as mergeable to "UNKNOWN". This followup patch fixes things by rejecting such mergers if the non-NULL svalue is not mergeable with "UNKNOWN". gcc/analyzer/ChangeLog: PR analyzer/103217 * store.cc (binding_cluster::can_merge_p): For the "key is bound" vs "key is not bound" merger case, check that the bound svalue is mergeable before merging it to "unknown", rejecting the merger otherwise. gcc/testsuite/ChangeLog: PR analyzer/103217 * gcc.dg/analyzer/pr103217-2.c: New test. * gcc.dg/analyzer/pr103217-3.c: New test. * gcc.dg/analyzer/pr103217-4.c: New test. * gcc.dg/analyzer/pr103217-5.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-11-29	i386: Fix and improve movhi_internal and movhf_internal some more.	Uros Bizjak	1	-83/+128
	An (v,C) alternative can be added to movhi_internal to directly load HImode constant 0 to xmm register. Also, V4SFmode moves can be used for xmm->xmm moves instead of TImode moves when optimizing for size. Fix invalid %vpinsrw insn template, which needs to duplicate %xmm register for AVX targets. Optimize GPR moves in movhf_internal in the same way as in movhi_internal. Fix pinsrw and pextrw templates for AVX targets. Use sselog1 instead of sselog type. Also, handle TARGET_SSE_PARTIAL_REG_DEPENDENCY and TARGET_SSE_SPLIT_REGS targets. 2021-11-29 Uroš Bizjak <ubizjak@gmail.com> gcc/ChangeLog: PR target/102811 config/i386/i386.md (movhi_internal): Introduce (v,C) alternative. Do not allocate non-GPR registers. Optimize xmm->xmm moves when optimizing for size. Fix vpinsrw insn template. (*movhf_internal): Fix pinsrw and pextrw insn templates for AVX targets. Use sselog1 type instead of sselog. Optimize GPR moves. Optimize xmm->xmm moves for TARGET_SSE_PARTIAL_REG_DEPENDENCY and TARGET_SSE_SPLIT_REGS targets.
2021-11-29	Prune out valid -Winfinite-recursion [PR103469].	Martin Sebor	3	-0/+9
	gcc/testsuite/ChangeLog: PR testsuite/103469 * c-c++-common/attr-retain-5.c: Prune out valid warning. * c-c++-common/attr-retain-6.c: Same. * c-c++-common/attr-retain-9.c: Same.
2021-11-29	Fix autoconf regeneration slip-up.	Eric Gallager	1	-1/+0
	A stray _AC_FINALIZE somehow snuck into g:909b30a; this should fix it. gcc/ChangeLog: * configure: Re-regenerate.
2021-11-29	Make etags path used by build system configurable	Eric Gallager	11	-20/+55
	This commit allows users to specify a path to their "etags" executable for use when doing "make tags". I based this patch off of this one from upstream automake: https://git.savannah.gnu.org/cgit/automake.git/commit/m4?id=d2ccbd7eb38d6a4277d6f42b994eb5a29b1edf29 This means that I just supplied variables that the user can override for the tags programs, rather than having the configure scripts actually check for them. I handle etags and ctags separately because the intl subdirectory has separate targets for them. This commit only affects the subdirectories that use handwritten Makefiles; the ones that use automake will have to wait until we update the version of automake used to be 1.16.4 or newer before they'll be fixed. Addresses #103021 gcc/ChangeLog: PR other/103021 * Makefile.in: Substitute CTAGS, ETAGS, and CSCOPE variables. Use ETAGS variable in TAGS target. * configure: Regenerate. * configure.ac: Allow CTAGS, ETAGS, and CSCOPE variables to be overridden. gcc/ada/ChangeLog: PR other/103021 * gcc-interface/Make-lang.in: Use ETAGS variable in TAGS target. gcc/c/ChangeLog: PR other/103021 * Make-lang.in: Use ETAGS variable in TAGS target. gcc/cp/ChangeLog: PR other/103021 * Make-lang.in: Use ETAGS variable in TAGS target. gcc/d/ChangeLog: PR other/103021 * Make-lang.in: Use ETAGS variable in TAGS target. gcc/fortran/ChangeLog: PR other/103021 * Make-lang.in: Use ETAGS variable in TAGS target. gcc/go/ChangeLog: PR other/103021 * Make-lang.in: Use ETAGS variable in TAGS target. gcc/objc/ChangeLog: PR other/103021 * Make-lang.in: Use ETAGS variable in TAGS target. gcc/objcp/ChangeLog: PR other/103021 * Make-lang.in: Use ETAGS variable in TAGS target. intl/ChangeLog: PR other/103021 * Makefile.in: Use ETAGS variable in TAGS target, CTAGS variable in CTAGS target, and MKID variable in ID target. * configure: Regenerate. * configure.ac: Allow CTAGS, ETAGS, and MKID variables to be overridden. libcpp/ChangeLog: PR other/103021 * Makefile.in: Use ETAGS variable in TAGS target. * configure: Regenerate. * configure.ac: Allow ETAGS variable to be overridden. libiberty/ChangeLog: PR other/103021 * Makefile.in: Use ETAGS variable in TAGS target. * configure: Regenerate. * configure.ac: Allow ETAGS variable to be overridden.
2021-11-29	rs6000: Add Power10 optimization for most _mm_movemask*	Paul A. Clarke	2	-0/+12
	Power10 ISA added `vextract` instructions which are realized in the `vec_extractm` instrinsic. Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and `_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`. 2021-11-29 Paul A. Clarke <pc@us.ibm.com> gcc config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm when _ARCH_PWR10. * config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise. (_mm_movemask_epi8): Likewise.
2021-11-29	Fix RTL FE issue with premature return	Richard Biener	1	-1/+2
	This fixes an issue discovered by -Wunreachable-code-return 2021-11-29 Richard Biener <rguenther@suse.de> * read-rtl-function.c (function_reader::read_rtx_operand): Return only after resetting m_in_call_function_usage.
2021-11-29	c++: redundant explicit 'this' capture before C++20 [PR100493]	Patrick Palka	4	-8/+19
	As described in detail in the PR, in C++20 implicitly capturing 'this' via a '=' capture default is deprecated, and in C++17 adding an explicit 'this' capture alongside a '=' capture default is diagnosed as redundant (and is strictly speaking ill-formed). This means it's impossible to write, in a forward-compatible way, a C++17 lambda that has a '=' capture default and that also captures 'this' (implicitly or explicitly): [=] { this; } // #1 deprecated in C++20, OK in C++17 // GCC issues a -Wdeprecated warning in C++20 mode [=, this] { } // #2 ill-formed in C++17, OK in C++20 // GCC issues an unconditional warning in C++17 mode This patch resolves this dilemma by downgrading the warning for #2 into a -pedantic one. In passing, move it into the -Wc++20-extensions class of warnings and adjust its wording accordingly. PR c++/100493 gcc/cp/ChangeLog: * parser.c (cp_parser_lambda_introducer): In C++17, don't diagnose a redundant 'this' capture alongside a by-copy capture default unless -pedantic. Move the diagnostic into -Wc++20-extensions and adjust wording accordingly. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/lambda-this1.C: Adjust expected diagnostics. * g++.dg/cpp1z/lambda-this8.C: New test. * g++.dg/cpp2a/lambda-this3.C: Compile with -pedantic in C++17 to continue to diagnose redundant 'this' captures.
2021-11-29	x86_64: Improved V1TImode rotations by non-constant amounts.	Roger Sayle	3	-3/+24
	This patch builds on the recent improvements to TImode rotations (and Jakub's fixes to shldq/shrdq patterns). Now that expanding a TImode rotation can never fail, it is safe to allow general_operand constraints on the QImode shift amounts in rotlv1ti3 and rotrv1ti3 patterns. I've also made an additional tweak to ix86_expand_v1ti_to_ti to use vec_extract via V2DImode, which avoid using memory and takes advantage vpextrq on recent hardware. For the following test case: typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16))); uv1ti rotr(uv1ti x, unsigned int i) { return (x >> i) \| (x << (128-i)); } GCC with -O2 -mavx2 would previously generate: rotr: vmovdqa %xmm0, -24(%rsp) movq -16(%rsp), %rdx movl %edi, %ecx xorl %esi, %esi movq -24(%rsp), %rax shrdq %rdx, %rax shrq %cl, %rdx testb $64, %dil cmovne %rdx, %rax cmovne %rsi, %rdx negl %ecx xorl %edi, %edi andl $127, %ecx vmovq %rax, %xmm2 movq -24(%rsp), %rax vpinsrq $1, %rdx, %xmm2, %xmm1 movq -16(%rsp), %rdx shldq %rax, %rdx salq %cl, %rax testb $64, %cl cmovne %rax, %rdx cmovne %rdi, %rax vmovq %rax, %xmm3 vpinsrq $1, %rdx, %xmm3, %xmm0 vpor %xmm1, %xmm0, %xmm0 ret with this patch, we now generate: rotr: movl %edi, %ecx vpextrq $1, %xmm0, %rax vmovq %xmm0, %rdx shrdq %rax, %rdx vmovq %xmm0, %rsi shrdq %rsi, %rax andl $64, %ecx movq %rdx, %rsi cmovne %rax, %rsi cmove %rax, %rdx vmovq %rsi, %xmm0 vpinsrq $1, %rdx, %xmm0, %xmm0 ret 2021-11-29 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.c (ix86_expand_v1ti_to_ti): Perform the conversion via V2DImode using vec_extractv2didi on TARGET_SSE2. * config/i386/sse.md (rotlv1ti3, rotrv1ti3): Change constraint on QImode shift amounts from const_int_operand to general_operand. gcc/testsuite/ChangeLog * gcc.target/i386/sse2-v1ti-rotate.c: New test case.
2021-11-29	Remove unreachable gcc_unreachable () at the end of functions	Richard Biener	5	-10/+0
	It seems to be a style to place gcc_unreachable () after a switch that handles all cases with every case returning. Those are unreachable (well, yes!), so they will be elided at CFG construction time and the middle-end will place another __builtin_unreachable "after" them to note the path doesn't lead to a return when the function is not declared void. So IMHO those explicit gcc_unreachable () serve no purpose, if they could be replaced by a comment. But since all cases cover switches not handling a case or not returning will likely cause some diagnostic to be emitted which is better than running into an ICE only at runtime. 2021-11-24 Richard Biener <rguenther@suse.de> * tree.h (reverse_storage_order_for_component_p): Remove spurious gcc_unreachable. * cfganal.c (dfs_find_deadend): Likewise. * fold-const-call.c (fold_const_logb): Likewise. (fold_const_significand): Likewise. * gimple-ssa-store-merging.c (lhs_valid_for_store_merging_p): Likewise. gcc/c-family/ * c-format.c (check_format_string): Remove spurious gcc_unreachable.
2021-11-29	Remove unreachable returns	Richard Biener	14	-39/+10
	This removes unreachable return statements as diagnosed by the -Wunreachable-code patch. Some cases are more obviously an improvement than others - in fact some may get you the idea to replace them with gcc_unreachable () instead, leading to cases of the 'Remove unreachable gcc_unreachable () at the end of functions' patch. 2021-11-25 Richard Biener <rguenther@suse.de> * vec.c (qsort_chk): Do not return the void return value from the noreturn qsort_chk_error. * ccmp.c (expand_ccmp_expr_1): Remove unreachable return. * df-scan.c (df_ref_equal_p): Likewise. * dwarf2out.c (is_base_type): Likewise. (add_const_value_attribute): Likewise. * fixed-value.c (fixed_arithmetic): Likewise. * gimple-fold.c (gimple_fold_builtin_fputs): Likewise. * gimple-ssa-strength-reduction.c (stmt_cost): Likewise. * graphite-isl-ast-to-gimple.c (gcc_expression_from_isl_expr_op): Likewise. (gcc_expression_from_isl_expression): Likewise. * ipa-fnsummary.c (will_be_nonconstant_expr_predicate): Likewise. * lto-streamer-in.c (lto_input_mode_table): Likewise. gcc/c-family/ * c-opts.c (c_common_post_options): Remove unreachable return. * c-pragma.c (handle_pragma_target): Likewise. (handle_pragma_optimize): Likewise. gcc/c/ * c-typeck.c (c_tree_equal): Remove unreachable return. * c-parser.c (get_matching_symbol): Likewise. libgomp/ * oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): Remove unreachable return.
2021-11-29	Optimize _Float16 usage for non AVX512FP16.	liuhongt	5	-8/+41
	1. No memory is needed to move HI/HFmode between GPR and SSE registers under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o AVX512FP16. 2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0 to replace ix86_expand_vector_set in extendhfsf2/truncsfhf2 so that redundant initialization cound be eliminated. gcc/ChangeLog: PR target/102811 * config/i386/i386.c (inline_secondary_memory_needed): HImode move between GPR and SSE registers is supported under TARGET_SSE2 and above. * config/i386/i386.md (extendhfsf2): Optimize expander. (truncsfhf2): Ditto. * config/i386/sse.md (sse2p4_1): Adjust attr for V8HFmode to align with V8HImode. gcc/testsuite/ChangeLog: * gcc.target/i386/pr102811-2.c: New test. * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: Add new scan-assembler-times.
2021-11-29	Fix regression introduced by r12-5536.	liuhongt	3	-18/+29
	There're several failures: 1. unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)" %vpextrw should be used in output templates. 2. ICE in get_attr_memory for movhi_internal since some alternatives are marked as TYPE_SSELOG. use TYPE_SSELOG1 instead. Also this patch fixs a typo and some latent bugs which are related to moving HImode from/to sse register w/o TARGET_AVX512FP16. gcc/ChangeLog: PR target/102811 PR target/103463 * config/i386/i386.c (ix86_secondary_reload): Without TARGET_SSE4_1, General register is needed to move HImode from sse register to memory. * config/i386/sse.md (vec_extrachf): Use %vpextrw instead of pextrw in output templates. config/i386/i386.md (movhi_internal): Ditto, also fix typo of MEM_P (operands[1]) and adjust mode/prefix/type attribute for alternatives related to sse register.
2021-11-29	tree-optimization/103458 - avoid creating new loops in CD-DCE	Richard Biener	2	-2/+27
	When creating forwarders in CD-DCE we have to avoid creating loops where we formerly did not consider those because of abnormal predecessors. At this point simply excuse us when there are any abnormal predecessors. 2021-11-29 Richard Biener <rguenther@suse.de> PR tree-optimization/103458 * tree-ssa-dce.c (make_forwarders_with_degenerate_phis): Do not create forwarders for blocks with abnormal predecessors. * gcc.dg/torture/pr103458.c: New testcase.
2021-11-29	Restore can_be_invalidated_p semantics to before refactoring	Richard Biener	1	-3/+5
	This restores the semantics of can_be_invalidated_p to the original semantics of the function this was split out from tree-ssa-uninit.c. The current semantics only ever look at the first predicate which cannot be correct. 2021-11-26 Richard Biener <rguenther@suse.de> * gimple-predicate-analysis.cc (can_be_invalidated_p): Restore semantics to the one before the split from tree-ssa-uninit.c.
2021-11-28	rs6000/test: Add emulated gather test case	Kewen Lin	1	-0/+20
	As verified, the emulated gather capability of vectorizer (r12-2733) can help to speed up SPEC2017 510.parest_r on Power8/9/10 by 5% ~ 9% with option sets Ofast unroll and Ofast lto. This patch is to add a test case similar to the one in i386 to add testing coverage for 510.parest_r hotspots. btw, different from the one in i386, this uses unsigned int as INDEXTYPE since the unpack support for unsigned int (r12-3134) also matters for the hotspots vectorization. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vect-gather-1.c: New test.
2021-11-29	Daily bump.	GCC Administrator	3	-1/+32

2021-11-28	Compare guessed and feedback frequencies during profile feedback stream-in	Jan Hubicka	1	-5/+73
	This patch adds simple code to dump and compare frequencies of basic blocks read from the profile feedback and frequencies guessed statically. It dumps basic blocks in the order of decreasing frequencies from feedback along with guessed frequencies and histograms. It makes it to possible spot basic blocks in hot regions that are considered cold by guessed profile or vice versa. I am trying to figure out how realistic our profile estimate is compared to read one on exchange2 (looking again into PR98782. There IRA now places spills into hot regions of code while with older (and worse) profile it did not. Catch is that the function is very large and has 9 nested loops, so it is hard to figure out how to improve the profile estimate and/or IRA. gcc/ChangeLog: 2021-11-28 Jan Hubicka <hubicka@ucw.cz> * profile.c: Include sreal.h (struct bb_stats): New. (cmp_stats): New function. (compute_branch_probabilities): Output bb stats.
2021-11-28	Improve -fprofile-report	Jan Hubicka	5	-124/+269
	Profile-report was never properly updated after switch to new profile representation. This patch fixes the way profile mismatches are calculated: we used to collect separately count and freq mismatches, while now we have only counts & probabilities. So we verify - in count: that total count of incomming edges is close to acutal count of the BB - out prob: that total sum of outgoing edge edge probabilities is close to 1 (except for BB containing noreturn calls or EH). Moreover I added dumping of absolute data which is useful to plot them: with Martin Liska we plan to setup regular testing so we keep optimizers profie updates bit under control. Finally I added both static and dynamic stats about mismatches - static one is simply number of inconsistencies in the cfg while dynamic is scaled by the profile - I think in order to keep eye on optimizers the first number is quite relevant. WHile when tracking why code quality regressed the second number matters more. 2021-11-28 Jan Hubicka <hubicka@ucw.cz> * cfghooks.c: Include sreal.h, profile.h. (profile_record_check_consistency): Fix checking of count counsistency; record also dynamic mismatches. * cfgrtl.c (rtl_account_profile_record): Similarly. * tree-cfg.c (gimple_account_profile_record): Likewise. * cfghooks.h (struct profile_record): Remove num_mismatched_freq_in, num_mismatched_freq_out, turn time to double, add dyn_mismatched_prob_out, dyn_mismatched_count_in, num_mismatched_prob_out; remove num_mismatched_count_out. * passes.c (account_profile_1): New function. (account_profile_in_list): New function. (pass_manager::dump_profile_report): Rewrite. (execute_one_ipa_transform_pass): Check profile consistency after running all passes. (execute_all_ipa_transforms): Remove cfun test; record all transform methods. (execute_one_pass): Fix collecting of profile stats.
2021-11-28	d: fix thinko in optimize attr parsing	Martin Liska	1	-1/+1
	gcc/d/ChangeLog: * d-attribs.cc (parse_optimize_options): Fix thinko.
2021-11-28	Daily bump.	GCC Administrator	4	-1/+36

2021-11-27	jit: Change printf specifiers for size_t to %zu	Petter Tomner	1	-2/+2
	Change four occurances of %ld specifier for size_t to %zu for clean 32bit builds. Signed-off-by 2021-11-27 Petter Tomner <tomner@kth.se> gcc/jit/ * libgccjit.c: %ld -> %zu
2021-11-27	x86: Fix up x86_{,64_}sh{l,r}d patterns [PR103431]	Jakub Jelinek	2	-42/+281
	The following testcase is miscompiled because the x86_{,64_}sh{l,r}d patterns don't properly describe what the instructions do. One thing is left out, in particular that there is initial count &= 63 for sh{l,r}dq and initial count &= 31 for sh{l,r}d{l,w}. And another thing not described properly, in particular the behavior when count (after the masking) is 0. The pattern says it is e.g. res = (op0 << op2) \| (op1 >> (64 - op2)) but that triggers UB on op1 >> 64. For op2 0 we actually want res = (op0 << op2) \| 0 When constants are propagated to these patterns during RTL optimizations, both such problems trigger wrong-code issues. This patch represents the patterns as e.g. res = (op0 << (op2 & 63)) \| (unsigned long long) ((uint128_t) op1 >> (64 - (op2 & 63))) so there is both the initial masking and op2 == 0 behavior results in zero being ored. The patch introduces alternate patterns for constant op2 where simplify-rtx.c will fold those expressions into simple numbers, and define_insn_and_split pre-reload splitter for how the patterns looked before into the new form, so that it can pattern match during combine even computations that assumed the shift amount will be in the range of 1 .. bitsize-1. 2021-11-27 Jakub Jelinek <jakub@redhat.com> PR middle-end/103431 * config/i386/i386.md (x86_64_shld, x86_shld, x86_64_shrd, x86_shrd): Change insn pattern to accurately describe the instructions. (x86_64_shld_1, x86_shld_1, x86_64_shrd_1, x86_shrd_1): New define_insn patterns. (x86_64_shld_2, x86_shld_2, x86_64_shrd_2, x86_shrd_2): New define_insn_and_split patterns. (ashl<dwi>3_doubleword_mask, ashl<dwi>3_doubleword_mask_1, <insn><dwi>3_doubleword_mask, <insn><dwi>3_doubleword_mask_1, ix86_rotl<dwi>3_doubleword, ix86_rotr<dwi>3_doubleword): Adjust splitters for x86_{,64_}sh{l,r}d pattern changes. * gcc.dg/pr103431.c: New test.
2021-11-27	bswap: Fix UB in find_bswap_or_nop_finalize [PR103435]	Jakub Jelinek	1	-2/+8
	On gcc.c-torture/execute/pr103376.c in the following code we trigger UB in the compiler. n->range is 8 because it is 64-bit load and rsize is 0 because it is a bswap sequence with load and known to be 0: /* Find real size of result (highest non-zero byte). / if (n->base_addr) for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++); else rsize = n->range; The shifts then shift uint64_t by 64 bits. For this case mask is 0 and we want both cmpxchg and cmpnop as 0, the operation can be done as both nop and bswap and callers will prefer nop. 2021-11-27 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/103435 gimple-ssa-store-merging.c (find_bswap_or_nop_finalize): Avoid UB if n->range - rsize == 8, just clear both cmpnop and cmpxchg in that case.
2021-11-27	[Committed] Fix new ivopts-[89].c test cases for -m32.	Roger Sayle	2	-2/+2
	2021-11-27 Roger Sayle <roger@nextmovesoftware.com> gcc/testsuite/ChangeLog * gcc.dg/tree-ssa/ivopts-8.c: Fix new test case for -m32. * gcc.dg/tree-ssa/ivopts-9.c: Likewise.
2021-11-27	Daily bump.	GCC Administrator	6	-1/+146

2021-11-27	ipa: Fix CFG fix-up in IPA-CP transform phase (PR 103441)	Martin Jambor	1	-10/+8
	I forgot that IPA passes before ipa-inline must not return TODO_cleanup_cfg from their transformation function because ordinary CFG cleanup does not remove call graph edges associated with removed call statements but must use delete_unreachable_blocks_update_callgraph instead. This patch fixes that error. gcc/ChangeLog: 2021-11-26 Martin Jambor <mjambor@suse.cz> PR ipa/103441 * ipa-prop.c (ipcp_transform_function): Call delete_unreachable_blocks_update_callgraph instead of returning TODO_cleanup_cfg.
2021-11-26	Fortran: improve check of arguments to the RESHAPE intrinsic	Harald Anlauf	4	-37/+41
	gcc/fortran/ChangeLog: PR fortran/103411 * check.c (gfc_check_reshape): Improve check of size of source array for the RESHAPE intrinsic against the given shape when pad is not given, and shape is a parameter. Try other simplifications of shape. gcc/testsuite/ChangeLog: PR fortran/103411 * gfortran.dg/pr68153.f90: Adjust test to improved check. * gfortran.dg/reshape_7.f90: Likewise. * gfortran.dg/reshape_9.f90: New test.
2021-11-26	tree-object-size: Abstract object_sizes array	Siddhesh Poyarekar	1	-79/+98
	Put all accesses to object_sizes behind functions so that we can add dynamic capability more easily. gcc/ChangeLog: * tree-object-size.c (object_sizes_grow, object_sizes_release, object_sizes_unknown_p, object_sizes_get, object_size_set_force, object_sizes_set): New functions. (addr_object_size, compute_builtin_object_size, expr_object_size, call_object_size, unknown_object_size, merge_object_sizes, plus_stmt_object_size, cond_expr_object_size, collect_object_sizes_for, check_for_plus_in_loops_1, init_object_sizes, fini_object_sizes): Adjust. Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
2021-11-26	tree-object-size: Replace magic numbers with enums	Siddhesh Poyarekar	1	-25/+34
	A simple cleanup to allow inserting dynamic size code more easily. gcc/ChangeLog: * tree-object-size.c: New enum. (object_sizes, computed, addr_object_size, compute_builtin_object_size, expr_object_size, call_object_size, merge_object_sizes, plus_stmt_object_size, collect_object_sizes_for, init_object_sizes, fini_object_sizes, object_sizes_execute): Replace magic numbers with enums. Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
2021-11-26	ivopts: Improve code generated for very simple loops.	Roger Sayle	7	-7/+106
	This patch tidies up the code that GCC generates for simple loops, by selecting/generating a simpler loop bound expression in ivopts. The original motivation came from looking at the following loop (from gcc.target/i386/pr90178.c) int find_ptr (int mem, int sz, int val) { for (int i = 0; i < sz; i++) if (mem[i] == val) return &mem[i]; return 0; } which GCC currently compiles to: find_ptr: movq %rdi, %rax testl %esi, %esi jle .L4 leal -1(%rsi), %ecx leaq 4(%rdi,%rcx,4), %rcx jmp .L3 .L7: addq $4, %rax cmpq %rcx, %rax je .L4 .L3: cmpl %edx, (%rax) jne .L7 ret .L4: xorl %eax, %eax ret Notice the relatively complex leal/leaq instructions, that result from ivopts using the following expression for the loop bound: inv_expr 2: ((unsigned long) ((unsigned int) sz_8(D) + 4294967295) * 4 + (unsigned long) mem_9(D)) + 4 which results from NITERS being (unsigned int) sz_8(D) + 4294967295, i.e. (sz - 1), and the logic in cand_value_at determining the bound as BASE + NITERSSTEP at the start of the final iteration and as BASE + NITERSSTEP + STEP at the end of the final iteration. Ideally, we'd like the middle-end optimizers to simplify BASE + NITERSSTEP + STEP as BASE + (NITERS+1)STEP, especially when NITERS already has the form BOUND-1, but with type conversions and possible overflow to worry about, the above "inv_expr 2" is the best that can be done by fold (without additional context information). This patch improves ivopts' cand_value_at by instead of using just the tree expression for NITERS, passing the data structure that explains how that expression was derived. This allows us to peek under the surface to check that NITERS+1 doesn't overflow, and in this patch to use the SSA_NAME already holding the required value. In the motivating loop above, inv_expr 2 now becomes: (unsigned long) sz_8(D) * 4 + (unsigned long) mem_9(D) And as a result, on x86_64 we now generate: find_ptr: movq %rdi, %rax testl %esi, %esi jle .L4 movslq %esi, %rsi leaq (%rdi,%rsi,4), %rcx jmp .L3 .L7: addq $4, %rax cmpq %rcx, %rax je .L4 .L3: cmpl %edx, (%rax) jne .L7 ret .L4: xorl %eax, %eax ret This improvement required one minor tweak to GCC's testsuite for gcc.dg/wrapped-binop-simplify.c, where we again generate better code, and therefore no longer find as many optimization opportunities in later passes (vrp2). Previously: void v1 (unsigned long in, unsigned long out, unsigned int n) { int i; for (i = 0; i < n; i++) { out[i] = in[i]; } } on x86_64 generated: v1: testl %edx, %edx je .L1 movl %edx, %edx xorl %eax, %eax .L3: movq (%rdi,%rax,8), %rcx movq %rcx, (%rsi,%rax,8) addq $1, %rax cmpq %rax, %rdx jne .L3 .L1: ret and now instead generates: v1: testl %edx, %edx je .L1 movl %edx, %edx xorl %eax, %eax leaq 0(,%rdx,8), %rcx .L3: movq (%rdi,%rax), %rdx movq %rdx, (%rsi,%rax) addq $8, %rax cmpq %rax, %rcx jne .L3 .L1: ret 2021-11-26 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * tree-ssa-loop-ivopts.c (cand_value_at): Take a class tree_niter_desc* argument instead of just a tree for NITER. If we require the iv candidate value at the end of the final loop iteration, try using the original loop bound as the NITER for sufficiently simple loops. (may_eliminate_iv): Update (only) call to cand_value_at. gcc/testsuite/ChangeLog * gcc.dg/wrapped-binop-simplify.c: Update expected test result. * gcc.dg/tree-ssa/ivopts-5.c: New test case. * gcc.dg/tree-ssa/ivopts-6.c: New test case. * gcc.dg/tree-ssa/ivopts-7.c: New test case. * gcc.dg/tree-ssa/ivopts-8.c: New test case. * gcc.dg/tree-ssa/ivopts-9.c: New test case.
2021-11-26	d: fix ASAN in option processing	Martin Liska	1	-1/+3
	Fixes: ==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178 READ of size 4 at 0x00000666ca5c thread T0 #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855 #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916 #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887 #3 0xff85b1 in decl_attributes(tree_node*, tree_node, int, tree_node) ../../gcc/attribs.c:829 #4 0xef2a91 in apply_user_attributes(Dsymbol, tree_node) ../../gcc/d/d-attribs.cc:427 #5 0xf7b7f3 in get_symbol_decl(Declaration) ../../gcc/d/decl.cc:1346 #6 0xf87bc7 in get_symbol_decl(Declaration) ../../gcc/d/decl.cc:967 #7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration) ../../gcc/d/decl.cc:808 #8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol) ../../gcc/d/decl.cc:146 for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d. gcc/d/ChangeLog: d-attribs.cc (parse_optimize_options): Check index before accessing cl_options.
2021-11-26	Minor ipa-modref tweaks	Jan Hubicka	1	-11/+13
	To make dumps easier to read modref now dumps cgraph_node name rather then cfun name in function being analysed and I also fixed minor issue with ECF flags merging when updating inline summary. gcc/ChangeLog: 2021-11-26 Jan Hubicka <hubicka@ucw.cz> * ipa-modref.c (analyze_function): Drop parameter F and dump cgraph node name rather than cfun name. (modref_generate): Update. (modref_summaries::insert):Update. (modref_summaries_lto::insert):Update. (pass_modref::execute):Update. (ipa_merge_modref_summary_after_inlining): Improve combining of ECF_FLAGS.
2021-11-26	Fix failure in inlline-9.c testcase	Jan Hubicka	1	-1/+1
	gcc/testsuite/ChangeLog: 2021-11-26 Jan Hubicka <hubicka@ucw.cz> * gcc.dg/ipa/inline-9.c: Update template.c
2021-11-26	Fix handling of in_flags in update_escape_summary_1	Jan Hubicka	1	-1/+1
	update_escape_summary_1 has thinko where it compues proper min_flags but then stores original value (ignoring the fact whether there was a dereference in the escape point). PR ipa/102943 * ipa-modref.c (update_escape_summary_1): Fix handling of min_flags.
2021-11-26	c++: Fix up taking address of an immediate function diagnostics [PR102753]	Jakub Jelinek	9	-37/+165
	On Wed, Oct 20, 2021 at 07:16:44PM -0400, Jason Merrill wrote: > or an unevaluated operand, or a subexpression of an immediate invocation. > > Hmm...that suggests that in consteval23.C, bar(foo) should also be OK, The following patch handles that by removing the diagnostics about taking address of immediate function from cp_build_addr_expr_1, and instead diagnoses it in cp_fold_r. To do that with proper locations, the patch attempts to ensure that ADDR_EXPRs of immediate functions get EXPR_LOCATION set and adds a PTRMEM_CST_LOCATION for PTRMEM_CSTs. Also, evaluation of std::source_location::current() is moved from genericization to cp_fold. 2021-11-26 Jakub Jelinek <jakub@redhat.com> PR c++/102753 * cp-tree.h (struct ptrmem_cst): Add locus member. (PTRMEM_CST_LOCATION): Define. * tree.c (make_ptrmem_cst): Set PTRMEM_CST_LOCATION to input_location. (cp_expr_location): Return PTRMEM_CST_LOCATION for PTRMEM_CST. * typeck.c (build_x_unary_op): Overwrite PTRMEM_CST_LOCATION for PTRMEM_CST instead of calling maybe_wrap_with_location. (cp_build_addr_expr_1): Don't diagnose taking address of immediate functions here. Instead when taking their address make sure the returned ADDR_EXPR has EXPR_LOCATION set. (expand_ptrmemfunc_cst): Copy over PTRMEM_CST_LOCATION to ADDR_EXPR's EXPR_LOCATION. (convert_for_assignment): Use cp_expr_loc_or_input_loc instead of EXPR_LOC_OR_LOC. * pt.c (tsubst_copy): Use build1_loc instead of build1. Ensure ADDR_EXPR of immediate function has EXPR_LOCATION set. * cp-gimplify.c (cp_fold_r): Diagnose taking address of immediate functions here. For consteval if don't walk THEN_CLAUSE. (cp_genericize_r): Move evaluation of calls to std::source_location::current from here to... (cp_fold): ... here. Don't assert calls to immediate functions must be source_location_current_p, instead only constant evaluate calls to source_location_current_p. * g++.dg/cpp2a/consteval20.C: Add some extra tests. * g++.dg/cpp2a/consteval23.C: Likewise. * g++.dg/cpp2a/consteval25.C: New test. * g++.dg/cpp2a/srcloc20.C: New test.
2021-11-26	i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode ↵	konglin1	5	-11/+83
	with -mf16c [PR 102811] Add define_insn extendhfsf2 and truncsfhf2 for target_f16c. gcc/ChangeLog: PR target/102811 * config/i386/i386.c (ix86_can_change_mode_class): Allow 16 bit data in XMM register for TARGET_SSE2. * config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for TARGET_F16C. (extendhfdf2): Restrict extendhfdf for TARGET_AVX512FP16 only. (extendhf<mode>2): Rename from extendhf<mode>2. (truncsfhf2): Likewise. (truncdfhf2): Likewise. (trunc<mode>2): Likewise. gcc/testsuite/ChangeLog: PR target/102811 * gcc.target/i386/pr90773-21.c: Allow pextrw instead of movw. * gcc.target/i386/pr90773-23.c: Ditto. * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.