aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-07-10doc: Document arm_v8_1m_main_cde_mve_fpChristophe Lyon1-1/+7
The arm_v8_1m_main_cde_mve_fp family of effective targets was not documented when it was introduced. 2023-07-07 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * doc/sourcebuild.texi (arm_v8_1m_main_cde_mve_fp): Document.
2023-07-10ada: Follow-up fix for compilation issue with recent MinGW-w64 versionsEric Botcazou1-0/+3
It turns out that adaint.c includes other Windows header files than just windows.h, so defining WIN32_LEAN_AND_MEAN is not sufficient for it. gcc/ada/ * adaint.c [_WIN32]: Undefine 'abort' macro.
2023-07-10ada: Add typedefs to snames.h-tmplTom Tromey1-2/+6
A future patch will change sname.h-tmpl to use enums rather than preprocessor defines. In order to do this, first introduce some typedefs that can be used in gcc-interface. gcc/ada/ * snames.h-tmpl (Name_Id, Attribute_Id, Convention_Id) (Pragma_Id): New typedefs. (Get_Attribute_Id, Get_Pragma_Id): Use typedef.
2023-07-10ada: Simplify assertion to remove CodePeer messageYannick Moy1-3/+1
CodePeer is correctly warning on a test always true in an assertion. It can be rewritten without loss of proof to avoid that message. gcc/ada/ * libgnat/s-aridou.adb (Lemma_Powers_Of_2_Commutation): Rewrite assertion.
2023-07-10ada: Documentation for mixed declarations and statementsBob Duff3-45/+91
This patch documents the new feature that allows declarations mixed with statements, primarily by referring to the RFC. gcc/ada/ * doc/gnat_rm/gnat_language_extensions.rst (Local Declarations Without Block): Document the feature very briefly, and refer the reader to the RFC for details and examples. * gnat_rm.texi: Regenerate. * gnat_ugn.texi: Regenerate.
2023-07-10ada: hardcfr: optionally disable in leaf functionsAlexandre Oliva2-0/+10
Document -fhardcfr-skip-leaf. gcc/ada/ * doc/gnat_rm/security_hardening_features.rst (Control Flow Hardening): Document -fhardcfr-skip-leaf. * gnat_rm.texi: Regenerate.
2023-07-10ada: hardcfr: mark throw-expected functionsAlexandre Oliva2-16/+18
Adjust documentation to reflect the introduction of -fhardcfr-check-noreturn-calls=no-xthrow. gcc/ada/ * doc/gnat_rm/security_hardening_features.rst (Control Flow Redundancy): Add -fhardcfr-check-noreturn-calls=no-xthrow. * gnat_rm.texi: Regenerate.
2023-07-10ada: Adapt proof of System.Arith_Double to remove CVC4Yannick Moy1-9/+75
The proof of System.Arith_Double still required the use of CVC4, now replaced by its successor cvc5. Adapt the proof to be able to remove CVC4 in the proof of run-time units. gcc/ada/ * libgnat/s-aridou.adb (Lemma_Div_Mult): New simple lemma. (Lemma_Powers_Of_2_Commutation): State post in else branch. (Lemma_Div_Pow2): Introduce local lemma and use it. (Scaled_Divide): Use cut operations in assertions, lemmas, new assertions. Introduce local lemma and use it.
2023-07-10ada: Add leafy mode for zero-call-used-regsAlexandre Oliva2-1/+13
Document leafy mode. gcc/ada/ * doc/gnat_rm/security_hardening_features.rst (Register Scrubbing): Document leafy mode. * gnat_rm.texi: Regenerate.
2023-07-10vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557]Xi Ruoyao2-16/+83
If a bit-field is signed and it's wider than the output type, we must ensure the extracted result sign-extended. But this was not handled correctly. For example: int x : 8; long y : 55; bool z : 1; The vectorized extraction of y was: vect__ifc__49.29_110 = MEM <vector(2) long unsigned int> [(struct Item *)vectp_a.27_108]; vect_patt_38.30_112 = vect__ifc__49.29_110 & { 9223372036854775552, 9223372036854775552 }; vect_patt_39.31_113 = vect_patt_38.30_112 >> 8; vect_patt_40.32_114 = VIEW_CONVERT_EXPR<vector(2) long int>(vect_patt_39.31_113); This is obviously incorrect. This pach has implemented it as: vect__ifc__25.16_62 = MEM <vector(2) long unsigned int> [(struct Item *)vectp_a.14_60]; vect_patt_31.17_63 = VIEW_CONVERT_EXPR<vector(2) long int>(vect__ifc__25.16_62); vect_patt_32.18_64 = vect_patt_31.17_63 << 1; vect_patt_33.19_65 = vect_patt_32.18_64 >> 9; gcc/ChangeLog: PR tree-optimization/110557 * tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Ensure the output sign-extended if necessary. gcc/testsuite/ChangeLog: PR tree-optimization/110557 * g++.dg/vect/pr110557.cc: New test.
2023-07-10i386: Add new insvti_lowpart_1 and insvdi_lowpart_1 patterns.Roger Sayle3-0/+92
This patch implements another of Uros' suggestions, to investigate a insvti_lowpart_1 pattern to improve TImode parameter passing on x86_64. In PR 88873, the RTL the middle-end expands for passing V2DF in TImode is subtly different from what it does for V2DI in TImode, sufficiently so that my explanations for why insvti_lowpart_1 isn't required don't apply in this case. This patch adds an insvti_lowpart_1 pattern, complementing the existing insvti_highpart_1 pattern, and also a 32-bit variant, insvdi_lowpart_1. Because the middle-end represents 128-bit constants using CONST_WIDE_INT and 64-bit constants using CONST_INT, it's easiest to treat these as different patterns, rather than attempt <dwi> parameterization. This patch also includes a peephole2 (actually a pair) to transform xchg instructions into mov instructions, when one of the destinations is unused. This optimization is required to produce the optimal code sequences below. For the 64-bit case: __int128 foo(__int128 x, unsigned long long y) { __int128 m = ~((__int128)~0ull); __int128 t = x & m; __int128 r = t | y; return r; } Before: xchgq %rdi, %rsi movq %rdx, %rax xorl %esi, %esi xorl %edx, %edx orq %rsi, %rax orq %rdi, %rdx ret After: movq %rdx, %rax movq %rsi, %rdx ret For the 32-bit case: long long bar(long long x, int y) { long long mask = ~0ull << 32; long long t = x & mask; long long r = t | (unsigned int)y; return r; } Before: pushl %ebx movl 12(%esp), %edx xorl %ebx, %ebx xorl %eax, %eax movl 16(%esp), %ecx orl %ebx, %edx popl %ebx orl %ecx, %eax ret After: movl 12(%esp), %eax movl 8(%esp), %edx ret 2023-07-10 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.md (peephole2): Transform xchg insn with a REG_UNUSED note to a (simple) move. (*insvti_lowpart_1): New define_insn_and_split. (*insvdi_lowpart_1): Likewise. gcc/testsuite/ChangeLog * gcc.target/i386/insvdi_lowpart-1.c: New test case. * gcc.target/i386/insvti_lowpart-1.c: Likewise.
2023-07-10i386: Add AVX512 support for STV of SI/DImode rotation by constant.Roger Sayle2-1/+42
Following Uros' suggestion, this patch adds support for AVX512VL's vpro[lr][dq] instructions to the recently added scalar-to-vector (STV) enhancements to handle DImode and SImode rotations by a constant. For the test cases: unsigned long long rot1(unsigned long long x) { return (x>>1) | (x<<63); } void mem1(unsigned long long *p) { *p = rot1(*p); } with -m32 -O2 -mavx512vl, we currently generate: rot1: movl 4(%esp), %eax movl 8(%esp), %edx movl %eax, %ecx shrdl $1, %edx, %eax shrdl $1, %ecx, %edx ret mem1: movl 4(%esp), %eax vmovq (%eax), %xmm0 vpshufd $20, %xmm0, %xmm0 vpsrlq $1, %xmm0, %xmm0 vpshufd $136, %xmm0, %xmm0 vmovq %xmm0, (%eax) ret with this patch, we now generate: rot1: vmovq 4(%esp), %xmm0 vprorq $1, %xmm0, %xmm0 vmovd %xmm0, %eax vpextrd $1, %xmm0, %edx ret mem1: movl 4(%esp), %eax vmovq (%eax), %xmm0 vprorq $1, %xmm0, %xmm0 vmovq %xmm0, (%eax) ret 2023-07-10 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-features.cc (compute_convert_gain): Tweak gains/costs for ROTATE/ROTATERT by integer constant on AVX512VL. (general_scalar_chain::convert_rotate): On TARGET_AVX512F generate avx512vl_rolv2di or avx412vl_rolv4si when appropriate. gcc/testsuite/ChangeLog * gcc.target/i386/avx512vl-stv-rotatedi-1.c: New test case.
2023-07-10d: Merge upstream dmd, druntime 17ccd12af3, phobos 8d3800bee.Iain Buclaw147-902/+1822
D front-end changes: - Import dmd v2.104.0. - Assignment-style syntax is now allowed for `alias this'. - Overloading `extern(C)' functions is now an error. D runtime changes: - Import druntime v2.104.0. Phobos changes: - Import phobos v2.104.0. - Better static assert messages when instantiating `std.algorithm.iteration.permutations' with wrong inputs. - Added `std.system.instructionSetArchitecture' and `std.system.ISA'. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd 17ccd12af3. * dmd/VERSION: Bump version to v2.104.0. * Make-lang.in (D_FRONTEND_OBJS): Rename d/apply.o to d/postordervisitor.o. * d-codegen.cc (make_location_t): Update for new front-end interface. (build_filename_from_loc): Likewise. (build_assert_call): Likewise. (build_array_bounds_call): Likewise. (build_bounds_index_condition): Likewise. (build_bounds_slice_condition): Likewise. (build_frame_type): Likewise. (get_frameinfo): Likewise. * d-diagnostic.cc (d_diagnostic_report_diagnostic): Likewise. * decl.cc (build_decl_tree): Likewise. (start_function): Likewise. * expr.cc (ExprVisitor::visit (NewExp *)): Replace code generation of `new pointer' with front-end lowering. * runtime.def (NEWITEMT): Remove. (NEWITEMIT): Remove. * toir.cc (IRVisitor::visit (LabelStatement *)): Update for new front-end interface. * typeinfo.cc (check_typeinfo_type): Likewise. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime 17ccd12af3. * src/MERGE: Merge upstream phobos 8d3800bee. gcc/testsuite/ChangeLog: * gdc.dg/asm4.d: Update test.
2023-07-10Add pre_reload splitter to detect fp min/max pattern.liuhongt3-0/+154
We have ix86_expand_sse_fp_minmax to detect min/max sematics, but it requires rtx_equal_p for cmp_op0/cmp_op1 and if_true/if_false, for the testcase in the PR, there's an extra move from cmp_op0 to if_true, and it failed ix86_expand_sse_fp_minmax. This patch adds pre_reload splitter to detect the min/max pattern. Operands order in MINSS matters for signed zero and NANs, since the instruction always returns second operand when any operand is NAN or both operands are zero. gcc/ChangeLog: PR target/110170 * config/i386/i386.md (*ieee_max<mode>3_1): New pre_reload splitter to detect fp max pattern. (*ieee_min<mode>3_1): Ditto, but for fp min pattern. gcc/testsuite/ChangeLog: * g++.target/i386/pr110170.C: New test. * gcc.target/i386/pr110170.c: New test.
2023-07-10Daily bump.GCC Administrator5-1/+40
2023-07-09d: Merge upstream dmd, druntime 28a3b24c2e, phobos 8ab95ded5.Iain Buclaw190-3215/+6538
D front-end changes: - Import dmd v2.104.0-beta.1. - Better error message when attribute inference fails down the call stack. - Using `;' as an empty statement has been turned into an error. - Using `in' parameters with non- `extern(D)' or `extern(C++)' functions is deprecated. - `in ref' on parameters has been deprecated in favor of `-preview=in'. - Throwing `immutable', `const', `inout', and `shared' qualified objects is now deprecated. - User Defined Attributes now parse Template Arguments. D runtime changes: - Import druntime v2.104.0-beta.1. Phobos changes: - Import phobos v2.104.0-beta.1. - Better static assert messages when instantiating `std.algorithm.comparison.clamp' with wrong inputs. - `std.typecons.Rebindable' now supports all types. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd 28a3b24c2e. * dmd/VERSION: Bump version to v2.104.0-beta.1. * d-codegen.cc (build_bounds_slice_condition): Update for new front-end interface. * d-lang.cc (d_init_options): Likewise. (d_handle_option): Likewise. (d_post_options): Initialize global.compileEnv. * expr.cc (ExprVisitor::visit (CatExp *)): Replace code generation with new front-end lowering. (ExprVisitor::visit (LoweredAssignExp *)): New method. (ExprVisitor::visit (StructLiteralExp *)): Don't generate static initializer symbols for structs defined in C sources. * runtime.def (ARRAYCATT): Remove. (ARRAYCATNTX): Remove. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime 28a3b24c2e. * src/MERGE: Merge upstream phobos 8ab95ded5. gcc/testsuite/ChangeLog: * gdc.dg/rtti1.d: Move array concat testcase to ... * gdc.dg/nogc1.d: ... here. New test.
2023-07-09Improve dumping of profile_countJan Hubicka3-6/+6
Dumps of profile_counts are quite hard to interpret since they are 64bit fixed point values. In many cases one looks at a single function and it is better to think of basic block frequency, that is how many times it is executed each invocatoin. This patch makes CFG dumps to also print this info. For example: main() { for (int i = 0; i < 10; i++) t(); } the -fdump-tree-optimized-blocks-details now prints: int main () { unsigned int ivtmp_1; unsigned int ivtmp_2; ;; basic block 2, loop depth 0, count 97603128 (estimated locally, freq 1.0000), maybe hot ;; prev block 0, next block 3, flags: (NEW, VISITED) ;; pred: ENTRY [always] count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE) ;; succ: 3 [always] count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE) ;; basic block 3, loop depth 1, count 976138697 (estimated locally, freq 10.0011), maybe hot ;; prev block 2, next block 4, flags: (NEW, VISITED) ;; pred: 3 [90.0% (guessed)] count:878535568 (estimated locally, freq 9.0011) (TRUE_VALUE,EXECUTABLE) ;; 2 [always] count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE) # ivtmp_2 = PHI <ivtmp_1(3), 10(2)> t (); ivtmp_1 = ivtmp_2 + 4294967295; if (ivtmp_1 != 0) goto <bb 3>; [90.00%] else goto <bb 4>; [10.00%] ;; succ: 3 [90.0% (guessed)] count:878535568 (estimated locally, freq 9.0011) (TRUE_VALUE,EXECUTABLE) ;; 4 [10.0% (guessed)] count:97603129 (estimated locally, freq 1.0000) (FALSE_VALUE,EXECUTABLE) ;; basic block 4, loop depth 0, count 97603128 (estimated locally, freq 1.0000), maybe hot ;; prev block 3, next block 1, flags: (NEW, VISITED) ;; pred: 3 [10.0% (guessed)] count:97603129 (estimated locally, freq 1.0000) (FALSE_VALUE,EXECUTABLE) return 0; ;; succ: EXIT [always] count:97603128 (estimated locally, freq 1.0000) (EXECUTABLE) } Which makes it easier to see that the inner bb is executed 10 times per invocation gcc/ChangeLog: * cfg.cc (check_bb_profile): Dump counts with relative frequency. (dump_edge_info): Likewise. (dump_bb_info): Likewise. * profile-count.cc (profile_count::dump): Add comma between quality and freq. gcc/testsuite/ChangeLog: * gcc.dg/predict-22.c: Update template.
2023-07-09Daily bump.GCC Administrator4-1/+133
2023-07-08Add missing profile_dump checkJan Hubicka2-3/+10
gcc/ChangeLog: PR tree-optimization/110600 * cfgloopmanip.cc (scale_loop_profile): Add mising profile_dump check. gcc/testsuite/ChangeLog: PR tree-optimization/110600 * gcc.c-torture/compile/pr110600.c: New test.
2023-07-08Fortran: Fix default type bugs in gfortran [PR99139, PR99368]Paul Thomas4-5/+80
2023-07-08 Steve Kargl <sgk@troutmask.apl.washington.edu> gcc/fortran PR fortran/99139 PR fortran/99368 * match.cc (gfc_match_namelist): Check for host associated or defined types before applying default type. (gfc_match_select_rank): Apply default type to selector of unknown type if possible. * resolve.cc (resolve_fl_variable): Do not apply local default initialization to assumed rank entities. gcc/testsuite/ PR fortran/99139 * gfortran.dg/pr99139.f90 : New test PR fortran/99368 * gfortran.dg/pr99368.f90 : New test
2023-07-08Fix tree-ssa/update-cunroll.cJan Hubicka3-7/+54
In this testcase the profile is misupdated before loop has two exits. The first exit is one eliminated by complete unrolling while second exit remains. We remove first exit but forget about fact that the source BB of other exit will then have higher frequency making other exit more likely. This patch fixes that in duplicate_loop_body_to_header_edge. While looking into resulting profiles I also noticed that in some cases scale_loop_profile may drop probabilities to 0 incorrectly either when trying to update exit from nested loop (which has similar problem) or when the profile was inconsistent as described in coment bellow. gcc/ChangeLog: PR middle-end/110590 * cfgloopmanip.cc (scale_loop_profile): Avoid scaling exits within inner loops and be more careful about inconsistent profiles. (duplicate_loop_body_to_header_edge): Fix profile update when eliminated exit is followed by other exit. gcc/testsuite/ChangeLog: PR middle-end/110590 * gcc.dg/tree-prof/update-cunroll-2.c: Remove xfail. * gcc.dg/tree-ssa/update-cunroll.c: Likewise.
2023-07-08Fortran: fixes for procedures with ALLOCATABLE,INTENT(OUT) arguments [PR92178]Harald Anlauf4-5/+215
gcc/fortran/ChangeLog: PR fortran/92178 * trans-expr.cc (gfc_conv_procedure_call): Check procedures for allocatable dummy arguments with INTENT(OUT) and move deallocation of actual arguments after evaluation of argument expressions before the procedure is executed. gcc/testsuite/ChangeLog: PR fortran/92178 * gfortran.dg/intent_out_16.f90: New test. * gfortran.dg/intent_out_17.f90: New test. * gfortran.dg/intent_out_18.f90: New test. Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
2023-07-08Fortran: simplification of FINDLOC for constant complex arguments [PR110585]Harald Anlauf2-0/+24
gcc/fortran/ChangeLog: PR fortran/110585 * arith.cc (gfc_compare_expr): Handle equality comparison of constant complex gfc_expr arguments. gcc/testsuite/ChangeLog: PR fortran/110585 * gfortran.dg/findloc_9.f90: New test.
2023-07-08cprop: Change return type of predicate functions from int to boolUros Bizjak1-52/+61
Also change some internal variables from int to bool. gcc/ChangeLog: * cprop.cc (reg_available_p): Change return type from int to bool. (reg_not_set_p): Ditto. (try_replace_reg): Ditto. Change "success" variable to bool. (cprop_jump): Change return type from int to void and adjust function body accordingly. (constprop_register): Ditto. (cprop_insn): Ditto. Change "changed" variable to bool. (local_cprop_pass): Change return type from int to void and adjust function body accordingly. (bypass_block): Ditto. Change "change", "may_be_loop_header" and "removed_p" variables to bool. (bypass_conditional_jumps): Change return type from int to void and adjust function body accordingly. Change "changed" variable to bool. (one_cprop_pass): Ditto.
2023-07-08gcse: Change return type of predicate functions from int to boolUros Bizjak1-115/+119
Also change some internal variables and function arguments from int to bool. gcc/ChangeLog: * gcse.cc (expr_equiv_p): Change return type from int to bool. (oprs_unchanged_p): Change return type from int to void and adjust function body accordingly. (oprs_anticipatable_p): Ditto. (oprs_available_p): Ditto. (insert_expr_in_table): Ditto. Change "antic_p" and "avail_p" arguments to bool. Change "found" variable to bool. (load_killed_in_block_p): Change return type from int to void and adjust function body accordingly. Change "avail_p" argument to bool. (pre_expr_reaches_here_p): Change return type from int to void and adjust function body accordingly. (pre_delete): Ditto. Change "changed" variable to bool. (pre_gcse): Change return type from int to void and adjust function body accordingly. Change "did_insert" and "changed" variables to bool. (one_pre_gcse_pass): Change return type from int to void and adjust function body accordingly. Change "changed" variable to bool. (should_hoist_expr_to_dom): Change return type from int to void and adjust function body accordingly. Change "visited_allocated_locally" variable to bool. (hoist_code): Change return type from int to void and adjust function body accordingly. Change "changed" variable to bool. (one_code_hoisting_pass): Ditto. (pre_edge_insert): Change return type from int to void and adjust function body accordingly. Change "did_insert" variable to bool. (pre_expr_reaches_here_p_work): Change return type from int to void and adjust function body accordingly. (simple_mem): Ditto. (want_to_gcse_p): Change return type from int to void and adjust function body accordingly. (can_assign_to_reg_without_clobbers_p): Update function body for bool return type. (hash_scan_set): Change "antic_p" and "avail_p" variables to bool. (pre_insert_copies): Change "added_copy" variable to bool.
2023-07-08doc: Fix typos in Warning Options [PR110596]Jonathan Wakely1-2/+2
gcc/ChangeLog: PR c++/110595 PR c++/110596 * doc/invoke.texi (Warning Options): Fix typos.
2023-07-08Daily bump.GCC Administrator7-1/+238
2023-07-07Dump profile_count along with relative frequencyJan Hubicka2-6/+12
gcc/ChangeLog: * profile-count.cc (profile_count::dump): Add FUN parameter; print relative frequency. (profile_count::debug): Update. * profile-count.h (profile_count::dump): Update prototype.
2023-07-07Fix fallout from re-enabling profile consistency checks.Jan Hubicka5-9/+9
gcc/testsuite/ChangeLog: * gcc.dg/pr43864-2.c: Avoid matching pre dump with details-blocks. * gcc.dg/pr43864-3.c: Likewise. * gcc.dg/pr43864-4.c: Likewise. * gcc.dg/pr43864.c: Likewise. * gcc.dg/unroll-7.c: xfail.
2023-07-07Collect both user and kernel events for autofdo tests and autoprofiledbootstrapEugene Rozenfeld3-3/+3
When we collect just user events for autofdo with lbr we get some events where branch sources are kernel addresses and branch targets are user addresses. Without kernel MMAP events create_gcov can't make sense of kernel addresses. Currently create_gcov fails if it can't map at least 95% of events. We sometimes get below this threshold with just user events. The change is to collect both user events and kernel events. Tested on x86_64-pc-linux-gnu. ChangeLog: * Makefile.in: Collect both kernel and user events for autofdo * Makefile.tpl: Collect both kernel and user events for autofdo gcc/testsuite/ChangeLog: * lib/target-supports.exp: Collect both kernel and user events for autofdo
2023-07-07i386: Improve __int128 argument passing (in ix86_expand_move).Roger Sayle3-0/+46
Passing 128-bit integer (TImode) parameters on x86_64 can sometimes result in surprising code. Consider the example below (from PR 43644): unsigned __int128 foo(unsigned __int128 x, unsigned long long y) { return x+y; } which currently results in 6 consecutive movq instructions: foo: movq %rsi, %rax movq %rdi, %rsi movq %rdx, %rcx movq %rax, %rdi movq %rsi, %rax movq %rdi, %rdx addq %rcx, %rax adcq $0, %rdx ret The underlying issue is that during RTL expansion, we generate the following initial RTL for the x argument: (insn 4 3 5 2 (set (reg:TI 85) (subreg:TI (reg:DI 86) 0)) "pr43644-2.c":5:1 -1 (nil)) (insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8) (reg:DI 87)) "pr43644-2.c":5:1 -1 (nil)) (insn 6 5 7 2 (set (reg/v:TI 84 [ x ]) (reg:TI 85)) "pr43644-2.c":5:1 -1 (nil)) which by combine/reload becomes (insn 25 3 22 2 (set (reg/v:TI 84 [ x ]) (const_int 0 [0])) "pr43644-2.c":5:1 -1 (nil)) (insn 22 25 23 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 0) (reg:DI 93)) "pr43644-2.c":5:1 90 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 93) (nil))) (insn 23 22 28 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 8) (reg:DI 94)) "pr43644-2.c":5:1 90 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 94) (nil))) where the heavy use of SUBREG SET_DESTs creates challenges for both combine and register allocation. The improvement proposed here is to avoid these problematic SUBREGs by adding (two) special cases to ix86_expand_move. For insn 4, which sets a TImode destination from a paradoxical SUBREG, to assign the lowpart, we can use an explicit zero extension (zero_extendditi2 was added in July 2022), and for insn 5, which sets the highpart of a TImode register we can use the *insvti_highpart_1 instruction (that was added in May 2023, after being approved for stage1 in January). This allows combine to work its magic, merging these insns into a *concatditi3 and from there into other optimized forms. So for the test case above, we now generate only a single movq: foo: movq %rdx, %rax xorl %edx, %edx addq %rdi, %rax adcq %rsi, %rdx ret But there is a little bad news. This patch causes two (minor) missed optimization regressions on x86_64; gcc.target/i386/pr82580.c and gcc.target/i386/pr91681-1.c. As shown in the test case above, we're no longer generating adcq $0, but instead using xorl. For the other FAIL, register allocation now has more freedom and is (arbitrarily) choosing a register assignment that doesn't match what the test is expecting. These issues are easier to explain and fix once this patch is in the tree. The good news is that this approach fixes a number of long standing issues, that need to checked in bugzilla, including PR target/110533 which was just opened/reported earlier this week. 2023-07-07 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/43644 PR target/110533 * config/i386/i386-expand.cc (ix86_expand_move): Convert SETs of TImode destinations from paradoxical SUBREGs (setting the lowpart) into explicit zero extensions. Use *insvti_highpart_1 instruction to set the highpart of a TImode destination. gcc/testsuite/ChangeLog PR target/43644 PR target/110533 * gcc.target/i386/pr110533.c: New test case. * gcc.target/i386/pr43644-2.c: Likewise.
2023-07-07d: Fix PR 108842: Cannot use enum array with -fno-druntimeIain Buclaw4-15/+45
Restrict the generating of CONST_DECLs for D manifest constants to just scalars without pointers. It shouldn't happen that a reference to a manifest constant has not been expanded within a function body during codegen, but it has been found to occur in older versions of the D front-end (PR98277), so if the decl of a non-scalar constant is requested, just return its initializer as an expression. PR d/108842 gcc/d/ChangeLog: * decl.cc (DeclVisitor::visit (VarDeclaration *)): Only emit scalar manifest constants. (get_symbol_decl): Don't generate CONST_DECL for non-scalar manifest constants. * imports.cc (ImportVisitor::visit (VarDeclaration *)): New method. gcc/testsuite/ChangeLog: * gdc.dg/pr98277.d: Add more tests. * gdc.dg/pr108842.d: New test.
2023-07-07Simplify force_edge_cold.Jan Hubicka1-14/+9
gcc/ChangeLog: * predict.cc (force_edge_cold): Use set_edge_probability_and_rescale_others; improve dumps.
2023-07-07Fix some profile consistency testcasesJan Hubicka25-29/+33
Information about profile mismatches is printed only with -details-blocks for some time. I think it should be printed even with default to make it easier to spot when someone introduces new transform that breaks the profile, but I will send separate RFC for that. This patch enables details in all testcases that greps for Invalid sum. There are 4 testcases which fails: gcc.dg/tree-ssa/loop-ch-profile-1.c here the problem is that loop header dulication introduces loop invariant conditoinal that is later updated by tree-ssa-dom but dom does not take care of updating profile. Since loop-ch knows when it duplicates loop invariant, we may be able to get this right. The test is still useful since it tests that right after ch profile is consistent. gcc.dg/tree-prof/update-cunroll-2.c This is about profile updating code in duplicate_loop_body_to_header_edge being wrong when optimized out exit is not last in the loop. In that case the probability of later exits needs to be accounted in. I will think about making this better - in general this does not seem to have easy solution, but for special case of chained tests we can definitely account for the later exits. gcc.dg/tree-ssa/update-unroll-1.c This fails after aprefetch invoked unrolling. I did not look into details yet. gcc.dg/tree-prof/update-unroll-2.c This one seems similar as previous I decided to xfail these tests and deal with them incrementally and filled in PR110590. gcc/testsuite/ChangeLog: * g++.dg/tree-prof/indir-call-prof.C: Add block-details to dump flags. * gcc.dg/pr43864-2.c: Likewise. * gcc.dg/pr43864-3.c: Likewise. * gcc.dg/pr43864-4.c: Likewise. * gcc.dg/pr43864.c: Likewise. * gcc.dg/tree-prof/cold_partition_label.c: Likewise. * gcc.dg/tree-prof/indir-call-prof.c: Likewise. * gcc.dg/tree-prof/update-cunroll-2.c: Likewise. * gcc.dg/tree-prof/update-tailcall.c: Likewise. * gcc.dg/tree-prof/val-prof-1.c: Likewise. * gcc.dg/tree-prof/val-prof-2.c: Likewise. * gcc.dg/tree-prof/val-prof-3.c: Likewise. * gcc.dg/tree-prof/val-prof-4.c: Likewise. * gcc.dg/tree-prof/val-prof-5.c: Likewise. * gcc.dg/tree-ssa/fnsplit-1.c: Likewise. * gcc.dg/tree-ssa/loop-ch-profile-2.c: Likewise. * gcc.dg/tree-ssa/update-threading.c: Likewise. * gcc.dg/tree-ssa/update-unswitch-1.c: Likewise. * gcc.dg/unroll-7.c: Likewise. * gcc.dg/unroll-8.c: Likewise. * gfortran.dg/pr25623-2.f90: Likewise. * gfortran.dg/pr25623.f90: Likewise. * gcc.dg/tree-ssa/loop-ch-profile-1.c: Likewise; xfail. * gcc.dg/tree-ssa/update-cunroll.c: Likewise; xfail. * gcc.dg/tree-ssa/update-unroll-1.c: Likewise; xfail.
2023-07-07Fix epilogue loop profileJan Hubicka3-5/+21
Fix two bugs in scale_loop_profile which crept in during my cleanups and curiously enoug did not show on the testcases we have so far. The patch also adds the missing call to cap iteration count of the vectorized loop epilogues. Vectorizer profile needs more work, but I am trying to chase out obvious bugs first so the profile quality statistics become meaningful and we can try to improve on them. Now we get: Pass dump id and name |static mismatcdynamic mismatch |in count |in count 107t cunrolli | 3 +3| 17251 +17251 116t vrp | 5 +2| 30908 +16532 118t dce | 3 -2| 17251 -13657 127t ch | 13 +10| 17251 131t dom | 39 +26| 17251 133t isolate-paths | 47 +8| 17251 134t reassoc | 49 +2| 17251 136t forwprop | 53 +4| 202501 +185250 159t cddce | 61 +8| 216211 +13710 161t ldist | 62 +1| 216211 172t ifcvt | 66 +4| 373711 +157500 173t vect | 143 +77| 9801947 +9428236 176t cunroll | 149 +6| 12006408 +2204461 183t loopdone | 146 -3| 11944469 -61939 195t fre | 142 -4| 11944469 197t dom | 141 -1| 13038435 +1093966 199t threadfull | 143 +2| 13246410 +207975 200t vrp | 145 +2| 13444579 +198169 204t dce | 143 -2| 13371315 -73264 206t sink | 141 -2| 13371315 211t cddce | 147 +6| 13372755 +1440 255t optimized | 145 -2| 13372755 256r expand | 141 -4| 13371197 -1558 258r into_cfglayout | 139 -2| 13371197 275r loop2_unroll | 143 +4| 16792056 +3420859 291r ce2 | 141 -2| 16811462 312r pro_and_epilogue | 161 +20| 16873400 +61938 315r jump2 | 167 +6| 20910158 +4036758 323r bbro | 160 -7| 16559844 -4350314 Vect still introduces 77 profile mismatches (same as without this patch) however subsequent cunroll works much better with 6 new mismatches compared to 78. Overall it reduces 229 mismatches to 160. Also overall runtime estimate is now reduced by 6.9%. Previously the overall runtime estimate grew by 11% which was result of the fat that the epilogue profile was pretty much the same as profile of the original loop. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * cfgloopmanip.cc (scale_loop_profile): Fix computation of count_in and scaling blocks after exit. * tree-vect-loop-manip.cc (vect_do_peeling): Scale loop profile of the epilogue if bound is known. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/vect-profile-upate.c: New test.
2023-07-07IBM Z: Fix vec_init default expanderJuergen Christ2-5/+23
Do not reinitialize vector lanes to zero since they are already initialized to zero. gcc/ChangeLog: * config/s390/s390.cc (vec_init): Fix default case gcc/testsuite/ChangeLog: * gcc.target/s390/vector/vec-init-3.c: New test.
2023-07-07LRA: Refine reload pseudo classVladimir N. Makarov3-55/+104
For given testcase a reload pseudo happened to occur only in reload insns created on one constraint sub-pass. Therefore its initial class (ALL_REGS) was not refined and the reload insns were not processed on the next constraint sub-passes. This resulted into the wrong insn. PR rtl-optimization/110372 gcc/ChangeLog: * lra-assigns.cc (assign_by_spills): Add reload insns involving reload pseudos with non-refined class to be processed on the next sub-pass. * lra-constraints.cc (enough_allocatable_hard_regs_p): New func. (in_class_p): Use it. (print_curr_insn_alt): New func. (process_alt_operands): Use it. Improve debug info. (curr_insn_transform): Use print_curr_insn_alt. Refine reload pseudo class if it is not refined yet. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110372.c: New.
2023-07-07A singleton irange has all known bits.Aldy Hernandez1-1/+18
gcc/ChangeLog: * value-range.cc (irange::get_bitmask_from_range): Return all the known bits for a singleton. (irange::set_range_from_bitmask): Set a range of a singleton when all bits are known.
2023-07-07The caller to irange::intersect (wide_int, wide_int) must normalize the range.Aldy Hernandez1-2/+5
Per the function comment, the caller to intersect(wide_int, wide_int) must handle the mask. This means it must also normalize the range if anything changed. gcc/ChangeLog: * value-range.cc (irange::intersect): Leave normalization to caller.
2023-07-07Implement value/mask tracking for irange.Aldy Hernandez11-122/+351
Integer ranges (irange) currently track known 0 bits. We've wanted to track known 1 bits for some time, and instead of tracking known 0 and known 1's separately, it has been suggested we track a value/mask pair similarly to what we do for CCP and RTL. This patch implements such a thing. With this we now track a VALUE integer which are the known values, and a MASK which tells us which bits contain meaningful information. This allows us to fix a handful of enhancement requests, such as PR107043 and PR107053. There is a 4.48% performance penalty for VRP and 0.42% in overall compilation for this entire patchset. It is expected and in line with the loss incurred when we started tracking known 0 bits. This patch just provides the value/mask tracking support. All the nonzero users (range-op, IPA, CCP, etc), are still using the nonzero nomenclature. For that matter, this patch reimplements the nonzero accessors with the value/mask functionality. In follow-up patches I will enhance these passes to use the value/mask information, and fix the aforementioned PRs. gcc/ChangeLog: * data-streamer-in.cc (streamer_read_value_range): Adjust for value/mask. * data-streamer-out.cc (streamer_write_vrange): Same. * range-op.cc (operator_cast::fold_range): Same. * value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks): Same. * value-range-storage.cc (irange_storage::write_lengths_address): Same. (irange_storage::set_irange): Same. (irange_storage::get_irange): Same. (irange_storage::size): Same. (irange_storage::dump): Same. * value-range-storage.h: Same. * value-range.cc (debug): New. (irange_bitmask::dump): New. (add_vrange): Adjust for value/mask. (irange::operator=): Same. (irange::set): Same. (irange::verify_range): Same. (irange::operator==): Same. (irange::contains_p): Same. (irange::irange_single_pair_union): Same. (irange::union_): Same. (irange::intersect): Same. (irange::invert): Same. (irange::get_nonzero_bits_from_range): Rename to... (irange::get_bitmask_from_range): ...this. (irange::set_range_from_nonzero_bits): Rename to... (irange::set_range_from_bitmask): ...this. (irange::set_nonzero_bits): Rename to... (irange::update_bitmask): ...this. (irange::get_nonzero_bits): Rename to... (irange::get_bitmask): ...this. (irange::intersect_nonzero_bits): Rename to... (irange::intersect_bitmask): ...this. (irange::union_nonzero_bits): Rename to... (irange::union_bitmask): ...this. (irange_bitmask::verify_mask): New. * value-range.h (class irange_bitmask): New. (irange_bitmask::set_unknown): New. (irange_bitmask::unknown_p): New. (irange_bitmask::irange_bitmask): New. (irange_bitmask::get_precision): New. (irange_bitmask::get_nonzero_bits): New. (irange_bitmask::set_nonzero_bits): New. (irange_bitmask::operator==): New. (irange_bitmask::union_): New. (irange_bitmask::intersect): New. (class irange): Friend vrange_printer. (irange::varying_compatible_p): Adjust for bitmask. (irange::set_varying): Same. (irange::set_nonzero): Same. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr107009.c: Adjust irange dumping for value/mask changes. * gcc.dg/tree-ssa/vrp-unreachable.c: Same. * gcc.dg/tree-ssa/vrp122.c: Same.
2023-07-07x86: slightly correct / simplify *vec_extractv2tiJan Beulich1-1/+1
V2TImode values cannot appear in the upper 16 YMM registers without AVX512VL being enabled. Therefore forcing 512-bit mode (also not reflected in the "mode" attribute) is pointless. gcc/ * config/i386/sse.md (*vec_extractv2ti): Drop g modifiers.
2023-07-07x86: correct / simplify @vec_extract_hi_<mode> and vec_extract_hi_v32qiJan Beulich1-12/+10
The middle alternative each was unusable without enabling AVX512DQ (in addition to AVX512VL), which is entirely unrelated here. The last alternative is usable with AVX512VL only (due to type restrictions on what may be put in the upper 16 YMM registers), and hence is pointlessly forcing 512-bit mode (without actually reflecting that in the "mode" attribute). gcc/ * config/i386/sse.md (@vec_extract_hi_<mode>): Drop last alternative. Switch new last alternative's "isa" attribute to "avx512vl". (vec_extract_hi_v32qi): Likewise.
2023-07-07Closing the GCC 10 branchRichard Biener2-2/+1
contrib/ * gcc-changelog/git_update_version.py: Remove GCC 10 from active_refs. maintainer-scripts/ * crontab: Remove entry for GCC 10.
2023-07-07RISC-V: Fix one bug for floating-point static frmPan Li2-5/+53
This patch would like to fix one bug to align below items of spec. RVV floating-point instructions always (implicitly) use the dynamic rounding mode. This implies that rounding is performed according to the rounding mode set in the FRM register. The FRM register itself only holds proper rounding modes and never the dynamic rounding mode. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-By: Robin Dapp <rdapp@ventanamicro.com> gcc/ChangeLog: * config/riscv/riscv.cc (riscv_emit_mode_set): Avoid emit insn when FRM_MODE_DYN. (riscv_mode_entry): Take FRM_MODE_DYN as entry mode. (riscv_mode_exit): Likewise for exit mode. (riscv_mode_needed): Likewise for needed mode. (riscv_mode_after): Likewise for after mode. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/float-point-frm-insert-6.c: New test.
2023-07-07RISC-V: Fix one typo of FRM dynamic definitionPan Li1-2/+2
This patch would like to fix one typo that take rdn instead of dyn by mistake. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/vector.md: Fix typo.
2023-07-07Daily bump.GCC Administrator7-1/+308
2023-07-06libstdc++: Fix fwrite error parameterTianqiang Shuai1-1/+1
The first parameter of fwrite should be the const char* __s which want write to FILE *__file, rather than the FILE *__file write to the FILE *__file. libstdc++-v3/ChangeLog: * config/io/basic_file_stdio.cc (xwrite) [USE_STDIO_PURE]: Fix first argument.
2023-07-06Improve profile updates after loop-ch and cunrollJan Hubicka3-0/+27
Extend loop-ch and loop unrolling to fix profile in case the loop is known to not iterate at all (or iterate few times) while profile claims it iterates more. While this is kind of symptomatic fix, it is best we can do incase profile was originally esitmated incorrectly. In the testcase the problematic loop is produced by vectorizer and I think vectorizer should know and account into its costs that vectorizer loop and/or epilogue is not going to loop after the transformation. So it would be nice to fix it on that side, too. The patch avoids about half of profile mismatches caused by cunroll. Pass dump id and name |static mismatcdynamic mismatch |in count |in count 107t cunrolli | 3 +3| 17251 +17251 115t threadfull | 3 | 14376 -2875 116t vrp | 5 +2| 30908 +16532 117t dse | 5 | 30908 118t dce | 3 -2| 17251 -13657 127t ch | 13 +10| 17251 131t dom | 39 +26| 17251 133t isolate-paths | 47 +8| 17251 134t reassoc | 49 +2| 17251 136t forwprop | 53 +4| 202501 +185250 159t cddce | 61 +8| 216211 +13710 161t ldist | 62 +1| 216211 172t ifcvt | 66 +4| 373711 +157500 173t vect | 143 +77| 9802097 +9428386 176t cunroll | 221 +78| 15639591 +5837494 183t loopdone | 218 -3| 15577640 -61951 195t fre | 214 -4| 15577640 197t dom | 213 -1| 16671606 +1093966 199t threadfull | 215 +2| 16879581 +207975 200t vrp | 217 +2| 17077750 +198169 204t dce | 215 -2| 17004486 -73264 206t sink | 213 -2| 17004486 211t cddce | 219 +6| 17005926 +1440 255t optimized | 217 -2| 17005926 256r expand | 210 -7| 19571573 +2565647 258r into_cfglayout | 208 -2| 19571573 275r loop2_unroll | 212 +4| 22992432 +3420859 291r ce2 | 210 -2| 23011838 312r pro_and_epilogue | 230 +20| 23073776 +61938 315r jump2 | 236 +6| 27110534 +4036758 323r bbro | 229 -7| 21826835 -5283699 W/o the patch cunroll does: 176t cunroll | 294 +151|126548439 +116746342 and we end up with 291 mismatches at bbro. Bootstrapped/regtested x86_64-linux. Plan to commit it after the scale_loop_frequency patch. gcc/ChangeLog: PR middle-end/25623 * tree-ssa-loop-ch.cc (ch_base::copy_headers): Scale loop frequency to maximal number of iterations determined. * tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely): Likewise. gcc/testsuite/ChangeLog: PR middle-end/25623 * gfortran.dg/pr25623-2.f90: New test.
2023-07-06Improve scale_loop_profileJan Hubicka3-85/+102
Original scale_loop_profile was implemented to only handle very simple loops produced by vectorizer at that time (basically loops with only one exit and no subloops). It also has not been updated to new profile-count API very carefully. The function does two thigs 1) scales down the loop profile by a given probability. This is useful, for example, to scale down profile after peeling when loop body is executed less often than before 2) update profile to cap iteration count by ITERATION_BOUND parameter. I changed ITERATION_BOUND to be actual bound on number of iterations as used elsewhere (i.e. number of executions of latch edge) rather then number of iterations + 1 as it was before. To do 2) one needs to do the following a) scale own loop profile so frquency o header is at most the sum of in-edge counts * (iteration_bound + 1) b) update loop exit probabilities so their count is the same as before scaling. c) reduce frequencies of basic blocks after loop exit old code did b) by setting probability to 1 / iteration_bound which is correctly only of the basic block containing exit executes precisely one per iteration (it is not insie other conditional or inner loop). This is fixed now by using set_edge_probability_and_rescale_others aldo c) was implemented only for special case when the exit was just before latch bacis block. I now use dominance info to get right some of addional case. I still did not try to do anything for multiple exit loops, though the implementatoin could be generalized. Bootstrapped/regtested x86_64-linux. Plan to cmmit it tonight if there are no complains. gcc/ChangeLog: * cfgloopmanip.cc (scale_loop_profile): Rewrite exit edge probability update to be safe on loops with subloops. Make bound parameter to be iteration bound. * tree-ssa-loop-ivcanon.cc (try_peel_loop): Update call of scale_loop_profile. * tree-vect-loop-manip.cc (vect_do_peeling): Likewise.
2023-07-06Vect: use a small step to calculate induction for the unrolled loop (PR ↵Hao Liu OS2-3/+58
tree-optimization/110449) If a loop is unrolled by n times during vectoriation, two steps are used to calculate the induction variable: - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step) - The large step for the whole loop: vec_loop = vec_iv + (VF * Step) This patch calculates an extra vec_n to replace vec_loop: vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop. So that we can save the large step register and related operations. gcc/ChangeLog: PR tree-optimization/110449 * tree-vect-loop.cc (vectorizable_induction): use vec_n to replace vec_loop for the unrolled loop. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr110449.c: New testcase.