aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2024-06-08rs6000: Update ELFv2 stack frame comment showing the correct ROP save locationPeter Bergner1-8/+8
The ELFv2 stack frame layout comment in rs6000-logue.cc shows the ROP hash save slot in the wrong location. Update the comment to show the correct ROP hash save location in the frame. 2024-06-07 Peter Bergner <bergner@linux.ibm.com> gcc/ * config/rs6000/rs6000-logue.cc (rs6000_stack_info): Update comment.
2024-06-08c++: Make *_cast<*> parsing more robust to errors [PR108438]Simon Martin2-1/+10
We ICE upon the following when trying to emit a -Wlogical-not-parentheses warning: === cut here === template <typename T> T foo (T arg, T& ref, T* ptr) { int a = 1; return static_cast<T!>(a); } === cut here === This patch makes *_cast<*> parsing more robust by skipping to the closing '>' upon error in the target type. Successfully tested on x86_64-pc-linux-gnu. PR c++/108438 gcc/cp/ChangeLog: * parser.cc (cp_parser_postfix_expression): Use cp_parser_require_end_of_template_parameter_list to skip to the closing '>' upon error parsing the target type of *_cast<*> expressions. gcc/testsuite/ChangeLog: * g++.dg/parse/crash75.C: New test.
2024-06-08i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]Uros Bizjak2-2/+54
The following testcase: unsigned add_sat(unsigned x, unsigned y) { unsigned z; return __builtin_add_overflow(x, y, &z) ? -1u : z; } currently compiles (-O2) to: add_sat: addl %esi, %edi jc .L3 movl %edi, %eax ret .L3: orl $-1, %eax ret We can expand through usadd{m}3 optab to use carry flag from the addition and generate branchless code using SBB instruction implementing: unsigned res = x + y; res |= -(res < x); add_sat: addl %esi, %edi sbbl %eax, %eax orl %edi, %eax ret PR target/112600 gcc/ChangeLog: * config/i386/i386.md (usadd<mode>3): New expander. (x86_mov<mode>cc_0_m1_neg): Use SWI mode iterator. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112600-a.c: New test.
2024-06-08RISC-V: Implement .SAT_SUB for unsigned scalar intPan Li20-0/+414
As the middle support of .SAT_SUB committed, implement the unsigned scalar int of .SAT_SUB for the riscv backend. Consider below example code: T __attribute__((noinline)) \ sat_u_sub_##T##_fmt_1 (T x, T y) \ { \ return (x - y) & (-(T)(x >= y)); \ } T __attribute__((noinline)) \ sat_u_sub_##T##_fmt_2 (T x, T y) \ { \ return (x - y) & (-(T)(x > y)); \ } DEF_SAT_U_SUB_FMT_1(uint64_t); DEF_SAT_U_SUB_FMT_2(uint64_t); Before this patch: sat_u_sub_uint64_t_fmt_1: bltu a0,a1,.L2 sub a0,a0,a1 ret .L2: li a0,0 ret After this patch: sat_u_sub_uint64_t_fmt_1: sltu a5,a0,a1 addi a5,a5,-1 sub a0,a0,a1 and a0,a5,a0 ret ToDo: Only above 2 forms of .SAT_SUB are support for now, we will support more forms of .SAT_SUB in the middle-end in short future. The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_expand_ussub): Add new func decl for ussub expanding. * config/riscv/riscv.cc (riscv_expand_ussub): Ditto but for impl. * config/riscv/riscv.md (ussub<mode>3): Add new pattern ussub for scalar modes. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test macros and comments. * gcc.target/riscv/sat_u_sub-1.c: New test. * gcc.target/riscv/sat_u_sub-2.c: New test. * gcc.target/riscv/sat_u_sub-3.c: New test. * gcc.target/riscv/sat_u_sub-4.c: New test. * gcc.target/riscv/sat_u_sub-5.c: New test. * gcc.target/riscv/sat_u_sub-6.c: New test. * gcc.target/riscv/sat_u_sub-7.c: New test. * gcc.target/riscv/sat_u_sub-8.c: New test. * gcc.target/riscv/sat_u_sub-run-1.c: New test. * gcc.target/riscv/sat_u_sub-run-2.c: New test. * gcc.target/riscv/sat_u_sub-run-3.c: New test. * gcc.target/riscv/sat_u_sub-run-4.c: New test. * gcc.target/riscv/sat_u_sub-run-5.c: New test. * gcc.target/riscv/sat_u_sub-run-6.c: New test. * gcc.target/riscv/sat_u_sub-run-7.c: New test. * gcc.target/riscv/sat_u_sub-run-8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-06-08analyzer: Restore g++ 4.8 bootstrap; use std::move to return std::unique_ptr.Roger Sayle6-11/+11
This patch restores bootstrap when using g++ 4.8 as a host compiler. Returning a std::unique_ptr requires a std::move on C++ compilers (pre-C++17) that don't guarantee copy elision/return value optimization. 2024-06-08 Roger Sayle <roger@nextmovesoftware.com> gcc/analyzer/ChangeLog * constraint-manager.cc (equiv_class::make_dump_widget): Use std::move to return a std::unique_ptr. (bounded_ranges_constraint::make_dump_widget): Likewise. (constraint_manager::make_dump_widget): Likewise. * program-state.cc (sm_state_map::make_dump_widget): Likewise. (program_state::make_dump_widget): Likewise. * region-model.cc (region_to_value_map::make_dump_widget): Likewise. (region_model::make_dump_widget): Likewise. * region.cc (region::make_dump_widget): Likewise. * store.cc (binding_cluster::make_dump_widget): Likewise. (store::make_dump_widget): Likewise. * svalue.cc (svalue::make_dump_widget): Likewise.
2024-06-08Daily bump.GCC Administrator8-1/+348
2024-06-07analyzer: add logging to get_representative_path_varDavid Malcolm5-35/+109
This was very helpful when debugging the cast_region::m_original_region removal, but is probably too verbose to enable except by hand on specific calls to get_representative_tree. gcc/analyzer/ChangeLog: * engine.cc (impl_region_model_context::on_state_leak): Pass nullptr to get_representative_path_var. * region-model.cc (region_model::get_representative_path_var_1): Add logger param and use it in both overloads. (region_model::get_representative_path_var): Likewise. (region_model::get_representative_tree): Likewise. (selftest::test_get_representative_path_var): Pass nullptr to get_representative_path_var. * region-model.h (region_model::get_representative_tree): Add optional logger param to both overloads. (region_model::get_representative_path_var): Add logger param to both overloads. (region_model::get_representative_path_var_1): Likewise. * store.cc (binding_cluster::get_representative_path_vars): Add logger param and use it. (store::get_representative_path_vars): Likewise. * store.h (binding_cluster::get_representative_path_vars): Add logger param. (store::get_representative_path_vars): Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-06-07analyzer: eliminate cast_region::m_original_regionDavid Malcolm7-85/+29
cast_region had its own field m_original_region, rather than simply using region::m_parent, leading to lots of pointless special-casing of RK_CAST. Remove the field and simply use the parent region. Doing so revealed a bug (seen in gcc.dg/analyzer/taint-alloc-4.c) where region_model::get_representative_path_var_1's RK_CAST case was always failing, due to using the "parent region" (actually that of the original region's parent), rather than the original region; the patch fixes the bug by removing the distinction. gcc/analyzer/ChangeLog: * call-summary.cc (call_summary_replay::convert_region_from_summary_1): Update for removal of cast_region::m_original_region. * region-model-manager.cc (region_model_manager::get_or_create_initial_value): Likewise. * region-model.cc (region_model::get_store_value): Likewise. * region.cc (region::get_base_region): Likewise. (region::descendent_of_p): Likewise. (region::maybe_get_frame_region): Likewise. (region::get_memory_space): Likewise. (region::calc_offset): Likewise. (cast_region::accept): Delete. (cast_region::dump_to_pp): Update for removal of cast_region::m_original_region. (cast_region::add_dump_widget_children): Delete. * region.h (struct cast_region::key_t): Rename "original_region" to "parent". (cast_region::cast_region): Likewise. Update for removal of cast_region::m_original_region. (cast_region::accept): Delete. (cast_region::add_dump_widget_children): Delete. (cast_region::get_original_region): Delete. (cast_region::m_original_region): Delete. * sm-taint.cc (region_model::check_region_for_taint): Remove special-casing for RK_CAST. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/taint-alloc-4.c: Update expected result to reflect change in message due to region_model::get_representative_path_var_1 now handling RK_CAST. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-06-07analyzer: new warning: -Wanalyzer-undefined-behavior-ptrdiff (PR ↵David Malcolm7-2/+293
analyzer/105892) Add a new warning to complain about pointer subtraction involving different chunks of memory. For example, given: #include <stddef.h> int arr[42]; int sentinel; ptrdiff_t test_invalid_calc_of_array_size (void) { return &sentinel - arr; } this emits: demo.c: In function ‘test_invalid_calc_of_array_size’: demo.c:9:20: warning: undefined behavior when subtracting pointers [CWE-469] [-Wanalyzer-undefined-behavior-ptrdiff] 9 | return &sentinel - arr; | ^ events 1-2 │ │ 3 | int arr[42]; │ | ~~~ │ | | │ | (2) underlying object for right-hand side of subtraction created here │ 4 | int sentinel; │ | ^~~~~~~~ │ | | │ | (1) underlying object for left-hand side of subtraction created here │ └──> ‘test_invalid_calc_of_array_size’: event 3 │ │ 9 | return &sentinel - arr; │ | ^ │ | | │ | (3) ⚠️ subtraction of pointers has undefined behavior if they do not point into the same array object │ gcc/analyzer/ChangeLog: PR analyzer/105892 * analyzer.opt (Wanalyzer-undefined-behavior-ptrdiff): New option. * analyzer.opt.urls: Regenerate. * region-model.cc (class undefined_ptrdiff_diagnostic): New. (check_for_invalid_ptrdiff): New. (region_model::get_gassign_result): Call it for POINTER_DIFF_EXPR. gcc/ChangeLog: * doc/invoke.texi: Add -Wanalyzer-undefined-behavior-ptrdiff. gcc/testsuite/ChangeLog: PR analyzer/105892 * c-c++-common/analyzer/out-of-bounds-pr110387.c: Add expected warnings about pointer subtraction. * c-c++-common/analyzer/ptr-subtraction-1.c: New test. * c-c++-common/analyzer/ptr-subtraction-CWE-469-example.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-06-07c++: Handle erroneous DECL_LOCAL_DECL_ALIAS in duplicate_decls [PR107575]Simon Martin2-4/+18
We currently ICE upon the following because we don't properly handle local functions with an error_mark_node as DECL_LOCAL_DECL_ALIAS in duplicate_decls. === cut here === void f (void) { virtual int f (void) const; virtual int f (void); } === cut here === This patch fixes this by checking for error_mark_node. Successfully tested on x86_64-pc-linux-gnu. PR c++/107575 gcc/cp/ChangeLog: * decl.cc (duplicate_decls): Check for error_mark_node DECL_LOCAL_DECL_ALIAS. gcc/testsuite/ChangeLog: * g++.dg/parse/crash74.C: New test.
2024-06-07c++: -include and header unit translationJason Merrill4-4/+31
Within a source file, #include is translated to import if a suitable header unit is available, but this wasn't working with -include. This turned out to be because we suppressed the translation before the beginning of the main file. After removing that, I had to tweak libcpp file handling to accommodate the way it moves from an -include to the main file. gcc/ChangeLog: * doc/invoke.texi (C++ Modules): Mention -include. gcc/cp/ChangeLog: * module.cc (maybe_translate_include): Allow before the main file. libcpp/ChangeLog: * files.cc (_cpp_stack_file): LC_ENTER for -include header unit. gcc/testsuite/ChangeLog: * g++.dg/modules/dashinclude-1_b.C: New test. * g++.dg/modules/dashinclude-1_a.H: New test.
2024-06-07c++: lambda in pack expansion [PR115378]Patrick Palka5-4/+20
Here find_parameter_packs_r is incorrectly treating the 'auto' return type of a lambda as a parameter pack due to Concepts-TS specific logic added in r6-4517, leading to confusion later when expanding the pattern. Since we intend on removing Concepts TS support soon anyway, this patch fixes this by restricting the problematic logic with flag_concepts_ts. Doing so revealed that add_capture was relying on this logic to set TEMPLATE_TYPE_PARAMETER_PACK for the 'auto' type of an pack expansion init-capture, which we now need to do explicitly. PR c++/115378 gcc/cp/ChangeLog: * lambda.cc (lambda_capture_field_type): Set TEMPLATE_TYPE_PARAMETER_PACK on the auto type of an init-capture pack expansion. * pt.cc (find_parameter_packs_r) <case TEMPLATE_TYPE_PARM>: Restrict TEMPLATE_TYPE_PARAMETER_PACK promotion with flag_concepts_ts. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/decltype-auto-103497.C: Adjust expected diagnostic. * g++.dg/template/pr95672.C: Likewise. * g++.dg/cpp2a/lambda-targ5.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2024-06-07lto: Fix build on MacOSSimon Martin1-1/+1
The build fails on x86_64-apple-darwin19.6.0 starting with 5b6d5a886ee because vector is included after system.h and runs into poisoned identifiers. This patch fixes this by defining INCLUDE_VECTOR before including system.h. Validated by doing a full build on x86_64-apple-darwin19.6.0. gcc/lto/ChangeLog: * lto-partition.cc: Define INCLUDE_VECTOR to avoid running into poisoned identifiers.
2024-06-07i386: PR target/115351: RTX costs for *concatditi3 and *insvti_highpart.Roger Sayle2-0/+62
This patch addresses PR target/115351, which is a code quality regression on x86 when passing floating point complex numbers. The ABI considers these arguments to have TImode, requiring interunit moves to place the FP values (which are actually passed in SSE registers) into the upper and lower parts of a TImode pseudo, and then similar moves back again before they can be used. The cause of the regression is that changes in how TImode initialization is represented in RTL now prevents the RTL optimizers from eliminating these redundant moves. The specific cause is that the *concatditi3 pattern, (zext(hi)<<64)|zext(lo), has an inappropriately high (default) rtx_cost, preventing fwprop1 from propagating it. This pattern just sets the hipart and lopart of a double-word register, typically two instructions (less if reload can allocate things appropriately) but the current ix86_rtx_costs actually returns INSN_COSTS(13), i.e. 52. propagating insn 5 into insn 6, replacing: (set (reg:TI 110) (ior:TI (and:TI (reg:TI 110) (const_wide_int 0x0ffffffffffffffff)) (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ]) 0)) (const_int 64 [0x40])))) successfully matched this instruction to *concatditi3_3: (set (reg:TI 110) (ior:TI (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ]) 0)) (const_int 64 [0x40])) (zero_extend:TI (subreg:DI (reg:DF 111 [ zD.2796 ]) 0)))) change not profitable (cost 50 -> cost 52) This issue is resolved by having ix86_rtx_costs return more reasonable values for these (place-holder) patterns. 2024-06-07 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/115351 * config/i386/i386.cc (ix86_rtx_costs): Provide estimates for the *concatditi3 and *insvti_highpart patterns, about two insns. gcc/testsuite/ChangeLog PR target/115351 * g++.target/i386/pr115351.C: New test case.
2024-06-07i386: Improve handling of ternlog instructions in i386/sse.mdRoger Sayle14-14/+3491
This patch improves the way that the x86 backend recognizes and expands AVX512's bitwise ternary logic (vpternlog) instructions. As a motivating example consider the following code which calculates the carry out from a (binary) full adder: typedef unsigned long long v4di __attribute((vector_size(32))); v4di foo(v4di a, v4di b, v4di c) { return (a & b) | ((a ^ b) & c); } with -O2 -march=cascadelake current mainline produces: foo: vpternlogq $96, %ymm0, %ymm1, %ymm2 vmovdqa %ymm0, %ymm3 vmovdqa %ymm2, %ymm0 vpternlogq $248, %ymm3, %ymm1, %ymm0 ret with the patch below, we now generate a single instruction: foo: vpternlogq $232, %ymm2, %ymm1, %ymm0 ret The AVX512 vpternlog[qd] instructions are a very cool addition to the x86 instruction set, that can calculate any Boolean function of three inputs in a single fast instruction. As the truth table for any three-input function has 8 rows, any specific function can be represented by specifying those bits, i.e. by a 8-bit byte, an immediate integer between 0 and 256. Examples of ternary functions and their indices are given below: 0x01 1: ~((b|a)|c) 0x02 2: (~(b|a))&c 0x03 3: ~(b|a) 0x04 4: (~(c|a))&b 0x05 5: ~(c|a) 0x06 6: (c^b)&~a 0x07 7: ~((c&b)|a) 0x08 8: (~a&c)&b (~a&b)&c (c&b)&~a 0x09 9: ~((c^b)|a) 0x0a 10: ~a&c 0x0b 11: ~((~c&b)|a) (~b|c)&~a 0x0c 12: ~a&b 0x0d 13: ~((~b&c)|a) (~c|b)&~a 0x0e 14: (c|b)&~a 0x0f 15: ~a 0x10 16: (~(c|b))&a 0x11 17: ~(c|b) ... 0xf4 244: (~c&b)|a 0xf5 245: ~c|a 0xf6 246: (c^b)|a 0xf7 247: (~(c&b))|a 0xf8 248: (c&b)|a 0xf9 249: (~(c^b))|a 0xfa 250: c|a 0xfb 251: (c|a)|~b (~b|a)|c (~b|c)|a 0xfc 252: b|a 0xfd 253: (b|a)|~c (~c|a)|b (~c|b)|a 0xfe 254: (b|a)|c (c|a)|b (c|b)|a A naive implementation (in many compilers) might be add define_insn patterns for all 256 different functions. The situation is even worse as many of these Boolean functions don't have a "canonical form" (as produced by simplify_rtx) and would each need multiple patterns. See the space-separated equivalent expressions in the table above. This need to provide instruction "templates" might explain why GCC, LLVM and ICC all exhibit similar coverage problems in their ability to recognize x86 ternlog ternary functions. Perhaps a unique feature of GCC's design is that in addition to regular define_insn templates, machine descriptions can also perform pattern matching via a match_operator (and its corresponding predicate). This patch introduces a ternlog_operand predicate that matches a (possibly infinite) set of expression trees, identifying those that have at most three unique operands. This then allows a define_insn_and_split to recognize suitable expressions and then transform them into the appropriate UNSPEC_VTERNLOG as a pre-reload splitter. This design allows combine to smash together arbitrarily complex Boolean expressions, then transform them into an UNSPEC before register allocation. As an "optimization", where possible ix86_expand_ternlog generates a simpler binary operation, using AND, XOR, IOR or ANDN where possible, and in a few cases attempts to "canonicalize" the ternlog, by reordering or duplicating operands, so that later CSE passes have a hope of spotting equivalent values. This patch leaves the existing ternlog patterns in sse.md (for now), many of which are made obsolete by these changes. In theory we now only need one define_insn for UNSPEC_VTERNLOG. One complication from these previous variants was that they inconsistently used decimal vs. hexadecimal to specify the immediate constant operand in assembly language, making the list of tweaks to the testsuite with this patch larger than it might have been. I propose to remove the vestigial patterns in a follow-up patch, once this approach has baked (proven to be stable) on mainline. 2024-06-07 Roger Sayle <roger@nextmovesoftware.com> Hongtao Liu <hongtao.liu@intel.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_args_builtin): Call fixup_modeless_constant before testing predicates. Only call copy_to_mode_reg on memory operands (after the first one). (ix86_gen_bcst_mem): Helper function to convert a CONST_VECTOR into a VEC_DUPLICATE if possible. (ix86_ternlog_idx): Convert an RTX expression into a ternlog index between 0 and 255, recording the operands in ARGS, if possible or return -1 if this is not possible/valid. (ix86_ternlog_leaf_p): Helper function to identify "leaves" of a ternlog expression, e.g. REG_P, MEM_P, CONST_VECTOR, etc. (ix86_ternlog_operand_p): Test whether a expression is suitable for and prefered as an UNSPEC_TERNLOG. (ix86_expand_ternlog_binop): Helper function to construct the binary operation corresponding to a sufficiently simple ternlog. (ix86_expand_ternlog_andnot): Helper function to construct a ANDN operation corresponding to a sufficiently simple ternlog. (ix86_expand_ternlog): Expand a 3-operand ternary logic expression, constructing either an UNSPEC_TERNLOG or simpler rtx expression. Called from builtin expanders and pre-reload splitters. * config/i386/i386-protos.h (ix86_ternlog_idx): Prototype here. (ix86_ternlog_operand_p): Likewise. (ix86_expand_ternlog): Likewise. * config/i386/predicates.md (ternlog_operand): New predicate that calls xi86_ternlog_operand_p. * config/i386/sse.md (<avx512>_vpternlog<mode>_0): New define_insn_and_split that recognizes a SET_SRC of ternlog_operand and expands it via ix86_expand_ternlog pre-reload. (<avx512>_vternlog<mode>_mask): Convert from define_insn to define_expand. Use ix86_expand_ternlog if the mask operand is ~0 (or 255 or -1). (*<avx512>_vternlog<mode>_mask): define_insn renamed from above. gcc/testsuite/ChangeLog * gcc.target/i386/avx512f-vpternlogd-1.c: Update test case. * gcc.target/i386/avx512f-vpternlogq-1.c: Likewise. * gcc.target/i386/avx512vl-vpternlogd-1.c: Likewise. * gcc.target/i386/avx512vl-vpternlogq-1.c: Likewise. * gcc.target/i386/pr100711-4.c: Likewise. * gcc.target/i386/pr100711-5.c: Likewise. * gcc.target/i386/avx512f-vpternlogd-3.c: New 128-bit test case. * gcc.target/i386/avx512f-vpternlogd-4.c: New 256-bit test case. * gcc.target/i386/avx512f-vpternlogd-5.c: New 512-bit test case. * gcc.target/i386/avx512f-vpternlogq-3.c: New test case.
2024-06-07lto: Implement cache partitioningMichal Jires6-10/+605
This patch implements new cache partitioning. It tries to keep symbols from single source file together to minimize propagation of divergence. It starts with symbols already grouped by source files. If reasonably possible it only either combines several files into one final partition, or, if a file is large, split the file into several final partitions. Intermediate representation is partition_set which contains set of groups of symbols (each group corresponding to original source file) and number of final partitions this partition_set should split into. First partition_fixed_split splits partition_set into constant number of partition_sets with equal number of symbols groups. If for example there are 39 source files, the resulting partition_sets will contain 10, 10, 10, and 9 source files. This splitting intentionally ignores estimated instruction counts to minimize propagation of divergence. Second partition_over_target_split separates too large files and splits them into individual symbols to be combined back into several smaller files in next step. Third partition_binary_split splits partition_set into two halves until it should be split into only one final partition, at which point the remaining symbols are joined into one final partition. Bootstrapped/regtested on x86_64-pc-linux-gnu gcc/ChangeLog: * common.opt: Add cache partitioning. * flag-types.h (enum lto_partition_model): Likewise. gcc/lto/ChangeLog: * lto-partition.cc (new_partition): Use new_partition_no_push. (new_partition_no_push): New. (free_ltrans_partition): New. (free_ltrans_partitions): Use free_ltrans_partition. (join_partitions): New. (split_partition_into_nodes): New. (is_partition_reorder): New. (class partition_set): New. (distribute_n_partitions): New. (partition_over_target_split): New. (partition_binary_split): New. (partition_fixed_split): New. (class partitioner_base): New. (class partitioner_default): New. (lto_cache_map): New. * lto-partition.h (lto_cache_map): New. * lto.cc (do_whole_program_analysis): Use lto_cache_map. gcc/testsuite/ChangeLog: * gcc.dg/completion-2.c: Add -flto-partition=cache.
2024-06-07Fix fold-left reduction vectorization with multiple stmt copiesRichard Biener1-1/+1
There's a typo when code generating the mask operand for conditional fold-left reductions in the case we have multiple stmt copies. The latter is now allowed for SLP and possibly disabled for non-SLP by accident. This fixes the observed run-FAIL for gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c with AVX512 and 256bit sized vectors. * tree-vect-loop.cc (vectorize_fold_left_reduction): Fix mask vector operand indexing.
2024-06-07Add finalizer creation to array constructor for functions of derived type.Andre Vehreschild2-1/+80
PR fortran/90068 gcc/fortran/ChangeLog: * trans-array.cc (gfc_trans_array_ctor_element): Eval non- variable expressions once only. (gfc_trans_array_constructor_value): Add statements of final block. (trans_array_constructor): Detect when final block is required. gcc/testsuite/ChangeLog: * gfortran.dg/finalize_57.f90: New test.
2024-06-07bitint: Fix up lower_addsub_overflow [PR115352]Jakub Jelinek2-5/+29
The following testcase is miscompiled because of a flawed optimization. If one changes the 65 in the testcase to e.g. 66, one gets: ... _25 = .USUBC (0, _24, _14); _12 = IMAGPART_EXPR <_25>; _26 = REALPART_EXPR <_25>; if (_23 >= 1) goto <bb 8>; [80.00%] else goto <bb 11>; [20.00%] <bb 8> : if (_23 != 1) goto <bb 10>; [80.00%] else goto <bb 9>; [20.00%] <bb 9> : _27 = (signed long) _26; _28 = _27 >> 1; _29 = (unsigned long) _28; _31 = _29 + 1; _30 = _31 > 1; goto <bb 11>; [100.00%] <bb 10> : _32 = _26 != _18; _33 = _22 | _32; <bb 11> : # _17 = PHI <_30(9), _22(7), _33(10)> # _19 = PHI <_29(9), _18(7), _18(10)> ... so there is one path for limbs below the boundary (in this case there are actually no limbs there, maybe we could consider optimizing that further, say with simply folding that _23 >= 1 condition to 1 == 1 and letting cfg cleanup handle it), another case where it is exactly the limb on the boundary (that is the bb 9 handling where it extracts the interesting bits (the first 3 statements) and then checks if it is zero or all ones and finally the case of limbs above that where it compares the current result limb against the previously recorded 0 or all ones and ors differences into accumulated result. Now, the optimization which the first hunk removes was based on the idea that for that case the extraction of the interesting bits from the limb don't need anything special, so the _27/_28/_29 statements above aren't needed, the whole limb is interesting bits, so it handled the >= 1 case like the bb 9 above without the first 3 statements and bb 10 wasn't there at all. There are 2 problems with that, for the higher limbs it only checks if the the result limb bits are all zeros or all ones, but doesn't check if they are the same as the other extension bits, and it forgets the previous flag whether there was an overflow. First I wanted to fix it just by adding the _33 = _22 | _30; statement to the end of bb 9 above, which fixed the originally filed huge testcase and the first 2 foo calls in the testcase included in the patch, it no longer forgets about previously checked differences from 0/1. But as the last 2 foo calls show, it still didn't check whether each even (or each odd depending on the exact position) result limb is equal to the first one, so every second limb it could choose some other 0 vs. all ones value and as long as it repeated in another limb above it it would be ok. So, the optimization just can't work properly and the following patch removes it. 2024-06-07 Jakub Jelinek <jakub@redhat.com> PR middle-end/115352 * gimple-lower-bitint.cc (lower_addsub_overflow): Don't disable single_comparison if cmp_code is GE_EXPR. * gcc.dg/torture/bitint-71.c: New test.
2024-06-07go: Fix gccgo -v on Solaris with ldRainer Orth1-2/+5
The Go testsuite's go.sum file ends in Couldn't determine version of /var/gcc/regression/master/11.4-gcc-64/build/gcc/gccgo on Solaris. It turns out this happens because gccgo -v is confused: [...] gcc version 15.0.0 20240531 (experimental) [master a0d60660f2aae2d79685f73d568facb2397582d8] (GCC) COMPILER_PATH=./:/usr/ccs/bin/ LIBRARY_PATH=./:/lib/amd64/:/usr/lib/amd64/:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-g1' '-B' './' '-v' '-shared-libgcc' '-mtune=generic' '-march=x86-64' '-dumpdir' 'a.' ./collect2 -V -M ./libgcc-unwind.map -Qy /usr/lib/amd64/crt1.o ./crtp.o /usr/lib/amd64/crti.o /usr/lib/amd64/values-Xa.o /usr/lib/amd64/values-xpg6.o ./crtbegin.o -L. -L/lib/amd64 -L/usr/lib/amd64 -t -lgcc_s -lgcc -lc -lgcc_s -lgcc ./crtend.o /usr/lib/amd64/crtn.o ld: Software Generation Utilities - Solaris Link Editors: 5.11-1.3297 Undefined first referenced symbol in file main /usr/lib/amd64/crt1.o ld: fatal: symbol referencing errors collect2: error: ld returned 1 exit status trying to invoke the linker without adding any object file. This only happens when Solaris ld is in use. gccgo passes -t to the linker in that case, but does it unconditionally, even with -v. When configured to use GNU ld, gccgo -v is fine instead. This patch avoids this by restricting the -t to actually linking. Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11 (ld and gld). 2024-06-05 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc/go: * gospec.cc (lang_specific_driver) [TARGET_SOLARIS !USE_GLD]: Only add -t if linking.
2024-06-07testsuite: go: Require split-stack support for go.test/test/index0.go [PR87589]Rainer Orth1-1/+2
The index0-out.go test FAILs on Solaris (SPARC and x86, 32 and 64-bit), as well as several others: FAIL: ./index0-out.go execution, -O0 -g -fno-var-tracking-assignments The test SEGVs because it tries a stack acess way beyond the stack area. As Ian analyzed in the PR, the testcase currently requires split-stack support, so this patch requires just that. Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11. 2024-06-05 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc/testsuite: PR go/87589 * go.test/go-test.exp (go-gc-tests): Require split-stack support for index0.go.
2024-06-07Fix returned type to be allocatable for user-functions.Andre Vehreschild3-22/+109
The returned type of user-defined function returning a class object was not detected and handled correctly, which lead to memory leaks. PR fortran/90072 gcc/fortran/ChangeLog: * expr.cc (gfc_is_alloc_class_scalar_function): Detect allocatable class return types also for user-defined functions. * trans-expr.cc (gfc_conv_procedure_call): Same. (trans_class_vptr_len_assignment): Compute vptr len assignment correctly for user-defined functions. gcc/testsuite/ChangeLog: * gfortran.dg/class_77.f90: New test.
2024-06-07enable adjustment of return_pc debug attrsAlexandre Oliva4-9/+35
This patch introduces infrastructure for targets to add an offset to the label issued after the call_insn to set the call_return_pc attribute. This will be used on rs6000, that sometimes issues another instruction after the call proper as part of a call insn. for gcc/ChangeLog * target.def (call_offset_return_label): New hook. * doc/tm.texi.in (TARGET_CALL_OFFSET_RETURN_LABEL): Add placeholder. * doc/tm.texi: Rebuild. * dwarf2out.cc (struct call_arg_loc_node): Record call_insn instead of call_arg_loc_note. (add_AT_lbl_id): Add optional offset argument. (gen_call_site_die): Compute and pass on a return pc offset. (gen_subprogram_die): Move call_arg_loc_note computation... (dwarf2out_var_location): ... from here. Set call_insn.
2024-06-07Add additional option --param max-completely-peeled-insns=200 for power64*-*-*liuhongt1-0/+1
gcc/testsuite/ChangeLog: * gcc.dg/vect/pr112325.c:Add additional option --param max-completely-peeled-insns=200 for power64*-*-*.
2024-06-07RISC-V: Add testcases for scalar unsigned SAT_ADD form 5Pan Li9-0/+183
After the middle-end support the form 5 of unsigned SAT_ADD and the RISC-V backend implement the scalar .SAT_ADD, add more test case to cover the form 5 of unsigned .SAT_ADD. Form 5: #define SAT_ADD_U_5(T) \ T sat_add_u_5_##T(T x, T y) \ { \ return (T)(x + y) < x ? -1 : (x + y); \ } Passed the riscv fully regression tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test macro for form 5. * gcc.target/riscv/sat_u_add-21.c: New test. * gcc.target/riscv/sat_u_add-22.c: New test. * gcc.target/riscv/sat_u_add-23.c: New test. * gcc.target/riscv/sat_u_add-24.c: New test. * gcc.target/riscv/sat_u_add-run-21.c: New test. * gcc.target/riscv/sat_u_add-run-22.c: New test. * gcc.target/riscv/sat_u_add-run-23.c: New test. * gcc.target/riscv/sat_u_add-run-24.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-06-07RISC-V: Add testcases for scalar unsigned SAT_ADD form 4Pan Li9-0/+183
After the middle-end support the form 4 of unsigned SAT_ADD and the RISC-V backend implement the scalar .SAT_ADD, add more test case to cover the form 4 of unsigned .SAT_ADD. Form 4: #define SAT_ADD_U_4(T) \ T sat_add_u_4_##T (T x, T y) \ { \ T ret; \ return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \ } Passed the rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test macro for form 4. * gcc.target/riscv/sat_u_add-17.c: New test. * gcc.target/riscv/sat_u_add-18.c: New test. * gcc.target/riscv/sat_u_add-19.c: New test. * gcc.target/riscv/sat_u_add-20.c: New test. * gcc.target/riscv/sat_u_add-run-17.c: New test. * gcc.target/riscv/sat_u_add-run-18.c: New test. * gcc.target/riscv/sat_u_add-run-19.c: New test. * gcc.target/riscv/sat_u_add-run-20.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-06-07RISC-V: Add testcases for scalar unsigned SAT_ADD form 3Pan Li9-0/+185
After the middle-end support the form 3 of unsigned SAT_ADD and the RISC-V backend implement the scalar .SAT_ADD, add more test case to cover the form 3 of unsigned .SAT_ADD. Form 3: #define SAT_ADD_U_3(T) \ T sat_add_u_3_##T (T x, T y) \ { \ T ret; \ return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \ } Passed the rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test macro for form 3. * gcc.target/riscv/sat_u_add-13.c: New test. * gcc.target/riscv/sat_u_add-14.c: New test. * gcc.target/riscv/sat_u_add-15.c: New test. * gcc.target/riscv/sat_u_add-16.c: New test. * gcc.target/riscv/sat_u_add-run-13.c: New test. * gcc.target/riscv/sat_u_add-run-14.c: New test. * gcc.target/riscv/sat_u_add-run-15.c: New test. * gcc.target/riscv/sat_u_add-run-16.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-06-07RISC-V: Add testcases for scalar unsigned SAT_ADD form 2Pan Li9-0/+185
After the middle-end support the form 2 of unsigned SAT_ADD and the RISC-V backend implement the scalar .SAT_ADD, add more test case to cover the form 2 of unsigned .SAT_ADD. Form 2: #define SAT_ADD_U_2(T) \ T sat_add_u_2_##T(T x, T y) \ { \ T ret; \ T overflow = __builtin_add_overflow (x, y, &ret); \ return (T)(-overflow) | ret; \ } Passed the rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test macro for form 2. * gcc.target/riscv/sat_u_add-10.c: New test. * gcc.target/riscv/sat_u_add-11.c: New test. * gcc.target/riscv/sat_u_add-12.c: New test. * gcc.target/riscv/sat_u_add-9.c: New test. * gcc.target/riscv/sat_u_add-run-10.c: New test. * gcc.target/riscv/sat_u_add-run-11.c: New test. * gcc.target/riscv/sat_u_add-run-12.c: New test. * gcc.target/riscv/sat_u_add-run-9.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-06-07RISC-V: Add testcases for scalar unsigned SAT_ADD form 1Pan Li9-0/+183
After the middle-end support the form 1 of unsigned SAT_ADD and the RISC-V backend implement the scalar .SAT_ADD, add more test case to cover the form 1 of unsigned .SAT_ADD. Form 1: #define SAT_ADD_U_1(T) \ T sat_add_u_1_##T(T x, T y) \ { \ return (T)(x + y) >= x ? (x + y) : -1; \ } Passed the riscv fully regression tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add helper macro for form 1. * gcc.target/riscv/sat_u_add-5.c: New test. * gcc.target/riscv/sat_u_add-6.c: New test. * gcc.target/riscv/sat_u_add-7.c: New test. * gcc.target/riscv/sat_u_add-8.c: New test. * gcc.target/riscv/sat_u_add-run-5.c: New test. * gcc.target/riscv/sat_u_add-run-6.c: New test. * gcc.target/riscv/sat_u_add-run-7.c: New test. * gcc.target/riscv/sat_u_add-run-8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-06-07Daily bump.GCC Administrator6-1/+258
2024-06-07Match: Support more form for scalar unsigned SAT_ADDPan Li4-5/+236
After we support one gassign form of the unsigned .SAT_ADD, we would like to support more forms including both the branch and branchless. There are 5 other forms of .SAT_ADD, list as below: Form 1: #define SAT_ADD_U_1(T) \ T sat_add_u_1_##T(T x, T y) \ { \ return (T)(x + y) >= x ? (x + y) : -1; \ } Form 2: #define SAT_ADD_U_2(T) \ T sat_add_u_2_##T(T x, T y) \ { \ T ret; \ T overflow = __builtin_add_overflow (x, y, &ret); \ return (T)(-overflow) | ret; \ } Form 3: #define SAT_ADD_U_3(T) \ T sat_add_u_3_##T (T x, T y) \ { \ T ret; \ return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \ } Form 4: #define SAT_ADD_U_4(T) \ T sat_add_u_4_##T (T x, T y) \ { \ T ret; \ return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \ } Form 5: #define SAT_ADD_U_5(T) \ T sat_add_u_5_##T(T x, T y) \ { \ return (T)(x + y) < x ? -1 : (x + y); \ } Take the forms 3 of above as example: uint64_t sat_add (uint64_t x, uint64_t y) { uint64_t ret; return __builtin_add_overflow (x, y, &ret) ? -1 : ret; } Before this patch: uint64_t sat_add (uint64_t x, uint64_t y) { long unsigned int _1; long unsigned int _2; uint64_t _3; __complex__ long unsigned int _6; ;; basic block 2, loop depth 0 ;; pred: ENTRY _6 = .ADD_OVERFLOW (x_4(D), y_5(D)); _2 = IMAGPART_EXPR <_6>; if (_2 != 0) goto <bb 4>; [35.00%] else goto <bb 3>; [65.00%] ;; succ: 4 ;; 3 ;; basic block 3, loop depth 0 ;; pred: 2 _1 = REALPART_EXPR <_6>; ;; succ: 4 ;; basic block 4, loop depth 0 ;; pred: 3 ;; 2 # _3 = PHI <_1(3), 18446744073709551615(2)> return _3; ;; succ: EXIT } After this patch: uint64_t sat_add (uint64_t x, uint64_t y) { long unsigned int _12; ;; basic block 2, loop depth 0 ;; pred: ENTRY _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call] return _12; ;; succ: EXIT } The flag '^' acts on cond_expr will generate matching code similar as below: else if (gphi *_a1 = dyn_cast <gphi *> (_d1)) { basic_block _b1 = gimple_bb (_a1); if (gimple_phi_num_args (_a1) == 2) { basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src; basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src; basic_block _db_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_pb_0_1)) ? _pb_0_1 : _pb_1_1; basic_block _other_db_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_pb_0_1)) ? _pb_1_1 : _pb_0_1; gcond *_ct_1 = safe_dyn_cast <gcond *> (*gsi_last_bb (_db_1)); if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1 && EDGE_COUNT (_other_db_1->succs) == 1 && EDGE_PRED (_other_db_1, 0)->src == _db_1) { tree _cond_lhs_1 = gimple_cond_lhs (_ct_1); tree _cond_rhs_1 = gimple_cond_rhs (_ct_1); tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, _cond_lhs_1, _cond_rhs_1); bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & EDGE_TRUE_VALUE; tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1); tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0); .... The below test suites are passed for this patch. * The x86 bootstrap test. * The x86 fully regression test. * The riscv fully regression test. gcc/ChangeLog: * doc/match-and-simplify.texi: Add doc for the matching flag '^'. * genmatch.cc (cmp_operand): Add match_phi comparation. (dt_node::gen_kids_1): Add cond_expr bool flag for phi match. (dt_operand::gen_phi_on_cond): Add new func to gen phi matching on cond_expr. (parser::parse_expr): Add handling for the expr flag '^'. * match.pd: Add more form for unsigned .SAT_ADD. * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add new func impl to build call for phi gimple. (match_unsigned_saturation_add): Add new func impl to match the .SAT_ADD for phi gimple. (math_opts_dom_walker::after_dom_children): Add phi matching try for all gimple phi stmt. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-06-06c: Fix up pointer types to may_alias structures [PR114493]Jakub Jelinek3-0/+60
The following testcase ICEs in ipa-free-lang, because the fld_incomplete_type_of gcc_assert (TYPE_CANONICAL (t2) != t2 && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t))); assertion doesn't hold. This is because t is a struct S * type which was created while struct S was still incomplete and without the may_alias attribute (and TYPE_CANONICAL of a pointer type is a type created with can_alias_all = false argument), while later on on the struct definition may_alias attribute was used. fld_incomplete_type_of then creates an incomplete distinct copy of the structure (but with the original attributes) but pointers created for it are because of the "may_alias" attribute TYPE_REF_CAN_ALIAS_ALL, including their TYPE_CANONICAL, because while that is created with !can_alias_all argument, we later set it because of the "may_alias" attribute on the to_type. This doesn't ICE with C++ since PR70512 fix because the C++ FE sets TYPE_REF_CAN_ALIAS_ALL on all pointer types to the class type (and its variants) when the may_alias is added. The following patch does that in the C FE as well. 2024-06-06 Jakub Jelinek <jakub@redhat.com> PR c/114493 * c-decl.cc (c_fixup_may_alias): New function. (finish_struct): Call it if "may_alias" attribute is specified. * gcc.dg/pr114493-1.c: New test. * gcc.dg/pr114493-2.c: New test.
2024-06-06aarch64: Add vector floating point extend pattern [PR113880, PR113869]Pengxuan Zheng3-1/+31
This patch adds vector floating point extend pattern for V2SF->V2DF and V4HF->V4SF conversions by renaming the existing aarch64_float_extend_lo_<Vwide> pattern to the standard optab one, i.e., extend<mode><Vwide>2. This allows the vectorizer to vectorize certain floating point widening operations for the aarch64 target. PR target/113880 PR target/113869 gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (VAR1): Remap float_extend_lo_ builtin codes to standard optab ones. * config/aarch64/aarch64-simd.md (aarch64_float_extend_lo_<Vwide>): Rename to... (extend<mode><Vwide>2): ... This. gcc/testsuite/ChangeLog: * gcc.target/aarch64/extend-vec.c: New test. Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
2024-06-06modula2: Simplify REAL/LONGREAL/SHORTREAL node creation.Gaius Mulley1-23/+7
This patch simplifies the real type build functions by using the default float_type_node, double_type_node rather than create new nodes. It also uses the default GCC long_double_type_node or float128_type_nodes for longreal. gcc/m2/ChangeLog: * gm2-gcc/m2type.cc (build_m2_short_real_node): Rewrite to use the default float_type_node. (build_m2_real_node): Rewrite to use the default double_type_node. (build_m2_long_real_node): Rewrite to use the default long_double_type_node or float128_type_node. Co-Authored-By: Kewen.Lin <linkw@linux.ibm.com> Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2024-06-06testsuite/i386: Add vector sat_sub testcases [PR112600]Uros Bizjak2-0/+30
PR middle-end/112600 gcc/testsuite/ChangeLog: * gcc.target/i386/pr112600-2a.c: New test. * gcc.target/i386/pr112600-2b.c: New test.
2024-06-06Plugins: Add label-text.h to CPPLIB_H so it will be installed [PR115288]Andrew Pinski1-0/+1
After r15-874-g9bda2c4c81b668, out of tree plugins won't compile as the new libcpp header file label-text.h is not installed. This adds the new header file to CPPLIB_H which is used for the plugin headers to install. Committed as obvious after a build and install and make sure the new header file is installed. gcc/ChangeLog: PR plugins/115288 * Makefile.in (CPPLIB_H): Add label-text.h. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-06-06aarch64: Add missing ACLE macro for NEON-SVE BridgeRichard Ball1-0/+1
__ARM_NEON_SVE_BRIDGE was missed in the original patch and is added by this patch. gcc/ChangeLog: * config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros): Add missing __ARM_NEON_SVE_BRIDGE.
2024-06-06arm: Fix CASE_VECTOR_SHORTEN_MODE for thumb2.Richard Ball2-2/+146
The CASE_VECTOR_SHORTEN_MODE query is missing some equals signs which causes suboptimal codegen due to missed optimisation opportunities. This patch also adds a test for thumb2 switch statements as none exist currently. gcc/ChangeLog: PR target/115353 * config/arm/arm.h (enum arm_auto_incmodes): Correct CASE_VECTOR_SHORTEN_MODE query. gcc/testsuite/ChangeLog: * gcc.target/arm/thumb2-switchstatement.c: New test.
2024-06-06AArch64: correct constraint on Upl early clobber alternativesTamar Christina2-33/+33
I made an oversight in the previous patch, where I added a ?Upa alternative to the Upl cases. This causes it to create the tie between the larger register file rather than the constrained one. This fixes the affected patterns. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (@aarch64_pred_cmp<cmp_op><mode>, *cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest, @aarch64_pred_cmp<cmp_op><mode>_wide, *aarch64_pred_cmp<cmp_op><mode>_wide_cc, *aarch64_pred_cmp<cmp_op><mode>_wide_ptest): Fix Upl tie alternative. * config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Fix Upl tie alternative.
2024-06-06nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution, via ↵Thomas Schwinge4-9/+39
'vote.all.pred' For example, this allows for '-muniform-simt' code to be executed single-threaded, which currently fails (device-side 'trap'): the '0xffffffff' bitmask isn't correct if not all 32 threads of a warp are active. The same issue/fix, I suppose but have not verified, would apply if we were to allow for OpenACC 'vector_length' smaller than 32, for example for OpenACC 'serial'. We use 'nvptx_uniform_warp_check' only for PTX ISA version less than 6.0. Otherwise we're using 'nvptx_warpsync', which emits 'bar.warp.sync 0xffffffff', which evidently appears to do the right thing. (I've tested '-muniform-simt' code executing single-threaded.) The change that I proposed on 2022-12-15 was to emit PTX code to calculate '(1 << %ntid.x) - 1' as the actual bitmask to use instead of '0xffffffff'. This works, but the PTX JIT generates SASS code to do this computation. In turn, this change now uses PTX 'vote.all.pred' -- which even simplifies upon the original code a little bit, see the following examplary SASS 'diff' before vs. after this change: [...] /*[...]*/ SYNC (*"BRANCH_TARGETS .L_x_332"*) } .L_x_332: - /*[...]*/ VOTE.ANY R9, PT, PT ; + /*[...]*/ VOTE.ALL P1, PT ; - /*[...]*/ ISETP.NE.U32.AND P1, PT, R9, -0x1, PT ; - /*[...]*/ @!P1 BRA `(.L_x_333) ; + /*[...]*/ @P1 BRA `(.L_x_333) ; /*[...]*/ BPT.TRAP 0x1 ; .L_x_333: - /*[...]*/ @P1 EXIT ; + /*[...]*/ @!P1 EXIT ; [...] gcc/ * config/nvptx/nvptx.md (nvptx_uniform_warp_check): Make fit for non-full-warp execution, via 'vote.all.pred'. gcc/testsuite/ * gcc.target/nvptx/nvptx.exp (check_effective_target_default_ptx_isa_version_at_least_6_0): New. * gcc.target/nvptx/uniform-simt-2.c: Adjust. * gcc.target/nvptx/uniform-simt-5.c: New.
2024-06-06Vect: Support IFN SAT_SUB for unsigned vector intPan Li2-15/+84
This patch would like to support the .SAT_SUB for the unsigned vector int. Given we have below example code: void vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { for (unsigned i = 0; i < n; i++) out[i] = (x[i] - y[i]) & (-(uint64_t)(x[i] >= y[i])); } Before this patch: void vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { ... _77 = .SELECT_VL (ivtmp_75, POLY_INT_CST [2, 2]); ivtmp_56 = _77 * 8; vect__4.7_59 = .MASK_LEN_LOAD (vectp_x.5_57, 64B, { -1, ... }, _77, 0); vect__6.10_63 = .MASK_LEN_LOAD (vectp_y.8_61, 64B, { -1, ... }, _77, 0); mask__7.11_64 = vect__4.7_59 >= vect__6.10_63; _66 = .COND_SUB (mask__7.11_64, vect__4.7_59, vect__6.10_63, { 0, ... }); .MASK_LEN_STORE (vectp_out.15_71, 64B, { -1, ... }, _77, 0, _66); vectp_x.5_58 = vectp_x.5_57 + ivtmp_56; vectp_y.8_62 = vectp_y.8_61 + ivtmp_56; vectp_out.15_72 = vectp_out.15_71 + ivtmp_56; ivtmp_76 = ivtmp_75 - _77; ... } After this patch: void vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { ... _76 = .SELECT_VL (ivtmp_74, POLY_INT_CST [2, 2]); ivtmp_60 = _76 * 8; vect__4.7_63 = .MASK_LEN_LOAD (vectp_x.5_61, 64B, { -1, ... }, _76, 0); vect__6.10_67 = .MASK_LEN_LOAD (vectp_y.8_65, 64B, { -1, ... }, _76, 0); vect_patt_37.11_68 = .SAT_SUB (vect__4.7_63, vect__6.10_67); .MASK_LEN_STORE (vectp_out.12_70, 64B, { -1, ... }, _76, 0, vect_patt_37.11_68); vectp_x.5_62 = vectp_x.5_61 + ivtmp_60; vectp_y.8_66 = vectp_y.8_65 + ivtmp_60; vectp_out.12_71 = vectp_out.12_70 + ivtmp_60; ivtmp_75 = ivtmp_74 - _76; ... } The below test suites are passed for this patch * The x86 bootstrap test. * The x86 fully regression test. * The riscv fully regression tests. gcc/ChangeLog: * match.pd: Add new form for vector mode recog. * tree-vect-patterns.cc (gimple_unsigned_integer_sat_sub): Add new match func decl; (vect_recog_build_binary_gimple_call): Extract helper func to build gcall with given internal_fn. (vect_recog_sat_sub_pattern): Add new func impl to recog .SAT_SUB. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-06-06lto: Remove random_seed from section name.Michal Jires2-2/+16
This patch removes suffixes from section names during LTO linking. These suffixes were originally added for ld -r to work (PR lto/44992). They were added to all LTO object files, but are only useful before WPA. After that they waste space, and if kept random, make LTO caching impossible. Bootstrapped/regtested on x86_64-pc-linux-gnu gcc/ChangeLog: * lto-streamer.cc (lto_get_section_name): Remove suffixes after WPA. gcc/lto/ChangeLog: * lto-common.cc (lto_section_with_id): Dont load suffix during LTRANS.
2024-06-06lto: Skip flag OPT_fltrans_output_list_.Michal Jires1-0/+1
Bootstrapped/regtested on x86_64-pc-linux-gnu gcc/ChangeLog: * lto-opts.cc (lto_write_options): Skip OPT_fltrans_output_list_.
2024-06-06RISC-V: Regenerate opt urls.Robin Dapp1-0/+6
I wasn't aware that I needed to regenerate the opt urls when adding an option. This patch does that. gcc/ChangeLog: * config/riscv/riscv.opt.urls: Regenerate.
2024-06-06[APX CCMP] Support ccmp for float compareHongyu Wang3-7/+138
The ccmp insn itself doesn't support fp compare, but x86 has fp comi insn that changes EFLAG which can be the scc input to ccmp. Allow scalar fp compare in ix86_gen_ccmp_first except ORDERED/UNORDERD compare which can not be identified in ccmp. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_gen_ccmp_first): Add fp compare and check the allowed fp compare type. (ix86_gen_ccmp_next): Adjust compare_code input to ccmp for fp compare. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ccmp-1.c: Add test for fp compare. * gcc.target/i386/apx-ccmp-2.c: Likewise.
2024-06-06[APX CCMP] Adjust startegy for selecting ccmp candidatesHongyu Wang1-1/+9
For general ccmp scenario, the tree sequence is like _1 = (a < b) _2 = (c < d) _3 = _1 & _2 current ccmp expanding will try to swap compare order for _1 and _2, compare the expansion cost/cost2 for expanding _1 or _2 first, then return the sequence with lower cost. It is possible that one expansion succeeds and the other fails. For example, x86 has int ccmp but not fp ccmp, so a combined fp and int comparison must be ordered such that the fp comparison happens first. The costs are not meaningful for failed expansions. Check the expand_ccmp_next result ret and ret2, returns the valid one before cost comparison. gcc/ChangeLog: * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of expand_ccmp_next, returns the valid one first instead of comparing cost.
2024-06-06[APX CCMP] Support APX CCMPHongyu Wang9-4/+337
APX CCMP feature implements conditional compare which executes compare when EFLAGS matches certain condition. CCMP introduces default flags value (dfv), when conditional compare does not execute, it will directly set the flags according to dfv. The instruction goes like ccmpeq {dfv=sf,of,cf,zf} %rax, %r16 For this instruction, it will test EFLAGS regs if it matches conditional code EQ, if yes, compare %rax and %r16 like legacy cmp. If no, the EFLAGS will be updated according to dfv, which means SF,OF,CF,ZF are set. PF will be set according to CF in dfv, and AF will always be cleared. The dfv part can be a combination of sf,of,cf,zf, like {dfv=cf,zf} which sets CF and ZF only and clear others, or {dfv=} which clears all EFLAGS. To enable CCMP, we implemented the target hook TARGET_GEN_CCMP_FIRST and TARGET_GEN_CCMP_NEXT to reuse the current ccmp infrastructure. Also we extended the cstorem4 optab to support storing different CCmode to fit current ccmp infrasturcture. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_gen_ccmp_first): New function that test if the first compare can be generated. (ix86_gen_ccmp_next): New function to emit a simgle compare and ccmp sequence. * config/i386/i386-opts.h (enum apx_features): Add apx_ccmp. * config/i386/i386-protos.h (ix86_gen_ccmp_first): New proto declare. (ix86_gen_ccmp_next): Likewise. (ix86_get_flags_cc): Likewise. * config/i386/i386.cc (ix86_flags_cc): New enum. (ix86_ccmp_dfv_mapping): New string array to map conditional code to dfv. (ix86_print_operand): Handle special dfv flag for CCMP. (ix86_get_flags_cc): New function to return x86 CC enum. (TARGET_GEN_CCMP_FIRST): Define. (TARGET_GEN_CCMP_NEXT): Likewise. * config/i386/i386.h (TARGET_APX_CCMP): Define. * config/i386/i386.md (@ccmp<mode>): New define_insn to support ccmp. (UNSPEC_APX_DFV): New unspec for ccmp dfv. (ALL_CC): New mode iterator. (cstorecc4): Change to ... (cstore<mode>4) ... this, use ALL_CC to loop through all available CCmodes. * config/i386/i386.opt (apx_ccmp): Add enum value for ccmp. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ccmp-1.c: New compile test. * gcc.target/i386/apx-ccmp-2.c: New runtime test.
2024-06-06[APX] Adjust target-support check [PR 115341]Hongyu Wang1-1/+7
Current target apxf check does not specify sub-features that assembler supports, so the check with older binutils will fail at assemble stage for new apx features like NF,CCMP or CFCMOV. Adjust the assembler check for all apx subfeatures. gcc/testsuite/ChangeLog: PR target/115341 * lib/target-supports.exp (check_effective_target_apxf): Check for all apx sub-features.
2024-06-06Allow single-lane SLP in-order reductionsRichard Biener1-29/+19
The single-lane case isn't different from non-SLP, no re-association implied. But the transform stage cannot handle a conditional reduction op which isn't checked during analysis - this makes it work, exercised with a single-lane non-reduction-chain by gcc.target/i386/pr112464.c * tree-vect-loop.cc (vectorizable_reduction): Allow single-lane SLP in-order reductions. (vectorize_fold_left_reduction): Handle SLP reduction with conditional reduction op.
2024-06-06Add double reduction support for SLP vectorizationRichard Biener3-11/+31
The following makes double reduction vectorization work when using (single-lane) SLP vectorization. * tree-vect-loop.cc (vect_analyze_scalar_cycles_1): Queue double reductions in LOOP_VINFO_REDUCTIONS. (vect_create_epilog_for_reduction): Remove asserts disabling SLP for double reductions. (vectorizable_reduction): Analyze SLP double reductions only once and start off the correct places. * tree-vect-slp.cc (vect_get_and_check_slp_defs): Allow vect_double_reduction_def. (vect_build_slp_tree_2): Fix condition for the ignored reduction initial values. * tree-vect-stmts.cc (vect_analyze_stmt): Allow vect_double_reduction_def.