aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-11-17rs6000: Fix regression cases caused 16-byte by pieces moveHaochen Gui2-0/+39
The previous patch enables 16-byte by pieces move. Originally 16-byte move is implemented via pattern. expand_block_move does an optimization on P8 LE to leverage V2DI reversed load/store for memory to memory move. Now 16-byte move is implemented via by pieces move and finally split to two DI load/store. This patch creates an insn_and_split pattern to retake the optimization. gcc/ PR target/111449 * config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New. gcc/testsuite/ PR target/111449 * gcc.target/powerpc/pr111449-2.c: New.
2023-11-17rs6000: Enable vector mode for by pieces equality compareHaochen Gui6-0/+77
This patch adds a new expand pattern - cbranchv16qi4 to enable vector mode by pieces equality compare on rs6000. The macro MOVE_MAX_PIECES (COMPARE_MAX_PIECES) is set to 16 bytes when EFFICIENT_UNALIGNED_VSX is enabled, otherwise keeps unchanged. The macro STORE_MAX_PIECES is set to the same value as MOVE_MAX_PIECES by default, so now it's explicitly defined and keeps unchanged. gcc/ PR target/111449 * config/rs6000/altivec.md (cbranchv16qi4): New expand pattern. * config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn sequence for V16QImode equality compare. * config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define. (STORE_MAX_PIECES): Define. gcc/testsuite/ PR target/111449 * gcc.target/powerpc/pr111449-1.c: New. * gcc.dg/tree-ssa/sra-17.c: Add additional options for 32-bit powerpc. * gcc.dg/tree-ssa/sra-18.c: Likewise.
2023-11-17LoongArch: Implement C[LT]Z_DEFINED_VALUE_AT_ZEROLi Wei2-0/+10
The LoongArch has defined ctz and clz on the backend, but if we want GCC do CTZ transformation optimization in forwprop2 pass, GCC need to know the value of c[lt]z at zero, which may be beneficial for some test cases (like spec2017 deepsjeng_r). After implementing the macro, we test dynamic instruction count on deepsjeng_r: - before 1688423249186 - after 1660311215745 (1.66% reduction) gcc/ChangeLog: * config/loongarch/loongarch.h (CLZ_DEFINED_VALUE_AT_ZERO): Implement. (CTZ_DEFINED_VALUE_AT_ZERO): Same. gcc/testsuite/ChangeLog: * gcc.dg/pr90838.c: add clz/ctz test support on LoongArch.
2023-11-17Assert we don't create recursive DW_AT_{abstract_origin,specification}Richard Biener1-0/+3
We have a support case that shows GCC 7 sometimes creates DW_TAG_label refering to itself via a DW_AT_abstract_origin when using LTO. This for example triggers the sanity check added below during LTO bootstrap. Making this check cover more than just DW_AT_abstract_origin breaks bootstrap on trunk for /* GNU extension: Record what type our vtable lives in. */ if (TYPE_VFIELD (type)) { tree vtype = DECL_FCONTEXT (TYPE_VFIELD (type)); gen_type_die (vtype, context_die); add_AT_die_ref (type_die, DW_AT_containing_type, lookup_type_die (vtype)); so the check is for now restricted to DW_AT_abstract_origin and DW_AT_specification both of which we follow within get_AT. * dwarf2out.cc (add_AT_die_ref): Assert we do not add a self-ref DW_AT_abstract_origin or DW_AT_specification.
2023-11-17LoongArch: Increase cost of vector aligned store/load.Jiahao Xu1-2/+2
Based on SPEC2017 performance evaluation results, it's better to make them equal to the cost of unaligned store/load so as to avoid odd alignment peeling. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_builtin_vectorization_cost): Adjust.
2023-11-17LoongArch: Fix scan-assembler-times of lasx/lsx test case.Jiahao Xu4-48/+48
These tests fail when they are first added,this patch adjusts the scan-assembler-times to fix them. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: Adjust assembler times. * gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto. * gcc.target/loongarch/vector/lsx/lsx-vcond-1.c: Ditto. * gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: Ditto.
2023-11-17Daily bump.GCC Administrator5-1/+291
2023-11-17Only allow (copysign x, NEG_CONST) -> (fneg (fabs x)) simplification for ↵Andrew Pinski1-1/+1
constant folding [PR112483] On targets with native copysign instructions, (copysign x, -1) is usually more efficient than (fneg (fabs x)). Since r14-5284, in the middle end we always optimize (fneg (fabs x)) to (copysign x, -1), not vice versa. If the target does not support native fcopysign, expand_COPYSIGN will expand it as (fneg (fabs x)) anyway. gcc/ChangeLog: PR rtl-optimization/112483 * simplify-rtx.cc (simplify_binary_operation_1) <case COPYSIGN>: Call simplify_unary_operation for NEG instead of simplify_gen_unary.
2023-11-16Fix warning on new Ada testcaseEric Botcazou1-1/+2
gcc/testsuite/ * gnat.dg/varsize4.adb (Func): Initialize Byte_Read parameter.
2023-11-16RISC-V: Change unaligned fast/slow/avoid macros to misaligned [PR111557]Edwin Lu10-45/+45
Fix __riscv_unaligned_fast/slow/avoid macro name to __riscv_misaligned_fast/slow/avoid to be consistent with the RISC-V API Spec PR target/111557 gcc/ChangeLog: * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): update macro name gcc/testsuite/ChangeLog: * gcc.target/riscv/attribute-1.c: update macro name * gcc.target/riscv/attribute-4.c: ditto * gcc.target/riscv/attribute-5.c: ditto * gcc.target/riscv/predef-align-1.c: ditto * gcc.target/riscv/predef-align-2.c: ditto * gcc.target/riscv/predef-align-3.c: ditto * gcc.target/riscv/predef-align-4.c: ditto * gcc.target/riscv/predef-align-5.c: ditto * gcc.target/riscv/predef-align-6.c: ditto Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
2023-11-16i386: Optimize QImode insn with high input registersUros Bizjak4-0/+239
Sometimes the compiler emits the following code with <insn>qi_ext<mode>_0: shrl $8, %eax addb %bh, %al Patch introduces new low part QImode insn patterns with both of their input arguments extracted from high register. This invalid insn is split after reload to a move from the high register and <insn>qi_ext<mode>_0 instruction. The combine pass is able to convert shift to zero/sign-extract sub-RTX, which we split to the optimal: movzbl %bh, %edx addb %ah, %dl PR target/78904 gcc/ChangeLog: * config/i386/i386.md (*addqi_ext2<mode>_0): New define_insn_and_split pattern. (*subqi_ext2<mode>_0): Ditto. (*<code>qi_ext2<mode>_0): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr78904-10.c: New test. * gcc.target/i386/pr78904-10a.c: New test. * gcc.target/i386/pr78904-10b.c: New test.
2023-11-16hppa: Revise REG+D address support to allow long displacements before reloadJohn David Anglin4-28/+38
In analyzing PR rtl-optimization/112415, I realized that restricting REG+D offsets to 5-bits before reload results in very poor code and complexities in optimizing these instructions after reload. The general problem is long displacements are not allowed for floating point accesses when generating PA 1.1 code. Even with PA 2.0, there is a ELF linker bug that prevents using long displacements for floating point loads and stores. In the past, enabling long displacements before reload caused issues in reload. However, there have been fixes in the handling of reloads for floating-point accesses. This change allows long displacements before reload and corrects a couple of issues in the constraint handling for integer and floating-point accesses. 2023-11-16 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: PR rtl-optimization/112415 * config/pa/pa.cc (pa_legitimate_address_p): Allow 14-bit displacements before reload. Simplify logic flow. Revise comments. * config/pa/pa.h (TARGET_ELF64): New define. (INT14_OK_STRICT): Update define and comment. * config/pa/pa64-linux.h (TARGET_ELF64): Define. * config/pa/predicates.md (base14_operand): Don't check alignment of short displacements. (integer_store_memory_operand): Don't return true when reload_in_progress is true. Remove INT_5_BITS check. (floating_point_store_memory_operand): Don't return true when reload_in_progress is true. Use INT14_OK_STRICT to check whether long displacements are always okay.
2023-11-16Fix internal error on function returning dynamically-sized typeEric Botcazou4-0/+39
This is a tree sharing issue for the internal return type synthesized for a function returning a dynamically-sized type and taking an Out or In/Out parameter passed by copy. gcc/ada/ * gcc-interface/decl.cc (gnat_to_gnu_subprog_type): Also create a TYPE_DECL for the return type built for the CI/CO mechanism. gcc/testsuite/ * gnat.dg/varsize4.ads, gnat.dg/varsize4.adb: New test. * gnat.dg/varsize4_pkg.ads: New helper.
2023-11-16c++: Fix error recovery ICE [PR112365]Jakub Jelinek2-1/+10
check_field_decls for DECL_C_BIT_FIELD FIELD_DECLs with error_mark_node TREE_TYPE continues early and doesn't call check_bitfield_decl which would either set DECL_BIT_FIELD, or clear DECL_C_BIT_FIELD. So, the following testcase ICEs after emitting tons of errors, because SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD asserts DECL_BIT_FIELD. The patch skips that for FIELD_DECLs with error_mark_node, another option would be to check DECL_BIT_FIELD in addition to DECL_C_BIT_FIELD. 2023-11-16 Jakub Jelinek <jakub@redhat.com> PR c++/112365 * class.cc (layout_class_type): Don't SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD on FIELD_DECLs with error_mark_node type. * g++.dg/cpp0x/pr112365.C: New test.
2023-11-16i386: Fix invalid RTX in split2 pass [PR112567]Uros Bizjak1-20/+20
Also fix some indentitation inconsistencies. PR target/112567 gcc/ChangeLog: * config/i386/i386.md (*<any_logic:code>qi_ext<mode>_1_slp): Fix generation of invalid RTX in split pattern.
2023-11-16c++: add fixed testcases [PR98614, PR104802]Patrick Palka2-0/+27
Both of these PRs are fixed by r12-1403-gc4e50e500da7692a. PR c++/98614 PR c++/104802 gcc/testsuite/ChangeLog: * g++.dg/cpp1z/nontype-auto22.C: New test. * g++.dg/cpp2a/concepts-partial-spec14.C: New test.
2023-11-16c++: constantness of call to function pointer [PR111703]Patrick Palka3-1/+17
potential_constant_expression for CALL_EXPR tests FUNCTION_POINTER_TYPE_P on the callee rather than on the type of the callee, which means we always pass want_rval=any when recursing and so may fail to identify a non-constant function pointer callee as such. Fixing this turns out to further work around PR111703. PR c++/111703 PR c++/107939 gcc/cp/ChangeLog: * constexpr.cc (potential_constant_expression_1) <case CALL_EXPR>: Fix FUNCTION_POINTER_TYPE_P test. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-fn8.C: Extend test. * g++.dg/diagnostic/constexpr4.C: New test.
2023-11-16diagnostics: make m_lang_mask privateDavid Malcolm4-10/+19
No functional change intended. gcc/ChangeLog: * diagnostic.cc (diagnostic_context::set_option_hooks): Add "lang_mask" param. * diagnostic.h (diagnostic_context::option_enabled_p): Update for move of m_lang_mask. (diagnostic_context::set_option_hooks): Add "lang_mask" param. (diagnostic_context::get_lang_mask): New. (diagnostic_context::m_lang_mask): Move into m_option_callbacks, thus making private. * lto-wrapper.cc (main): Update for new lang_mask param of set_option_hooks. * toplev.cc (init_asm_output): Use get_lang_mask. (general_init): Move initialization of global_dc's lang_mask to new lang_mask param of set_option_hooks. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-11-16middle-end: skip checking loop exits if loop malformed [PR111878]Tamar Christina2-0/+23
Before my refactoring if the loop->latch was incorrect then find_loop_location skipped checking the edges and would eventually return a dummy location. It turns out that a loop can have loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS) but also not have a latch in which case get_loop_exit_edges traps. This restores the old behavior. gcc/ChangeLog: PR tree-optimization/111878 * tree-vect-loop-manip.cc (find_loop_location): Skip edges check if latch incorrect. gcc/testsuite/ChangeLog: PR tree-optimization/111878 * gcc.dg/graphite/pr111878.c: New test.
2023-11-16gcc.c-torture/execute/931004-13.c: Fix declaration of mainFlorian Weimer1-2/+2
gcc/testsuite/ * gcc.c-torture/execute/931004-13.c (main): Fix mistakenly swapped int/void types.
2023-11-16RISC-V: Implement target attributeKito Cheng25-45/+950
The target attribute which proposed in [1], target attribute allow user to specify a local setting per-function basis. The syntax of target attribute is `__attribute__((target("<ATTR-STRING>")))`. and the syntax of `<ATTR-STRING>` describes below: ``` ATTR-STRING := ATTR-STRING ';' ATTR | ATTR ATTR := ARCH-ATTR | CPU-ATTR | TUNE-ATTR ARCH-ATTR := 'arch=' EXTENSIONS-OR-FULLARCH EXTENSIONS-OR-FULLARCH := <EXTENSIONS> | <FULLARCHSTR> EXTENSIONS := <EXTENSION> ',' <EXTENSIONS> | <EXTENSION> FULLARCHSTR := <full-arch-string> EXTENSION := <OP> <EXTENSION-NAME> <VERSION> OP := '+' VERSION := [0-9]+ 'p' [0-9]+ | [1-9][0-9]* | EXTENSION-NAME := Naming rule is defined in RISC-V ISA manual CPU-ATTR := 'cpu=' <valid-cpu-name> TUNE-ATTR := 'tune=' <valid-tune-name> ``` Changes since v1: - Use std::unique_ptr rather than alloca to prevent memory issue. - Error rather than warning when attribute duplicated. [1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/35 gcc/ChangeLog: * config.gcc (riscv): Add riscv-target-attr.o. * config/riscv/riscv-protos.h (riscv_declare_function_size) New. (riscv_option_valid_attribute_p): New. (riscv_override_options_internal): New. (struct riscv_tune_info): New. (riscv_parse_tune): New. * config/riscv/riscv-target-attr.cc (class riscv_target_attr_parser): New. (struct riscv_attribute_info): New. (riscv_attributes): New. (riscv_target_attr_parser::parse_arch): New. (riscv_target_attr_parser::handle_arch): New. (riscv_target_attr_parser::handle_cpu): New. (riscv_target_attr_parser::handle_tune): New. (riscv_target_attr_parser::update_settings): New. (riscv_process_one_target_attr): New. (num_occurences_in_str): New. (riscv_process_target_attr): New. (riscv_option_valid_attribute_p): New. * config/riscv/riscv.cc: Include target-globals.h and riscv-subset.h. (struct riscv_tune_info): Move to riscv-protos.h. (get_tune_str): New. (riscv_parse_tune): New parameter null_p. (riscv_declare_function_size): New. (riscv_option_override): Build target_option_default_node and target_option_current_node. (riscv_save_restore_target_globals): New. (riscv_option_restore): New. (riscv_previous_fndecl): New. (riscv_set_current_function): Apply the target attribute. (TARGET_OPTION_RESTORE): Define. (TARGET_OPTION_VALID_ATTRIBUTE_P): Ditto. * config/riscv/riscv.h (SWITCHABLE_TARGET): Define to 1. (ASM_DECLARE_FUNCTION_SIZE) Define. * config/riscv/riscv.opt (mtune=): Add Save attribute. (mcpu=): Ditto. (mcmodel=): Ditto. * config/riscv/t-riscv: Add build rule for riscv-target-attr.o * doc/extend.texi: Add doc for target attribute. gcc/testsuite/ChangeLog: * gcc.target/riscv/target-attr-01.c: New. * gcc.target/riscv/target-attr-02.c: Ditto. * gcc.target/riscv/target-attr-03.c: Ditto. * gcc.target/riscv/target-attr-04.c: Ditto. * gcc.target/riscv/target-attr-05.c: Ditto. * gcc.target/riscv/target-attr-06.c: Ditto. * gcc.target/riscv/target-attr-07.c: Ditto. * gcc.target/riscv/target-attr-bad-01.c: Ditto. * gcc.target/riscv/target-attr-bad-02.c: Ditto. * gcc.target/riscv/target-attr-bad-03.c: Ditto. * gcc.target/riscv/target-attr-bad-04.c: Ditto. * gcc.target/riscv/target-attr-bad-05.c: Ditto. * gcc.target/riscv/target-attr-bad-06.c: Ditto. * gcc.target/riscv/target-attr-bad-07.c: Ditto. * gcc.target/riscv/target-attr-bad-08.c: Ditto. * gcc.target/riscv/target-attr-bad-09.c: Ditto. * gcc.target/riscv/target-attr-bad-10.c: Ditto. Reviewed-by: Christoph Müllner <christoph.muellner@vrull.eu>
2023-11-16RISC-V: Save/restore ra register correctly [PR112478]Kito Cheng2-0/+12
We set ra to fixed register now, but we still need to save/restore that at prologue/epilogue if that has used. gcc/ChangeLog: PR target/112478 * config/riscv/riscv.cc (riscv_save_return_addr_reg_p): Check ra is ever lived. gcc/testsuite/ChangeLog: PR target/112478 * gcc.target/riscv/pr112478.c: New. Reviewed-by: Christoph Müllner <christoph.muellner@vrull.eu> Tested-by: Christoph Müllner <christoph.muellner@vrull.eu>
2023-11-16Fix ICE of unrecognizable insn.liuhongt2-4/+25
The new added splitter will generate (insn 58 56 59 2 (set (reg:V4HI 20 xmm0 [129]) (vec_duplicate:V4HI (reg:HI 22 xmm2 [123]))) "testcase.c":16:21 -1 But we only have (define_insn "*vec_dupv4hi" [(set (match_operand:V4HI 0 "register_operand" "=y,Yw") (vec_duplicate:V4HI (truncate:HI (match_operand:SI 1 "register_operand" "0,Yw"))))] The patch add patterns for V4HI and V2HI. gcc/ChangeLog: PR target/112532 * config/i386/mmx.md (*vec_dup<mode>): Extend for V4HI and V2HI. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112532.c: New test.
2023-11-16i386: Fix mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi ↵Jakub Jelinek2-1/+31
peephole2 [PR112526] The following testcase is miscompiled on x86_64 since PR110551 r14-4968 commit. That commit added 2 peephole2s, one for mov imm,%rXX; mov %rYY,%rax; mulq %rXX -> mov imm,%rax; mulq %rYY which I believe is ok, and another one for mov imm,%rXX; mov %rYY,%rdx; mulx %rXX, %rZZ, %rWW -> mov imm,%rdx; mulx %rYY, %rZZ, %rWW which is wrong. Both peephole2s verify that %rXX above is dead at the end of the pattern, by checking if %rXX is either one of the registers overwritten in the multiplication (%rdx:%rax in the first case, the 2 destination registers of mulx in the latter case), because we no longer set %rXX to that immediate (we set %rax resp. %rdx to it instead) when the peephole2 replaces it. But, we also need to ensure that the other register previously set to the value of %rYY and newly to imm isn't used after the multiplication, and neither of the peephole2s does that. Now, for the first one (at least assuming in the % pattern the matching operand (i.e. hardcoded %rax resp. %rdx) after RA will always go first) I think it is always the case, because operands[2] if it must be %rax register will be overwritten by mulq writing to %rdx:%rax. But in the second case, there is no reason why %rdx couldn't be used after the pattern, and if it is (like in the testcase), we can't make those changes. So, the patch checks similarly to operands[0] that operands[2] (which ought to be %rdx if RA puts the % match_dup operand first and nothing swaps it afterwards) is either the same register as one of the destination registers of mulx or dies at the end of the multiplication. 2023-11-16 Jakub Jelinek <jakub@redhat.com> PR target/112526 * config/i386/i386.md (mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi): Verify in define_peephole2 that operands[2] dies or is overwritten at the end of multiplication. * gcc.target/i386/bmi2-pr112526.c: New test.
2023-11-16slp: Fix handling of IFN_CLZ/CTZ [PR112536]Jakub Jelinek2-0/+63
We ICE on the following testcase now that IFN_C[LT]Z calls can have one or two arguments (where 2 mean it is well defined at zero). The following patch makes us create child node only for the first argument and compatible_calls_p ensures the other argument is the same, which at least according to the testcase seems sufficient because of vect patterns. 2023-11-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/112536 * tree-vect-slp.cc (arg0_map): New variable. (vect_get_operand_map): For IFN_CLZ or IFN_CTZ, return arg0_map. * gcc.dg/pr112536.c: New test.
2023-11-16tree-optimization/112282 - fix testcaseRichard Biener1-3/+8
Avoid requiring a glibc specific symbol. PR tree-optimization/112282 * gcc.dg/torture/pr112282.c: Do not use __assert_fail.
2023-11-16VECT: Clear LOOP_VINFO_USING_SELECT_VL_P when loop is not partial vectorizedJuzhe-Zhong2-0/+25
This patch fixes ICE: https://godbolt.org/z/z8T6o6qov <source>: In function 'b': <source>:2:6: error: missing definition 2 | void b() { | ^ for SSA_NAME: loop_len_8 in statement: _1 = -loop_len_8; during GIMPLE pass: vect <source>:2:6: internal compiler error: verify_ssa failed 0x7f1b56331082 __libc_start_main ???:0 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. Compiler returned: 1 The root cause is we generate such IR in vectorization: _1 = -loop_len_8; vect_cst__11 = {_1, _1}; _18 = vect_vec_iv_.6_14 + vect_cst__11; loop_len_8 is uninitialized value. The IR _18 = vect_vec_iv_.6_14 + vect_cst__11; is generated because of we are adding induction variable with the result of SELECT_VL instead of VF. The code is: else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) { /* When we're using loop_len produced by SELEC_VL, the non-final iterations are not always processing VF elements. So vectorize induction variable instead of _21 = vect_vec_iv_.6_22 + { VF, ... }; We should generate: _35 = .SELECT_VL (ivtmp_33, VF); vect_cst__22 = [vec_duplicate_expr] _35; _21 = vect_vec_iv_.6_22 + vect_cst__22; */ gcc_assert (!slp_node); gimple_seq seq = NULL; vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); tree len = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0); expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr), unshare_expr (len)), &seq, true, NULL_TREE); new_name = gimple_build (&seq, MULT_EXPR, TREE_TYPE (step_expr), expr, step_expr); gsi_insert_seq_before (&si, seq, GSI_SAME_STMT); step_iv_si = &si; } LOOP_VINFO_USING_SELECT_VL_P is set before loop vectorization analysis so we don't know whether it is partial vectorization or not but the induction variable depends on SELECT_VL_P is true. So update SELECT_VL_P as false when it is not partial vectorization. PR middle-end/112554 gcc/ChangeLog: * tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling): Clear SELECT_VL_P for non-partial vectorization. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112554.c: New test.
2023-11-15c++: fix parsing with auto(x) [PR112410]Marek Polacek3-0/+83
Here we are wrongly parsing int y(auto(42)); which uses the C++23 cast-to-prvalue feature, and initializes y to 42. However, we were treating the auto as an implicit template parameter. Fixing the auto{42} case is easy, but when auto is followed by a (, I found the fix to be much more involved. For instance, we cannot use cp_parser_expression, because that can give hard errors. It's also necessary to disambiguate 'auto(i)' as 'auto i', not a cast. auto(), auto(int), auto(f)(int), auto(*), auto(i[]), auto(...), etc. are all function declarations. This patch rectifies that by undoing the implicit function template modification. In the test above, we should notice that the parameter list is ill-formed, and since we've synthesized an implicit template parameter, we undo it by calling abort_fully_implicit_template. Then, we'll parse the "(auto(42))" as an initializer. PR c++/112410 gcc/cp/ChangeLog: * parser.cc (cp_parser_direct_declarator): Maybe call abort_fully_implicit_template if it turned out the parameter list was ill-formed. gcc/testsuite/ChangeLog: * g++.dg/cpp23/auto-fncast13.C: New test. * g++.dg/cpp23/auto-fncast14.C: New test.
2023-11-16[i386] APX: Fix EGPR usage in several patterns.Hongyu Wang2-13/+21
For vextract/insert{if}128 they cannot adopt EGPR in their memory operand, all related pattern should be adjusted to disable EGPR usage on them. Also fix a wrong gpr16 attr for insertps. gcc/ChangeLog: * config/i386/sse.md (vec_extract_hi_<mode>): Add noavx512vl alternative with attr addr gpr16 and "jm" constraint. (vec_extract_hi_<mode>): Likewise for SF vector modes. (@vec_extract_hi_<mode>): Likewise. (*vec_extractv2ti): Likewise. (vec_set_hi_<mode><mask_name>): Likewise. * config/i386/mmx.md (@sse4_1_insertps_<mode>): Correct gpr16 attr for each alternative.
2023-11-16Daily bump.GCC Administrator4-1/+405
2023-11-15i386: Optimize strict_low_part QImode insn with high input registersUros Bizjak7-0/+376
Following testcase: struct S1 { unsigned char val; unsigned char pad1; unsigned short pad2; }; struct S2 { unsigned char pad1; unsigned char val; unsigned short pad2; }; struct S1 test_add (struct S1 a, struct S2 b, struct S2 c) { a.val = b.val + c.val; return a; } compiles with -O2 to: movl %edi, %eax movzbl %dh, %edx movl %esi, %ecx movb %dl, %al addb %ch, %al The insert to %al can go directly from %dh: movl %edi, %eax movl %esi, %ecx movb %dh, %al addb %ch, %al Patch introduces strict_low_part QImode insn patterns with both of their input arguments extracted from high register. This invalid insn is split after reload to a lowpart insert from the high register and <insn>qi_ext<mode>_1_slp instruction. PR target/78904 gcc/ChangeLog: * config/i386/i386.md (*movstrictqi_ext<mode>_1): New insn pattern. (*addqi_ext<mode>_2_slp): New define_insn_and_split pattern. (*subqi_ext<mode>_2_slp): Ditto. (*<any_logic:code>qi_ext<mode>_2_slp): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr78904-8.c: New test. * gcc.target/i386/pr78904-8a.c: New test. * gcc.target/i386/pr78904-8b.c: New test. * gcc.target/i386/pr78904-9.c: New test. * gcc.target/i386/pr78904-9a.c: New test. * gcc.target/i386/pr78904-9b.c: New test.
2023-11-15RISC-V: Fix ICE in non-canonical march parsingPatrick O'Neill3-4/+27
Passing in a base extension in non-canonical order (i, e, g) causes GCC to ICE: xgcc: error: '-march=rv64ge': ISA string is not in canonical order. 'e' xgcc: internal compiler error: in add, at common/config/riscv/riscv-common.cc:671 ... This is fixed by skipping to the next extension when a non-canonical order is detected. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::parse_std_ext): Emit an error and skip to the next extension when a non-canonical ordering is detected. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-27.c: New test. * gcc.target/riscv/arch-28.c: New test. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-11-15c++, analyzer: Expand CAN_HAVE_LOCATION_P macro.Bernhard Reutner-Fischer2-2/+2
r14-985-gca2007a9bb3074 used the collapsed macro definition CAN_HAVE_LOCATION_P in gcc-rich-location.cc and r14-977-g8861c80733da5c in c++'s build_cplus_array_type (). However, although otherwise correct, the usage of CAN_HAVE_LOCATION_P in these two spots is misleading, so this patch reverts aforementioned two hunks. gcc/cp/ChangeLog: * tree.cc (build_cplus_array_type): Revert using the macro CAN_HAVE_LOCATION_P. gcc/ChangeLog: * gcc-rich-location.cc (maybe_range_label_for_tree_type_mismatch::get_text): Revert using the macro CAN_HAVE_LOCATION_P.
2023-11-15RISC-V: fix vsetvli pass testsuite failure [PR/112447]Juzhe-Zhong1-35/+35
Fixes: f0e28d8c1371 ("RISC-V: Fix failed hoist in LICM of vmv.v.x instruction") Since above commit, we have following failure: FAIL: gcc.c-torture/execute/memset-3.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.c-torture/execute/memset-3.c -O3 -g execution test The issue was not the commit but rather it unravelled an issue in the vsetvli pass. Here's Juzhe's analysis: We have 2 types of global vsetvls insertion. One is earliest fusion of each end of the block. The other is LCM suggested edge vsetvls. So before this patch, insertion as follows: | (insn 2817 2820 2818 361 (set (reg:SI 67 vtype) | (unspec:SI [ | (const_int 8 [0x8]) | (const_int 7 [0x7]) | (const_int 1 [0x1]) repeated x2 | ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only} | (nil)) | (insn 2818 2817 999 361 (set (reg:SI 67 vtype) | (unspec:SI [ | (const_int 32 [0x20]) | (const_int 1 [0x1]) repeated x3 | ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only} | (nil)) After this patch: | (insn 2817 2820 2819 361 (set (reg:SI 67 vtype) | (unspec:SI [ | (const_int 32 [0x20]) | (const_int 1 [0x1]) repeated x3 | ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only} | (nil)) | (insn 2819 2817 999 361 (set (reg:SI 67 vtype) | (unspec:SI [ | (const_int 8 [0x8]) | (const_int 7 [0x7]) | (const_int 1 [0x1]) repeated x2 | ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only} | (nil)) The original insertion order is incorrect. We should first insert earliest fusion since it is the vsetvls information already there which was seen by later LCM. We just delay the insertion. So it should be come before the LCM suggested insertion. PR target/112447 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): Insert local vsetvl info before LCM suggested one. Tested-by: Patrick O'Neill <patrick@rivosinc.com> # pre-commit-CI #679 Co-developed-by: Vineet Gupta <vineetg@rivosinc.com>
2023-11-15RISC-V: elide unnecessary sign extend when expanding cmp_and_jumpVineet Gupta1-2/+21
RV64 compare and branch instructions only support 64-bit operands. At Expand time, the backend conservatively zero/sign extends its operands even if not needed, such as incoming function args which ABI/ISA guarantee to be sign-extended already (this is true for SI, HI, QI operands) And subsequently REE fails to eliminate them as "missing defintion(s)" or "multiple definition(s) since function args don't have explicit definition. So during expand riscv_extend_comparands (), if an operand is a subreg-promoted SI with inner DI, which is representative of a function arg, just peel away the subreg to expose the DI, eliding the sign extension. As Jeff noted this routine is also used in if-conversion so potentially can also help there. Note there's currently patches floating around to improve REE and also a new pass to eliminate unneccesary extensions, but it is still beneficial to not generate those extra extensions in first place. It is obviously less work for post-reload passes such as REE, but even for earlier passes, such as combine, having to deal with one less thing and ensuing fewer combinations is a win too. Way too many existing tests used to observe this issue. e.g. gcc.c-torture/compile/20190827-1.c -O2 -march=rv64gc It elimiates the SEXT.W gcc/ChangeLog: * config/riscv/riscv.cc (riscv_sign_extend_if_not_subreg_prom): New. * (riscv_extend_comparands): Call New function on operands. Tested-by: Patrick O'Neill <patrick@rivosinc.com> # pre-commit-CI #676 Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2023-11-15c++: direct enum init from type-dep elt [PR112515]Patrick Palka2-0/+10
The NON_DEPENDENT_EXPR removal exposed that is_direct_enum_init can be called in a template context on a CONSTRUCTOR that isn't type-dependent but whose element is. PR c++/112515 gcc/cp/ChangeLog: * decl.cc (is_direct_enum_init): Check type-dependence of the single element. gcc/testsuite/ChangeLog: * g++.dg/template/non-dependent30.C: New test.
2023-11-15c++: partially inst requires-expr in noexcept-spec [PR101043]Patrick Palka2-7/+23
Here we're ICEing from strip_typedefs for the partially instantiated requires-expression when walking its REQUIRES_EXPR_EXTRA_ARGS which in this case is a TREE_LIST with non-empty TREE_PURPOSE (to hold the captured local specialization 't' as per build_extra_args) which strip_typedefs doesn't expect. We can probably skip walking REQUIRES_EXPR_EXTRA_ARGS at all since it shouldn't contain any typedefs in the first place, but it seems safer and more generally useful to just teach strip_typedefs to handle non-empty TREE_PURPOSE the obvious way. (The code asserts TREE_PURPOSE was empty even since since its inception i.e. r189298.) PR c++/101043 gcc/cp/ChangeLog: * tree.cc (strip_typedefs_expr) <case TREE_LIST>: Handle non-empty TREE_PURPOSE. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-requires37.C: New test.
2023-11-15c++: non-dependent .* operand folding [PR112427]Patrick Palka3-2/+28
Here when building up the non-dependent .* expression, we crash from fold_convert on 'b.a' due to this (templated) COMPONENT_REF having an IDENTIFIER_NODE instead of FIELD_DECL operand that middle-end routines expect. Like in r14-4899-gd80a26cca02587, this patch fixes this by replacing the problematic piecemeal folding with a single call to cp_fully_fold. Also, don't bother building the POINTER_PLUS_EXPR in a template context. This means the returned non-dependent tree might not have TREE_SIDE_EFFECTS set when it used to, so we need to compensate by making build_min_non_dep propagate TREE_SIDE_EFFECTS from the original arguments like buildN and build_min do. PR c++/112427 gcc/cp/ChangeLog: * tree.cc (build_min_non_dep): Propagate TREE_SIDE_EFFECTS from the original arguments. (build_min_non_dep_call_vec): Likewise. * typeck2.cc (build_m_component_ref): Use cp_convert, build2 and cp_fully_fold instead of fold_build_pointer_plus and fold_convert. Don't build the POINTER_PLUS_EXPR in a template context. gcc/testsuite/ChangeLog: * g++.dg/template/non-dependent29.C: New test.
2023-11-15c++: constantness of local var in constexpr fn [PR111703, PR112269]Patrick Palka2-2/+26
potential_constant_expression was incorrectly treating most local variables from a constexpr function as constant because it wasn't considering the 'now' parameter. This patch fixes this by relaxing its var_in_maybe_constexpr_fn checks accordingly, which turns out to partially fix two recently reported regressions: PR111703 is a regression caused by r11-550-gf65a3299a521a4 for restricting constexpr evaluation during warning-dependent folding. The mechanism is intended to restrict only constant evaluation of the instantiated non-dependent expression, but it also ends up restricting constant evaluation occurring during instantiation of the expression, in particular when instantiating the converted argument 'x' (a VIEW_CONVERT_EXPR) into a copy constructor call. This seems like a flaw in the mechanism, though I don't know if we want to fix the mechanism or get rid of it completely since the original testcases which motivated the mechanism are fixed more simply by r13-1225-gb00b95198e6720. In any case, this patch partially fixes this by making us correctly treat 'x' as non-constant which prevents the problematic warning-dependent folding from occurring at all. PR112269 is caused by r14-4796-g3e3d73ed5e85e7 for merging tsubst_copy into tsubst_copy_and_build. tsubst_copy used to exit early when 'args' was empty, behavior which that commit deliberately didn't preserve. This early exit masked the fact that COMPLEX_EXPR wasn't handled by tsubst at all, and is a tree code that apparently we could see during warning-dependent folding on some targets. A complete fix is to add handling for this tree code in tsubst_expr, but this patch should fix the reported testsuite failures since the COMPLEX_EXPRs that crop up in <complex> are considered non-constant expressions after this patch. PR c++/111703 PR c++/112269 gcc/cp/ChangeLog: * constexpr.cc (potential_constant_expression_1) <case VAR_DECL>: Only consider var_in_maybe_constexpr_fn if 'now' is false. <case INDIRECT_REF>: Likewise. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-fn8.C: New test.
2023-11-15i386: Fix split condition of *<insn>qi_ext<mode>_1_slp patternsUros Bizjak1-3/+3
gcc/ChangeLog: * config/i386/i386.md (*addqi_ext<mode>_1_slp): Add "&& " before "reload_completed" in split condition. (*subqi_ext<mode>_1_slp): Ditto. (*<any_logic:code>qi_ext<mode>_1_slp): Ditto.
2023-11-15i386: Fix strict_low_part QImode insn with high input register patterns ↵Uros Bizjak1-12/+12
[PR112540] PR target/112540 gcc/ChangeLog: * config/i386/i386.md (*addqi_ext<mode>_1_slp): Correct operand numbers in split pattern. Replace !Q constraint of operand 1 with !qm. Add insn constrain. (*subqi_ext<mode>_1_slp): Ditto. (*<any_logic:code>qi_ext<mode>_1_slp): Ditto.
2023-11-15nvptx: Fix copy'n'paste-o in '__builtin_nvptx_brev' descriptionThomas Schwinge1-1/+0
Minor fix-up for commit c09471fbc7588db2480f036aa56a2403d3c03ae5 "nvptx: Add suppport for __builtin_nvptx_brev instrinsic". gcc/ * doc/extend.texi (Nvidia PTX Built-in Functions): Fix copy'n'paste-o in '__builtin_nvptx_brev' description.
2023-11-15Update nvptx's bitrev<mode>2 pattern to use BITREVERSE rtx.Roger Sayle3-50/+5
This minor tweak to the nvptx backend switches the representation of of the brev instruction from an UNSPEC to instead use the new BITREVERSE rtx. This allows various RTL optimizations including evaluation (constant folding) of integer constant arguments at compile-time. gcc/ * config/nvptx/nvptx.md (UNSPEC_BITREV): Delete. (bitrev<mode>2): Represent using bitreverse. gcc/testsuite/ * gcc.target/nvptx/brev-2-O2.c: Adjust. * gcc.target/nvptx/brevll-2-O2.c: Likewise. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2023-11-15nvptx: Extend 'brev' test casesThomas Schwinge6-4/+392
In order to observe effects of a later patch, extend the 'brev' test cases added in commit c09471fbc7588db2480f036aa56a2403d3c03ae5 "nvptx: Add suppport for __builtin_nvptx_brev instrinsic". gcc/testsuite/ * gcc.target/nvptx/brev-1.c: Extend. * gcc.target/nvptx/brev-2.c: Rename to... * gcc.target/nvptx/brev-2-O2.c: ... this, and extend. Copy to... * gcc.target/nvptx/brev-2-O0.c: ... this, and adapt for '-O0'. * gcc.target/nvptx/brevll-1.c: Extend. * gcc.target/nvptx/brevll-2.c: Rename to... * gcc.target/nvptx/brevll-2-O2.c: ... this, and extend. Copy to... * gcc.target/nvptx/brevll-2-O0.c: ... this, and adapt for '-O0'.
2023-11-15amdgcn: Add Accelerator VGPR registersAndrew Stubbs16-191/+818
Add the new CDNA register file. We don't support any of the specialized instructions that use these registers, but they're useful to relieve register pressure without spilling to stack. Co-authored-by: Andrew Jenner <andrew@codesourcery.com> gcc/ChangeLog: * config/gcn/constraints.md: Add "a" AVGPR constraint. * config/gcn/gcn-valu.md (*mov<mode>): Add AVGPR alternatives. (*mov<mode>_4reg): Likewise. (@mov<mode>_sgprbase): Likewise. (gather<mode>_insn_1offset<exec>): Likewise. (gather<mode>_insn_1offset_ds<exec>): Likewise. (gather<mode>_insn_2offsets<exec>): Likewise. (scatter<mode>_expr<exec_scatter>): Likewise. (scatter<mode>_insn_1offset_ds<exec_scatter>): Likewise. (scatter<mode>_insn_2offsets<exec_scatter>): Likewise. * config/gcn/gcn.cc (MAX_NORMAL_AVGPR_COUNT): Define. (gcn_class_max_nregs): Handle AVGPR_REGS and ALL_VGPR_REGS. (gcn_hard_regno_mode_ok): Likewise. (gcn_regno_reg_class): Likewise. (gcn_spill_class): Allow spilling to AVGPRs on TARGET_CDNA1_PLUS. (gcn_sgpr_move_p): Handle AVGPRs. (gcn_secondary_reload): Reload AVGPRs via VGPRs. (gcn_conditional_register_usage): Handle AVGPRs. (gcn_vgpr_equivalent_register_operand): New function. (gcn_valid_move_p): Check for validity of AVGPR moves. (gcn_compute_frame_offsets): Handle AVGPRs. (gcn_memory_move_cost): Likewise. (gcn_register_move_cost): Likewise. (gcn_vmem_insn_p): Handle TYPE_VOP3P_MAI. (gcn_md_reorg): Handle AVGPRs. (gcn_hsa_declare_function_name): Likewise. (print_reg): Likewise. (gcn_dwarf_register_number): Likewise. * config/gcn/gcn.h (FIRST_AVGPR_REG): Define. (AVGPR_REGNO): Define. (LAST_AVGPR_REG): Define. (SOFT_ARG_REG): Update. (FRAME_POINTER_REGNUM): Update. (DWARF_LINK_REGISTER): Update. (FIRST_PSEUDO_REGISTER): Update. (AVGPR_REGNO_P): Define. (enum reg_class): Add AVGPR_REGS and ALL_VGPR_REGS. (REG_CLASS_CONTENTS): Add new register classes and add entries for AVGPRs to all classes. (REGISTER_NAMES): Add AVGPRs. * config/gcn/gcn.md (FIRST_AVGPR_REG, LAST_AVGPR_REG): Define. (AP_REGNUM, FP_REGNUM): Update. (define_attr "type"): Add vop3p_mai. (define_attr "unit"): Handle vop3p_mai. (define_attr "gcn_version"): Add "cdna2". (define_attr "enabled"): Handle cdna2. (*mov<mode>_insn): Add AVGPR alternatives. (*movti_insn): Likewise. * config/gcn/mkoffload.cc (isa_has_combined_avgprs): New. (process_asm): Process avgpr_count. * config/gcn/predicates.md (gcn_avgpr_register_operand): New. (gcn_avgpr_hard_register_operand): New. * doc/md.texi: Document the "a" constraint. gcc/testsuite/ChangeLog: * gcc.target/gcn/avgpr-mem-double.c: New test. * gcc.target/gcn/avgpr-mem-int.c: New test. * gcc.target/gcn/avgpr-mem-long.c: New test. * gcc.target/gcn/avgpr-mem-short.c: New test. * gcc.target/gcn/avgpr-spill-double.c: New test. * gcc.target/gcn/avgpr-spill-int.c: New test. * gcc.target/gcn/avgpr-spill-long.c: New test. * gcc.target/gcn/avgpr-spill-short.c: New test. libgomp/ChangeLog: * plugin/plugin-gcn.c (max_isa_vgprs): New. (run_kernel): CDNA2 devices have more VGPRs.
2023-11-15amdgcn: simplify secondary reload patternsAndrew Stubbs2-90/+4
Remove some unnecessary complexity; no functional change is intended, although LRA appears to use the constraints from the reload_in/out patterns, so it's probably an improvement for it to see the real sgprbase constraints. gcc/ChangeLog: * config/gcn/gcn-valu.md (mov<mode>_sgprbase): Add @ modifier. (reload_in<mode>): Delete. (reload_out<mode>): Delete. * config/gcn/gcn.cc (CODE_FOR): Delete. (get_code_for_##PREFIX##vN##SUFFIX): Delete. (CODE_FOR_OP): Delete. (get_code_for_##PREFIX): Delete. (gcn_secondary_reload): Replace "get_code_for" with "code_for".
2023-11-15s390: Fix generation of s390-gen-builtins.hStefan Schulze Frielinghaus1-1/+1
By default the preprocessed output includes linemarkers. This leads to an error if -pedantic is used as e.g. during bootstrap: s390-gen-builtins.h:1:3: error: style of line directive is a GCC extension [-Werror] Fixed by omitting linemarkers while generating s390-gen-builtins.h. gcc/ChangeLog: * config/s390/t-s390: Generate s390-gen-builtins.h without linemarkers.
2023-11-15tree-optimization/112282 - wrong-code with ifcvt hoistingRichard Biener2-23/+153
The following avoids hoisting of invariants from conditionally executed parts of an if-converted loop. That now makes a difference since we perform bitfield lowering even when we do not actually if-convert the loop. if-conversion deals with resetting flow-sensitive info when necessary already. PR tree-optimization/112282 * tree-if-conv.cc (ifcvt_hoist_invariants): Only hoist from the loop header. * gcc.dg/torture/pr112282.c: New testcase.
2023-11-15Fix ICE with SLP and -fdbg-cntRichard Biener1-3/+6
We have to clear the visited flag on stmts. * tree-vect-slp.cc (vect_slp_region): Also clear visited flag when we skipped an instance due to -fdbg-cnt.
2023-11-15libsanitizer: Adjust the asan/sanity-check-pure-c-1.c testJakub Jelinek1-2/+2
The updated libasan doesn't print __interceptor_free (or __interceptor_malloc) but free (or malloc), the following patch adjusts the testcase so that it accepts it. 2023-11-15 Jakub Jelinek <jakub@redhat.com> * c-c++-common/asan/sanity-check-pure-c-1.c: Adjust for interceptor_ or wrap_ substrings possibly not being emitted in newer libasan.