aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
10 daysRISC-V: Primary vector pipeline model for sifive 7 seriesKito Cheng1-1/+136
This commit introduces a primary vector pipeline model for the SiFive 7 series, that pipeline model is kind of simplified version, it only defined vector command queue, arithmetic unit, and vector load store unit. The latency of real hardware is LMUL-aware, but I realize that will complicate the model a lots, so I just use a simplified version, which all LMUL use same latency, we may improve it later once we have found meaningful performance difference. gcc/ChangeLog: * config/riscv/sifive-7.md: Add primary vector pipeline model for SiFive 7 series.
10 daysRISC-V: Adding B ext, fp16 and missing scalar instruction type for sifive-7 ↵Kito Cheng2-3/+34
pipeline model [PR120659] gcc/ChangeLog: PR target/120659 * config/riscv/sifive-7.md: Add B extension, fp16 and missing scalar instruction type for sifive-7 pipeline model. gcc/testsuite/ChangeLog: PR target/120659 * gcc.target/riscv/pr120659.c: New test.
10 daysHandle SLP build operand swapping for ternaries and callsRichard Biener3-10/+60
The following adds SLP build operand swapping for .FMA which is a ternary operator and a call. The current code only handles binary operators in assignments, thus the patch extends this to handle both calls and assignments as well as binary and ternary operators. * tree-vect-slp.cc (vect_build_slp_2): Handle ternary and call operators when swapping operands. * gcc.target/i386/vect-pr82426.c: Pass explicit -ffp-contract=fast. * gcc.target/i386/vect-pr82426-2.c: New testcase variant with -ffp-contract=on.
10 daysRISC-V: Vector-scalar negate-multiply-(subtract-)accumulate [PR119100]Paul-Antoine Arras21-18/+187
This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a (possibly negated) minus-mult RTL instruction. Before this patch, we have two instructions, e.g.: vfmv.v.f v6,fa0 vfnmacc.vv v2,v6,v4 After, we get only one: vfnmacc.vf v2,fa0,v4 PR target/119100 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfnmsub_<mode>,*vfnmadd_<mode>): Handle both add and acc variants. * config/riscv/vector.md (*pred_mul_neg_<optab><mode>_scalar_undef): New pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfnmacc and vfnmsac. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop.h (DEF_VF_MULOP_CASE_1): Fix return type. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f64.c: New test.
10 daysaarch64: Add support for NVIDIA GB10Kyrylo Tkachov3-2/+5
This adds support for -mcpu=gb10. This is a big.LITTLE configuration involving Cortex-X925 and Cortex-A725 cores. The appropriate MIDR numbers are added to detect them in -mcpu=native. We did not add an -mcpu=cortex-x925.cortex-a725 option because GB10 does include the crypto instructions which we want on by default, and the current convention is to not enable such extensions for Arm Cortex cores in -mcpu where they are optional in the IP. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * config/aarch64/aarch64-cores.def (gb10): New entry. * config/aarch64/aarch64-tune.md: Regenerate. * doc/invoke.texi (AArch64 Options): Document the above.
11 daysExtend nonnull_if_nonzero attribute [PR120520]Jakub Jelinek18-58/+605
C2Y voted in the https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3466.pdf paper, which clarifies some of the conditional nonnull cases. For strncat/__strncat_chk no changes are necessary, we already use __attribute__((nonnull (1), nonnull_if_nonzero (2, 3))) attributes on the builtin and glibc can do the same too, meaning that first argument must be nonnull always and second must be nonnull if the third one is nonzero. The problem is with the fread/fwrite changes, where the paper adds: If size or nmemb is zero, +ptr may be a null pointer, fread returns zero and the contents of the array and the state of the stream remain unchanged. and ditto for fwrite, so the two argument nonnull_if_nonzero attribute isn't usable to express that, because whether the pointer can be null depends on 2 integral arguments rather than one. The following patch extends the nonnull_if_nonzero attribute, so that instead of requiring 2 arguments it allows 2 or 3, the first one is still the pointer argument index which sometimes must not be null and the other one or two are integral arguments, if there are 2, the invalid case is only if pointer is null and both the integral arguments are nonzero. 2025-06-30 Jakub Jelinek <jakub@redhat.com> PR c/120520 PR c/117023 gcc/ * builtin-attrs.def (DEF_LIST_INT_INT_INT): Define it and use for 1,2,3. (ATTR_NONNULL_IF123_LIST): New DEF_ATTR_TREE_LIST. (ATTR_NONNULL_4_IF123_LIST): Likewise. * builtins.def (BUILT_IN_FWRITE): Use ATTR_NONNULL_4_IF123_LIST instead of ATTR_NONNULL_LIST. (BUILT_IN_FWRITE_UNLOCKED): Likewise. * gimple.h (infer_nonnull_range_by_attribute): Add another optional tree * argument defaulted to NULL. * gimple.cc (infer_nonnull_range_by_attribute): Add OP3 argument, handle 3 argument nonnull_if_nonzero attribute. * builtins.cc (validate_arglist): Handle 3 argument nonnull_if_nonzero attribute. * tree-ssa-ccp.cc (pass_post_ipa_warn::execute): Likewise. * ubsan.cc (instrument_nonnull_arg): Adjust infer_nonnull_range_by_attribute caller, handle 3 argument nonnull_if_nonzero attribute. * gimple-range-infer.cc (gimple_infer_range::gimple_infer_range): Handle 3 argument nonnull_if_nonzero attribute. * doc/extend.texi (nonnull_if_nonzero): Document 3 argument version of the attribute. gcc/c-family/ * c-attribs.cc (c_common_gnu_attributes): Allow 2 or 3 arguments for nonnull_if_nonzero attribute instead of only 2. (handle_nonnull_if_nonzero_attribute): Handle 3 argument nonnull_if_nonzero. * c-common.cc (struct nonnull_arg_ctx): Rename other member to other1, add other2 member. (check_function_nonnull): Clear a if nonnull attribute has an argument. Adjust for nonnull_arg_ctx changes. Handle 3 argument nonnull_if_nonzero attribute. (check_nonnull_arg): Adjust for nonnull_arg_ctx changes, emit different diagnostics for 3 argument nonnull_if_nonzero attributes. (check_function_arguments): Adjust ctx var initialization. gcc/analyzer/ * sm-malloc.cc (malloc_state_machine::on_stmt): Handle 3 argument nonnull_if_nonzero attribute. gcc/testsuite/ * gcc.dg/nonnull-9.c: Tweak for 3 argument nonnull_if_nonzero attribute support, add further tests. * gcc.dg/nonnull-12.c: New test. * gcc.dg/nonnull-13.c: New test. * gcc.dg/nonnull-14.c: New test. * c-c++-common/ubsan/nonnull-8.c: New test. * c-c++-common/ubsan/nonnull-9.c: New test.
11 dayslra: Check for null lowpart_subregs [PR120733]Richard Sandiford1-3/+6
lra-eliminations.cc:move_plus_up tries to: Transform (subreg (plus reg const)) to (plus (subreg reg) const) when it is possible. Most of it is heavily conditional: if (!paradoxical_subreg_p (x) && GET_CODE (subreg_reg) == PLUS && CONSTANT_P (XEXP (subreg_reg, 1)) && GET_MODE_CLASS (x_mode) == MODE_INT && GET_MODE_CLASS (subreg_reg_mode) == MODE_INT) { rtx cst = simplify_subreg (x_mode, XEXP (subreg_reg, 1), subreg_reg_mode, subreg_lowpart_offset (x_mode, subreg_reg_mode)); if (cst && CONSTANT_P (cst)) but the final: return gen_rtx_PLUS (x_mode, lowpart_subreg (x_mode, XEXP (subreg_reg, 0), subreg_reg_mode), cst); assumed without checking that lowpart_subreg succeeded. In the PR, this led to creating a PLUS with a null operand. In more detail, the testcase had: (var_location a (plus:SI (subreg:SI (reg/f:DI 64 sfp) 0) (const_int -4 [0xfffffffffffffffc]))) with sfp being eliminated to (plus:DI (reg:DI sp) (const_int 16)). Initially, during the !subst_p phase, lra_eliminate_regs_1 sees the PLUS and recurses into each operand. The recursive call sees the SUBREG and recurses into the SUBREG_REG. Since !subst_p, this final recursive call replaces (reg:DI sfp) with: (plus:DI (reg:DI sfp) (const_int 16)) (i.e. keeping the base register the same). So the SUBREG is eliminated to: (subreg:SI (plus:DI (reg:DI sfp) (const_int 16)) 0) The PLUS handling in lra_eliminate_regs_1 then passes this to move_plus_up, which tries to push the SUBREG into the PLUS. This means trying to create: (plus:SI (simplify_gen_subreg:SI (reg:DI sfp) 0) (const_int 16)) The simplify_gen_subreg then returns null, because simplify_subreg_regno fails both with allow_stack_regs==false (when trying to simplify the SUBREG to a REG) and with allow_stack_regs=true (when validating whether the SUBREG can be generated). And that in turn happens because aarch64 refuses to allow SImode to be stored in sfp: if (regno == SP_REGNUM) /* The purpose of comparing with ptr_mode is to support the global register variable associated with the stack pointer register via the syntax of asm ("wsp") in ILP32. */ return mode == Pmode || mode == ptr_mode; if (regno == FRAME_POINTER_REGNUM || regno == ARG_POINTER_REGNUM) return mode == Pmode; This seems dubious. If the frame pointer can hold a DImode value then it can also hold an SImode value. There might be limited cases when the low 32 bits of the frame pointer are useful, but aarch64_hard_regno_mode_ok doesn't have the context to second-guess things like that. It seemed from a quick scan of other targets that they behave more as I'd expect. So there might be a target bug here too. But it seemed worth fixing the unchecked use of lowpart_subreg independently of that. The patch fixes an existing ICE in gcc.c-torture/compile/pass.c. gcc/ PR rtl-optimization/120733 * lra-eliminations.cc (move_plus_up): Check whether lowpart_subreg returns null.
11 daysRe-add logic to mitigate some afdo profile inconsistenciesJan Hubicka1-3/+40
This patch re-adds logic to increase counts of annotated basic blocks if otherwise the Kirhoff law can not be solved. This is done only in easy cases where total count of in or out edges is smaller than the count of BB or when BB has single exit which is annotated by small count. This helps to solve problems seen i.e. in parest where header of loops gets too low count because vectorizer replaced the IV condiitonal and did not preserved debug info. We should solve the debug info issues as well, and simiar problems can now be tracked by in afdo debug dumps. gcc/ChangeLog: * auto-profile.cc (autofdo_source_profile::offline_external_functions): Add missing newline in dump. (afdo_propagate_edge): If annotated BB or edge has too small count bump it up to mitigate profile imprecisions caused by vectorizer. (afdo_propagate): Increase number of iteraitons and fix dump
11 daysImpove diagnostics of mismatched discriminators in auto-profileJan Hubicka1-48/+78
We are missing discriminator info in auto-profiles, for example in exchange2. I am not sure why, since I see the info still present in dwarf2out, so it may be bug at create_gcov side. This patch makes the workaround to ouptput better diagnostics (to actually show the soruce location). This needs promotion of location info through the inline stack API, so I turned it from pair to actual structure. Overall I think pairs are overused in this source and makes it harder to read. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * auto-profile.cc (struct decl_lineno): Turn to structure; add location. (dump_inline_stack): Update. (get_inline_stack): Update. (get_relative_location_for_locus): Fixup formating. (function_instance::get_function_instance_by_decl): Add LOCATION parameter; improve dumping. (autofdo_source_profile::get_callsite_total_count): Improve dumping; update. (walk_block): Update. (autofdo_source_profile::offline_unrealized_inlines): Update. (autofdo_source_profile::get_count_info): Update.
11 daysx86: Preserve frame pointer for no_callee_saved_registers attributeH.J. Lu27-114/+251
Update functions with no_callee_saved_registers/preserve_none attribute to preserve frame pointer since caller may use it to save the current stack: pushq %rbp movq %rsp, %rbp ... call function ... leave ret If callee changes frame pointer without restoring it, caller will fail to restore its stack after callee returns as LEAVE does mov %rbp, %rsp pop %rbp The corrupted frame pointer will corrupt stack pointer in caller. There are no regressions on Linux/x86-64. Also tested with https://github.com/python/cpython configured with "./configure --with-tail-call-interp". gcc/ PR target/120840 * config/i386/i386-expand.cc (ix86_expand_call): Don't mark hard frame pointer as clobber. * config/i386/i386-options.cc (ix86_set_func_type): Use TYPE_NO_CALLEE_SAVED_REGISTERS instead of TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP. * config/i386/i386.cc (ix86_function_ok_for_sibcall): Remove the TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP check. (ix86_save_reg): Merge TYPE_NO_CALLEE_SAVED_REGISTERS and TYPE_PRESERVE_NONE with TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP. * config/i386/i386.h (call_saved_registers_type): Remove TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP. * doc/extend.texi: Update no_callee_saved_registers documentation. gcc/testsuite/ PR target/120840 * gcc.target/i386/no-callee-saved-1.c: Updated. * gcc.target/i386/no-callee-saved-2.c: Likewise. * gcc.target/i386/no-callee-saved-7.c: Likewise. * gcc.target/i386/no-callee-saved-8.c: Likewise. * gcc.target/i386/no-callee-saved-9.c: Likewise. * gcc.target/i386/no-callee-saved-10.c: Likewise. * gcc.target/i386/no-callee-saved-18.c: Likewise. * gcc.target/i386/no-callee-saved-19a.c: Likewise. * gcc.target/i386/no-callee-saved-19c.c: Likewise. * gcc.target/i386/no-callee-saved-19d.c: Likewise. * gcc.target/i386/pr119784a.c: Likewise. * gcc.target/i386/preserve-none-6.c: Likewise. * gcc.target/i386/preserve-none-7.c: Likewise. * gcc.target/i386/preserve-none-12.c: Likewise. * gcc.target/i386/preserve-none-13.c: Likewise. * gcc.target/i386/preserve-none-14.c: Likewise. * gcc.target/i386/preserve-none-15.c: Likewise. * gcc.target/i386/preserve-none-23.c: Likewise. * gcc.target/i386/pr120840-1a.c: New test. * gcc.target/i386/pr120840-1b.c: Likewise. * gcc.target/i386/pr120840-1c.c: Likewise. * gcc.target/i386/pr120840-1d.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
11 daysRISC-V: Refactor the function bitmap_union_of_preds_with_entryJin Ma1-22/+19
The current implementation of this function is somewhat difficult to understand, as it uses a direct break statement within the for loop, rendering the loop meaningless. Additionally, during the Coverity check on the for loop, a warning appeared: "unreachable: Since the loop increment ix++; is unreachable, the loop body will never execute more than once." Therefore, I have made some simple refactoring to address these issues. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (bitmap_union_of_preds_with_entry): Refactor. Signed-off-by: Jin Ma <jinma@linux.alibaba.com>
11 daysRISC-V: Add pipeline-checker scriptKito Cheng1-0/+191
Pipeline checker utility for RISC-V architecture that validates processor pipeline models. This tool analyzes machine description files to ensure all instruction types are properly handled by pipeline scheduling models. I write this tool since I am implment vector pipeline stuff for SiFive core, but it's hard to find which instruction type is not handled by pipeline scheduling models. This tool will help me to find out which instruction type is not handled by pipeline scheduling models, so I can fix them. And I think it may be useful for other RISC-V core developers, so I decided to upstream that :) Usage: ``` ./pipeline-checker <your-pipeline-model> ``` Example: ``` $ ./pipeline-checker sifive-7.md Error: Some types are not consumed by the pipemodel Missing types: {'vfclass', 'vimovxv', 'vmov', 'rdfrm', 'wrfrm', 'ghost', 'wrvxrm', 'crypto', 'vwsll', 'vfmovfv', 'vimovvx', 'sf_vc', 'vfmovvf', 'sf_vc_se', 'rdvlenb', 'vbrev', 'vrev8', 'sf_vqmacc', 'sf_vfnrclip', 'vsetvl_pre', 'rdvl', 'vsetvl'} ``` gcc/ChangeLog: * config/riscv/pipeline-checker: New file.
11 daysDaily bump.GCC Administrator4-1/+95
11 days[PR modula2/117203] Followup add Delete procedure functionGaius Mulley9-13/+428
This patch provides GetFileName procedure function for FIO.File, FileSystem.File and IOChan.ChanId. The return result from these procedures can be passed into StringFileSysOp.Unlink to complete the required delete. gcc/m2/ChangeLog: PR modula2/117203 * gm2-libs-log/FileSystem.def (GetFileName): New procedure function. (WriteString): New procedure. * gm2-libs-log/FileSystem.mod (GetFileName): New procedure function. (WriteString): New procedure. * gm2-libs/SFIO.def (GetFileName): New procedure function. * gm2-libs/SFIO.mod (GetFileName): New procedure function. * gm2-libs-iso/IOChanUtils.def: New file. * gm2-libs-iso/IOChanUtils.mod: New file. libgm2/ChangeLog: PR modula2/117203 * libm2iso/Makefile.am (M2DEFS): Add IOChanUtils.def. (M2MODS): Add IOChanUtils.mod. * libm2iso/Makefile.in: Regenerate. gcc/testsuite/ChangeLog: PR modula2/117203 * gm2/isolib/run/pass/testdelete2.mod: New test. * gm2/pimlib/logitech/run/pass/testdelete2.mod: New test. * gm2/pimlib/run/pass/testdelete.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
11 daysshrink_wrap_separate_check_lea.c: Scan lea(l|q)H.J. Lu1-1/+1
Scan "lea(l|q)", instead of "leaq", to support x32. * gcc.target/i386/shrink_wrap_separate_check_lea.c: Scan "lea(l|q)", instead of "leaq". Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
11 dayscobol: Normalize generating and using function_decls.Robert Dubner8-416/+569
Because COBOL doesn't require function prototypes, it is possible to, for example, CALL "getcwd" USING <parameters> and then later CALL "getcwd" USING <parameters> RETURNING <alphanumeric> The second call "knows" that the return value is a char*, but the first one does not. So, the first one gets a default return value type of SSIZE_t, which later needs to be replaced with CHAR_P. These [all too] extensive changes ensure that all references to a particular function use the same function_decl, and take measures to make sure that one function_decl is back-modified, if necessary, with the best return value type. gcc/cobol/ChangeLog: * Make-lang.in: Incorporate gcobol.clean. * except.cc (cbl_enabled_exceptions_t::dump): Update debug message. * genapi.cc (gg_attribute_bit_get): Formatting. (file_static_variable): Formatting. (trace1_init): Formatting. (build_main_that_calls_something): Normalize function_decl use. (parser_call_target): Likewise. (set_call_convention): Likewise. (parser_call_target_convention): Likewise. (parser_call_targets_dump): Likewise. (function_handle_from_name): Likewise. (function_pointer_from_name): Likewise. (parser_initialize_programs): Likewise. (parser_statement_begin): Formatting. (parser_leave_file): Use function_decl FIFO. (enter_program_common): Normalize function_decl use. (parser_enter_program): Normalize function_decl use. (tree_type_from_field_type): Normalize function_decl use. (is_valuable): Comment. (pe_stuff): Change name to program_end_stuff. (program_end_stuff): Likewise. (parser_exit): Likewise. (parser_division): Normalize function_decl use. (create_and_call): Normalize function_decl use. (parser_call): Normalize function_decl use. (parser_set_pointers): Normalize function_decl use. (parser_program_hierarchy): Normalize function_decl use. (psa_FldLiteralA): Defeat attempt to re-use literals. (Fails on some aarch64). (parser_symbol_add): Error message formatting. * genapi.h: Formatting. * gengen.cc (struct cbl_translation_unit_t): Add function_decl FIFO. (show_type): Rename to gg_show_type. (gg_show_type): Correct an error message. (gg_assign): Formatting; change error handling. (gg_modify_function_type): Normalize function_decl use. (gg_define_function_with_no_parameters): Fold into gg_defint_function(). (function_decl_key): Normalize function_decl use. (gg_peek_fn_decl): Normalize function_decl use. (gg_build_fn_decl): Normalize function_decl use. (gg_define_function): Normalize function_decl use. (gg_tack_on_function_parameters): Remove. (gg_finalize_function): Normalize function_decl use. (gg_leaving_the_source_code_file): Normalize function_decl use. (gg_call_expr_list): Normalize function_decl use. (gg_trans_unit_var_decl): Normalize function_decl use. (gg_insert_into_assemblerf): New function; formatting. * gengen.h (struct gg_function_t): Eliminate "is_truly_nested" flag. (gg_assign): Incorporate return value. (gg_define_function): Normalize function_decl use. (gg_define_function_with_no_parameters): Eliminate. (gg_build_fn_decl): Normalize function_decl use. (gg_peek_fn_decl): Normalize function_decl use. (gg_modify_function_type): Normalize function_decl use. (gg_call_expr_list): Normalize function_decl use. (gg_get_function_decl): Normalize function_decl use. (location_from_lineno): Prefix with "extern". (gg_open): Likewise. (gg_close): Likewise. (gg_get_indirect_reference): Likewise. (gg_insert_into_assembler): Likewise. (gg_insert_into_assemblerf): Likewise. (gg_show_type): New declaration. (gg_leaving_the_source_code_file): New declaration. * parse.y: Format debugging message. * parse_ante.h: Normalize function_decl use.
12 daysDaily bump.GCC Administrator5-1/+86
12 daysAdd "void debug (tree)"H.J. Lu2-0/+7
Add "void debug (tree)" to support: (gdb) call debug (expr) <parm_decl 0x7fffe9810bb0 f type <record_type 0x7fffe99cec78 c BLK size <integer_cst 0x7fffe98242d0 constant 256> unit-size <integer_cst 0x7fffe98243a8 constant 32> user align:256 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7fffe99cebd0 fields <field_decl 0x7fffe98318c0 a type <real_type 0x7fffe982a3f0 long double> XF x.c:2:15 size <integer_cst 0x7fffe9802fa8 constant 128> unit-size <integer_cst 0x7fffe9802fc0 constant 16> align:128 warn_if_not_align:0 offset_align 128 decl_not_flexarray: 1 offset <integer_cst 0x7fffe9802f90 constant 0> bit-offset <integer_cst 0x7fffe9802fd8 constant 0> context <record_type 0x7fffe99cebd0> chain <field_decl 0x7fffe9831960 b>>> used read BLK x.c:7:6 size <integer_cst 0x7fffe98242d0 256> unit-size <integer_cst 0x7fffe98243a8 32> align:256 warn_if_not_align:0 context <function_decl 0x7fffe99d2900 e> arg-type <record_type 0x7fffe99cec78 c>> (gdb) PR debug/120849 * print-tree.cc (debug): New. * print-tree.h (debug): Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
12 daysFix compilation of concatenation with illegal character constantEric Botcazou2-2/+18
This fixes an error recovery issue, whereby the compilation of a string concatenation with an illegal character constant hangs. gcc/ada/ PR ada/120854 * sem_eval.adb (Get_String_Val): Be prepared for an integer literal after a serious error is detected, and raise PE on other nodes. gcc/testsuite/ * gnat.dg/concat6.adb: New test.
12 daysc++/modules: Make bitfield storage unit detection more robustNathaniel Shead1-5/+12
Modules streaming needs to handle these differently from other unnamed FIELD_DECLs that are streamed for internal RECORD_DECLs, and there doesn't seem to be a good way to detect this case otherwise. This matters only to allow for compiler-generated type definitions that build FIELD_DECLs with no name, as otherwise they get confused. Currently the only such types left I hadn't earlier fixed by giving names to are contextless, for which we have an early check to mark their fields as MK_unique anyway, but there may be other cases in the future. gcc/cp/ChangeLog: * module.cc (trees_out::walking_bit_field_unit): New flag. (trees_out::trees_out): Initialize it. (trees_out::core_vals): Set it. (trees_out::get_merge_kind): Use it, move previous ad-hoc check into assertion. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
12 daysc++/modules: Ensure type of partial spec VAR_DECL is consistent with its ↵Nathaniel Shead6-5/+38
template [PR120644] We were erroring because the TEMPLATE_DECL of the existing partial specialisation has an undeduced return type, but the imported declaration did not. The root cause is similar to what was fixed in r13-2744-g4fac53d6522189, where modules streaming code assumes that a TEMPLATE_DECL and its DECL_TEMPLATE_RESULT will always have the same TREE_TYPE. That commit fixed the issue by ensuring that when the type of a variable is deduced the TEMPLATE_DECL is updated as well, but missed handling partial specialisations. This patch ensures that the same adjustment is made there as well. PR c++/120644 gcc/cp/ChangeLog: * decl.cc (cp_finish_decl): Also propagate type to partial templates. * module.cc (trees_out::decl_value): Add assertion that the TREE_TYPE of a streamed template decl matches its inner. (trees_in::is_matching_decl): Clarify function return type deduction should only occur for non-TEMPLATE_DECL. * pt.cc (template_for_substitution): Handle partial specs. gcc/testsuite/ChangeLog: * g++.dg/modules/auto-7.h: New test. * g++.dg/modules/auto-7_a.H: New test. * g++.dg/modules/auto-7_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
12 daysAVR: target/120856 - Deny R24:DI in avr_hard_regno_mode_ok with Reload.Georg-Johann Lay1-1/+1
This fixes an ICE with -mno-lra when split2 tries to split the following zero_extendsidi2 insn: (set (reg:DI 24) (zero_extend:DI (reg:SI **))) The ICE is because avr_hard_regno_mode_ok allows R24:DI but disallows R28:SI when Reload is used. R28:SI is a result of zero_extendsidi2. This ICE only occurs with Reload (which will die before very long), but it occurs when building libgcc. gcc/ PR target/120856 * config/avr/avr.cc (avr_hard_regno_mode_ok) [-mno-lra]: Deny hard regs >= 4 bytes that overlap Y.
12 daysRelax the testcase check for Solaris [PR120818]Lili Cui1-2/+1
gcc/testsuite/ChangeLog: PR target/120818 * g++.target/i386/shrink_wrap_separate.C: Relax the check.
13 daysFix handling of dwarf name and duplicated namesJan Hubicka4-154/+360
I have tested Kugan's patch on exchange2 and noticed multiple problems: 1) with LTO the translation from dwarf names to symbol names is disabled since we free lang data sooner. I moved the offline pass upstream which however also may make us miss clones intorduced betwen free lang data and annotation. This is not very important right now and may be furhter fixed by splitting off auto-profile-read and offline passes. 2) I noticed that we miss a lot of AFDO inlines because some code compares name indexes for equality in belief that it compares symbol names. This is not ture if we drop prefixes. For this reason I integrated get_original_name into the renaming machinery which actually updates indexes so string table conitnues to work as symbol table. This lets me to drop afdo_string_table->get_index (afdo_string_table->get_name (other->name ())) hops that were introduced at some places Now after renaming all afdo instances should go by DECL_ASSEMBLER_NAME names. 3) Detection of realized offline instances had an ordering issue where we omitted marking of those that were offlined later. Since we can now lookup assembler names, I simplified the logic into single-pass. autoprofiledbootstrapped/regteted x86_64-linux, comitted. gcc/ChangeLog: * auto-profile.cc (get_original_name): Only strip suffixes introduced after auto-fdo annotation. (string_table::get_index_by_decl): Simplify. (string_table::add_name): New member function. (string_table::read): Micro-optimize allocation. (function_instance::get_function_instance_by_decl): Dump reasons for failure; try to compensate lost discriminators. (function_instance::merge): Simplify sanity check; do not check for realized flag; fix merging of targets. (function_instance::offline_if_in_set): Simplify. (function_instance::dump): Sanity check that names are consistent. (autofdo_source_profile::offline_external_functions): Also handle stripping suffixes. (walk_block): Move up in source. (autofdo_source_profile::offline_unrealized_inlines): Also compute realized functions. (autofdo_source_profile::get_function_instance_by_name_index): Simplify. (autofdo_source_profile::add_function_instance): Simplify. (autofdo_source_profile::read): Do not strip suffxies; error on duplicates. (mark_realized_functions): Remove. (auto_profile): Do not call mark_realized_functions. * passes.def: Move auto_profile_offline before free_lang_data. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/clone-test.c: New test. * gcc.dg/tree-prof/clone-merge-1.c: Updae template. Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
13 daysDaily bump.GCC Administrator8-1/+653
13 daysc++: fix ICE with [[deprecated]] [PR120756]Marek Polacek2-1/+15
Here we end up with "error reporting routines re-entered" because resolve_nondeduced_context isn't passing complain to mark_used. PR c++/120756 gcc/cp/ChangeLog: * pt.cc (resolve_nondeduced_context): Pass complain to mark_used. gcc/testsuite/ChangeLog: * g++.dg/warn/deprecated-22.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
13 daystestsuite: adjust for implicit constexprJason Merrill3-3/+3
Jakub's constexpr virtual base patch allowed -fimplicit-constexpr to interfere with these tests. * g++.dg/abi/mangle81.C: Add -fno-implicit-constexpr. * g++.dg/init/vbase1.C: Likewise. * g++.dg/ipa/ipa-icf-4.C: Likewise.
13 daysFix misoptimization of CONSTRUCTOR with reverse SSOEric Botcazou3-6/+38
fold_ctor_reference already punts on a CONSTRUCTOR whose type has reverse storage order, but it can be invoked in a couple of places on a CONSTRUCTOR with native storage order that has been wrapped in a VIEW_CONVERT_EXPR to a type with reverse storage order; this would require a post adjustment that does not currently exist, thus yield wrong code for this admittedly quite pathological (but supported) case. gcc/ * gimple-fold.cc (fold_const_aggregate_ref_1) <COMPONENT_REF>: Bail out immediately if the reference has reverse storage order. * tree-ssa-sccvn.cc (fully_constant_vn_reference_p): Likewise. gcc/testsuite/ * gnat.dg/sso20.adb: New test.
13 daysc++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]Jakub Jelinek18-49/+506
The following patch implements the C++26 P3533R2 - constexpr virtual inheritance paper. The changes include not rejecting it for C++26, tweaking the error wording to show that it is valid in C++26, adjusting synthesized_method_walk not to make synthetized cdtors non-constexpr just because of virtual base classes in C++26 and various tweaks in constexpr.cc so that it can deal with the expressions used for virtual base member accesses or cdtor calls which need __in_chrg and/or __vtt_parm arguments to be passed in some cases implicitly when they aren't passed explicitly. And dynamic_cast constant evaluation tweaks so that it handles also expressions with types with virtual bases. 2025-06-27 Jakub Jelinek <jakub@redhat.com> PR c++/120777 gcc/ * gimple-fold.cc (gimple_get_virt_method_for_vtable): Revert 2018-09-18 changes. gcc/c-family/ * c-cppbuiltin.cc (c_cpp_builtins): Predefine __cpp_constexpr_virtual_inheritance=202506L for C++26. gcc/cp/ * constexpr.cc: Implement C++26 P3533R2 - constexpr virtual inheritance. (is_valid_constexpr_fn): Don't reject constexpr cdtors in classes with virtual bases for C++26, adjust error wording. (cxx_bind_parameters_in_call): Add ORIG_FUN argument, add values for __in_chrg and __vtt_parm arguments when needed. (cxx_eval_dynamic_cast_fn): Adjust function comment, HINT -1 should be possible. For C++26 if obj is cast from POINTER_PLUS_EXPR, attempt to use cxx_fold_indirect_ref to simplify it and if successful, build ADDR_EXPR of that. (cxx_eval_call_expression): Add orig_fun variable, set it to fun before looking through clones, pass it to cxx_bind_parameters_in_call. (reduced_constant_expression_p): Add SZ argument, pass DECL_SIZE of FIELD_DECL e.index to recursive calls and don't return false if SZ is non-NULL and there are unfilled fields with bit position at or above SZ. (cxx_fold_indirect_ref_1): Handle reading of vtables using ptrdiff_t dynamic type instead of some pointer type. Set el_sz to DECL_SIZE_UNIT value rather than TYPE_SIZE_UNIT of DECL_FIELD_IS_BASE fields in classes with virtual bases. (cxx_fold_indirect_ref): In canonicalize_obj_off lambda look through COMPONENT_REFs with DECL_FIELD_IS_BASE in classes with virtual bases and adjust off correspondingly. Remove assertion that off is integer_zerop, pass tree_to_uhwi (off) instead of 0 to the cxx_fold_indirect_ref_1 call. * cp-tree.h (publicly_virtually_derived_p): Declare. (reduced_constant_expression_p): Add another tree argument defaulted to NULL_TREE. * method.cc (synthesized_method_walk): Don't clear *constexpr_p if there are virtual bases for C++26. * class.cc (build_base_path): Compute fixed_type_p and virtual_access before checks for build_simple_base_path instead of after that and conditional cp_build_addr_expr. Use build_simple_path if !virtual_access even when v_binfo is non-NULL. (layout_virtual_bases): For build_base_field calls use access_public_node rather than access_private_node if publicly_virtually_derived_p. (build_vtbl_initializer): Revert 2018-09-18 and 2018-12-11 changes. (publicly_virtually_derived_p): New function. gcc/testsuite/ * g++.dg/cpp26/constexpr-virt-inherit1.C: New test. * g++.dg/cpp26/constexpr-virt-inherit2.C: New test. * g++.dg/cpp26/constexpr-virt-inherit3.C: New test. * g++.dg/cpp26/feat-cxx26.C: Add __cpp_constexpr_virtual_inheritance tersts. * g++.dg/cpp2a/constexpr-dtor3.C: Don't expect one error for C++26. * g++.dg/cpp2a/constexpr-dtor16.C: Don't expect errors for C++26. * g++.dg/cpp2a/constexpr-dynamic10.C: Likewise. * g++.dg/cpp0x/constexpr-ice21.C: Likewise. * g++.dg/cpp0x/constexpr-ice4.C: Likewise. * g++.dg/abi/mangle1.C: Guard the test on c++23_down. * g++.dg/abi/mangle81.C: New test. * g++.dg/ipa/ipa-icf-4.C (A::A): For __cpp_constexpr_virtual_inheritance >= 202506L add user provided non-constexpr constructor.
13 daysFortran: follow-up fix to checking of renamed-on-use interface name [PR120784]Harald Anlauf2-1/+38
Commit r16-1633 introduced a regression for imported interfaces that were not renamed-on-use, since the related logic did not take into account that the absence of renaming could be represented by an empty string. PR fortran/120784 gcc/fortran/ChangeLog: * interface.cc (gfc_match_end_interface): Detect empty local_name. gcc/testsuite/ChangeLog: * gfortran.dg/interface_63.f90: Extend testcase.
13 daysc++: fix decltype_p handling for binary expressionsJason Merrill2-0/+17
With Jakub's constexpr virtual base patch, 23_containers/vector/bool/cmp_c++20.cc failed the assert I add to fixed_type_or_null, meaning that it returned the wrong value. Let's fix the result as well as adding the assert, and fix cp_parser_binary_expression to properly wrap any class-type calls in the operands in TARGET_EXPR even within a decltype so we don't hit the assert. gcc/cp/ChangeLog: * class.cc (fixed_type_or_null): Handle class-type CALL_EXPR. * parser.cc (cp_parser_binary_expression): Fix decltype_p handling.
13 daysc++/modules: Avoid name clashes when streaming internal labels ↵Nathaniel Shead13-23/+188
[PR98375,PR118904] The frontend creates some variables that need to be given unique names for the TU so that they can unambiguously be accessed. Historically this has been done with a global counter local to each place that needs an internal label, but this doesn't work with modules as depending on what declarations have been imported, some counter values may have already been used. This patch reworks the situation to instead have a single collection of counters for the TU, and a new function 'generate_internal_label' that gets the next label with given prefix using that counter. Modules streaming can then use this function to regenerate new names on stream-in for any such decls, guaranteeing uniqueness within the TU. These labels should only be used for internal entities so there should be no issues with the names differing from TU to TU; we will need to handle this if we ever start checking ODR of definitions we're merging but that's an issue for later. For proof of concept, this patch makes use of the new API for __builtin_source_location and ubsan; there are probably other places in the frontend where this change will need to be made as well. One other change this exposes is that both of these components rely on the definition of the VAR_DECLs they create, so stream that too for uncontexted variables. PR c++/98735 PR c++/118904 gcc/cp/ChangeLog: * cp-gimplify.cc (source_location_id): Remove. (fold_builtin_source_location): Use generate_internal_label. * module.cc (enum tree_tag): Add 'tt_internal_id' enumerator. (trees_out::tree_value): Adjust assertion, write definitions of uncontexted VAR_DECLs. (trees_in::tree_value): Read variable definitions. (trees_out::tree_node): Write internal labels, adjust assert. (trees_in::tree_node): Read internal labels. gcc/ChangeLog: * tree.cc (struct identifier_hash): New type. (struct identifier_count_traits): New traits. (internal_label_nums): New hash map. (generate_internal_label): New function. (prefix_for_internal_label): New function. * tree.h (IDENTIFIER_INTERNAL_P): New macro. (generate_internal_label): Declare. (prefix_for_internal_label): Declare. * ubsan.cc (ubsan_ids): Remove. (ubsan_type_descriptor): Use generate_internal_label. (ubsan_create_data): Likewise. gcc/testsuite/ChangeLog: * g++.dg/modules/src-loc-1.h: New test. * g++.dg/modules/src-loc-1_a.H: New test. * g++.dg/modules/src-loc-1_b.C: New test. * g++.dg/modules/src-loc-1_c.C: New test. * g++.dg/modules/ubsan-1_a.C: New test. * g++.dg/modules/ubsan-1_b.C: New test. * g++.dg/ubsan/module-1-aux.cc: New test. * g++.dg/ubsan/module-1.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
13 daysc++/modules: Support streaming new size cookie for constexpr [PR120040]Nathaniel Shead5-2/+47
This type currently has a DECL_NAME of an IDENTIFIER_DECL. Although the documentation indicates this is legal, this confuses modules streaming which expects all RECORD_TYPEs to have a TYPE_DECL, which is used to determine the context and merge key, etc. PR c++/120040 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_constant_expression): Handle TYPE_NAME now being a TYPE_DECL rather than just an IDENTIFIER_NODE. * init.cc (build_new_constexpr_heap_type): Build a TYPE_DECL for the returned type; mark the type as artificial. * module.cc (trees_out::type_node): Add some assertions. gcc/testsuite/ChangeLog: * g++.dg/modules/pr120040_a.C: New test. * g++.dg/modules/pr120040_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
13 daysc++/modules: Implement streaming of uncontexted TYPE_DECLs [PR98735]Nathaniel Shead1-5/+97
Currently, most declarations must have a DECL_CONTEXT for modules streaming to behave correctly, so that they can have an appropriate merge key generated and be correctly deduplicated on import. There are a few exceptions, however, for internally generated declarations that will never be merged and don't necessarily have an appropriate parent to key off for the context. One case that's come up a few times is TYPE_DECLs, especially temporary RECORD_TYPEs used as intermediaries within expressions. Previously I've tried to give all such types a DECL_CONTEXT, but in some cases that has ended up being infeasible, such as with the types generated by UBSan (which are shared with the C frontend and don't know their context, especially when created at global scope). Additionally, these types often don't have many of the parts that a normal struct declaration created via parsing user code would have, which confuses module streaming. Given that these types are typically intended to be one-off and unique anyway, this patch instead adds support for by-value streaming of uncontexted TYPE_DECLs. The patch only support streaming the bare minimum amount of fields needed for the cases I've come across so far; in general the preference should still be to ensure that DECL_CONTEXT is set where possible. PR c++/98735 PR c++/120040 gcc/cp/ChangeLog: * module.cc (trees_out::tree_value): Write TYPE_DECLs. (trees_in::tree_value): Read TYPE_DECLs. (trees_out::tree_node): Support uncontexted TYPE_DECLs, and ensure that all parts of a by-value decl are marked for streaming. (trees_out::get_merge_kind): Treat members of uncontexted types as always unique. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
13 daysFix afdo profiles for functions that was not early-inlinedJan Hubicka2-58/+339
This patch should finish the oflining infrastructure by offlining (prior AFDO annotation) all inline function instances that was not early inlined. This is mostly the case of recursive inlining or when -fno-auto-profile-inlining is used which sould now produce comparable code. I also cleaned up offlining of self-recursive functions which now happens through the worklist and reduces problem with recursive ivocation of the funciton merging modifying datastructures at unexpected places. gcc/ChangeLog: * auto-profile.cc (function_instance::set_name, function_instance::set_realized, function_instnace::realized_p, function_instance::set_in_worklist, function_instance::clear_in_worklist, function_instance::in_worklist_p): New member functions. (function_instance::in_worklist, function_instance::realized_): new. (get_relative_location_for_locus): Break out from .... (get_relative_location_for_stmt): ... here. (function_instance::~function_instance): Sanity check that removed function is not in worklist. (function_instance::merge): Do not offline realized instances. (function_instance::offline): Make private; add duplicate functions to worklist rather then merging immediately. (function_instance::offline_if_in_set): Cleanup. (function_instance::remove_external_functions): Likewise. (function_instance::offline_if_not_realized): New member function. (autofdo_source_profile::offline_external_functions): Handle delayed functions. (autofdo_source_profile::offline_unrealized_inlines): New member function. (walk_block): New function. (mark_realized_functions): New function. (afdo_annotate_cfg): Fix dump. (auto_profile): Mark realized functions and offline rest; do not compute fn summary. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/afdo-crossmodule-1.c: Update template.
13 daysAVR: target/113934 - Use LRA per default.Georg-Johann Lay1-2/+2
Now that the patches for PR120424 are upstream, the last known bug associated with avr+lra has been fixed: PR118591. So we can pull the switch that turns on LRA per default. This patch only sets -mlra per default. It doesn't do any Reload related cleanup or removal from the avr backend, hence -mno-lra still works. The only new problem is that gcc.dg/torture/pr64088.c fails with LRA but not with Reload. Though that test case is awkward since it is UB but expects the compiler to behave in a specific way which avr-gcc doesn't do: PR116780. This patch also avoids a relative recent ICE that breaks building libgcc: R24:DI is allowed per hard_regno_mode_ok, but R26:SI is disallowed for Reload for old reasons. Outcome is that a split2 pattern for R24:DI = zero_extend:DI (R22:SI) runs into an ICE. AVR-LibC builds fine with this patch. The AVR-LibC testsuite passes without errors. gcc/ PR target/113934 * config/avr/avr.opt (-mlra): Turn on per default.
13 days[RISC-V][PR target/119971] Avoid losing shift count maskingJeff Law1-1/+1
Fix typo spotted by Bernhard Reutner-Fischer. PR target/119971 gcc/testsuite/ * gcc.target/riscv/pr119971.c: Fix typo.
13 daystree-optimization/120808 - SLP patterns with FMA/FMSRichard Biener2-26/+71
The following amends the SLP addsub pattern to also match blends of .FMA/.FMS and form .FMADDSUB even when -ffp-contract=off. PR tree-optimization/120808 * tree-vect-slp-patterns.cc (vect_match_expression_p): Take a code_helper and also match calls. (addsub_pattern::recognize): Handle .FMA/.FMS pairs in addition to PLUS/MINUS. (addsub_pattern::build): Adjust. * gcc.dg/vect/bb-slp-pr120808.c: Now also expect FMADDSUB patterns to be matched.
14 daysFixup vector epilog analysis skipping when not using partial vectorsRichard Biener3-7/+36
The following avoids re-analyzing the loop as epilogue when not using partial vectors and the mode is the same as the autodetected vector mode and that has a too high VF for a non-predicated loop. This situation occurs almost always on x86 and saves us one re-analysis unless --param vect-partial-vector-usage is non-default. * tree-vectorizer.h (vect_chooses_same_modes_p): New overload. * tree-vect-stmts.cc (vect_chooses_same_modes_p): Likewise. * tree-vect-loop.cc (vect_analyze_loop): Prune epilogue analysis further when not using partial vectors.
14 daysFixup partial_vectors_supported_p useRichard Biener1-2/+9
The following fixes the computation of supports_partial_vectors which is used to prune the set of modes to iterate over for epilog vectorization. The used partial_vectors_supported_p predicate only looks for while_ult while also support predication when mask modes are integer modes as for AVX512. I've noticed this isn't very effective on x86_64 anyway since if the main loop mode is autodetected we skip re-analyzing mode_i == 0, but then mode_i == 1 is usually the very same large mode. A patch for this will follow, but this will regress without the fix below. * tree-vect-loop.cc (vect_analyze_loop): Consider AVX512 style masking when computing supports_partial_vectors.
14 daysc++: Add fix note for how to declare main in a moduleNathaniel Shead1-1/+6
This patch adds a note to help users unfamiliar with modules terminology understand how to declare main in a named module since P3618. There doesn't appear to be an easy robust location available for "the start of this declaration" that I could find to attach a fixit to, but the explanation should suffice. gcc/cp/ChangeLog: * decl.cc (grokfndecl): Add explanation of how to attach to global module. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
14 daysdocs: fix a typo in used attribute documentationTamar Christina1-1/+1
This fixes a small typo in the Label attributes docs. gcc/ChangeLog: * doc/extend.texi: Fix typo in unsed attribute docs.
14 daysx86: Handle vector broadcast sourceH.J. Lu2-0/+220
Use the inner scalar mode of vector broadcast source in: (set (reg:V8DF 394) (vec_duplicate:V8DF (reg:V2DF 190 [ alpha ]))) to compute the vector mode for broadcast from vector source. gcc/ PR target/120830 * config/i386/i386-features.cc (ix86_get_vector_cse_mode): Handle vector broadcast source. gcc/testsuite/ PR target/120830 * g++.target/i386/pr120830.C: New test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
14 days[lra] catch all to-sp eliminations with nonzero offsets [PR120424]Alexandre Oliva1-21/+25
An x86_64-linux-gnu native with ix86_frame_pointer_required modified to return true for nonzero frames, to exercize lra_update_fp2sp_elimination, reveals in stage1 testing that wrong code is generated for gcc.c-torture/execute/ieee/fp-cmp-8l.c: argp-to-sp eliminations are used for one_test to pass its arguments on to *pos, and the sp offsets survive the disabling of that elimination. We didn't really have to disable that elimination, but the x86 backend disables eliminations to sp if frame_pointer_needed. This change extends the catching of fp2sp eliminations to all (?) eliminations to sp with nonzero offsets, since none of them can be properly reversed and would silently lead to wrong code. By accepting nonzero offsets, we bootstrap with -maccumulate-outgoing-args on x86_64-linux-gnu (with ix86_frame_pointer_required modified to return true on nonzero frame size). for gcc/ChangeLog PR rtl-optimization/120424 * lra-eliminations.cc (elimination_2sp_occurred_p): Rename from... (elimination_fp2sp_occured_p): ... this. Adjust all uses. (lra_eliminate_regs_1): Don't require a from-frame-pointer elimination to set it. (update_reg_eliminate): Likewise to test it.
14 days[lra] apply elimination offsets to MEM in autoinc address [PR120424]Alexandre Oliva1-0/+6
When attempting to bootstrap arm-linux-gnueabihf with {BOOT_C,T}FLAGS='-g -O2 -fnon-call-exceptions -fstack-clash-protection', gmp fails to build in stage2: gen-fac's mpz_and gets miscompiled. A pseudo is initialized before a loop and used in a PRE_INC load inside a loop. It gets spilled just as the fp2sp elimination is disabled, and only the initialization gets adjusted with elimination offsets. The unadjusted stack slot within the PRE_INC load ends up reloaded later, but only when the FP offset has already missed its chance to be adjusted. Arrange for lra_eliminate_regs_1 to adjust autoinc addresses that are MEMs themselves. for gcc/ChangeLog PR rtl-optimization/120424 * lra-eliminations.cc (lra_eliminate_regs_1): Adjust autoinc addresses that are MEMs.
14 days[lra] reorder operations in lra_update_fp2sp_elimination [PR120424]Alexandre Oliva1-7/+5
The various recent additions to lra_update_fp2sp_elimination rendered it somewhat confusing, with intermixed groups of statements pertaining to three different major actions: disabling the elimination, recomputing live ranges, and spilling uses of the frame pointer. Reorder them for readability. for gcc/ChangeLog PR rtl-optimization/120424 * lra-eliminations.cc (lra_update_fp2sp_elimination): Reorder and regroup related statements.
14 days[lra] rework deactivation of fp2sp elimination [PR120424]Alexandre Oliva1-2/+16
Deactivating the fp2sp elimination in lra_update_fp2sp_elimination prevents update_reg_eliminate from propagating the fp2sp elimination offset to the next chosen elimination, so it may retain -1 as the prev_offset, and prev_offset will be taken as an already-applied offset that needs to be compensated in the next round of spilling and reloading. This affects, for example, crtbegin.o's __do_global_dtors_aux on arm-linux-gnueabihf in a {BOOT_C,T}FLAGS='-O2 -g -fnon-call-exceptions -fstack-clash-protection' bootstrap. Alas, just retaining that elimination causes spills to use the fp2sp elimination, including applying sp offsets, which breaks e.g. an x86_64-linux-gnu native bootstrap with ix86_frame_pointer_required modified to return true on nonzero frame size. The middle-ground solution is to keep the elimination active, so that its offsets are applied and propagated on to the subsequent fp elimination, but without introducing sp offsets, so that e.g. pr103973-18.c on the modified x86_64-linux-gnu doesn't get adjacent argument pushes of two adjacent on-stack temporaries ending up pushing the same temporary because of undesired adjustments. for gcc/ChangeLog PR rtl-optimization/120424 * lra-eliminations.cc (lra_update_fp2sp_elimination): Avoid sp offsets in further fp2sp eliminations... (update_reg_eliminate): ... and restore to_rtx before assert checking.
14 days[lra] recompute ranges upon disabling fp2sp elimination [PR120424]Alexandre Oliva4-0/+70
If the frame size grows to nonzero, arm_frame_pointer_required may flip to true under -fstack-clash-protection -fnon-call-exceptions, and that may disable the fp2sp elimination part-way through lra. If pseudos had got assigned to the frame pointer register before that, they have to be spilled, and that requires complete live range information. If !lra_reg_spill_p, lra_spill won't have live ranges for such pseudos, and they could end up sharing spill slots with other pseudos whose live ranges actually overlap. This affects at least Ada.Strings.Wide_Superbounded.Super_Insert and .Super_Replace_Slice in libgnat/a-stwisu.adb, when compiled with -O2 -fstack-clash-protection -march=armv7 (implied Thumb2), causing acats-4's cdd2a01 to fail. Recomputing live ranges including registers may renumber and compress points, so we have to recompute the aggregated live ranges for already-assigned spill slots as well. As a safety net, reject empty live ranges when computing slot sharing. for gcc/ChangeLog PR rtl-optimization/120424 * lra-eliminations.cc (lra_update_fp2sp_elimination): Compute complete live ranges and recompute slots' live ranges if needed. * lra-lives.cc (lra_reset_live_range_list): New. (lra_complete_live_ranges): New. * lra-spills.cc (assign_spill_hard_regs): Reject empty live ranges. (add_pseudo_to_slot): Likewise. (lra_recompute_slots_live_ranges): New. * lra-int.h (lra_reset_live_range_list): Declare. (lra_complete_live_ranges): Declare. (lra_recompute_slots_live_ranges): Declare.
14 days[genoutput] mark scratch outputs as eliminable [PR120424]Alexandre Oliva1-1/+1
acats' fdd2a00.read is miscompiled on arm-linux-gnu with -O2 -fstack-clash-protection -march=armv7-a -marm: a clobbered scratch register in a *iorsi3_compare0_scratch pattern gets initially assigned to the frame pointer register, but at some point during lra the frame size grows to nonzero, arm_frame_pointer_required flips to true, and the fp2sp elimination has to be disabled, so the scratch register gets spilled to a stack slot. It needs to get the sfp elimination at that point, because later rounds of elimination will assume the previous round's offset has already been applied. But since scratch matches are not regarded as eliminable by genoutput, we don't attempt elimination in the clobbered stack slot MEM rtx. Later on, lra issues a reload for that slot, using a new pseudo allocated to a hardware register, that gets stored in the stack slot after the original insn. Elimination in that reload store insn eventually updates the elimination offset, but it's an incremental update, assuming that the offset so far has already been applied. Without applying the initial offset, the store ends up overlapping with the function's register save area, corrupting a caller's call-saved register. AFAICT the old reload's elimination wouldn't be harmed by allowing elimination in scratch operands, so I'm enabling eliminable for them regardless. Should it be found to make a difference, we could presumably set a different bit in eliminable to enable reload and lra to tell them apart and behave accordingly. for gcc/ChangeLog PR rtl-optimization/120424 * genoutput.cc (scan_operands): Make MATCH_SCRATCHes eliminable.
14 days[lra] inactivate disabled fp2sp elimination [PR120424]Alexandre Oliva1-3/+12
Even after we disable the fp2sp elimination when it is the active elimination for the fp, spilling might use it before update_reg_eliminate runs and inactivates it for good. If it is used, update_reg_eliminate will fail the check that fp2sp was not used. Since we keep track of uses of this specific elimination, and lra_update_fp2sp_elimination checks it before disabling it, we know it hasn't been used, so we can inactivate it without any ill effects. This fixes the pr118591-1.c avr-none regression exposed by the PR120424 fix. for gcc/ChangeLog PR rtl-optimization/120424 * lra-eliminations.cc (lra_update_fp2sp_elimination): Inactivate the unused fp2sp elimination right away.