riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-08-20	rtl-ssa: Fix thinko when adding live-out uses	Richard Sandiford	1	-6/+5
	While testing a later patch, I found that create_degenerate_phi had an inverted test for bitmap_set_bit. It was assuming that the return value was the previous bit value, rather than a "something changed" value. :( Also, the call to add_live_out_use shouldn't be conditional on the DF_LR_OUT operation, since the register could be live-out because of uses later in the same EBB (which do not require a live-out use to be added to the rtl-ssa instruction). Instead, add_live_out should itself check whether a live-out use already exists. gcc/ * rtl-ssa/blocks.cc (function_info::create_degenerate_phi): Fix inverted test of bitmap_set_bit. Call add_live_out_use even if the register was previously live-out from the predecessor block. Instead... (function_info::add_live_out_use): ...check here whether a live-out use already exists.
2025-08-20	rtl-ssa: Add a find_uses function	Richard Sandiford	4	-0/+113
	rtl-ssa already has a find_def function for finding the definition of a particular resource (register or memory) at a particular point in the program. This patch adds a similar function for looking up uses. Both functions have amortised logarithmic complexity. gcc/ * rtl-ssa/accesses.h (use_lookup): New class. * rtl-ssa/functions.h (function_info::find_def): Expand comment. (function_info::find_use): Declare. * rtl-ssa/member-fns.inl (use_lookup::prev_use, use_lookup::next_use) (use_lookup::matching_use, use_lookup::matching_or_prev_use) (use_lookup::matching_or_next_use): New member functions. * rtl-ssa/accesses.cc (function_info::find_use): Likewise.
2025-08-20	tree-optimization/114480 - speedup IDF compute	Richard Biener	1	-19/+25
	The testcase in the PR shows that it's worth splitting the processing of the initial workset, which is def_blocks from the main iteration. This reduces SSA incremental update time from 44.7s to 32.9s. Further changing the workset bitmap of the main iteration to a vector speeds up things further to 23.5s for an overall nearly halving of the SSA incremental update compile-time and an overall 12% compile-time saving at -O1. Using bitmap_ior in the first loop or avoiding (immediate) re-processing of blocks in def_blocks does not make a measurable difference for the testcase so I left this as-is. PR tree-optimization/114480 * cfganal.cc (compute_idf): Split processing of the initial workset from the main iteration. Use a vector for the workset of the main iteration.
2025-08-20	AVR: target/121608 - Don't add --relax when linking with -r.	Georg-Johann Lay	1	-1/+1
	The linker rejects --relax in relocatable links (-r), hence only add --relax when -r is not specified. gcc/ PR target/121608 * config/avr/specs.h (LINK_RELAX_SPEC): Wrap in %{!r...}.
2025-08-20	Thread the remains of vect_analyze_slp_instance	Richard Biener	1	-146/+345
	vect_analyze_slp_instance still handles stores and reduction chains. The following threads the special handling of those two kinds, duplicating vect_build_slp_instance into two specialized entries. * tree-vect-slp.cc (vect_analyze_slp_reduc_chain): New, copied from vect_analyze_slp_instance and only handle slp_inst_kind_reduc_chain. Inline vect_build_slp_instance. (vect_analyze_slp_instance): Only handle slp_inst_kind_store. Inline vect_build_slp_instance. (vect_build_slp_instance): Remove now unused stmt_info parameter, remove special code for store groups and reduction chains. (vect_analyze_slp): Call vect_analyze_slp_reduc_chain for reduction chain SLP build and adjust.
2025-08-20	Enable gather/scatter for epilogues of vector epilogues	Richard Biener	1	-7/+0
	The restriction no longer applies, so remove it. * tree-vect-data-refs.cc (vect_check_gather_scatter): Remove restriction on epilogue of epilogue vectorization.
2025-08-20	Remove most of the epilogue vinfo fixup	Richard Biener	1	-89/+5
	The following removes the fixup we apply to pattern stmt operands before code generating vector epilogues. This isn't necessary anymore since the SLP graph now exclusively records the data flow. Similarly fixing up of SSA references inside DR_REF of gather/scatter isn't necessary since we now record the analysis result and avoid re-doing it during transform. What we still need to keep is the adjustment of the actual pointers to gimple stmts from stmt_vec_info and the back-reference from the DRs. * tree-vect-loop.cc (update_epilogue_loop_vinfo): Remove fixing up pattern stmt operands and gather/scatter DR_REFs. (find_in_mapping): Remove.
2025-08-20	Record get_load_store_info results from analysis	Richard Biener	3	-138/+182
	The following is a patch to make us record the get_load_store_info results from load/store analysis and re-use them during transform. In particular this moves where SLP_TREE_MEMORY_ACCESS_TYPE is stored. A major hassle was (and still is, to some extent), gather/scatter handling with it's accompaning gather_scatter_info. As get_load_store_info no longer fully re-analyzes them but parts of the information is recorded in the SLP tree during SLP build the following goes and eliminates the use of this data in vectorizable_load/store, instead recording the other relevant part in the load-store info (namely the IFN or decl chosen). Strided load handling keeps the re-analysis but populates the data back to the SLP tree and the load-store info. That's something for further improvement. This also shows that early classifying a SLP tree as load/store and allocating the load-store data might be a way to move back all of the gather/scatter auxiliary data into one place. Rather than mass-replacing references to variables I've kept the locals but made them read-only, only adjusting a few elsval setters and adding a FIXME to strided SLP handling of alignment (allowing local override there). The FIXME shows that while a lot of analysis is done in get_load_store_type that's far from all of it. There's also a possibility that splitting up the transform phase into separate load/store def types, based on VMAT choosen, will make the code more maintainable. * tree-vectorizer.h (vect_load_store_data): New. (_slp_tree::memory_access_type): Remove. (SLP_TREE_MEMORY_ACCESS_TYPE): Turn into inline function. * tree-vect-slp.cc (_slp_tree::_slp_tree): Do not initialize SLP_TREE_MEMORY_ACCESS_TYPE. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Remove gather_scatter_info pointer argument, instead get info from the SLP node. (vect_build_one_gather_load_call): Get SLP node and builtin decl as argument and remove uses of gather_scatter_info. (vect_build_one_scatter_store_call): Likewise. (vect_get_gather_scatter_ops): Remove uses of gather_scatter_info. (vect_get_strided_load_store_ops): Get SLP node and remove uses of gather_scatter_info. (get_load_store_type): Take pointer to vect_load_store_data instead of individual pointers. (vectorizable_store): Adjust. Re-use get_load_store_type result from analysis time. (vectorizable_load): Likewise.
2025-08-19	cobol: Eliminate errors that cause valgrind messages.	Robert Dubner	3	-2/+15
	gcc/cobol/ChangeLog: * genutil.cc (get_binary_value): Fix a comment. * parse.y: udf_args_valid(): Fix loc calculation. * symbols.cc (assert): extend_66_capacity(): Avoid assert(e < e2) in -O0 build until symbol_table expansion is fixed. libgcobol/ChangeLog: * libgcobol.cc (format_for_display_internal): Handle NumericDisplay properly. (compare_88): Fix memory access error. (__gg__unstring): Likewise.
2025-08-19	Fortran: Clean up and fix some refs.	Jerry DeLisle	1	-51/+51
	gcc/fortran/ChangeLog: * intrinsic.texi: Correct the example given for FRACTION. Move the TEAM_NUMBER section to after the TANPI to align with the order gven in the index.
2025-08-19	x86: Place the TLS call before all register setting BBs	H.J. Lu	5	-106/+332
	We can't place a TLS call before a conditional jump in a basic block like (code_label 13 11 14 4 2 (nil) [1 uses]) (note 14 13 16 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (jump_insn 16 14 17 4 (set (pc) (if_then_else (le (reg:CCNO 17 flags) (const_int 0 [0])) (label_ref 27) (pc))) "x.c":10:21 discrim 1 1462 {jcc} (expr_list:REG_DEAD (reg:CCNO 17 flags) (int_list:REG_BR_PROB 628353713 (nil))) -> 27) since the TLS call will clobber flags register nor place a TLS call in a basic block if any live caller-saved registers aren't dead at the end of the basic block: ;; live in 6 [bp] 7 [sp] 16 [argp] 17 [flags] 19 [frame] 104 ;; live gen 0 [ax] 102 106 108 116 117 118 120 ;; live kill 5 [di] Instead, we should place such call before all register setting basic blocks which dominate the current basic block. Keep track the replaced GNU and GNU2 TLS instructions. Use these info to place the __tls_get_addr call and mark FLAGS register as dead. gcc/ PR target/121572 config/i386/i386-features.cc (replace_tls_call): Add a bitmap argument and put the updated TLS instruction in the bitmap. (ix86_get_dominator_for_reg): New. (ix86_check_flags_reg): Likewise. (ix86_emit_tls_call): Likewise. (ix86_place_single_tls_call): Add 2 bitmap arguments for updated GNU and GNU2 TLS instructions. Call ix86_emit_tls_call to emit TLS instruction. Correct debug dump for before instruction. gcc/testsuite/ PR target/121572 * gcc.target/i386/pr121572-1a.c: New test. * gcc.target/i386/pr121572-1b.c: Likewise. * gcc.target/i386/pr121572-2a.c: Likewise. * gcc.target/i386/pr121572-2b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-20	Daily bump.	GCC Administrator	4	-1/+128

2025-08-19	c++: testcase tweak for -fimplicit-constexpr	Jason Merrill	1	-1/+1
	This testcase is testing the difference between functions that are or are not declared constexpr. gcc/testsuite/ChangeLog: * g++.dg/cpp26/expansion-stmt16.C: Add -fno-implicit-constexpr.
2025-08-19	c++: Fix ICE on mangling invalid compound requirement [PR120618]	Ben Wu	3	-4/+20
	This testcase caused an ICE when mangling the invalid type-constraint in write_requirement since write_type_constraint expects a TEMPLATE_TYPE_PARM. Setting the trailing return type to NULL_TREE when a return-type-requirement is found in place of a type-constraint prevents the failed assertion in write_requirement. It also allows the invalid constraint to be satisfied in some contexts to prevent redundant errors, e.g. in concepts-requires5.C. Bootstrapped and tested on x86_64-linux-gnu. PR c++/120618 gcc/cp/ChangeLog: * parser.cc (cp_parser_compound_requirement): Set type to NULL_TREE for invalid type-constraint. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-requires5.C: Don't require redundant diagnostic in static assertion. * g++.dg/concepts/pr120618.C: New test. Suggested-by: Jason Merrill <jason@redhat.com>
2025-08-19	middle-end: Fix malloc like functions when calling with void "return" [PR120024]	Andrew Pinski	3	-13/+28
	When expanding malloc like functions, we copy the return register into a temporary and then mark that temporary register with a noalias regnote and the alignment. This works fine unless you are calling the function with a return type of void. At this point then the valreg will be null and a crash will happen. A few cleanups are included in this patch because it was easier to do the fix with the cleanups added. The start_sequence/end_sequence for ECF_MALLOC is no longer needed; I can't tell if it was ever needed. The emit_move_insn function returns the last emitted instruction anyways so there is no reason to call get_last_insn as we can just use the return value of emit_move_insn. This has been true since this code was originally added so I don't understand why it was done that way beforehand. Bootstrapped and tested on x86_64-linux-gnu. PR middle-end/120024 gcc/ChangeLog: * calls.cc (expand_call): Remove start_sequence/end_sequence for ECF_MALLOC. Check valreg before deferencing it when it comes to malloc like functions. Use the return value of emit_move_insn instead of calling get_last_insn. gcc/testsuite/ChangeLog: * gcc.dg/torture/malloc-1.c: New test. * gcc.dg/torture/malloc-2.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-19	c++: constrained corresponding using from partial spec [PR121351]	Patrick Palka	2	-4/+27
	When comparing constraints during correspondence checking for a using from a partial specialization, we need to substitute the partial specialization arguments into the constraints rather than the primary template arguments. Otherwise we incorrectly reject e.g. the below testcase as ambiguous since we substitute T=int* instead of T=int into #1's constraints and don't notice the correspondence. This patch corrects the recent r16-2771-gb9f1cc4e119da9 fix by using outer_template_args instead of TI_ARGS of the DECL_CONTEXT, which should always give the correct outer arguments for substitution. PR c++/121351 gcc/cp/ChangeLog: * class.cc (add_method): Use outer_template_args when substituting outer template arguments into constraints. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-using7.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2025-08-19	Remove reduction chain detection from parloops	Richard Biener	1	-177/+0
	Historically SLP reduction chains were the only multi-stmt reductions supported. But since we have check_reduction_path more complicated cases are handled. As parloops doesn't do any specific chain processing it can solely rely on that functionality instead. * tree-parloops.cc (parloops_is_slp_reduction): Remove. (parloops_is_simple_reduction): Do not call it.
2025-08-19	A few missing SLP node passings to vector costing	Richard Biener	2	-5/+6
	The following fixes another few missed cases to pass a SLP node instead of a stmt_info. * tree-vect-loop.cc (vectorizable_reduction): Pass the appropriate SLP node for costing of single-def-use-cycle operations. (vectorizable_live_operation): Pass the SLP node to the costing hook. * tree-vect-stmts.cc (vectorizable_bswap): Likewise. (vectorizable_store): Likewise.
2025-08-19	tree-optimization/121592 - failed reduction SLP discovery	Richard Biener	1	-0/+5
	The testcase in the PR shows that when we have a reduction chain with a wrapped conversion we fail to properly fall back to a regular reduction, resulting in wrong-code. The following fixes this by failing discovery. The testcase has other issues, so I'm not including it here. PR tree-optimization/121592 * tree-vect-slp.cc (vect_analyze_slp): When SLP reduction chain discovery fails, fail overall when the tail of the chain isn't also the entry for the non-SLP reduction.
2025-08-19	Fix riscv build, no longer works with python2	Richard Biener	1	-1/+1
	Building riscv no longer works with python2: > python ./config/riscv/arch-canonicalize -misa-spec=20191213 rv64gc File "./config/riscv/arch-canonicalize", line 229 print(f"ERROR: Unhandled conditional dependency: '{ext_name}' with condition:", file=sys.stderr) ^ SyntaxError: invalid syntax On systems that have python aliased to python2 we chose that, even when python3 is available. Don't. * config.gcc (riscv--*): Look for python3, then fall back to python. Never use python2.
2025-08-19	tree-optimization/121527 - wrong SRA with aggregate copy	Richard Biener	1	-14/+6
	SRA handles outermost VIEW_CONVERT_EXPRs but it wrongly ignores those when building an access which leads to the wrong size used when the VIEW_CONVERT_EXPR does not have the same size as its operand which is valid GENERIC and is used by Ada upcasting. PR tree-optimization/121527 * tree-sra.cc (build_access_from_expr_1): Do not strip an outer VIEW_CONVERT_EXPR as it's relevant for the size of the access. (get_access_for_expr): Likewise.
2025-08-19	AArch64: Use vectype from SLP node instead of stmt_info [PR121536]	Tamar Christina	2	-5/+20
	commit g:1786be14e94bf1a7806b9dc09186f021737f0227 stops storing in STMT_VINFO_VECTYPE the vectype of the current stmt being vectorized and instead requires the use of SLP_TREE_VECTYPE for everything but data-refs. This means that STMT_VINFO_VECTYPE (stmt_info) will always be NULL and so aarch64_bool_compound_p will never properly cost predicate AND operations anymore resulting in less vectorization. This patch changes it to use SLP_TREE_VECTYPE and pass the slp_node to aarch64_bool_compound_p. gcc/ChangeLog: PR target/121536 * config/aarch64/aarch64.cc (aarch64_bool_compound_p): Use SLP_TREE_VECTYPE instead of STMT_VINFO_VECTYPE. (aarch64_adjust_stmt_cost, aarch64_vector_costs::count_ops): Pass SLP node to aarch64_bool_compound_p. gcc/testsuite/ChangeLog: PR target/121536 * g++.target/aarch64/sve/pr121536.cc: New test.
2025-08-19	middle-end: Fix costing hooks of various vectorizable_* [PR121536]	Tamar Christina	1	-6/+6
	commit g:1786be14e94bf1a7806b9dc09186f021737f0227 stops storing in STMT_VINFO_VECTYPE the vectype of the current stmt being vectorized and instead requires the use of SLP_TREE_VECTYPE for everything but data-refs. However contrary to what the commit says not all usages of STMT_VINFO_VECTYPE have been purged from vectorizable_* as the costing hooks which don't pass the SLP tree as an agrument will extract vectype using STMT_VINFO_VECTYPE. This results in no vector type being passed to the backends and results in a few costing test failures in AArch64. This commit replaces the last few cases I could find, all except for in vectorizable_reduction when single_defuse_cycle where the stmt being costed is not the representative of the PHI in the SLP tree but rather the out of tree reduction statement. So I've left that alone, but it does mean vectype is NULL. Most likely this needs to use the overload where we pass an explicit vectype but I wasn't sure so left it for now. gcc/ChangeLog: PR target/121536 * tree-vect-loop.cc (vectorizable_phi, vectorizable_recurr, vectorizable_nonlinear_induction, vectorizable_induction): Pass slp_node instead of stmt_info to record_stmt_cost.
2025-08-19	AArch64: Fix scalar costing after removal of vectype from mid-end [PR121536]	Tamar Christina	1	-0/+11
	commit g:fb59c5719c17a04ecfd58b5e566eccd6d2ac583a stops passing the scalar type (confusingly named vectype) to the costing hook when doing scalar costing. As a result, we could no longer distinguish between FPR and GPR scalar stmts. A later commit also removed STMT_VINFO_VECTYPE from stmt_info. This leaves the only remaining option to get the type of the original stmt in the stmt_info. This patch does this when we're performing scalar costing. Ideally I'd refactor this a bit because a lot of the hooks just need to know if it's FP or not, but this seems pointless with the ongoing costing churn. So for now this restores our costing. gcc/ChangeLog: PR target/121536 * config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost): Set vectype from type of lhs of gimple stmt.
2025-08-19	testsuite: Fix g++.dg/abi/mangle83.C [PR121578]	Nathaniel Shead	1	-1/+1
	This testcase (added in r16-3233-g7921bb4afcb7a3) mistakenly only required C++14, but auto template paramaters are a C++17 feature. PR c++/121578 gcc/testsuite/ChangeLog: * g++.dg/abi/mangle83.C: Requires C++17. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2025-08-19	c++/modules: Fix exporting using-decls of unattached purview functions ↵	Nathaniel Shead	3	-2/+26
	[PR120195] We have logic to adjust a function decl if it gets re-declared as a using-decl with different purviewness, but we also need to do the same if it gets redeclared with different exportedness. PR c++/120195 gcc/cp/ChangeLog: * name-lookup.cc (do_nonmember_using_decl): Also handle change in exportedness of a function. gcc/testsuite/ChangeLog: * g++.dg/modules/using-32_a.C: New test. * g++.dg/modules/using-32_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2025-08-19	testsuite: Fix PR108080 testcase for some targets [PR121396]	Nathaniel Shead	1	-1/+1
	I added a testcase for the (temporary) warning that we don't currently support the 'gnu::optimize' or 'gnu::target' attributes in r15-10183; however, some targets produce target nodes even when only an optimize attribute is present. This adjusts the warning. PR c++/108080 PR c++/121396 gcc/testsuite/ChangeLog: * g++.dg/modules/pr108080.H: Also allow target warnings. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2025-08-19	Daily bump.	GCC Administrator	6	-1/+203

2025-08-18	docs: Fix __builtin_object_size example [PR121581]	Andrew Pinski	1	-1/+2
	This example used to work (with C) in GCC 14 before the warning for different pointer types without a cast was changed to an error. The fix is to make the q variable `int` rather than the current `char`. This also fixes the example for C++ too. Pushed as obvious after doing a `make html`. PR middle-end/121581 gcc/ChangeLog: * doc/extend.texi (__builtin_object_size): Fix example. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-18	opts: use sanitize_code_type for sanitizer flags	Indu Bhagat	14	-39/+75
	Currently, the data type of sanitizer flags is unsigned int, with SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual enumerator for enum sanitize_code. Use 'sanitize_code_type' data type to allow for more distinct instrumentation modes be added when needed. gcc/ChangeLog: * flag-types.h (sanitize_code_type): Define. * asan.h (sanitize_flags_p): Use 'sanitize_code_type' instead of 'unsigned int'. * common.opt: Likewise. * dwarf2asm.cc (dw2_output_indirect_constant_1): Likewise. * opts.cc (find_sanitizer_argument): Likewise. (report_conflicting_sanitizer_options): Likewise. (parse_sanitizer_options): Likewise. (parse_no_sanitize_attribute): Likewise. * opts.h (parse_sanitizer_options): Likewise. (parse_no_sanitize_attribute): Likewise. * tree-cfg.cc (print_no_sanitize_attr_value): Likewise. * tree.cc (tree_fits_sanitize_code_type_p): Define. (tree_to_sanitize_code_type): Likewise. * tree.h (tree_fits_sanitize_code_type_p): Declare. (tree_to_sanitize_code_type): Likewise. gcc/c-family/ChangeLog: * c-attribs.cc (add_no_sanitize_value): Use 'sanitize_code_type' instead of 'unsigned int'. (handle_no_sanitize_attribute): Likewise. (handle_no_sanitize_address_attribute): Likewise. (handle_no_sanitize_thread_attribute): Likewise. (handle_no_address_safety_analysis_attribute): Likewise. * c-common.h (add_no_sanitize_value): Likewise. gcc/c/ChangeLog: * c-parser.cc (c_parser_declaration_or_fndef): Use 'sanitize_code_type' instead of 'unsigned int'. gcc/cp/ChangeLog: * typeck.cc (get_member_function_from_ptrfunc): Use 'sanitize_code_type' instead of 'unsigned int'. gcc/d/ChangeLog: * d-attribs.cc (d_handle_no_sanitize_attribute): Use 'sanitize_code_type' instead of 'unsigned int'. Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
2025-08-18	aarch64: add new constants for MTE insns	Indu Bhagat	1	-4/+14
	Define new constants to be used by the MTE pattern definitions. gcc/ * config/aarch64/aarch64.md (MEMTAG_TAG_MASK): New define constant. (MEMTAG_ADDR_MASK): Likewise. (irg, subp, ldg): Use new constants. Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
2025-08-18	gcse: Fix handling of partial clobbers [PR97497]	Richard Sandiford	2	-8/+71
	This patch fixes an internal disagreement in gcse about how to handle partial clobbers. Like many passes, gcse doesn't track the modes of live values, so if a call clobbers only part of a register, the pass has to make conservative assumptions. As the comment in the patch says, this means: (1) ignoring partial clobbers when computing liveness and reaching definitions (2) treating partial clobbers as full clobbers when computing availability DF is mostly concerned with (1), so ignores partial clobbers. compute_hash_table_work did (2) when calculating kill sets, but compute_transp didn't do (2) when computing transparency. This led to a nonsensical situation of a register being in both the transparency and kill sets. gcc/ PR rtl-optimization/97497 * function-abi.h (predefined_function_abi::only_partial_reg_clobbers) (function_abi::only_partial_reg_clobbers): New member functions. * gcse-common.cc: Include regs.h and function-abi.h. (compute_transp): Check for partially call-clobbered registers and treat them as not being transparent in blocks with calls.
2025-08-18	LoongArch: Implement 16-byte atomic add, sub, and, or, xor, and nand with sc.q	Xi Ruoyao	1	-6/+111
	gcc/ChangeLog: * config/loongarch/sync.md (UNSPEC_TI_FETCH_ADD): New unspec. (UNSPEC_TI_FETCH_SUB): Likewise. (UNSPEC_TI_FETCH_AND): Likewise. (UNSPEC_TI_FETCH_XOR): Likewise. (UNSPEC_TI_FETCH_OR): Likewise. (UNSPEC_TI_FETCH_NAND_MASK_INVERTED): Likewise. (ALL_SC): New define_mode_iterator. (_scq): New define_mode_attr. (atomic_fetch_nand<mode>): Accept ALL_SC instead of only GPR. (UNSPEC_TI_FETCH_DIRECT): New define_int_iterator. (UNSPEC_TI_FETCH): New define_int_iterator. (amop_ti_fetch): New define_int_attr. (size_ti_fetch): New define_int_attr. (atomic_fetch_<amop_ti_fetch>ti_scq): New define_insn. (atomic_fetch_<amop_ti_fetch>ti): New define_expand.
2025-08-18	LoongArch: Implement 16-byte atomic exchange with sc.q	Xi Ruoyao	1	-0/+35
	gcc/ChangeLog: * config/loongarch/sync.md (atomic_exchangeti_scq): New define_insn. (atomic_exchangeti): New define_expand.
2025-08-18	LoongArch: Implement 16-byte CAS with sc.q	Xi Ruoyao	1	-0/+89
	gcc/ChangeLog: * config/loongarch/sync.md (atomic_compare_and_swapti_scq): New define_insn. (atomic_compare_and_swapti): New define_expand.
2025-08-18	LoongArch: Implement 16-byte atomic store with sc.q	Xi Ruoyao	2	-1/+28
	When LSX is not available but sc.q is (for example on LA664 where the SIMD unit is not enabled), we can use a LL-SC loop for 16-byte atomic store. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_print_operand_reloc): Accept "%t" for printing the number of the 64-bit machine register holding the upper half of a TImode. * config/loongarch/sync.md (atomic_storeti_scq): New define_insn. (atomic_storeti): expand to atomic_storeti_scq if !ISA_HAS_LSX.
2025-08-18	LoongArch: Add -m[no-]scq option	Xi Ruoyao	8	-5/+31
	We'll use the sc.q instruction for some 16-byte atomic operations, but it's only added in LoongArch 1.1 evolution so we need to gate it with an option. gcc/ChangeLog: * config/loongarch/genopts/isa-evolution.in (scq): New evolution feature. * config/loongarch/loongarch-evolution.cc: Regenerate. * config/loongarch/loongarch-evolution.h: Regenerate. * config/loongarch/loongarch-str.h: Regenerate. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/loongarch.opt.urls: Regenerate. * config/loongarch/loongarch-def.cc: Make -mscq the default for -march=la664 and -march=la64v1.1. * doc/invoke.texi (LoongArch Options): Document -m[no-]scq.
2025-08-18	LoongArch: Implement 16-byte atomic store with LSX	Xi Ruoyao	1	-0/+44
	If the vector is naturally aligned, it cannot cross cache lines so the LSX store is guaranteed to be atomic. Thus we can use LSX to do the lock-free atomic store, instead of using a lock. gcc/ChangeLog: * config/loongarch/sync.md (atomic_storeti_lsx): New define_insn. (atomic_storeti): New define_expand.
2025-08-18	LoongArch: Implement 16-byte atomic load with LSX	Xi Ruoyao	1	-0/+41
	If the vector is naturally aligned, it cannot cross cache lines so the LSX load is guaranteed to be atomic. Thus we can use LSX to do the lock-free atomic load, instead of using a lock. gcc/ChangeLog: * config/loongarch/sync.md (atomic_loadti_lsx): New define_insn. (atomic_loadti): New define_expand.
2025-08-18	LoongArch: Implement atomic_fetch_nand<GPR:mode>	Xi Ruoyao	1	-0/+40
	Without atomic_fetch_nandsi and atomic_fetch_nanddi, __atomic_fetch_nand is expanded to a loop containing a CAS in the body, and CAS itself is a LL-SC loop so we have a nested loop. This is obviously not a good idea as we just need one LL-SC loop in fact. As ~(atom & mask) is (~mask) \| (~atom), we can just invert the mask first and the body of the LL-SC loop would be just one orn instruction. gcc/ChangeLog: * config/loongarch/sync.md (atomic_fetch_nand_mask_inverted<GPR:mode>): New define_insn. (atomic_fetch_nand<GPR:mode>): New define_expand.
2025-08-18	LoongArch: Don't expand atomic_fetch_sub_{hi, qi} to LL-SC loop if -mlam-bh	Xi Ruoyao	1	-1/+1
	With -mlam-bh, we should negate the addend first, and use an amadd instruction. Disabling the expander makes the compiler do it correctly. gcc/ChangeLog: * config/loongarch/sync.md (atomic_fetch_sub<SHORT:mode>): Disable if ISA_HAS_LAM_BH.
2025-08-18	LoongArch: Implement subword atomic_fetch_{and, or, xor} with am*.w instructions	Xi Ruoyao	1	-143/+34
	We can just shift the mask and fill the other bits with 0 (for ior/xor) or 1 (for and), and use an am.w instruction to perform the atomic operation, instead of using a LL-SC loop. gcc/ChangeLog: config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AND): Remove. (UNSPEC_COMPARE_AND_SWAP_XOR): Remove. (UNSPEC_COMPARE_AND_SWAP_OR): Remove. (atomic_test_and_set): Rename to ... (atomic_fetch_<any_bitwise:amop><SHORT:mode>): ... this, and adapt the expansion to use it for any bitwise operations and any val, instead of just ior 1. (atomic_test_and_set): New define_expand.
2025-08-18	LoongArch: Remove unneeded "andi offset, addr, 3" instruction in ↵	Xi Ruoyao	1	-5/+4
	atomic_test_and_set On LoongArch sll.w and srl.w instructions only take the [4:0] bits of rk (shift amount) into account, and we've already defined SHIFT_COUNT_TRUNCATED to 1 so the compiler knows this fact, thus we don't need this instruction. gcc/ChangeLog: * config/loongarch/sync.md (atomic_test_and_set): Remove unneeded andi instruction from the expansion.
2025-08-18	LoongArch: Remove unneeded "b 3f" instruction after LL-SC loops	Xi Ruoyao	2	-13/+10
	This instruction is used to skip an redundant barrier if -mno-ld-seq-sa or the memory model requires a barrier on failure. But with -mld-seq-sa and other memory models the barrier may be nonexisting at all, and we should remove the "b 3f" instruction as well. The implementation uses a new operand modifier "%T" to output a comment marker if the operand is a memory order for which the barrier won't be generated. "%T", and also "%t", are not really used before and the code for them in loongarch_print_operand_reloc is just some MIPS legacy. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_print_operand_reloc): Make "%T" output a comment marker if the operand is a memory order for which the barrier won't be generated; remove "%t". * config/loongarch/sync.md (atomic_cas_value_strong<mode>): Add %T before "b 3f". (atomic_cas_value_cmp_and_7_<mode>): Likewise.
2025-08-18	LoongArch: Don't emit overly-restrictive barrier for LL-SC loops	Xi Ruoyao	1	-12/+9
	For LL-SC loops, if the atomic operation has succeeded, the SC instruction always imply a full barrier, so the barrier we manually inserted only needs to take the account for the failure memorder, not the success memorder (the barrier is skipped with "b 3f" on success anyway). Note that if we use the AMCAS instructions, we indeed need to consider both the success memorder an the failure memorder deciding if "_db" suffix is needed. Thus the semantics of atomic_cas_value_strong<mode> and atomic_cas_value_strong<mode>_amcas start to be different. To prevent the compiler from being too clever, use a different unspec code for AMCAS instructions. gcc/ChangeLog: * config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AMCAS): New UNSPEC code. (atomic_cas_value_strong<mode>): NFC, update the comment to note we only need to consider failure memory order. (atomic_cas_value_strong<mode>_amcas): Use UNSPEC_COMPARE_AND_SWAP_AMCAS instead of UNSPEC_COMPARE_AND_SWAP. (atomic_compare_and_swap<mode:GPR>): Pass failure memorder to gen_atomic_cas_value_strong<mode>. (atomic_compare_and_swap<mode:SHORT>): Pass failure memorder to gen_atomic_cas_value_cmp_and_7_si.
2025-08-18	LoongArch: Allow using bstrins for masking the address in atomic_test_and_set	Xi Ruoyao	1	-5/+6
	We can use bstrins for masking the address here. As people are already working on LA32R (which lacks bstrins instructions), for future-proofing we check whether (const_int -4) is an and_operand and force it into an register if not. gcc/ChangeLog: * config/loongarch/sync.md (atomic_test_and_set): Use bstrins for masking the address if possible.
2025-08-18	LoongArch: Don't use "+" for atomic_{load, store} "m" constraint	Xi Ruoyao	1	-2/+2
	Atomic load does not modify the memory. Atomic store does not read the memory, thus we can use "=" instead. gcc/ChangeLog: * config/loongarch/sync.md (atomic_load<mode>): Remove "+" for the memory operand. (atomic_store<mode>): Use "=" instead of "+" for the memory operand.
2025-08-18	LoongArch: (NFC) Remove amo and use size instead	Xi Ruoyao	1	-28/+25
	They are the same. gcc/ChangeLog: * config/loongarch/sync.md: Use <size> instead of <amo>. (amo): Remove.
2025-08-18	LoongArch: (NFC) Remove atomic_optab and use amop instead	Xi Ruoyao	1	-4/+2
	They are the same. gcc/ChangeLog: * config/loongarch/sync.md (atomic_optab): Remove. (atomic_<atomic_optab><mode>): Change atomic_optab to amop. (atomic_fetch_<atomic_optab><mode>): Likewise.
2025-08-18	Daily bump.	GCC Administrator	3	-1/+99