riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-09-23	dwarf2: add hooks for architecture-specific CFIs	Matthieu Longo	17	-45/+171
	Architecture-specific CFI directives are currently declared an processed among others architecture-independent CFI directives in gcc/dwarf2* files. This approach creates confusion, specifically in the case of DWARF instructions in the vendor space and using the same instruction code. Such a clash currently happen between DW_CFA_GNU_window_save (used on SPARC) and DW_CFA_AARCH64_negate_ra_state (used on AArch64), and both having the same instruction code 0x2d. Then AArch64 compilers generates a SPARC CFI directive (.cfi_window_save) instead of .cfi_negate_ra_state, contrarilly to what is expected in [DWARF for the Arm 64-bit Architecture (AArch64)](https://github.com/ ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst). This refactoring does not solve completely the problem, but improve the situation by moving some of the processing of those directives (more specifically their output in the assembly) to the backend via 2 target hooks: - DW_CFI_OPRND1_DESC: parse the first operand of the directive (if any). - OUTPUT_CFI_DIRECTIVE: output the CFI directive as a string. Additionally, this patch also contains a renaming of an enum used for return address mangling on AArch64. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_output_cfi_directive): New hook for CFI directives. (aarch64_dw_cfi_oprnd1_desc): Same. (TARGET_OUTPUT_CFI_DIRECTIVE): Hook for output_cfi_directive. (TARGET_DW_CFI_OPRND1_DESC): Hook for dw_cfi_oprnd1_desc. * config/sparc/sparc.cc (sparc_output_cfi_directive): New hook for CFI directives. (sparc_dw_cfi_oprnd1_desc): Same. (TARGET_OUTPUT_CFI_DIRECTIVE): Hook for output_cfi_directive. (TARGET_DW_CFI_OPRND1_DESC): Hook for dw_cfi_oprnd1_desc. * coretypes.h (struct dw_cfi_node): Forward declaration of CFI type from gcc/dwarf2out.h. (enum dw_cfi_oprnd_type): Same. (enum dwarf_call_frame_info): Same. * doc/tm.texi: Regenerated from doc/tm.texi.in. * doc/tm.texi.in: Add doc for new target hooks. type of enum to allow forward declaration. * dwarf2cfi.cc (struct dw_cfi_row): Update the description for window_save and ra_mangled. (dwarf2out_frame_debug_cfa_negate_ra_state): Use AArch64 CFI directive instead of the SPARC one. (change_cfi_row): Use the right CFI directive's name for RA mangling. (output_cfi): Remove explicit architecture-specific CFI directive DW_CFA_GNU_window_save that falls into default case. (output_cfi_directive): Use target hook as default. * dwarf2out.cc (dw_cfi_oprnd1_desc): Use target hook as default. * dwarf2out.h (enum dw_cfi_oprnd_type): specify underlying type of enum to allow forward declaration. (dw_cfi_oprnd1_desc): Call target hook. (output_cfi_directive): Use dw_cfi_ref instead of struct dw_cfi_node . hooks.cc (hook_bool_dwcfi_dwcfioprndtyperef_false): New. (hook_bool_FILEptr_dwcfiptr_false): New. * hooks.h (hook_bool_dwcfi_dwcfioprndtyperef_false): New. (hook_bool_FILEptr_dwcfiptr_false): New. * target.def: Documentation for new hooks. include/ChangeLog: * dwarf2.h (enum dwarf_call_frame_info): specify underlying libffi/ChangeLog: * include/ffi_cfi.h (cfi_negate_ra_state): Declare AArch64 cfi directive. libgcc/ChangeLog: * config/aarch64/aarch64-asm.h (PACIASP): Replace SPARC CFI directive by AArch64 one. (AUTIASP): Same. libitm/ChangeLog: * config/aarch64/sjlj.S: Replace SPARC CFI directive by AArch64 one. gcc/testsuite/ChangeLog: * g++.target/aarch64/pr94515-1.C: Replace SPARC CFI directive by AArch64 one. * g++.target/aarch64/pr94515-2.C: Same.
2024-09-23	Rename REG_CFA_TOGGLE_RA_MANGLE to REG_CFA_NEGATE_RA_STATE	Matthieu Longo	4	-11/+11
	The current name REG_CFA_TOGGLE_RA_MANGLE is not representative of what it really is, i.e. a register to represent several states, not only a binary one. Same for dwarf2out_frame_debug_cfa_toggle_ra_mangle. gcc/ChangeLog: * combine-stack-adj.cc (no_unhandled_cfa): Rename. * config/aarch64/aarch64.cc (aarch64_expand_prologue): Rename. (aarch64_expand_epilogue): Rename. * dwarf2cfi.cc (dwarf2out_frame_debug_cfa_toggle_ra_mangle): Rename this... (dwarf2out_frame_debug_cfa_negate_ra_state): To this. (dwarf2out_frame_debug): Rename. * reg-notes.def (REG_CFA_NOTE): Rename REG_CFA_TOGGLE_RA_MANGLE.
2024-09-23	libgcc: hide CIE and FDE data for DWARF architecture extensions behind a ↵	Matthieu Longo	8	-15/+71
	handler. This patch provides a new handler MD_ARCH_FRAME_STATE_T to hide an architecture-specific structure containing CIE and FDE data related to DWARF architecture extensions. Hiding the architecture-specific attributes behind a handler has the following benefits: 1. isolating those data from the generic ones in _Unwind_FrameState 2. avoiding casts to custom types. 3. preserving typing information when debugging with GDB, and so facilitating their printing. This approach required to add a new header md-unwind-def.h included at the top of libgcc/unwind-dw2.h, and redirecting to the corresponding architecture header via a symbolic link. An obvious drawback is the increase in complexity with macros, and headers. It also caused a split of architecture definitions between md-unwind-def.h (types definitions used in unwind-dw2.h) and md-unwind.h (local types definitions and handlers implementations). The naming of md-unwind.h with .h extension is a bit misleading as the file is only included in the middle of unwind-dw2.c. Changing this naming would require modification of others backends, which I prefered to abstain from. Overall the benefits are worth the added complexity from my perspective. libgcc/ChangeLog: * Makefile.in: New target for symbolic link to md-unwind-def.h * config.host: New parameter md_unwind_def_header. Set it to aarch64/aarch64-unwind-def.h for AArch64 targets, or no-unwind.h by default. * config/aarch64/aarch64-unwind.h (aarch64_pointer_auth_key): Move to aarch64-unwind-def.h (aarch64_cie_aug_handler): Update. (aarch64_arch_extension_frame_init): Update. (aarch64_demangle_return_addr): Update. * configure.ac: New substitute variable md_unwind_def_header. * unwind-dw2.h (defined): MD_ARCH_FRAME_STATE_T. * config/aarch64/aarch64-unwind-def.h: New file. * configure: Regenerate. * config/no-unwind.h: Updated comment
2024-09-23	aarch64: skip copy of RA state register into target context	Matthieu Longo	2	-0/+16
	The RA state register is local to a frame, so it should not be copied to the target frame during the context installation. This patch adds a new backend handler that check whether a register needs to be skipped or not before its installation. libgcc/ChangeLog: * config/aarch64/aarch64-unwind.h (MD_FRAME_LOCAL_REGISTER_P): new handler checking whether a register from the current context needs to be skipped before installation into the target context. (aarch64_frame_local_register): Likewise. * unwind-dw2.c (uw_install_context_1): use MD_FRAME_LOCAL_REGISTER_P.
2024-09-23	aarch64: store signing key and signing method in DWARF _Unwind_FrameState	Matthieu Longo	4	-48/+159
	This patch is only a refactoring of the existing implementation of PAuth and returned-address signing. The existing behavior is preserved. _Unwind_FrameState already contains several CIE and FDE information (see the attributes below the comment "The information we care about from the CIE/FDE" in libgcc/unwind-dw2.h). The patch aims at moving the information from DWARF CIE (signing key stored in the augmentation string) and FDE (the used signing method) into _Unwind_FrameState along the already-stored CIE and FDE information. Note: those information have to be saved in frame_state_reg_info instead of _Unwind_FrameState as they need to be savable by DW_CFA_remember_state and restorable by DW_CFA_restore_state, that both rely on the attribute "prev". Those new information in _Unwind_FrameState simplifies the look-up of the signing key when the return address is demangled. It also allows future signing methods to be easily added. _Unwind_FrameState is not a part of the public API of libunwind, so the change is backward compatible. A new architecture-specific handler MD_ARCH_EXTENSION_FRAME_INIT allows to reset values (if needed) in the frame state and unwind context before changing the frame state to the caller context. A new architecture-specific handler MD_ARCH_EXTENSION_CIE_AUG_HANDLER isolates the architecture-specific augmentation strings in AArch64 backend, and allows others architectures to reuse augmentation strings that would have clashed with AArch64 DWARF extensions. aarch64_demangle_return_addr, DW_CFA_AARCH64_negate_ra_state and DW_CFA_val_expression cases in libgcc/unwind-dw2-execute_cfa.h were documented to clarify where the value of the RA state register is stored (FS and CONTEXT respectively). libgcc/ChangeLog: * config/aarch64/aarch64-unwind.h (AARCH64_DWARF_RA_STATE_MASK): The mask for RA state register. (aarch64_ra_signing_method_t): The diversifiers used to sign a function's return address. (aarch64_pointer_auth_key): The key used to sign a function's return address. (aarch64_cie_signed_with_b_key): Deleted as the signing key is available now in _Unwind_FrameState. (MD_ARCH_EXTENSION_CIE_AUG_HANDLER): New CIE augmentation string handler for architecture extensions. (MD_ARCH_EXTENSION_FRAME_INIT): New architecture-extension initialization routine for DWARF frame state and context before execution of DWARF instructions. (aarch64_context_ra_state_get): Read RA state register from CONTEXT. (aarch64_ra_state_get): Read RA state register from FS. (aarch64_ra_state_set): Write RA state register into FS. (aarch64_ra_state_toggle): Toggle RA state register in FS. (aarch64_cie_aug_handler): Handler AArch64 augmentation strings. (aarch64_arch_extension_frame_init): Initialize defaults for the signing key (PAUTH_KEY_A), and RA state register (RA_no_signing). (aarch64_demangle_return_addr): Rely on the frame registers and the signing_key attribute in _Unwind_FrameState. * unwind-dw2-execute_cfa.h: Use the right alias DW_CFA_AARCH64_negate_ra_state for __aarch64__ instead of DW_CFA_GNU_window_save. (DW_CFA_AARCH64_negate_ra_state): Save the signing method in RA state register. Toggle RA state register without resetting 'how' to REG_UNSAVED. * unwind-dw2.c: (extract_cie_info): Save the signing key in the current _Unwind_FrameState while parsing the augmentation data. (uw_frame_state_for): Reset some attributes related to architecture extensions in _Unwind_FrameState. (uw_update_context): Move authentication code to AArch64 unwinding. * unwind-dw2.h (enum register_rule): Give a name to the existing enum for the register rules, and replace 'unsigned char' by 'enum register_rule' to facilitate debugging in GDB. (_Unwind_FrameState): Add a new architecture-extension attribute to store the signing key.
2024-09-23	OpenMP: Fix omp_get_device_from_uid, minor cleanup	Tobias Burnus	10	-15/+60
	In Fortran, omp_get_device_from_uid can also accept substrings, which are then not NUL terminated. Fixed by introducing a fortran.c wrapper function. Additionally, in case of a fail the plugin functions now return NULL instead of failing fatally such that a fall-back UID is generated. gcc/ChangeLog: * omp-general.cc (omp_runtime_api_procname): Strip "omp_" from string; move get_device_from_uid as now a '_' suffix exists. libgomp/ChangeLog: * fortran.c (omp_get_device_from_uid_): New function. * libgomp.map (GOMP_6.0): Add it. * oacc-host.c (host_dispatch): Init '.uid' and '.get_uid_func'. * omp_lib.f90.in: Make it used by removing bind(C). * omp_lib.h.in: Likewise. * target.c (omp_get_device_from_uid): Ensure the device is initialized. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): Add function comment; return NULL in case of an error. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): Likewise. * testsuite/libgomp.fortran/device_uid.f90: Update to test substrings.
2024-09-23	arc: Remove mlra option [PR113954]	Claudiu Zissulescu	4	-18/+4
	The target dependent mlra option was designed to be able to quickly switch between LRA and reload. The reload register allocator step is scheduled for retirement, thus, remove the functionality of mlra, keeping it for backward compatibility. PR target/113954 gcc/ChangeLog: * config/arc/arc.cc (TARGET_LRA_P): Always return true. (arc_lra_p): Remove. * config/arc/arc.h (TARGET_LRA): Remove. * config/arc/arc.opt (mlra): Change it to do nothing. * doc/invoke.texi (mlra): Update option description. Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>
2024-09-23	c++: Don't crash when mangling member with anonymous union or template type ↵	Simon Martin	6	-1/+90
	[PR100632, PR109790] We currently crash upon mangling members that have an anonymous union or a template operator type. The problem is that before calling write_unqualified_name, write_member_name asserts that it has a declaration whose DECL_NAME is an identifier node that is not that of an operator. This is wrong: - In PR100632, it's an anonymous union declaration, hence a 0 DECL_NAME - In PR109790, it's a legitimate template declaration for an operator (this was accepted up to GCC 10) This assert was added via r11-6301, to be sure that we do write the "on" marker for operator members. This patch removes that assert and instead - Lets members with an anonymous union type go through - For operators, adds the missing "on" marker for ABI versions greater than the highest usable with GCC 10 PR c++/109790 PR c++/100632 gcc/cp/ChangeLog: * mangle.cc (write_member_name): Handle members whose type is an anonymous union member. Write missing "on" marker for operators when ABI version is at least 16. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/decltype83.C: New test. * g++.dg/cpp0x/decltype83a.C: New test. * g++.dg/cpp1y/lambda-ice3.C: New test. * g++.dg/cpp1y/lambda-ice3a.C: New test. * g++.dg/cpp2a/nontype-class67.C: New test.
2024-09-23	c++: Don't ICE due to artificial constructor parameters [PR116722]	Simon Martin	2	-1/+25
	The following code triggers an ICE === cut here === class base {}; class derived : virtual public base { public: template<typename Arg> constexpr derived(Arg) {} }; int main() { derived obj(1.); } === cut here === The problem is that cxx_bind_parameters_in_call ends up attempting to convert a REAL_CST (the first non artificial parameter) to INTEGER_TYPE (the type of the __in_chrg parameter), which ICEs. This patch changes cxx_bind_parameters_in_call to return early if it's called with a structor that has an __in_chrg or __vtt_parm parameter since the expression won't be a constant expression. Note that in the test case, the constructor is not constexpr-suitable, however it's OK since it's a template according to my read of paragraph (3) of [dcl.constexpr]. PR c++/116722 gcc/cp/ChangeLog: constexpr.cc (cxx_bind_parameters_in_call): Leave early for {con,de}structors of classes with virtual bases. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/constexpr-ctor22.C: New test.
2024-09-23	Add myself to write after approval	Saurabh Jha	1	-0/+1
	ChangeLog: * MAINTAINERS: Add myself to write after approval.
2024-09-23	tree-optimization/116810 - out-of-bound access to matches[]	Richard Biener	1	-1/+1
	The following makes sure to apply forced splitting of groups for firced single-lane SLP only when the group being analyzed has more than one lane. This avoids an out-of-bound access to matches[]. PR tree-optimization/116810 * tree-vect-slp.cc (vect_build_slp_instance): Onlu force splitting for group_size > 1.
2024-09-23	tree-optimization/116796 - virtual LC SSA broken after unrolling	Richard Biener	1	-4/+6
	When the unroller unloops loops it tracks whether it changes any nesting relationship of remaining loops but when scanning a loops preheader it fails to pass down the LC-SSA-invalidated bitmap, losing the fact that an unrolled formerly inner loop can now be placed on an exit of its outer loop. The following fixes that. PR tree-optimization/116796 * cfgloopmanip.cc (fix_loop_placements): Get LC-SSA-invalidated bitmap and pass it on. (remove_path): Pass LC-SSA-invalidated to fix_loop_placements.
2024-09-23	middle-end: Insert invariant instructions before the gsi [PR116812]	Tamar Christina	2	-4/+19
	The new invariant statements should be inserted before the current statement and not after. This goes fine 99% of the time but when the current statement is a gcond the control flow gets corrupted. gcc/ChangeLog: PR tree-optimization/116812 * tree-vect-slp.cc (vect_slp_region): Fix insertion. gcc/testsuite/ChangeLog: PR tree-optimization/116812 * gcc.dg/vect/pr116812.c: New test.
2024-09-23	tree-optimization/116791 - Elementwise SLP vectorization	Richard Biener	2	-6/+37
	The following restricts the elementwise SLP vectorization to the single-lane case which is the reason I enabled it to avoid regressions with non-SLP. The PR shows that multi-line SLP loads with elementwise accesses require work, I'll open a new bug to track this for the future. PR tree-optimization/116791 * tree-vect-stmts.cc (get_group_load_store_type): Only fall back to elementwise access for single-lane SLP, restore hard failure mode for other cases. * gcc.dg/vect/pr116791.c: New testcase.
2024-09-23	gcn/mkoffload.cc: Re-add fprintf for #include of stdlib.h/stdbool.h	Tobias Burnus	1	-0/+6
	In commit r15-3629-g508ef585243d4674d06b0737bfe8769fc18f824f, #embed was added and no longer required fprintf '#include' removed, missing somehow that with -mstack-size=, the generated configure_stack_size will use 'setenv' and 'true'. gcc/ChangeLog: * config/gcn/mkoffload.cc (process_asm): (Re)add the fprintf lines for stdlib.h/stdbool.h inclusion if gcn_stack_size is used.
2024-09-23	Genmatch: Fix ICE for binary phi cfg mismatching [PR116795]	Pan Li	2	-1/+15
	This patch would like to fix one ICE when try to match the binary phi for below cfg. We check the first edge of the Phi block comes from b0, instead of check the only one edge of b1 comes from the b0 too. Thus, it will result in some code to be recog as .SAT_SUB but it is not, and finally result the verify_ssa failure. +------+ \| b0: \| \| def \| +-----+ \| ... \| \| b1: \| \| cond \|------>\| def \| +------+ \| ... \| \| +-----+ \| \| \| \| v \| +-----+ \| \| b2: \| \| \| Phi \|<----------+ +-----+ The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. PR target/116795 gcc/ChangeLog: * gimple-match-head.cc (match_cond_with_binary_phi): Fix the incorrect cfg check as b0->b1 in above example. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr116795-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-23	gimple: Simplify gimple_seq_nondebug_singleton_p	Andrew Pinski	1	-21/+2
	The implementation of gimple_seq_nondebug_singleton_p was convoluted on how to determine if the sequence was a singleton (which could contain debug statements). This simplifies the function into two calls. One to get the start after all of the debug statements and then check to see if it is at the one before the end (or there is only debug statements afterwards). Bootstrapped and tested on x86_64-linux-gnu (including ada). gcc/ChangeLog: * gimple-iterator.h (gimple_seq_nondebug_singleton_p): Rewrite to be simplely, gsi_start_nondebug/gsi_one_nondebug_before_end_p. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-23	gimple: Remove custom remove_pointer	Andrew Pinski	1	-6/+2
	Since r11-2700-g22dc89f8073cd0, type_traits has been included via system.h so we don't need a custom version for gimple.h. Note a small C++14 cleanup is to use remove_pointer_t directly here instead of remove_pointer<t>::type. bootstrapped and tested on x86_64-linux-gnu gcc/ChangeLog: * gimple.h (remove_pointer): Remove. (GIMPLE_CHECK2): Use std::remove_pointer instead of custom one. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-23	Remove commented out PHI_ARG_DEF macro defition	Andrew Pinski	1	-3/+0
	This was commented out since r0-125500-g80560f9521f81a and a new defition was added at the same time. Let's remove the commented out version. gcc/ChangeLog: * tree-ssa-operands.h (PHI_ARG_DEF): Remove definition. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-23	Update email in MAINTAINERS file.	Aldy Hernandez	1	-4/+5
	ChangeLog: * MAINTAINERS: Update email and add myself to DCO.
2024-09-23	Match: Support form 2 for vector signed integer .SAT_ADD	Pan Li	1	-0/+16
	This patch would like to support the form 2 of the vector signed integer .SAT_ADD. Aka below example: Form 2: #define DEF_VEC_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_add_##T##_fmt_2 (T out, T op_1, T op_2, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ T x = op_1[i]; \ T y = op_2[i]; \ T sum = (UT)x + (UT)y; \ if ((x ^ y) < 0 \|\| (sum ^ x) >= 0) \ out[i] = sum; \ else \ out[i] = x < 0 ? MIN : MAX; \ } \ } DEF_VEC_SAT_S_ADD_FMT_2(int8_t, uint8_t, INT8_MIN, INT8_MAX) Before this patch: 104 │ loop_len_79 = MIN_EXPR <ivtmp.51_53, POLY_INT_CST [16, 16]>; 105 │ _50 = &MEM <vector([16,16]) signed char> [(int8_t )vectp_op_1.9_77]; 106 │ vect_x_18.11_80 = .MASK_LEN_LOAD (_50, 8B, { -1, ... }, loop_len_79, 0); 107 │ _70 = vect_x_18.11_80 >> 7; 108 │ vect_x.12_81 = VIEW_CONVERT_EXPR<vector([16,16]) unsigned char>(vect_x_18.11_80); 109 │ _26 = (void ) ivtmp.47_20; 110 │ _27 = &MEM <vector([16,16]) signed char> [(int8_t )_26]; 111 │ vect_y_20.15_84 = .MASK_LEN_LOAD (_27, 8B, { -1, ... }, loop_len_79, 0); 112 │ vect__7.21_90 = vect_x_18.11_80 ^ vect_y_20.15_84; 113 │ mask__50.23_92 = vect__7.21_90 >= { 0, ... }; 114 │ vect_y.16_85 = VIEW_CONVERT_EXPR<vector([16,16]) unsigned char>(vect_y_20.15_84); 115 │ vect__6.17_86 = vect_x.12_81 + vect_y.16_85; 116 │ vect_sum_21.18_87 = VIEW_CONVERT_EXPR<vector([16,16]) signed char>(vect__6.17_86); 117 │ vect__8.19_88 = vect_x_18.11_80 ^ vect_sum_21.18_87; 118 │ mask__45.20_89 = vect__8.19_88 < { 0, ... }; 119 │ mask__44.24_93 = mask__45.20_89 & mask__50.23_92; 120 │ _40 = .COND_XOR (mask__44.24_93, _70, { 127, ... }, vect_sum_21.18_87); 121 │ _60 = (void ) ivtmp.49_6; 122 │ _61 = &MEM <vector([16,16]) signed char> [(int8_t )_60]; 123 │ .MASK_LEN_STORE (_61, 8B, { -1, ... }, loop_len_79, 0, _40); 124 │ vectp_op_1.9_78 = vectp_op_1.9_77 + POLY_INT_CST [16, 16]; 125 │ ivtmp.47_4 = ivtmp.47_20 + POLY_INT_CST [16, 16]; 126 │ ivtmp.49_21 = ivtmp.49_6 + POLY_INT_CST [16, 16]; 127 │ ivtmp.51_98 = ivtmp.51_53; 128 │ ivtmp.51_8 = ivtmp.51_53 + POLY_INT_CST [18446744073709551600, 18446744073709551600]; After this patch: 88 │ _103 = .SELECT_VL (ivtmp_101, POLY_INT_CST [16, 16]); 89 │ vect_x_18.11_90 = .MASK_LEN_LOAD (vectp_op_1.9_88, 8B, { -1, ... }, _103, 0); 90 │ vect_y_20.14_94 = .MASK_LEN_LOAD (vectp_op_2.12_92, 8B, { -1, ... }, _103, 0); 91 │ vect_patt_49.15_95 = .SAT_ADD (vect_x_18.11_90, vect_y_20.14_94); 92 │ .MASK_LEN_STORE (vectp_out.16_97, 8B, { -1, ... }, _103, 0, vect_patt_49.15_95); 93 │ vectp_op_1.9_89 = vectp_op_1.9_88 + _103; 94 │ vectp_op_2.12_93 = vectp_op_2.12_92 + _103; 95 │ vectp_out.16_98 = vectp_out.16_97 + _103; 96 │ ivtmp_102 = ivtmp_101 - _103; The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add the case 3 for signed .SAT_ADD matching. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-23	RISC-V: Add testcases for form 2 of signed vector SAT_ADD	Pan Li	9	-0/+128
	Form 2: #define DEF_VEC_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_add_##T##_fmt_2 (T out, T op_1, T op_2, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ T x = op_1[i]; \ T y = op_2[i]; \ T sum = (UT)x + (UT)y; \ if ((x ^ y) < 0 \|\| (sum ^ x) >= 0) \ out[i] = sum; \ else \ out[i] = x < 0 ? MIN : MAX; \ } \ } DEF_VEC_SAT_S_ADD_FMT_2 (int8_t, uint8_t, INT8_MIN, INT8_MAX) The below test are passed for this patch. The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macro. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-5.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-6.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-7.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-8.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-5.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-6.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-7.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-23	testsuite/gfortran.dg/unsigned_22.f90: Add missing close with delete, PR116701	Hans-Peter Nilsson	1	-0/+1
	Without this patch, gfortran.dg/unsigned_22.f90 fails for non-effective-target fd_truncate targets, i.e. targets that don't support chsize or ftruncate. See also libgfortran/io/unix.c:raw_truncate. It passes on the first run, but leaves behind a file "fort.10" which is then picked up by subsequent runs, but since that file is to be rewritten, the libgfortran machinery tries to truncate it, which fails. The file always being left behind, is primarily because the test-case lacks a deleting close-statement, apparently accidentally. Incidentally, this "fort.10" artefact is also picked up by gfortran.dg/write_check3.f90 causing that test to fail too, observable as a regression for non-fd_truncate targets since the unsigned_22.f90 introduction. Also, when running e.g. the whole of gfortran.dg/dg.exp, the "fort.10" is later deleted by gfortran.dg/write_direct_eor.f90 (which regardlessly passes), erasing the clue of the cause of the write_check3 failure. Also, running just dg.exp=write_check3.f90 or manually repeating the commands in gfortran.log showed no error. N.B.: this close-statement will not help if unsigned_22 for some reason fails, executing one of the "stop" statements, but that's also the case for many other tests. PR testsuite/116701 * gfortran.dg/unsigned_22.f90: Add missing close with delete.
2024-09-23	Daily bump.	GCC Administrator	5	-1/+142

2024-09-23	RISC-V: Add testcases for form 4 of signed scalar SAT_ADD	Pan Li	9	-0/+200
	Form 4: #define DEF_SAT_S_ADD_FMT_4(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_add_##T##_fmt_4 (T x, T y) \ { \ T sum; \ bool overflow = __builtin_add_overflow (x, y, &sum); \ return !overflow ? sum : x < 0 ? MIN : MAX; \ } DEF_SAT_S_ADD_FMT_4 (int64_t, uint64_t, INT64_MIN, INT64_MAX) The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_add-13.c: New test. * gcc.target/riscv/sat_s_add-14.c: New test. * gcc.target/riscv/sat_s_add-15.c: New test. * gcc.target/riscv/sat_s_add-16.c: New test. * gcc.target/riscv/sat_s_add-run-13.c: New test. * gcc.target/riscv/sat_s_add-run-14.c: New test. * gcc.target/riscv/sat_s_add-run-15.c: New test. * gcc.target/riscv/sat_s_add-run-16.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-23	RISC-V: Add testcases for form 3 of signed scalar SAT_ADD	Pan Li	9	-0/+200
	This patch would like to add testcases of the signed scalar SAT_ADD for form 3. Aka: Form 3: #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_add_##T##_fmt_3 (T x, T y) \ { \ T sum; \ bool overflow = __builtin_add_overflow (x, y, &sum); \ return overflow ? x < 0 ? MIN : MAX : sum; \ } DEF_SAT_S_ADD_FMT_3 (int64_t, uint64_t, INT64_MIN, INT64_MAX) The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_add-10.c: New test. * gcc.target/riscv/sat_s_add-11.c: New test. * gcc.target/riscv/sat_s_add-12.c: New test. * gcc.target/riscv/sat_s_add-9.c: New test. * gcc.target/riscv/sat_s_add-run-10.c: New test. * gcc.target/riscv/sat_s_add-run-11.c: New test. * gcc.target/riscv/sat_s_add-run-12.c: New test. * gcc.target/riscv/sat_s_add-run-9.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-22	testsuite, coroutines: Add tests for non-supension ramp returns.	Iain Sandoe	2	-0/+256
	Although it is most common for the ramp function to see a return when a coroutine first suspends, there are other possibilities. For example all the awaits could be ready - effectively the coroutine will then run to completion and deallocation. Another case is where the first active suspension point causes the current routine to be cancelled and thence destroyed. These cases are tested here. gcc/testsuite/ChangeLog: * g++.dg/coroutines/torture/special-termination-00-sync-completion.C: New test. * g++.dg/coroutines/torture/special-termination-01-self-destruct.C: New test. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-09-22	libgcc, Darwin: From macOS 11, make that the earliest supported.	Iain Sandoe	2	-1/+7
	For libgcc, we have (so far) supported building a DSO that supports earlier versions of the OS than the target. From macOS 11, there are APIs that do not exist on earlier OS versions, so limit the libgcc range to macOS11..current. libgcc/ChangeLog: * config.host: From macOS 11, limit earliest macOS support to macOS 11. * config/t-darwin-min-11: New file. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-09-22	libstdc++: Disable std::formatter<char8_t, C> specialization	Jonathan Wakely	2	-25/+45
	I noticed that char8_t was missing from the list of types that were prevented from using the std::formatter partial specialization for integer types. That partial specialization was also matching cv-qualified integer types, because std::integral<const int> is true. This change simplifies the constraints by introducing a new variable template which is only true for cv-unqualified integer types, with explicit specializations to exclude the character types. This should be slightly more efficient than the previous constraints that checked std::integral<T> and (!__is_one_of<T, char, wchar_t, ...>). It also avoids the need for a separate std::formatter specialization for 128-bit integers, as they can be handled by the new variable template too. libstdc++-v3/ChangeLog: * include/std/format (__format::__is_formattable_integer): New variable template and specializations. (template<integral, __char> struct formatter): Replace constraints on first arg with __is_formattable_integer. * testsuite/std/format/formatter/requirements.cc: Check that std::formatter specializations for char8_t and const int are disabled.
2024-09-22	libstdc++: Fix condition for ranges::copy to use memmove [PR116754]	Jonathan Wakely	1	-1/+1
	libstdc++-v3/ChangeLog: PR libstdc++/116754 * include/bits/ranges_algobase.h (__copy_or_move): Fix order of arguments to __memcpyable.
2024-09-22	libstdc++: Fix formatting of most negative chrono::duration [PR116755]	Jonathan Wakely	2	-2/+22
	When formatting chrono::duration<signed-integer-type, P>::min() we were causing undefined behaviour by trying to form the negative of the most negative value. If we convert negative durations with integer rep to the corresponding unsigned integer rep then we can safely represent all values. libstdc++-v3/ChangeLog: PR libstdc++/116755 * include/bits/chrono_io.h (formatter<duration<R,P>>::format): Cast negative integral durations to unsigned rep. * testsuite/20_util/duration/io.cc: Test the most negative integer durations.
2024-09-22	libstdc++: Use constexpr instead of _GLIBCXX20_CONSTEXPR in <vector>	Jonathan Wakely	1	-3/+3
	For the operator<=> overload we can use the 'constexpr' keyword directly, because we know the language dialect is at least C++20. libstdc++-v3/ChangeLog: * include/bits/stl_vector.h (operator<=>): Use constexpr instead of _GLIBCXX20_CONSTEXPR macro.
2024-09-22	libstdc++: Silence -Wattributes warning in exception_ptr	Jonathan Wakely	1	-2/+1
	libstdc++-v3/ChangeLog: * libsupc++/exception_ptr.h (__exception_ptr::_M_safe_bool_dummy): Remove __attribute__((const)) from function returning void.
2024-09-22	libstdc++: Silence -Woverloaded-virtual warning in cxx11-ios_failure.cc	Jonathan Wakely	1	-0/+2
	libstdc++-v3/ChangeLog: * src/c++11/cxx11-ios_failure.cc (__iosfail_type_info): Unhide the three-arg overload of __do_upcast.
2024-09-22	libstdc++: Reorder C++26 entries in version.def	Jonathan Wakely	2	-37/+37
	This puts the C++26 ftms definitions in alphabetical order. libstdc++-v3/ChangeLog: * include/bits/version.def: Sort C++26 entries alphabetically. * include/bits/version.h: Regenerate.
2024-09-22	libstdc++: add default template parameters to algorithms	Jonathan Wakely	19	-81/+392
	This implements P2248R8 + P3217R0, both approved for C++26. The changes are mostly mechanical; the struggle is to keep readability with the pre-P2248 signatures. * For containers, "classic STL" algorithms and their parallel versions, introduce a macro and amend their declarations/definitions with it. The macro either expands to the defaulted parameter or to nothing in pre-C++26 modes. * For range algorithms, we need to reorder their template parameters. I've done so unconditionally, because users cannot rely on template parameters of algorithms (this is explicitly authorized by [algorithms.requirements]/15). The defaults are then hidden behind another macro. libstdc++-v3/ChangeLog: * include/bits/iterator_concepts.h: Add projected_value_t. * include/bits/algorithmfwd.h: Add the default template parameter to the relevant forward declarations. * include/pstl/glue_algorithm_defs.h: Likewise. * include/bits/ranges_algo.h: Add the default template parameter to range-based algorithms. * include/bits/ranges_algobase.h: Likewise. * include/bits/ranges_util.h: Likewise. * include/bits/ranges_base.h: Add helper macros. * include/bits/stl_iterator_base_types.h: Add helper macro. * include/bits/version.def: Add the new feature-testing macro. * include/bits/version.h: Regenerate. * include/std/algorithm: Pull the feature-testing macro. * include/std/ranges: Likewise. * include/std/deque: Pull the feature-testing macro, add the default for std::erase. * include/std/forward_list: Likewise. * include/std/list: Likewise. * include/std/string: Likewise. * include/std/vector: Likewise. * testsuite/23_containers/default_template_value.cc: New test. * testsuite/25_algorithms/default_template_value.cc: New test. Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com> Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
2024-09-22	middle-end: lower COND_EXPR into gimple form in vect_recog_bool_pattern	Tamar Christina	9	-9/+139
	Currently the vectorizer cheats when lowering COND_EXPR during bool recog. In the cases where the conditonal is loop invariant or non-boolean it instead converts the operation back into GENERIC and hides much of the operation from the analysis part of the vectorizer. i.e. a ? b : c is transformed into: a != 0 ? b : c however by doing so we can't perform any optimization on the mask as they aren't explicit until quite late during codegen. To fix this this patch lowers booleans earlier and so ensures that we are always in GIMPLE. For when the value is a loop invariant boolean we have to generate an additional conversion from bool to the integer mask form. This is done by creating a loop invariant a ? -1 : 0 with the target mask precision and then doing a normal != 0 comparison on that. To support this the patch also adds the ability to during pattern matching create a loop invariant pattern that won't be seen by the vectorizer and will instead me materialized inside the loop preheader in the case of loops, or in the case of BB vectorization it materializes it in the first BB in the region. gcc/ChangeLog: * tree-vect-patterns.cc (append_inv_pattern_def_seq): New. (vect_recog_bool_pattern): Lower COND_EXPRs. * tree-vect-slp.cc (vect_slp_region): Materialize loop invariant statements. * tree-vect-loop.cc (vect_transform_loop): Likewise. * tree-vect-stmts.cc (vectorizable_comparison_1): Remove VECT_SCALAR_BOOLEAN_TYPE_P handling for vectype. * tree-vectorizer.cc (vec_info::vec_info): Initialize inv_pattern_def_seq. * tree-vectorizer.h (LOOP_VINFO_INV_PATTERN_DEF_SEQ): New. (class vec_info): Add inv_pattern_def_seq. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-conditional_store_1.c: New test. * gcc.dg/vect/vect-conditional_store_5.c: New test. * gcc.dg/vect/vect-conditional_store_6.c: New test.
2024-09-22	aarch64: Take into account when VF is higher than known scalar iters	Tamar Christina	10	-29/+79
	Consider low overhead loops like: void foo (char restrict a, int restrict b, int restrict c, int n) { for (int i = 0; i < 9; i++) { int res = c[i]; int t = b[i]; if (a[i] != 0) res = t; c[i] = res; } } For such loops we use latency only costing since the loop bounds is known and small. The current costing however does not consider the case where niters < VF. So when comparing the scalar vs vector costs it doesn't keep in mind that the scalar code can't perform VF iterations. This makes it overestimate the cost for the scalar loop and we incorrectly vectorize. This patch takes the minimum of the VF and niters in such cases. Before the patch we generate: note: Original vector body cost = 46 note: Vector loop iterates at most 1 times note: Scalar issue estimate: note: load operations = 2 note: store operations = 1 note: general operations = 1 note: reduction latency = 0 note: estimated min cycles per iteration = 1.000000 note: estimated cycles per vector iteration (for VF 32) = 32.000000 note: SVE issue estimate: note: load operations = 5 note: store operations = 4 note: general operations = 11 note: predicate operations = 12 note: reduction latency = 0 note: estimated min cycles per iteration without predication = 5.500000 note: estimated min cycles per iteration for predication = 12.000000 note: estimated min cycles per iteration = 12.000000 note: Low iteration count, so using pure latency costs note: Cost model analysis: vs after: note: Original vector body cost = 46 note: Known loop bounds, capping VF to 9 for analysis note: Vector loop iterates at most 1 times note: Scalar issue estimate: note: load operations = 2 note: store operations = 1 note: general operations = 1 note: reduction latency = 0 note: estimated min cycles per iteration = 1.000000 note: estimated cycles per vector iteration (for VF 9) = 9.000000 note: SVE issue estimate: note: load operations = 5 note: store operations = 4 note: general operations = 11 note: predicate operations = 12 note: reduction latency = 0 note: estimated min cycles per iteration without predication = 5.500000 note: estimated min cycles per iteration for predication = 12.000000 note: estimated min cycles per iteration = 12.000000 note: Increasing body cost to 1472 because the scalar code could issue within the limit imposed by predicate operations note: Low iteration count, so using pure latency costs note: Cost model analysis: gcc/ChangeLog: config/aarch64/aarch64.cc (adjust_body_cost): Cap VF for low iteration loops. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/asrdiv_4.c: Update bounds. * gcc.target/aarch64/sve/cond_asrd_2.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_6.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_7.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_8.c: Likewise. * gcc.target/aarch64/sve/miniloop_1.c: Likewise. * gcc.target/aarch64/sve/spill_6.c: Likewise. * gcc.target/aarch64/sve/sve_iters_low_1.c: New test. * gcc.target/aarch64/sve/sve_iters_low_2.c: New test.
2024-09-22	Daily bump.	GCC Administrator	6	-1/+193

2024-09-21	fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]	Mikael Morin	10	-5/+927
	Introduce the -finline-intrinsics flag to control from the command line whether to generate either inline code or calls to the functions from the library, for the MINLOC and MAXLOC intrinsics. The flag allows to specify inlining either independently for each intrinsic (either MINLOC or MAXLOC), or all together. For each intrinsic, a default value is set if none was set. The default value depends on the optimization setting: inlining is avoided if not optimizing or if optimizing for size; otherwise inlining is preferred. There is no direct support for this behaviour provided by the .opt options framework. It is obtained by defining three different variants of the flag (finline-intrinsics, fno-inline-intrinsics, finline-intrinsics=) all using the same underlying option variable. Each enum value (corresponding to an intrinsic function) uses two identical bits, and the variable is initialized with alternated bits, so that we can tell whether the value was set or not by checking whether the two bits have different values. PR fortran/90608 gcc/ChangeLog: * flag-types.h (enum gfc_inlineable_intrinsics): New type. gcc/fortran/ChangeLog: * invoke.texi(finline-intrinsics): Document new flag. * lang.opt (finline-intrinsics, finline-intrinsics=, fno-inline-intrinsics): New flags. * options.cc (gfc_post_options): If the option variable controlling the inlining of MAXLOC (respectively MINLOC) has not been set, set it or clear it depending on the optimization option variables. * trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Return false if inlining for the intrinsic is disabled according to the option variable. gcc/testsuite/ChangeLog: * gfortran.dg/minmaxloc_18.f90: New test. * gfortran.dg/minmaxloc_18a.f90: New test. * gfortran.dg/minmaxloc_18b.f90: New test. * gfortran.dg/minmaxloc_18c.f90: New test. * gfortran.dg/minmaxloc_18d.f90: New test.
2024-09-21	fortran: Continue MINLOC/MAXLOC second loop where the first stopped [PR90608]	Mikael Morin	1	-2/+31
	Continue the second set of loops where the first one stopped in the generated inline MINLOC/MAXLOC code in the cases where the generated code contains two sets of loops. This fixes a regression that was introduced when enabling the generation of inline MINLOC/MAXLOC code with ARRAY of rank greater than 1, no DIM argument, and either non-scalar MASK or floating- point ARRAY. In the cases where two sets of loops are generated as inline MINLOC/MAXLOC code, we previously generated code such as (for rank 2 ARRAY, so with two levels of nesting): for (idx11 in lower1..upper1) { for (idx12 in lower2..upper2) { ... if (...) { ... goto second_loop; } } } second_loop: for (idx21 in lower1..upper1) { for (idx22 in lower2..upper2) { ... } } which means we process the first elements twice, once in the first set of loops and once in the second one. This change avoids this duplicate processing by using a conditional as lower bound for the second set of loops, generating code like: second_loop_entry = false; for (idx11 in lower1..upper1) { for (idx12 in lower2..upper2) { ... if (...) { ... second_loop_entry = true; goto second_loop; } } } second_loop: for (idx21 in (second_loop_entry ? idx11 : lower1)..upper1) { for (idx22 in (second_loop_entry ? idx12 : lower2)..upper2) { ... second_loop_entry = false; } } It was expected that the compiler optimizations would be able to remove the state variable second_loop_entry. It is the case if ARRAY has rank 1 (so without loop nesting), the variable is removed and the loop bounds become unconditional, which restores previously generated code, fully fixing the regression. For larger rank, unfortunately, the state variable and conditional loop bounds remain, but those cases were previously using library calls, so it's not a regression. PR fortran/90608 gcc/fortran/ChangeLog: * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate a set of index variables. Set them using the loop indexes before leaving the first set of loops. Generate a new loop entry predicate. Initialize it. Set it before leaving the first set of loops. Clear it in the body of the second set of loops. For the second set of loops, update each loop lower bound to use the corresponding index variable if the predicate variable is set.
2024-09-21	fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608]	Mikael Morin	3	-48/+87
	Enable generation of inline MINLOC/MAXLOC code in the case where DIM is not present, and either ARRAY is of floating point type or MASK is an array. Those cases are the remaining bits to fully support inlining of non-CHARACTER MINLOC/MAXLOC without DIM. They are treated together because they generate similar code, the NANs for REAL types being handled a bit like a second level of masking. These are the cases for which we generate two sets of loops. This change affects the code generating the second loop, that was previously accessible only in the cases ARRAY has rank 1 only. The single variable initialization and update are changed to apply to multiple variables, one per dimension. The code generated is as follows (if ARRAY has rank 2): for (idx11 in lower1..upper1) { for (idx12 in lower2..upper2) { ... if (...) { ... goto second_loop; } } } second_loop: for (idx21 in lower1..upper1) { for (idx22 in lower2..upper2) { ... } } This code leads to processing the first elements redundantly, both in the first set of loops and in the second one. The loop over idx22 could start from idx12 the first time it is run, but as it has to start from lower2 for the rest of the runs, this change uses the same bounds for both set of loops for simplicity. In the rank 1 case, this makes the generated code worse compared to the inline code that was generated before. A later change will introduce conditionals to avoid the duplicate processing and restore the generated code in that case. PR fortran/90608 gcc/fortran/ChangeLog: * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Initialize and update all the variables. Put the label and goto in the outermost scalarizer loop. Don't start the second loop where the first stopped. (gfc_inline_intrinsic_function_p): Also return TRUE for array MASK or for any REAL type. gcc/testsuite/ChangeLog: * gfortran.dg/maxloc_bounds_5.f90: Additionally accept error messages reported by the scalarizer. * gfortran.dg/maxloc_bounds_6.f90: Ditto.
2024-09-21	fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK [PR90608]	Mikael Morin	2	-7/+10
	Enable the generation of inline code for MINLOC/MAXLOC when argument ARRAY is of integral type, DIM is not present, and MASK is present and is scalar (only absent MASK or rank 1 ARRAY were inlined before). Scalar masks are implemented with a wrapping condition around the code one would generate if MASK wasn't present, so they are easy to support once inline code without MASK is working. PR fortran/90608 gcc/fortran/ChangeLog: * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate variable initialization for each dimension in the else branch of the toplevel condition. (gfc_inline_intrinsic_function_p): Return TRUE for scalar MASK. gcc/testsuite/ChangeLog: * gfortran.dg/maxloc_bounds_7.f90: Additionally accept the error message reported by the scalarizer.
2024-09-21	fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK [PR90608]	Mikael Morin	3	-57/+165
	Enable generation of inline code for the MINLOC and MAXLOC intrinsic, if the ARRAY argument is of integral type and of any rank (only the rank 1 case was previously inlined), and neither DIM nor MASK arguments are present. This needs a few adjustments in gfc_conv_intrinsic_minmaxloc, mainly to replace the single variables POS and OFFSET, with collections of variables, one variable per dimension each. The restriction to integral ARRAY and absent MASK limits the scope of the change to the cases where we generate single loop inline code. The code generation for the second loop is only accessible with ARRAY of rank 1, so it can continue using a single variable. A later change will extend inlining to the double loop cases. There is some bounds checking code that was previously handled by the library, and that needed some changes in the scalarizer to avoid regressing. The bounds check code generation was already supported by the scalarizer, but it was only applying to array reference sections, checking both for array bound violation and for shape conformability between all the involved arrays. With this change, for MINLOC or MAXLOC, enable the conformability check between all the scalarized arrays, and disable the array bound violation check. PR fortran/90608 gcc/fortran/ChangeLog: * trans-array.cc (gfc_conv_ss_startstride): Set the MINLOC/MAXLOC result upper bound using the rank of the ARRAY argument. Ajdust the error message for intrinsic result arrays. Only check array bounds for array references. Move bound check decision code... (bounds_check_needed): ... here as a new predicate. Allow bound check for MINLOC/MAXLOC intrinsic results. * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Change the result array upper bound to the rank of ARRAY. Update the NONEMPTY variable to depend on the non-empty extent of every dimension. Use one variable per dimension instead of a single variable for the position and the offset. Update their declaration, initialization, and update to affect the variable of each dimension. Use the first variable only in areas only accessed with rank 1 ARRAY argument. Set every element of the result using its corresponding variable. (gfc_inline_intrinsic_function_p): Return true for integral ARRAY and absent DIM and MASK. gcc/testsuite/ChangeLog: * gfortran.dg/maxloc_bounds_4.f90: Additionally accept the error message emitted by the scalarizer.
2024-09-21	fortran: Outline array bound check generation code	Mikael Morin	1	-154/+143
	gcc/fortran/ChangeLog: * trans-array.cc (gfc_conv_ss_startstride): Move array bound check generation code... (add_check_section_in_array_bounds): ... here as a new function.
2024-09-21	fortran: Remove MINLOC/MAXLOC frontend optimization	Mikael Morin	1	-58/+0
	Remove the frontend pass rewriting calls of MINLOC/MAXLOC without DIM to calls with one-valued DIM enclosed in an array constructor. This transformation was circumventing the limitation of inline MINLOC/MAXLOC code generation to scalar cases only, allowing inline code to be generated if ARRAY had rank 1 and DIM was absent. As MINLOC/MAXLOC has gained support of inline code generation in that case, the limitation is no longer effective, and the transformation no longer necessary. gcc/fortran/ChangeLog: * frontend-passes.cc (optimize_minmaxloc): Remove. (optimize_expr): Remove dispatch to optimize_minmaxloc.
2024-09-21	fortran: Inline MINLOC/MAXLOC with no DIM and ARRAY of rank 1 [PR90608]	Mikael Morin	2	-68/+181
	Enable inline code generation for the MINLOC and MAXLOC intrinsic, if the DIM argument is not present and ARRAY has rank 1. This case is similar to the case where the result is scalar (DIM present and rank 1 ARRAY), which already supports inline expansion of the intrinsic. Both cases return the same value, with the difference that the result is an array of size 1 if DIM is absent, whereas it's a scalar if DIM is present. So all there is to do for the new case to work is hook the inline expansion with the scalarizer. PR fortran/90608 gcc/fortran/ChangeLog: * trans-array.cc (gfc_conv_ss_startstride): Set the scalarization rank based on the MINLOC/MAXLOC rank if needed. Call the inline code generation and setup the scalarizer array descriptor info in the MINLOC and MAXLOC cases. * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Return the result array element if the scalarizer is setup and we are inside the loops. Restrict library function call dispatch to the case where inline expansion is not supported. Declare an array result if the expression isn't scalar. Initialize the array result single element and return the result variable if the expression isn't scalar. (walk_inline_intrinsic_minmaxloc): New function. (walk_inline_intrinsic_function): Add MINLOC and MAXLOC cases, dispatching to walk_inline_intrinsic_minmaxloc. (gfc_add_intrinsic_ss_code): Add MINLOC and MAXLOC cases. (gfc_inline_intrinsic_function_p): Return true if ARRAY has rank 1, regardless of DIM.
2024-09-21	fortran: Disable frontend passes for inlinable MINLOC/MAXLOC [PR90608]	Mikael Morin	2	-1/+25
	Disable rewriting of MINLOC/MAXLOC expressions for which inline code generation is supported. Update the gfc_inline_intrinsic_function_p predicate (already existing) for that, with the current state of MINLOC/MAXLOC inlining support, that is only the cases of a scalar result and non-CHARACTER argument for now. This change has no effect currently, as the MINLOC/MAXLOC front-end passes only change expressions of rank 1, but the inlining control predicate gfc_inline_intrinsic_function_p returns false for those. However, later changes will extend MINLOC/MAXLOC inline expansion support to array expressions and update the inlining control predicate, and this will become effective. PR fortran/90608 gcc/fortran/ChangeLog: * frontend-passes.cc (optimize_minmaxloc): Skip if we can generate inline code for the unmodified expression. * trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Add MINLOC and MAXLOC cases.
2024-09-21	fortran: Add tests covering inline MINLOC/MAXLOC without DIM [PR90608]	Mikael Morin	6	-0/+1249
	Add the tests covering the various cases for which we are about to implement inline expansion of MINLOC and MAXLOC. Those are cases where the DIM argument is not present. PR fortran/90608 gcc/testsuite/ChangeLog: * gfortran.dg/ieee/maxloc_nan_1.f90: New test. * gfortran.dg/ieee/minloc_nan_1.f90: New test. * gfortran.dg/maxloc_7.f90: New test. * gfortran.dg/maxloc_with_mask_1.f90: New test. * gfortran.dg/minloc_8.f90: New test. * gfortran.dg/minloc_with_mask_1.f90: New test.
2024-09-21	modula2: Tidyup remove unnecessary parameters	Gaius Mulley	1	-6/+6
	This patch removes ununsed parameters from gm2-compiler/M2Comp.mod. gcc/m2/ChangeLog: * gm2-compiler/M2Comp.mod (GenerateDependencies): Remove unused parameter. (WriteDep): Remove parameter dep. (WritePhoneDep): Ditto. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>