aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2024-09-24Fortran: Allow to nullify caf token when not in ultimate component. [PR101100]Andre Vehreschild2-1/+36
gcc/fortran/ChangeLog: PR fortran/101100 * trans-expr.cc (trans_caf_token_assign): Take caf-token from decl for non ultimate coarray components. gcc/testsuite/ChangeLog: * gfortran.dg/coarray/proc_pointer_assign_1.f90: New test.
2024-09-24build: enable C++11 narrowing warningsJason Merrill14-39/+40
We've been using -Wno-narrowing since gcc 4.7, but at this point narrowing diagnostics seem like a stable part of C++ and we should adjust. This patch changes -Wno-narrowing to -Wno-error=narrowing so that narrowing issues will still not break bootstrap, but we can see them. The rest of the patch fixes the narrowing warnings I see in an x86_64-pc-linux-gnu bootstrap. In most of the cases, by adjusting the types of various declarations so that we store the values in the same types we compute them in, which seems worthwhile anyway. This also allowed us to remove a few -Wsign-compare casts. gcc/ChangeLog: * configure.ac (CXX_WARNING_OPTS): Change -Wno-narrowing to -Wno-error=narrowing. * configure: Regenerate. * config/i386/i386.h (debugger_register_map) (debugger64_register_map) (svr4_debugger_register_map): Make unsigned. * config/i386/i386.cc: Likewise. * diagnostic-event-id.h (diagnostic_thread_id_t): Make int. * vec.h (vec::size): Make unsigned int. * ipa-modref.cc (escape_point::arg): Make unsigned. (modref_lattice::add_escape_point): Use eaf_flags_t. (update_escape_summary_1): Use eaf_flags_t, && for bool. * pair-fusion.cc (pair_fusion_bb_info::track_access): Make mem_size unsigned int. * pretty-print.cc (format_phase_2): Cast va_arg to char. * tree-ssa-loop-ch.cc (ch_base::copy_headers): Make nheaders unsigned, remove cast. * tree-ssa-structalias.cc (bitpos_of_field): Return unsigned. (push_fields_onto_fieldstack):Make offset unsigned, remove cast. * tree-vect-slp.cc (vect_prologue_cost_for_slp): Use nelt_limit. * tree-vect-stmts.cc (vect_truncate_gather_scatter_offset): Make scale unsigned. (vectorizable_operation): Make ncopies unsigned. * rtl-ssa/member-fns.inl: Make num_accesses unsigned int.
2024-09-24Fortran: Assign allocated caf-memory to scalar members [PR84870]Andre Vehreschild2-0/+26
Allocating a coarray required an array-descriptor. For scalars a temporary descriptor was created. Assigning the allocated memory from the temporary descriptor back to the scalar is now added. gcc/fortran/ChangeLog: PR fortran/84870 * trans-array.cc (duplicate_allocatable_coarray): For scalar allocatable components the memory allocated is now assigned to the component's pointer. gcc/testsuite/ChangeLog: * gfortran.dg/coarray/alloc_comp_10.f90: New test.
2024-09-24tree-optimization/114855 - more update_ssa speedupRichard Biener1-0/+5
The following tackles another source of slow bitmap operations, namely populating blocks_to_update. We already have that in tree view around PHI insertion but also the initial population is slow. There's unfortunately a conditional inbetween list view requirement and the bitmap API doesn't allow opportunistic switching but rejects tree -> tree or list -> list transitions. So the following patch wraps the early population in a tree view section with possibly one redundant tree -> list -> tree view transition. This cuts tree SSA incremental from 228.25s (21%) to 65.05s (7%). PR tree-optimization/114855 * tree-into-ssa.cc (update_ssa): Use tree view for the initial population of blocks_to_update.
2024-09-24OpenMP: Add support for 'self_maps' to the 'require' directiveTobias Burnus17-18/+118
'self_maps' implies 'unified_shared_memory', except that the latter also permits that explicit maps copy data to device memory while self_maps does not. In GCC, currently, both are handled identical. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_requires): Handle self_maps clause. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_requires): Handle self_maps clause. gcc/fortran/ChangeLog: * gfortran.h (enum gfc_omp_requires_kind): Add OMP_REQ_SELF_MAPS. (gfc_namespace): Enlarge omp_requires bitfield. * module.cc (enum ab_attribute, attr_bits): Add AB_OMP_REQ_SELF_MAPS. (mio_symbol_attribute): Handle it. * openmp.cc (gfc_check_omp_requires, gfc_match_omp_requires): Handle self_maps clause. * parse.cc (gfc_parse_file): Handle self_maps clause. gcc/ChangeLog: * lto-cgraph.cc (output_offload_tables, omp_requires_to_name): Handle self_maps clause. * omp-general.cc (struct omp_ts_info, omp_context_selector_matches): Likewise for the associated trait. * omp-general.h (enum omp_requires): Add OMP_REQUIRES_SELF_MAPS. * omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_IMPLEMENTATION_SELF_MAPS. include/ChangeLog: * gomp-constants.h (GOMP_REQUIRES_SELF_MAPS): #define. libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Accept self_maps clause. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Likewise. * libgomp.texi (TR13 Impl. Status): Set to 'Y'. * target.c (gomp_requires_to_name, GOMP_offload_register_ver, gomp_target_init): Handle self_maps clause. * testsuite/libgomp.fortran/self_maps.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/declare-variant-1.c: Add self_maps test. * c-c++-common/gomp/requires-4.c: Likewise. * gfortran.dg/gomp/declare-variant-3.f90: Likewise. * c-c++-common/gomp/requires-2.c: Update dg-error msg. * gfortran.dg/gomp/requires-2.f90: Likewise. * gfortran.dg/gomp/requires-self-maps-aux.f90: New. * gfortran.dg/gomp/requires-self-maps.f90: New.
2024-09-24Testsuite, darwin: account for macOS 15Francois-Xavier Coudert1-0/+1
gcc/testsuite/ChangeLog: * gcc.dg/darwin-minversion-link.c: Account for macOS 15.
2024-09-24tree-optimization/115372 - failed store-lanes in some casesRichard Biener1-0/+18
The gcc.target/riscv/rvv/autovec/struct/struct_vect-4.c testcase shows that we sometimes fail to use store-lanes even though it should be profitable. We're currently relying on vect_slp_prefer_store_lanes_p at the point we run into the first SLP discovery mismatch with obviously limited information. For the case at hand we have 3, 5 or 7 lanes of VnDImode [2, 2] vectors with the first mismatch at lane 2 so the new group size is 1. The heuristic says that might be an OK split given the rest is a multiple of the vector lanes. Now we continue discovery but in the end mismatches result in uniformly single-lane SLP instances which we can handle via interleaving but of course are prime candidates for store-lanes. The following patch re-assesses with the extra knowledge now just relying on the fact whether the target supports store-lanes for the given group size. PR tree-optimization/115372 * tree-vect-slp.cc (vect_build_slp_instance): Compute the uniform, if, number of lanes of the RHS sub-graphs feeding the store and if uniformly one, use store-lanes if the target supports that.
2024-09-24tree-optimization/114855 - high update_ssa timeRichard Biener1-36/+8
Part of the problem in PR114855 is high update_ssa time. When one fixes the backward jump threading issue tree SSA incremental is at 439.91s ( 26%), mostly doing bitmap element searches for blocks_with_phis_to_rewrite. The following turns that bitmap to tree view noticing the two-dimensional vector of PHIs it guards is excessive compared to what we actually save with it - walking all PHI nodes in a block, something we already do once to initialize stmt flags. So instead of optimizing that walk we use the stmt flag, saving allocations and global state that lives throughout the whole compilation. This reduces the tree SSA incremental time to 203.13 ( 14%) The array was added in r0-74758-g2ce798794df8e1 when we still possibly had gazillion virtual operands for PR26830, I checked the testcase still behaves OK. PR tree-optimization/114855 * tree-into-ssa.cc (phis_to_rewrite): Remove global var. (mark_phi_for_rewrite): Simplify. (rewrite_update_phi_arguments): Walk all PHIs, process those satisfying rewrite_uses_p. (delete_update_ssa): Simplify. (update_ssa): Likewise. Switch blocks_with_phis_to_rewrite to tree view.
2024-09-24hosthooks.h: Fix GCC_HOST_HOOKS_H typoYangyu Chen1-1/+1
The comment of the final endif in hosthooks.h is wrong, it should be GCC_HOST_HOOKS_H instead of GCC_LANG_HOOKS_H. gcc/ChangeLog: * hosthooks.h (struct host_hooks): Fix GCC_HOST_HOOKS_H typo. Signed-off-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn>
2024-09-24nvptx: Partial support for aliases to aliases.Prathamesh Kulkarni2-5/+25
For the following test (adapted from pr96390.c): __attribute__((noipa)) int foo () { return 42; } int bar () __attribute__((alias ("foo"))); int baz () __attribute__((alias ("bar"))); int main () { int n; #pragma omp target map(from:n) n = baz (); return n; } gcc emits following ptx for baz: .visible .func (.param.u32 %value_out) bar; .alias bar,foo; .visible .func (.param.u32 %value_out) baz; .alias baz,bar; which is incorrect since PTX requires aliasee to be a defined function. The patch instead uses cgraph_node::get(name)->ultimate_alias_target, which generates the following PTX: .visible .func (.param.u32 %value_out) baz; .alias baz,foo; gcc/ChangeLog: PR target/104957 * config/nvptx/nvptx.cc (nvptx_asm_output_def_from_decls): Use cgraph_node::get(name)->ultimate_alias_target instead of value. gcc/testsuite/ChangeLog: PR target/104957 * gcc.target/nvptx/alias-to-alias-1.c: Adjust. Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com> Co-authored-by: Thomas Schwinge <tschwinge@baylibre.com>
2024-09-24Daily bump.GCC Administrator5-1/+281
2024-09-24modula2: Add noreturn attribute to m2/gm2-libs/M2RTS.modGaius Mulley3-2/+4
This patch removes a build warning by adding a noreturn attribute to the M2RTS.mod:HaltC procedure. Also add an infinite loop to gm2-libs-min/M2RTS.mod. gcc/m2/ChangeLog: * Make-lang.in (m2/gm2-libs-boot/M2RTS.o): Remove --suppress-noreturn. * gm2-libs/M2RTS.mod (HaltC): Add noreturn attribute. * gm2-libs-min/M2RTS.mod (HALT): Add LOOP END. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2024-09-23c++: diagnose this specifier in requires expr [PR116798]Marek Polacek2-3/+18
We don't detect an explicit object parameter in a requires expression. We can get there by way of requires-expression -> requirement-parameter-list -> parameter-declaration-clause -> ... -> parameter-declaration with this[opt]. But [dcl.fct]/5 doesn't allow an explicit object parameter in this context. So let's fix it like r14-9033 and not like r14-8832. PR c++/116798 gcc/cp/ChangeLog: * parser.cc (cp_parser_parameter_declaration): Detect an explicit object parameter in a requires expression. gcc/testsuite/ChangeLog: * g++.dg/cpp23/explicit-obj-diagnostics12.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2024-09-23aarch64: Add codegen support for AdvSIMD faminmaxSaurabh Jha5-0/+693
The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and mandatory from Armv9.5-a. It introduces instructions for computing the floating point absolute maximum and minimum of the two vectors element-wise. This patch adds code generation support for famax and famin in terms of existing RTL operators. famax/famin is equivalent to first taking abs of the operands and then taking smax/smin on the results of abs. famax/famin (a, b) = smax/smin (abs (a), abs (b)) This fusion of operators is only possible when -march=armv9-a+faminmax flags are passed. We also need to pass -ffast-math flag; if we don't, then a statement like c[i] = __builtin_fmaxf16 (a[i], b[i]); is RTL expanded to UNSPEC_FMAXNM instead of smax (likewise for smin). This code generation is only available on -O2 or -O3 as that is when auto-vectorization is enabled. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (*aarch64_faminmax_fused): Instruction pattern for faminmax codegen. * config/aarch64/iterators.md: Attribute for faminmax codegen. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/faminmax-codegen-no-flag.c: New test. * gcc.target/aarch64/simd/faminmax-codegen.c: New test. * gcc.target/aarch64/simd/faminmax-no-codegen.c: New test.
2024-09-23aarch64: Add AdvSIMD faminmax intrinsicsSaurabh Jha9-0/+294
The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and mandatory from Armv9.5-a. It introduces instructions for computing the floating point absolute maximum and minimum of the two vectors element-wise. This patch introduces AdvSIMD faminmax intrinsics. The intrinsics of this extension are implemented as the following builtin functions: * vamax_f16 * vamaxq_f16 * vamax_f32 * vamaxq_f32 * vamaxq_f64 * vamin_f16 * vaminq_f16 * vamin_f32 * vaminq_f32 * vaminq_f64 We are defining a new way to add AArch64 AdvSIMD intrinsics by listing all the intrinsics in a .def file and then using that .def file to initialise various data structures. This would lead to more concise code and easier addition of the new AdvSIMD intrinsics in future. The faminmax intrinsics are defined using the new approach. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (ENTRY): Macro to parse the contents of aarch64-simd-pragma-builtins.def. (ENTRY_VHSDF): Macro to parse the contents of aarch64-simd-pragma-builtins.def. (enum aarch64_builtins): New enum values for faminmax builtins via aarch64-simd-pragma-builtins.def. (enum class aarch64_builtin_signatures): Enum class to specify the number of operands a builtin will take. (struct aarch64_pragma_builtins_data): Struct to hold data from aarch64-simd-pragma-builtins.def. (aarch64_fntype): New function to define function types of intrinsics given an object of type aarch64_pragma_builtins_data. (aarch64_init_pragma_builtins): New function to define pragma builtins. (aarch64_get_pragma_builtin): New function to get a row of aarch64_pragma_builtins, given code. (handle_arm_neon_h): Modify to call aarch64_init_pragma_builtins. (aarch64_general_check_builtin_call): Modify to check whether required flag is being used for pragma builtins. (aarch64_expand_pragma_builtin): New function to emit instructions of pragma_builtin. (aarch64_general_expand_builtin): Modify to call aarch64_expand_pragma_builtin. * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): Introduce new flag for this extension. * config/aarch64/aarch64-simd.md (@aarch64_<faminmax_uns_op><mode>): Instruction pattern for faminmax intrinsics. * config/aarch64/aarch64.h (TARGET_FAMINMAX): Introduce new flag for this extension. * config/aarch64/iterators.md: New iterators and unspecs. * doc/invoke.texi: Document extension in AArch64 Options. * config/aarch64/aarch64-simd-pragma-builtins.def: New file to list pragma builtins. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/faminmax-builtins-no-flag.c: New test. * gcc.target/aarch64/simd/faminmax-builtins.c: New test.
2024-09-23dwarf2: store the RA state in CFI rowMatthieu Longo1-6/+18
On AArch64, the RA state informs the unwinder whether the return address is mangled and how, or not. This information is encoded in a boolean in the CFI row. This binary approach prevents from expressing more complex configuration, as it is the case with PAuth_LR introduced in Armv9.5-A. This patch addresses this limitation by replacing the boolean by an enum. gcc/ChangeLog: * dwarf2cfi.cc (struct dw_cfi_row): Declare a new enum type to replace ra_mangled. (cfi_row_equal_p): Use ra_state instead of ra_mangled. (dwarf2out_frame_debug_cfa_negate_ra_state): Same. (change_cfi_row): Same.
2024-09-23aarch64 testsuite: explain expectections for pr94515* testsMatthieu Longo2-6/+41
gcc/testsuite/ChangeLog: * g++.target/aarch64/pr94515-1.C: Improve test documentation. * g++.target/aarch64/pr94515-2.C: Same.
2024-09-23dwarf2: add hooks for architecture-specific CFIsMatthieu Longo13-38/+157
Architecture-specific CFI directives are currently declared an processed among others architecture-independent CFI directives in gcc/dwarf2* files. This approach creates confusion, specifically in the case of DWARF instructions in the vendor space and using the same instruction code. Such a clash currently happen between DW_CFA_GNU_window_save (used on SPARC) and DW_CFA_AARCH64_negate_ra_state (used on AArch64), and both having the same instruction code 0x2d. Then AArch64 compilers generates a SPARC CFI directive (.cfi_window_save) instead of .cfi_negate_ra_state, contrarilly to what is expected in [DWARF for the Arm 64-bit Architecture (AArch64)](https://github.com/ ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst). This refactoring does not solve completely the problem, but improve the situation by moving some of the processing of those directives (more specifically their output in the assembly) to the backend via 2 target hooks: - DW_CFI_OPRND1_DESC: parse the first operand of the directive (if any). - OUTPUT_CFI_DIRECTIVE: output the CFI directive as a string. Additionally, this patch also contains a renaming of an enum used for return address mangling on AArch64. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_output_cfi_directive): New hook for CFI directives. (aarch64_dw_cfi_oprnd1_desc): Same. (TARGET_OUTPUT_CFI_DIRECTIVE): Hook for output_cfi_directive. (TARGET_DW_CFI_OPRND1_DESC): Hook for dw_cfi_oprnd1_desc. * config/sparc/sparc.cc (sparc_output_cfi_directive): New hook for CFI directives. (sparc_dw_cfi_oprnd1_desc): Same. (TARGET_OUTPUT_CFI_DIRECTIVE): Hook for output_cfi_directive. (TARGET_DW_CFI_OPRND1_DESC): Hook for dw_cfi_oprnd1_desc. * coretypes.h (struct dw_cfi_node): Forward declaration of CFI type from gcc/dwarf2out.h. (enum dw_cfi_oprnd_type): Same. (enum dwarf_call_frame_info): Same. * doc/tm.texi: Regenerated from doc/tm.texi.in. * doc/tm.texi.in: Add doc for new target hooks. type of enum to allow forward declaration. * dwarf2cfi.cc (struct dw_cfi_row): Update the description for window_save and ra_mangled. (dwarf2out_frame_debug_cfa_negate_ra_state): Use AArch64 CFI directive instead of the SPARC one. (change_cfi_row): Use the right CFI directive's name for RA mangling. (output_cfi): Remove explicit architecture-specific CFI directive DW_CFA_GNU_window_save that falls into default case. (output_cfi_directive): Use target hook as default. * dwarf2out.cc (dw_cfi_oprnd1_desc): Use target hook as default. * dwarf2out.h (enum dw_cfi_oprnd_type): specify underlying type of enum to allow forward declaration. (dw_cfi_oprnd1_desc): Call target hook. (output_cfi_directive): Use dw_cfi_ref instead of struct dw_cfi_node *. * hooks.cc (hook_bool_dwcfi_dwcfioprndtyperef_false): New. (hook_bool_FILEptr_dwcfiptr_false): New. * hooks.h (hook_bool_dwcfi_dwcfioprndtyperef_false): New. (hook_bool_FILEptr_dwcfiptr_false): New. * target.def: Documentation for new hooks. include/ChangeLog: * dwarf2.h (enum dwarf_call_frame_info): specify underlying libffi/ChangeLog: * include/ffi_cfi.h (cfi_negate_ra_state): Declare AArch64 cfi directive. libgcc/ChangeLog: * config/aarch64/aarch64-asm.h (PACIASP): Replace SPARC CFI directive by AArch64 one. (AUTIASP): Same. libitm/ChangeLog: * config/aarch64/sjlj.S: Replace SPARC CFI directive by AArch64 one. gcc/testsuite/ChangeLog: * g++.target/aarch64/pr94515-1.C: Replace SPARC CFI directive by AArch64 one. * g++.target/aarch64/pr94515-2.C: Same.
2024-09-23Rename REG_CFA_TOGGLE_RA_MANGLE to REG_CFA_NEGATE_RA_STATEMatthieu Longo4-11/+11
The current name REG_CFA_TOGGLE_RA_MANGLE is not representative of what it really is, i.e. a register to represent several states, not only a binary one. Same for dwarf2out_frame_debug_cfa_toggle_ra_mangle. gcc/ChangeLog: * combine-stack-adj.cc (no_unhandled_cfa): Rename. * config/aarch64/aarch64.cc (aarch64_expand_prologue): Rename. (aarch64_expand_epilogue): Rename. * dwarf2cfi.cc (dwarf2out_frame_debug_cfa_toggle_ra_mangle): Rename this... (dwarf2out_frame_debug_cfa_negate_ra_state): To this. (dwarf2out_frame_debug): Rename. * reg-notes.def (REG_CFA_NOTE): Rename REG_CFA_TOGGLE_RA_MANGLE.
2024-09-23OpenMP: Fix omp_get_device_from_uid, minor cleanupTobias Burnus1-2/+2
In Fortran, omp_get_device_from_uid can also accept substrings, which are then not NUL terminated. Fixed by introducing a fortran.c wrapper function. Additionally, in case of a fail the plugin functions now return NULL instead of failing fatally such that a fall-back UID is generated. gcc/ChangeLog: * omp-general.cc (omp_runtime_api_procname): Strip "omp_" from string; move get_device_from_uid as now a '_' suffix exists. libgomp/ChangeLog: * fortran.c (omp_get_device_from_uid_): New function. * libgomp.map (GOMP_6.0): Add it. * oacc-host.c (host_dispatch): Init '.uid' and '.get_uid_func'. * omp_lib.f90.in: Make it used by removing bind(C). * omp_lib.h.in: Likewise. * target.c (omp_get_device_from_uid): Ensure the device is initialized. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): Add function comment; return NULL in case of an error. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): Likewise. * testsuite/libgomp.fortran/device_uid.f90: Update to test substrings.
2024-09-23arc: Remove mlra option [PR113954]Claudiu Zissulescu4-18/+4
The target dependent mlra option was designed to be able to quickly switch between LRA and reload. The reload register allocator step is scheduled for retirement, thus, remove the functionality of mlra, keeping it for backward compatibility. PR target/113954 gcc/ChangeLog: * config/arc/arc.cc (TARGET_LRA_P): Always return true. (arc_lra_p): Remove. * config/arc/arc.h (TARGET_LRA): Remove. * config/arc/arc.opt (mlra): Change it to do nothing. * doc/invoke.texi (mlra): Update option description. Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>
2024-09-23c++: Don't crash when mangling member with anonymous union or template type ↵Simon Martin6-1/+90
[PR100632, PR109790] We currently crash upon mangling members that have an anonymous union or a template operator type. The problem is that before calling write_unqualified_name, write_member_name asserts that it has a declaration whose DECL_NAME is an identifier node that is not that of an operator. This is wrong: - In PR100632, it's an anonymous union declaration, hence a 0 DECL_NAME - In PR109790, it's a legitimate template declaration for an operator (this was accepted up to GCC 10) This assert was added via r11-6301, to be sure that we do write the "on" marker for operator members. This patch removes that assert and instead - Lets members with an anonymous union type go through - For operators, adds the missing "on" marker for ABI versions greater than the highest usable with GCC 10 PR c++/109790 PR c++/100632 gcc/cp/ChangeLog: * mangle.cc (write_member_name): Handle members whose type is an anonymous union member. Write missing "on" marker for operators when ABI version is at least 16. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/decltype83.C: New test. * g++.dg/cpp0x/decltype83a.C: New test. * g++.dg/cpp1y/lambda-ice3.C: New test. * g++.dg/cpp1y/lambda-ice3a.C: New test. * g++.dg/cpp2a/nontype-class67.C: New test.
2024-09-23c++: Don't ICE due to artificial constructor parameters [PR116722]Simon Martin2-1/+25
The following code triggers an ICE === cut here === class base {}; class derived : virtual public base { public: template<typename Arg> constexpr derived(Arg) {} }; int main() { derived obj(1.); } === cut here === The problem is that cxx_bind_parameters_in_call ends up attempting to convert a REAL_CST (the first non artificial parameter) to INTEGER_TYPE (the type of the __in_chrg parameter), which ICEs. This patch changes cxx_bind_parameters_in_call to return early if it's called with a *structor that has an __in_chrg or __vtt_parm parameter since the expression won't be a constant expression. Note that in the test case, the constructor is not constexpr-suitable, however it's OK since it's a template according to my read of paragraph (3) of [dcl.constexpr]. PR c++/116722 gcc/cp/ChangeLog: * constexpr.cc (cxx_bind_parameters_in_call): Leave early for {con,de}structors of classes with virtual bases. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/constexpr-ctor22.C: New test.
2024-09-23tree-optimization/116810 - out-of-bound access to matches[]Richard Biener1-1/+1
The following makes sure to apply forced splitting of groups for firced single-lane SLP only when the group being analyzed has more than one lane. This avoids an out-of-bound access to matches[]. PR tree-optimization/116810 * tree-vect-slp.cc (vect_build_slp_instance): Onlu force splitting for group_size > 1.
2024-09-23tree-optimization/116796 - virtual LC SSA broken after unrollingRichard Biener1-4/+6
When the unroller unloops loops it tracks whether it changes any nesting relationship of remaining loops but when scanning a loops preheader it fails to pass down the LC-SSA-invalidated bitmap, losing the fact that an unrolled formerly inner loop can now be placed on an exit of its outer loop. The following fixes that. PR tree-optimization/116796 * cfgloopmanip.cc (fix_loop_placements): Get LC-SSA-invalidated bitmap and pass it on. (remove_path): Pass LC-SSA-invalidated to fix_loop_placements.
2024-09-23middle-end: Insert invariant instructions before the gsi [PR116812]Tamar Christina2-4/+19
The new invariant statements should be inserted before the current statement and not after. This goes fine 99% of the time but when the current statement is a gcond the control flow gets corrupted. gcc/ChangeLog: PR tree-optimization/116812 * tree-vect-slp.cc (vect_slp_region): Fix insertion. gcc/testsuite/ChangeLog: PR tree-optimization/116812 * gcc.dg/vect/pr116812.c: New test.
2024-09-23tree-optimization/116791 - Elementwise SLP vectorizationRichard Biener2-6/+37
The following restricts the elementwise SLP vectorization to the single-lane case which is the reason I enabled it to avoid regressions with non-SLP. The PR shows that multi-line SLP loads with elementwise accesses require work, I'll open a new bug to track this for the future. PR tree-optimization/116791 * tree-vect-stmts.cc (get_group_load_store_type): Only fall back to elementwise access for single-lane SLP, restore hard failure mode for other cases. * gcc.dg/vect/pr116791.c: New testcase.
2024-09-23gcn/mkoffload.cc: Re-add fprintf for #include of stdlib.h/stdbool.hTobias Burnus1-0/+6
In commit r15-3629-g508ef585243d4674d06b0737bfe8769fc18f824f, #embed was added and no longer required fprintf '#include' removed, missing somehow that with -mstack-size=, the generated configure_stack_size will use 'setenv' and 'true'. gcc/ChangeLog: * config/gcn/mkoffload.cc (process_asm): (Re)add the fprintf lines for stdlib.h/stdbool.h inclusion if gcn_stack_size is used.
2024-09-23Genmatch: Fix ICE for binary phi cfg mismatching [PR116795]Pan Li2-1/+15
This patch would like to fix one ICE when try to match the binary phi for below cfg. We check the first edge of the Phi block comes from b0, instead of check the only one edge of b1 comes from the b0 too. Thus, it will result in some code to be recog as .SAT_SUB but it is not, and finally result the verify_ssa failure. +------+ | b0: | | def | +-----+ | ... | | b1: | | cond |------>| def | +------+ | ... | | +-----+ | | | | v | +-----+ | | b2: | | | Phi |<----------+ +-----+ The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. PR target/116795 gcc/ChangeLog: * gimple-match-head.cc (match_cond_with_binary_phi): Fix the incorrect cfg check as b0->b1 in above example. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr116795-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-23gimple: Simplify gimple_seq_nondebug_singleton_pAndrew Pinski1-21/+2
The implementation of gimple_seq_nondebug_singleton_p was convoluted on how to determine if the sequence was a singleton (which could contain debug statements). This simplifies the function into two calls. One to get the start after all of the debug statements and then check to see if it is at the one before the end (or there is only debug statements afterwards). Bootstrapped and tested on x86_64-linux-gnu (including ada). gcc/ChangeLog: * gimple-iterator.h (gimple_seq_nondebug_singleton_p): Rewrite to be simplely, gsi_start_nondebug/gsi_one_nondebug_before_end_p. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-23gimple: Remove custom remove_pointerAndrew Pinski1-6/+2
Since r11-2700-g22dc89f8073cd0, type_traits has been included via system.h so we don't need a custom version for gimple.h. Note a small C++14 cleanup is to use remove_pointer_t directly here instead of remove_pointer<t>::type. bootstrapped and tested on x86_64-linux-gnu gcc/ChangeLog: * gimple.h (remove_pointer): Remove. (GIMPLE_CHECK2): Use std::remove_pointer instead of custom one. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-23Remove commented out PHI_ARG_DEF macro defitionAndrew Pinski1-3/+0
This was commented out since r0-125500-g80560f9521f81a and a new defition was added at the same time. Let's remove the commented out version. gcc/ChangeLog: * tree-ssa-operands.h (PHI_ARG_DEF): Remove definition. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-23Match: Support form 2 for vector signed integer .SAT_ADDPan Li1-0/+16
This patch would like to support the form 2 of the vector signed integer .SAT_ADD. Aka below example: Form 2: #define DEF_VEC_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ T x = op_1[i]; \ T y = op_2[i]; \ T sum = (UT)x + (UT)y; \ if ((x ^ y) < 0 || (sum ^ x) >= 0) \ out[i] = sum; \ else \ out[i] = x < 0 ? MIN : MAX; \ } \ } DEF_VEC_SAT_S_ADD_FMT_2(int8_t, uint8_t, INT8_MIN, INT8_MAX) Before this patch: 104 │ loop_len_79 = MIN_EXPR <ivtmp.51_53, POLY_INT_CST [16, 16]>; 105 │ _50 = &MEM <vector([16,16]) signed char> [(int8_t *)vectp_op_1.9_77]; 106 │ vect_x_18.11_80 = .MASK_LEN_LOAD (_50, 8B, { -1, ... }, loop_len_79, 0); 107 │ _70 = vect_x_18.11_80 >> 7; 108 │ vect_x.12_81 = VIEW_CONVERT_EXPR<vector([16,16]) unsigned char>(vect_x_18.11_80); 109 │ _26 = (void *) ivtmp.47_20; 110 │ _27 = &MEM <vector([16,16]) signed char> [(int8_t *)_26]; 111 │ vect_y_20.15_84 = .MASK_LEN_LOAD (_27, 8B, { -1, ... }, loop_len_79, 0); 112 │ vect__7.21_90 = vect_x_18.11_80 ^ vect_y_20.15_84; 113 │ mask__50.23_92 = vect__7.21_90 >= { 0, ... }; 114 │ vect_y.16_85 = VIEW_CONVERT_EXPR<vector([16,16]) unsigned char>(vect_y_20.15_84); 115 │ vect__6.17_86 = vect_x.12_81 + vect_y.16_85; 116 │ vect_sum_21.18_87 = VIEW_CONVERT_EXPR<vector([16,16]) signed char>(vect__6.17_86); 117 │ vect__8.19_88 = vect_x_18.11_80 ^ vect_sum_21.18_87; 118 │ mask__45.20_89 = vect__8.19_88 < { 0, ... }; 119 │ mask__44.24_93 = mask__45.20_89 & mask__50.23_92; 120 │ _40 = .COND_XOR (mask__44.24_93, _70, { 127, ... }, vect_sum_21.18_87); 121 │ _60 = (void *) ivtmp.49_6; 122 │ _61 = &MEM <vector([16,16]) signed char> [(int8_t *)_60]; 123 │ .MASK_LEN_STORE (_61, 8B, { -1, ... }, loop_len_79, 0, _40); 124 │ vectp_op_1.9_78 = vectp_op_1.9_77 + POLY_INT_CST [16, 16]; 125 │ ivtmp.47_4 = ivtmp.47_20 + POLY_INT_CST [16, 16]; 126 │ ivtmp.49_21 = ivtmp.49_6 + POLY_INT_CST [16, 16]; 127 │ ivtmp.51_98 = ivtmp.51_53; 128 │ ivtmp.51_8 = ivtmp.51_53 + POLY_INT_CST [18446744073709551600, 18446744073709551600]; After this patch: 88 │ _103 = .SELECT_VL (ivtmp_101, POLY_INT_CST [16, 16]); 89 │ vect_x_18.11_90 = .MASK_LEN_LOAD (vectp_op_1.9_88, 8B, { -1, ... }, _103, 0); 90 │ vect_y_20.14_94 = .MASK_LEN_LOAD (vectp_op_2.12_92, 8B, { -1, ... }, _103, 0); 91 │ vect_patt_49.15_95 = .SAT_ADD (vect_x_18.11_90, vect_y_20.14_94); 92 │ .MASK_LEN_STORE (vectp_out.16_97, 8B, { -1, ... }, _103, 0, vect_patt_49.15_95); 93 │ vectp_op_1.9_89 = vectp_op_1.9_88 + _103; 94 │ vectp_op_2.12_93 = vectp_op_2.12_92 + _103; 95 │ vectp_out.16_98 = vectp_out.16_97 + _103; 96 │ ivtmp_102 = ivtmp_101 - _103; The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add the case 3 for signed .SAT_ADD matching. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-23RISC-V: Add testcases for form 2 of signed vector SAT_ADDPan Li9-0/+128
Form 2: #define DEF_VEC_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ T x = op_1[i]; \ T y = op_2[i]; \ T sum = (UT)x + (UT)y; \ if ((x ^ y) < 0 || (sum ^ x) >= 0) \ out[i] = sum; \ else \ out[i] = x < 0 ? MIN : MAX; \ } \ } DEF_VEC_SAT_S_ADD_FMT_2 (int8_t, uint8_t, INT8_MIN, INT8_MAX) The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macro. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-5.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-6.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-7.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-8.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-5.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-6.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-7.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-23testsuite/gfortran.dg/unsigned_22.f90: Add missing close with delete, PR116701Hans-Peter Nilsson1-0/+1
Without this patch, gfortran.dg/unsigned_22.f90 fails for non-effective-target fd_truncate targets, i.e. targets that don't support chsize or ftruncate. See also libgfortran/io/unix.c:raw_truncate. It passes on the first run, but leaves behind a file "fort.10" which is then picked up by subsequent runs, but since that file is to be rewritten, the libgfortran machinery tries to truncate it, which fails. The file always being left behind, is primarily because the test-case lacks a deleting close-statement, apparently accidentally. Incidentally, this "fort.10" artefact is also picked up by gfortran.dg/write_check3.f90 causing that test to fail too, observable as a regression for non-fd_truncate targets since the unsigned_22.f90 introduction. Also, when running e.g. the whole of gfortran.dg/dg.exp, the "fort.10" is later deleted by gfortran.dg/write_direct_eor.f90 (which regardlessly passes), erasing the clue of the cause of the write_check3 failure. Also, running just dg.exp=write_check3.f90 or manually repeating the commands in gfortran.log showed no error. N.B.: this close-statement will not help if unsigned_22 for some reason fails, executing one of the "stop" statements, but that's also the case for many other tests. PR testsuite/116701 * gfortran.dg/unsigned_22.f90: Add missing close with delete.
2024-09-23Daily bump.GCC Administrator3-1/+67
2024-09-23RISC-V: Add testcases for form 4 of signed scalar SAT_ADDPan Li9-0/+200
Form 4: #define DEF_SAT_S_ADD_FMT_4(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_add_##T##_fmt_4 (T x, T y) \ { \ T sum; \ bool overflow = __builtin_add_overflow (x, y, &sum); \ return !overflow ? sum : x < 0 ? MIN : MAX; \ } DEF_SAT_S_ADD_FMT_4 (int64_t, uint64_t, INT64_MIN, INT64_MAX) The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_add-13.c: New test. * gcc.target/riscv/sat_s_add-14.c: New test. * gcc.target/riscv/sat_s_add-15.c: New test. * gcc.target/riscv/sat_s_add-16.c: New test. * gcc.target/riscv/sat_s_add-run-13.c: New test. * gcc.target/riscv/sat_s_add-run-14.c: New test. * gcc.target/riscv/sat_s_add-run-15.c: New test. * gcc.target/riscv/sat_s_add-run-16.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-23RISC-V: Add testcases for form 3 of signed scalar SAT_ADDPan Li9-0/+200
This patch would like to add testcases of the signed scalar SAT_ADD for form 3. Aka: Form 3: #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_add_##T##_fmt_3 (T x, T y) \ { \ T sum; \ bool overflow = __builtin_add_overflow (x, y, &sum); \ return overflow ? x < 0 ? MIN : MAX : sum; \ } DEF_SAT_S_ADD_FMT_3 (int64_t, uint64_t, INT64_MIN, INT64_MAX) The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_add-10.c: New test. * gcc.target/riscv/sat_s_add-11.c: New test. * gcc.target/riscv/sat_s_add-12.c: New test. * gcc.target/riscv/sat_s_add-9.c: New test. * gcc.target/riscv/sat_s_add-run-10.c: New test. * gcc.target/riscv/sat_s_add-run-11.c: New test. * gcc.target/riscv/sat_s_add-run-12.c: New test. * gcc.target/riscv/sat_s_add-run-9.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-22testsuite, coroutines: Add tests for non-supension ramp returns.Iain Sandoe2-0/+256
Although it is most common for the ramp function to see a return when a coroutine first suspends, there are other possibilities. For example all the awaits could be ready - effectively the coroutine will then run to completion and deallocation. Another case is where the first active suspension point causes the current routine to be cancelled and thence destroyed. These cases are tested here. gcc/testsuite/ChangeLog: * g++.dg/coroutines/torture/special-termination-00-sync-completion.C: New test. * g++.dg/coroutines/torture/special-termination-01-self-destruct.C: New test. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-09-22middle-end: lower COND_EXPR into gimple form in vect_recog_bool_patternTamar Christina9-9/+139
Currently the vectorizer cheats when lowering COND_EXPR during bool recog. In the cases where the conditonal is loop invariant or non-boolean it instead converts the operation back into GENERIC and hides much of the operation from the analysis part of the vectorizer. i.e. a ? b : c is transformed into: a != 0 ? b : c however by doing so we can't perform any optimization on the mask as they aren't explicit until quite late during codegen. To fix this this patch lowers booleans earlier and so ensures that we are always in GIMPLE. For when the value is a loop invariant boolean we have to generate an additional conversion from bool to the integer mask form. This is done by creating a loop invariant a ? -1 : 0 with the target mask precision and then doing a normal != 0 comparison on that. To support this the patch also adds the ability to during pattern matching create a loop invariant pattern that won't be seen by the vectorizer and will instead me materialized inside the loop preheader in the case of loops, or in the case of BB vectorization it materializes it in the first BB in the region. gcc/ChangeLog: * tree-vect-patterns.cc (append_inv_pattern_def_seq): New. (vect_recog_bool_pattern): Lower COND_EXPRs. * tree-vect-slp.cc (vect_slp_region): Materialize loop invariant statements. * tree-vect-loop.cc (vect_transform_loop): Likewise. * tree-vect-stmts.cc (vectorizable_comparison_1): Remove VECT_SCALAR_BOOLEAN_TYPE_P handling for vectype. * tree-vectorizer.cc (vec_info::vec_info): Initialize inv_pattern_def_seq. * tree-vectorizer.h (LOOP_VINFO_INV_PATTERN_DEF_SEQ): New. (class vec_info): Add inv_pattern_def_seq. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-conditional_store_1.c: New test. * gcc.dg/vect/vect-conditional_store_5.c: New test. * gcc.dg/vect/vect-conditional_store_6.c: New test.
2024-09-22aarch64: Take into account when VF is higher than known scalar itersTamar Christina10-29/+79
Consider low overhead loops like: void foo (char *restrict a, int *restrict b, int *restrict c, int n) { for (int i = 0; i < 9; i++) { int res = c[i]; int t = b[i]; if (a[i] != 0) res = t; c[i] = res; } } For such loops we use latency only costing since the loop bounds is known and small. The current costing however does not consider the case where niters < VF. So when comparing the scalar vs vector costs it doesn't keep in mind that the scalar code can't perform VF iterations. This makes it overestimate the cost for the scalar loop and we incorrectly vectorize. This patch takes the minimum of the VF and niters in such cases. Before the patch we generate: note: Original vector body cost = 46 note: Vector loop iterates at most 1 times note: Scalar issue estimate: note: load operations = 2 note: store operations = 1 note: general operations = 1 note: reduction latency = 0 note: estimated min cycles per iteration = 1.000000 note: estimated cycles per vector iteration (for VF 32) = 32.000000 note: SVE issue estimate: note: load operations = 5 note: store operations = 4 note: general operations = 11 note: predicate operations = 12 note: reduction latency = 0 note: estimated min cycles per iteration without predication = 5.500000 note: estimated min cycles per iteration for predication = 12.000000 note: estimated min cycles per iteration = 12.000000 note: Low iteration count, so using pure latency costs note: Cost model analysis: vs after: note: Original vector body cost = 46 note: Known loop bounds, capping VF to 9 for analysis note: Vector loop iterates at most 1 times note: Scalar issue estimate: note: load operations = 2 note: store operations = 1 note: general operations = 1 note: reduction latency = 0 note: estimated min cycles per iteration = 1.000000 note: estimated cycles per vector iteration (for VF 9) = 9.000000 note: SVE issue estimate: note: load operations = 5 note: store operations = 4 note: general operations = 11 note: predicate operations = 12 note: reduction latency = 0 note: estimated min cycles per iteration without predication = 5.500000 note: estimated min cycles per iteration for predication = 12.000000 note: estimated min cycles per iteration = 12.000000 note: Increasing body cost to 1472 because the scalar code could issue within the limit imposed by predicate operations note: Low iteration count, so using pure latency costs note: Cost model analysis: gcc/ChangeLog: * config/aarch64/aarch64.cc (adjust_body_cost): Cap VF for low iteration loops. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/asrdiv_4.c: Update bounds. * gcc.target/aarch64/sve/cond_asrd_2.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_6.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_7.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_8.c: Likewise. * gcc.target/aarch64/sve/miniloop_1.c: Likewise. * gcc.target/aarch64/sve/spill_6.c: Likewise. * gcc.target/aarch64/sve/sve_iters_low_1.c: New test. * gcc.target/aarch64/sve/sve_iters_low_2.c: New test.
2024-09-22Daily bump.GCC Administrator5-1/+153
2024-09-21fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]Mikael Morin10-5/+927
Introduce the -finline-intrinsics flag to control from the command line whether to generate either inline code or calls to the functions from the library, for the MINLOC and MAXLOC intrinsics. The flag allows to specify inlining either independently for each intrinsic (either MINLOC or MAXLOC), or all together. For each intrinsic, a default value is set if none was set. The default value depends on the optimization setting: inlining is avoided if not optimizing or if optimizing for size; otherwise inlining is preferred. There is no direct support for this behaviour provided by the .opt options framework. It is obtained by defining three different variants of the flag (finline-intrinsics, fno-inline-intrinsics, finline-intrinsics=) all using the same underlying option variable. Each enum value (corresponding to an intrinsic function) uses two identical bits, and the variable is initialized with alternated bits, so that we can tell whether the value was set or not by checking whether the two bits have different values. PR fortran/90608 gcc/ChangeLog: * flag-types.h (enum gfc_inlineable_intrinsics): New type. gcc/fortran/ChangeLog: * invoke.texi(finline-intrinsics): Document new flag. * lang.opt (finline-intrinsics, finline-intrinsics=, fno-inline-intrinsics): New flags. * options.cc (gfc_post_options): If the option variable controlling the inlining of MAXLOC (respectively MINLOC) has not been set, set it or clear it depending on the optimization option variables. * trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Return false if inlining for the intrinsic is disabled according to the option variable. gcc/testsuite/ChangeLog: * gfortran.dg/minmaxloc_18.f90: New test. * gfortran.dg/minmaxloc_18a.f90: New test. * gfortran.dg/minmaxloc_18b.f90: New test. * gfortran.dg/minmaxloc_18c.f90: New test. * gfortran.dg/minmaxloc_18d.f90: New test.
2024-09-21fortran: Continue MINLOC/MAXLOC second loop where the first stopped [PR90608]Mikael Morin1-2/+31
Continue the second set of loops where the first one stopped in the generated inline MINLOC/MAXLOC code in the cases where the generated code contains two sets of loops. This fixes a regression that was introduced when enabling the generation of inline MINLOC/MAXLOC code with ARRAY of rank greater than 1, no DIM argument, and either non-scalar MASK or floating- point ARRAY. In the cases where two sets of loops are generated as inline MINLOC/MAXLOC code, we previously generated code such as (for rank 2 ARRAY, so with two levels of nesting): for (idx11 in lower1..upper1) { for (idx12 in lower2..upper2) { ... if (...) { ... goto second_loop; } } } second_loop: for (idx21 in lower1..upper1) { for (idx22 in lower2..upper2) { ... } } which means we process the first elements twice, once in the first set of loops and once in the second one. This change avoids this duplicate processing by using a conditional as lower bound for the second set of loops, generating code like: second_loop_entry = false; for (idx11 in lower1..upper1) { for (idx12 in lower2..upper2) { ... if (...) { ... second_loop_entry = true; goto second_loop; } } } second_loop: for (idx21 in (second_loop_entry ? idx11 : lower1)..upper1) { for (idx22 in (second_loop_entry ? idx12 : lower2)..upper2) { ... second_loop_entry = false; } } It was expected that the compiler optimizations would be able to remove the state variable second_loop_entry. It is the case if ARRAY has rank 1 (so without loop nesting), the variable is removed and the loop bounds become unconditional, which restores previously generated code, fully fixing the regression. For larger rank, unfortunately, the state variable and conditional loop bounds remain, but those cases were previously using library calls, so it's not a regression. PR fortran/90608 gcc/fortran/ChangeLog: * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate a set of index variables. Set them using the loop indexes before leaving the first set of loops. Generate a new loop entry predicate. Initialize it. Set it before leaving the first set of loops. Clear it in the body of the second set of loops. For the second set of loops, update each loop lower bound to use the corresponding index variable if the predicate variable is set.
2024-09-21fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608]Mikael Morin3-48/+87
Enable generation of inline MINLOC/MAXLOC code in the case where DIM is not present, and either ARRAY is of floating point type or MASK is an array. Those cases are the remaining bits to fully support inlining of non-CHARACTER MINLOC/MAXLOC without DIM. They are treated together because they generate similar code, the NANs for REAL types being handled a bit like a second level of masking. These are the cases for which we generate two sets of loops. This change affects the code generating the second loop, that was previously accessible only in the cases ARRAY has rank 1 only. The single variable initialization and update are changed to apply to multiple variables, one per dimension. The code generated is as follows (if ARRAY has rank 2): for (idx11 in lower1..upper1) { for (idx12 in lower2..upper2) { ... if (...) { ... goto second_loop; } } } second_loop: for (idx21 in lower1..upper1) { for (idx22 in lower2..upper2) { ... } } This code leads to processing the first elements redundantly, both in the first set of loops and in the second one. The loop over idx22 could start from idx12 the first time it is run, but as it has to start from lower2 for the rest of the runs, this change uses the same bounds for both set of loops for simplicity. In the rank 1 case, this makes the generated code worse compared to the inline code that was generated before. A later change will introduce conditionals to avoid the duplicate processing and restore the generated code in that case. PR fortran/90608 gcc/fortran/ChangeLog: * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Initialize and update all the variables. Put the label and goto in the outermost scalarizer loop. Don't start the second loop where the first stopped. (gfc_inline_intrinsic_function_p): Also return TRUE for array MASK or for any REAL type. gcc/testsuite/ChangeLog: * gfortran.dg/maxloc_bounds_5.f90: Additionally accept error messages reported by the scalarizer. * gfortran.dg/maxloc_bounds_6.f90: Ditto.
2024-09-21fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK [PR90608]Mikael Morin2-7/+10
Enable the generation of inline code for MINLOC/MAXLOC when argument ARRAY is of integral type, DIM is not present, and MASK is present and is scalar (only absent MASK or rank 1 ARRAY were inlined before). Scalar masks are implemented with a wrapping condition around the code one would generate if MASK wasn't present, so they are easy to support once inline code without MASK is working. PR fortran/90608 gcc/fortran/ChangeLog: * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate variable initialization for each dimension in the else branch of the toplevel condition. (gfc_inline_intrinsic_function_p): Return TRUE for scalar MASK. gcc/testsuite/ChangeLog: * gfortran.dg/maxloc_bounds_7.f90: Additionally accept the error message reported by the scalarizer.
2024-09-21fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK [PR90608]Mikael Morin3-57/+165
Enable generation of inline code for the MINLOC and MAXLOC intrinsic, if the ARRAY argument is of integral type and of any rank (only the rank 1 case was previously inlined), and neither DIM nor MASK arguments are present. This needs a few adjustments in gfc_conv_intrinsic_minmaxloc, mainly to replace the single variables POS and OFFSET, with collections of variables, one variable per dimension each. The restriction to integral ARRAY and absent MASK limits the scope of the change to the cases where we generate single loop inline code. The code generation for the second loop is only accessible with ARRAY of rank 1, so it can continue using a single variable. A later change will extend inlining to the double loop cases. There is some bounds checking code that was previously handled by the library, and that needed some changes in the scalarizer to avoid regressing. The bounds check code generation was already supported by the scalarizer, but it was only applying to array reference sections, checking both for array bound violation and for shape conformability between all the involved arrays. With this change, for MINLOC or MAXLOC, enable the conformability check between all the scalarized arrays, and disable the array bound violation check. PR fortran/90608 gcc/fortran/ChangeLog: * trans-array.cc (gfc_conv_ss_startstride): Set the MINLOC/MAXLOC result upper bound using the rank of the ARRAY argument. Ajdust the error message for intrinsic result arrays. Only check array bounds for array references. Move bound check decision code... (bounds_check_needed): ... here as a new predicate. Allow bound check for MINLOC/MAXLOC intrinsic results. * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Change the result array upper bound to the rank of ARRAY. Update the NONEMPTY variable to depend on the non-empty extent of every dimension. Use one variable per dimension instead of a single variable for the position and the offset. Update their declaration, initialization, and update to affect the variable of each dimension. Use the first variable only in areas only accessed with rank 1 ARRAY argument. Set every element of the result using its corresponding variable. (gfc_inline_intrinsic_function_p): Return true for integral ARRAY and absent DIM and MASK. gcc/testsuite/ChangeLog: * gfortran.dg/maxloc_bounds_4.f90: Additionally accept the error message emitted by the scalarizer.
2024-09-21fortran: Outline array bound check generation codeMikael Morin1-154/+143
gcc/fortran/ChangeLog: * trans-array.cc (gfc_conv_ss_startstride): Move array bound check generation code... (add_check_section_in_array_bounds): ... here as a new function.
2024-09-21fortran: Remove MINLOC/MAXLOC frontend optimizationMikael Morin1-58/+0
Remove the frontend pass rewriting calls of MINLOC/MAXLOC without DIM to calls with one-valued DIM enclosed in an array constructor. This transformation was circumventing the limitation of inline MINLOC/MAXLOC code generation to scalar cases only, allowing inline code to be generated if ARRAY had rank 1 and DIM was absent. As MINLOC/MAXLOC has gained support of inline code generation in that case, the limitation is no longer effective, and the transformation no longer necessary. gcc/fortran/ChangeLog: * frontend-passes.cc (optimize_minmaxloc): Remove. (optimize_expr): Remove dispatch to optimize_minmaxloc.
2024-09-21fortran: Inline MINLOC/MAXLOC with no DIM and ARRAY of rank 1 [PR90608]Mikael Morin2-68/+181
Enable inline code generation for the MINLOC and MAXLOC intrinsic, if the DIM argument is not present and ARRAY has rank 1. This case is similar to the case where the result is scalar (DIM present and rank 1 ARRAY), which already supports inline expansion of the intrinsic. Both cases return the same value, with the difference that the result is an array of size 1 if DIM is absent, whereas it's a scalar if DIM is present. So all there is to do for the new case to work is hook the inline expansion with the scalarizer. PR fortran/90608 gcc/fortran/ChangeLog: * trans-array.cc (gfc_conv_ss_startstride): Set the scalarization rank based on the MINLOC/MAXLOC rank if needed. Call the inline code generation and setup the scalarizer array descriptor info in the MINLOC and MAXLOC cases. * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Return the result array element if the scalarizer is setup and we are inside the loops. Restrict library function call dispatch to the case where inline expansion is not supported. Declare an array result if the expression isn't scalar. Initialize the array result single element and return the result variable if the expression isn't scalar. (walk_inline_intrinsic_minmaxloc): New function. (walk_inline_intrinsic_function): Add MINLOC and MAXLOC cases, dispatching to walk_inline_intrinsic_minmaxloc. (gfc_add_intrinsic_ss_code): Add MINLOC and MAXLOC cases. (gfc_inline_intrinsic_function_p): Return true if ARRAY has rank 1, regardless of DIM.