aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-10-14libstdc++: Implement LWG 3564 for ranges::transform_viewJonathan Wakely2-2/+26
The _Iterator<true> type returned by begin() const uses const F& to transform the elements, so it should use const F& to determine the iterator's value_type and iterator_category as well. This was accepted into the WP in July 2022. libstdc++-v3/ChangeLog: * include/std/ranges (transform_view:_Iterator): Use const F& to determine value_type and iterator_category of _Iterator<true>, as per LWG 3564. * testsuite/std/ranges/adaptors/transform.cc: Check value_type and iterator_category. Reviewed-by: Patrick Palka <ppalka@redhat.com>
2024-10-14c++: address deduction and concepts [CWG2918]Jason Merrill4-7/+36
CWG2918 changes deduction from an overload set for the case where multiple candidates succeed and have the same type; previously this made the overload set a non-deduced context, now it succeeds since the result is consistent between the candidates. This is needed for cases of overloading based on requirements, where we want to choose the most constrained overload. I also needed to adjust resolve_address_of_overloaded_function accordingly; we already handled the comparison for template candidates in most_specialized_instantiation, but need to also do the comparison for non-template candidates such as member functions of a class template. CWG 2918 (proposed) gcc/cp/ChangeLog: * cp-tree.h (most_constrained_function): Declare.. * class.cc (resolve_address_of_overloaded_function): Call it. * pt.cc (get_template_for_ordering): Handle list from resolve_address_of_overloaded_function. (most_constrained_function): No longer static. (resolve_overloaded_unification): Always compare type rather than decl. gcc/testsuite/ChangeLog: * g++.dg/DRs/dr2918.C: New test.
2024-10-14OpenACC 'nohost' clause: harmonize ↵Thomas Schwinge1-0/+2
'libgomp.oacc-{c-c++-common,fortran}/routine-nohost-1.*' The test case 'libgomp.oacc-fortran/routine-nohost-1.f90' added in 2021 commit a61f6afbee370785cf091fe46e2e022748528307 "OpenACC 'nohost' clause" was dependend on inlining being enabled, and otherwise ('-fno-inline') failed to optimize/link: /tmp/ccb2hsPd.o: In function `MAIN__._omp_fn.0': routine-nohost-1.f90:(.text+0xf4): undefined reference to `fact_nohost_' However, as of recent commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11 "Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device", we're now properly handling OpenACC/Fortran 'acc_on_device', and may specify '-fno-inline', like done in 'libgomp.oacc-c-c++-common/routine-nohost-1.c'. libgomp/ * testsuite/libgomp.oacc-fortran/routine-nohost-1.f90: Add '-fno-inline'.
2024-10-14libstdc++: Populate generic std::time_get's wide %c format [PR117135]Jonathan Wakely1-4/+4
I missed out the __timepunct<wchar_t> specialization for the "generic" implementation when defining the %c format in r15-4016-gc534e37faccf48. libstdc++-v3/ChangeLog: PR libstdc++/117135 * config/locale/generic/time_members.cc (__timepunct<wchar_t>::_M_initialize_timepunc): Set _M_date_time_format for C locale. Set %Ex formats to the same values as the %x formats.
2024-10-14libstdc++: Constrain std::expected comparisons (P3379R0)Jonathan Wakely8-13/+213
This proposal of mine has been approved by LEWG and forwarded to LWG. I expect it to be voted into the draft without significant changes. libstdc++-v3/ChangeLog: * include/bits/version.def (constrained_equality): Bump value. * include/bits/version.h: Regenerate. * include/std/expected (operator==): Add constraints and noexcept specifiers. * testsuite/20_util/optional/relops/constrained.cc: Adjust check for feature test macro. * testsuite/20_util/pair/comparison_operators/constrained.cc: Likewise. * testsuite/20_util/tuple/comparison_operators/constrained.cc: Likewise. * testsuite/20_util/variant/relops/constrained.cc: Likewise. * testsuite/20_util/expected/equality_constrained.cc: New test.
2024-10-14fold-const: Fix BIT_INSERT_EXPR folding for BYTES_BIG_ENDIAN [PR116997]Andre Vieira2-0/+20
Fix constant folding of BIT_INSER_EXPR for BYTES_BIG_ENDIAN targets. gcc/ChangeLog: PR middle-end/116997 * fold-const.cc (fold_ternary_loc): Fix BIT_INSERT_EXPR constant folding for BYTES_BIG_ENDIAN targets. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr116997.c: New test. Co-authored-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-10-14dce: Use a base common base class for pass_cd_dce and pass_dceAndrew Pinski1-41/+31
The classes pass_dce and pass_cd_dce share the same mechansim for their params and almost the same execute functionality so let's create a new base class which will be used for these two classes and move the common code into the same one. Note update_address_taken_p was updated to be a NSDMI instead of initializing it explicitly in the constructor. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-dce.cc (tree_ssa_dce): Remove. (tree_ssa_cd_dce): Remove. (class pass_dce_base): New class. (class pass_dce): Use pass_dce_base as the base class. (class pass_cd_dce): Likewise. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-10-14dce: add remove_unused_locals conditionally to the todos [PR117096]Andrew Pinski3-12/+19
This is the updated patch with the suggestion from: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665217.html Where we use a second arg/param to set which passes we want to have the remove_unused_locals on the dce. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: PR tree-optimization/117096 * passes.def: Update some of the dce/cd-cde passes setting the 2nd arg to true. Also remove comment about stdarg since dce does it. * tree-ssa-dce.cc (pass_dce): Add remove_unused_locals_p field. Update set_pass_param to allow for 2nd param. Use remove_unused_locals_p in execute to return TODO_remove_unused_locals. (pass_cd_dce): Likewise. * tree-stdarg.cc (pass_data_stdarg): Remove TODO_remove_unused_locals. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-10-14passes: Allow for second param for NEXT_PASSAndrew Pinski3-5/+28
Right now we currently only support 1 parameter for each pass in NEXT_PASS. We also don't error out if someone tries to use more than 1. This adds support for more than one but only to a max of max_number_args (which is currently 2). In the next patch, this will be used for DCE, adding a new parameter. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * gen-pass-instances.awk (END): Handle processing of multiple arguments to NEXT_PASS. Also error out if using more than max_number_args (2). * pass_manager.h (NEXT_PASS_WITH_ARG2): New define. * passes.cc (NEXT_PASS_WITH_ARG2): New define. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-10-14passes: Move #undef to pass-instances.defAndrew Pinski3-20/+7
Like what was done r6-4608-g0aad01985747ab for builtins.def/DEF_BUILTIN, the same should be done for the defines that are used for pass-instances.def. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * gen-pass-instances.awk: Print out the #undefs. * pass_manager.h: Don't #undef INSERT_PASSES_AFTER, PUSH_INSERT_PASSES_WITHIN, POP_INSERT_PASSES, NEXT_PASS, NEXT_PASS_WITH_ARG, and TERMINATE_PASS_LIST. * passes.cc: Likewise. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-10-14libcpp: Fix _Pragma("GCC system_header") [PR114436]Lewis Hyatt4-3/+17
_Pragma("GCC system_header") currently takes effect only partially. It does succeed in updating the line_map, so that checks like in_system_header_at() return correctly, but it does not update pfile->buffer->sysp. One result is that a subsequent #include does not set up the system header state properly for the newly included file, as pointed out in the PR. Fix by propagating the new system header state back to the buffer after processing the pragma. libcpp/ChangeLog: PR preprocessor/114436 * directives.cc (destringize_and_run): If the _Pragma changed the buffer system header state (e.g. because it was "GCC system_header"), propagate that change back to the actual buffer too. gcc/testsuite/ChangeLog: PR preprocessor/114436 * c-c++-common/cpp/pragma-system-header-1.h: New test. * c-c++-common/cpp/pragma-system-header-2.h: New test. * c-c++-common/cpp/pragma-system-header.c: New test.
2024-10-14libcpp: Support extended characters for #pragma {push,pop}_macro [PR109704]Lewis Hyatt11-151/+378
The implementation of #pragma push_macro and #pragma pop_macro has to date made use of an ad-hoc function, _cpp_lex_identifier(), which lexes an identifier out of a string. When support was added for extended characters in identifiers ($, UCNs, or UTF-8), that support was added only for the "normal" way of lexing identifiers out of a cpp_buffer (_cpp_lex_direct) and not for the ad-hoc way. Consequently, extended identifiers are not usable with these pragmas. The logic for lexing identifiers has become more complicated than it was when _cpp_lex_identifier() was written -- it now handles things like \N{} escapes in C++, for instance -- and it no longer seems practical to maintain a redundant code path for lexing identifiers. Address the issue by changing the implementation of #pragma {push,pop}_macro to lex identifiers in the expected way, i.e. by pushing a cpp_buffer and lexing the identifier from there. The existing implementation has some quirks because of the ad-hoc parsing logic. For example: #pragma push_macro("X ") ... #pragma pop_macro("X") will not restore macro X (note the extra space in the first string). However: #pragma push_macro("X ") ... #pragma pop_macro("X ") actually does sucessfully restore "X". This is because the key for looking up the saved macro on the push stack is the original string passed, so the string passed to pop_macro needs to match it exactly. It is not that easy to reproduce this logic in the world of extended characters, given that for example it should be valid to pass a UCN to push_macro, and the corresponding UTF-8 to pop_macro. Given that this aspect of the existing behavior seems unintentional and has no tests (and does not match other implementations), I opted to make the new logic more straightforward. The string passed needs to lex to one token, which must be a valid identifier, or else no action is taken and no error is generated. Any diagnostics encountered during lexing (e.g., due to a UTF-8 character not permitted to appear in an identifier) are also suppressed. It could be nice (for GCC 15) to also add a warning if a pop_macro does not match a previous push_macro. libcpp/ChangeLog: PR preprocessor/109704 * include/cpplib.h (class cpp_auto_suppress_diagnostics): New class. * errors.cc (cpp_auto_suppress_diagnostics::cpp_auto_suppress_diagnostics): New function. (cpp_auto_suppress_diagnostics::~cpp_auto_suppress_diagnostics): New function. * charset.cc (noop_diagnostic_cb): Remove. (cpp_interpret_string_ranges): Refactor diagnostic suppression logic into new class cpp_auto_suppress_diagnostics. (count_source_chars): Likewise. * directives.cc (cpp_pop_definition): Add cpp_hashnode argument. (lex_identifier_from_string): New static helper function. (push_pop_macro_common): Refactor common logic from do_pragma_push_macro and do_pragma_pop_macro; use lex_identifier_from_string instead of _cpp_lex_identifier. (do_pragma_push_macro): Reimplement using push_pop_macro_common. (do_pragma_pop_macro): Likewise. * internal.h (_cpp_lex_identifier): Remove. * lex.cc (lex_identifier_intern): Remove. (_cpp_lex_identifier): Remove. gcc/testsuite/ChangeLog: PR preprocessor/109704 * c-c++-common/cpp/pragma-push-pop-utf8.c: New test. * g++.dg/pch/pushpop-2.C: New test. * g++.dg/pch/pushpop-2.Hs: New test. * gcc.dg/pch/pushpop-2.c: New test. * gcc.dg/pch/pushpop-2.hs: New test.
2024-10-14Allow for class type coarray parameters. [PR77871]Andre Vehreschild3-13/+58
gcc/fortran/ChangeLog: PR fortran/77871 * trans-expr.cc (gfc_conv_derived_to_class): Assign token when converting a coarray to class. (gfc_get_tree_for_caf_expr): For classes get the caf decl from the saved descriptor. (gfc_get_caf_token_offset):Assert that coarray=lib is set and cover more cases where the tree having the coarray token can be. * trans-intrinsic.cc (gfc_conv_intrinsic_caf_get): Use unified test for pointers. gcc/testsuite/ChangeLog: * gfortran.dg/coarray/dummy_3.f90: New test.
2024-10-14middle-end: copy STMT_VINFO_STRIDED_P when DR is replaced [PR116956]Tamar Christina2-0/+13
When move_dr copies a DR from one statement to another, it seems we've forgotten to copy the STMT_VINFO_STRIDED_P flag. This leaves the new DR in a broken state where it has a non constant stride but isn't marked as strided. This causes the ICE in the PR because dataref analysis fails during epilogue vectorization because there is an assumption in place that while costing may fail for epiloque vectorization, that DR analysis cannot if it succeeded for the main loop. gcc/ChangeLog: PR tree-optimization/116956 * tree-vectorizer.cc (vec_info::move_dr): Copy STMT_VINFO_STRIDED_P. gcc/testsuite/ChangeLog: PR tree-optimization/116956 * gfortran.dg/vect/pr116956.f90: New test.
2024-10-14simplify-rtx: Fix incorrect folding of shift and AND [PR117012]Tamar Christina2-2/+18
The optimization added in r15-1047-g7876cde25cbd2f is using the wrong operaiton to check for uniform constant vectors. The Author intended to check that all the lanes in the vector are the same and so used CONST_VECTOR_DUPLICATE_P. However this only checks that the vector is created from a pattern duplication, but doesn't say how many pattern alternatives make up the duplication. Normally would would need to check this separately or use const_vec_duplicate_p. Without this the optimization incorrectly triggers. gcc/ChangeLog: PR rtl-optimization/117012 * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Use const_vec_duplicate_p instead of CONST_VECTOR_DUPLICATE_P. gcc/testsuite/ChangeLog: PR rtl-optimization/117012 * gcc.target/aarch64/pr117012.c: New test.
2024-10-14AArch64: rename the SVE2 psel intrinsics to psel_lane [PR116371]Tamar Christina19-698/+698
The psel intrinsics. similar to the pext, should be name psel_lane. This corrects the naming. gcc/ChangeLog: PR target/116371 * config/aarch64/aarch64-sve-builtins-sve2.cc (class svpsel_impl): Renamed to ... (class svpsel_lane_impl): ... This and adjust initialization. * config/aarch64/aarch64-sve-builtins-sve2.def (svpsel): Renamed to ... (svpsel_lane): ... This. * config/aarch64/aarch64-sve-builtins-sve2.h (svpsel): Renamed to svpsel_lane. gcc/testsuite/ChangeLog: PR target/116371 * gcc.target/aarch64/sme2/acle-asm/psel_b16.c, gcc.target/aarch64/sme2/acle-asm/psel_b32.c, gcc.target/aarch64/sme2/acle-asm/psel_b64.c, gcc.target/aarch64/sme2/acle-asm/psel_b8.c, gcc.target/aarch64/sme2/acle-asm/psel_c16.c, gcc.target/aarch64/sme2/acle-asm/psel_c32.c, gcc.target/aarch64/sme2/acle-asm/psel_c64.c, gcc.target/aarch64/sme2/acle-asm/psel_c8.c: Renamed to.... * gcc.target/aarch64/sme2/acle-asm/psel_lane_b16.c, gcc.target/aarch64/sme2/acle-asm/psel_lane_b32.c, gcc.target/aarch64/sme2/acle-asm/psel_lane_b64.c, gcc.target/aarch64/sme2/acle-asm/psel_lane_b8.c, gcc.target/aarch64/sme2/acle-asm/psel_lane_c16.c, gcc.target/aarch64/sme2/acle-asm/psel_lane_c32.c, gcc.target/aarch64/sme2/acle-asm/psel_lane_c64.c, gcc.target/aarch64/sme2/acle-asm/psel_lane_c8.c: ... These.
2024-10-14RISC-V: Add detailed comments on processing implied extensions. [NFC]Yangyu Chen1-3/+6
In some cases, we don't need to handle implied extensions. Add detailed comments to help developers understand what implied ISAs should be considered. libgcc/ChangeLog: * config/riscv/feature_bits.c (__init_riscv_features_bits_linux): Add detailed comments on processing implied extensions. Signed-off-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn>
2024-10-14middle-end: support SLP early breakTamar Christina7-21/+257
This patch introduces feature parity for early break int the SLP only vectorizer. The approach taken here is to treat the early exits as root statements for an SLP tree. This means that we don't need any changes to build_slp to support gconds. Codegen for the gcond itself now has to be done out of line but the body of the SLP blocks itself is simply driven by SLP scheduling. There is a slight awkwardness in having re-used vectorizable_early_exit for both SLP and non-SLP but I've documented the differences and when I did try to refactor it it wasn't really worth it given that this is a temporary state anyway. This version is restricted to lane = 1, as such we can re-use the existing move_early_break function instead of having to do safety update through scheduling. I have a branch where I'm working on that but lane > 1 is out of scope for GCC 15 anyway. The only reason I will try to get moving through scheduling done as a stretch goal is so we get epilogue vectorization back for early break. The example: unsigned test4(unsigned x) { unsigned ret = 0; for (int i = 0; i < N; i++) { vect_b[i] = x + i; if (vect_a[i]*2 != x) break; vect_a[i] = x; } return ret; } builds the following SLP instance for early break: note: Analyzing vectorizable control flow: if (patt_6 != 0) note: Starting SLP discovery for note: patt_6 = _4 != x_9(D); note: starting SLP discovery for node 0x63abc80 note: Build SLP for patt_6 = _4 != x_9(D); note: precomputed vectype: vector(4) <signed-boolean:32> note: nunits = 4 note: vect_is_simple_use: operand x_9(D), type of def: external note: vect_is_simple_use: operand # RANGE [irange] unsigned int [0, 0][2, +INF] MASK 0xffff _3 * 2, type of def: internal note: starting SLP discovery for node 0x63abdc0 note: Build SLP for _4 = _3 * 2; note: precomputed vectype: vector(4) unsigned int note: nunits = 4 note: vect_is_simple_use: operand # vect_aD.4416[i_15], type of def: internal note: vect_is_simple_use: operand 2, type of def: constant note: starting SLP discovery for node 0x63abe60 note: Build SLP for _3 = vect_a[i_15]; note: precomputed vectype: vector(4) unsigned int note: nunits = 4 note: SLP discovery for node 0x63abe60 succeeded note: SLP discovery for node 0x63abdc0 succeeded note: SLP discovery for node 0x63abc80 succeeded note: SLP size 3 vs. limit 10. note: Final SLP tree for instance 0x6474190: note: node 0x63abc80 (max_nunits=4, refcnt=2) vector(4) <signed-boolean:32> note: op template: patt_6 = _4 != x_9(D); note: stmt 0 patt_6 = _4 != x_9(D); note: children 0x63abd20 0x63abdc0 note: node (external) 0x63abd20 (max_nunits=1, refcnt=1) note: { x_9(D) } note: node 0x63abdc0 (max_nunits=4, refcnt=2) vector(4) unsigned int note: op template: _4 = _3 * 2; note: stmt 0 _4 = _3 * 2; note: children 0x63abe60 0x63abf00 note: node 0x63abe60 (max_nunits=4, refcnt=2) vector(4) unsigned int note: op template: _3 = vect_a[i_15]; note: stmt 0 _3 = vect_a[i_15]; note: load permutation { 0 } note: node (constant) 0x63abf00 (max_nunits=1, refcnt=1) note: { 2 } and during codegen: note: ------>vectorizing SLP node starting from: patt_6 = _4 != x_9(D); note: vect_is_simple_use: operand # RANGE [irange] unsigned int [0, 0][2, +INF] MASK 0xffff _3 * 2, type of def: internal note: add new stmt: mask_patt_6.18_58 = _53 != vect__4.17_57; note: === vectorizable_early_exit === note: transform early-exit. note: vectorizing stmts using SLP. note: Vectorizing SLP tree: note: node 0x63abfa0 (max_nunits=4, refcnt=1) vector(4) int note: op template: i_12 = i_15 + 1; note: stmt 0 i_12 = i_15 + 1; note: children 0x63aba00 0x63ac040 note: node 0x63aba00 (max_nunits=4, refcnt=2) vector(4) int note: op template: i_15 = PHI <i_12(6), 0(14)> note: [l] stmt 0 i_15 = PHI <i_12(6), 0(14)> note: children (nil) (nil) note: node (constant) 0x63ac040 (max_nunits=1, refcnt=1) vector(4) int note: { 1 } gcc/ChangeLog: * tree-vect-loop.cc (vect_analyze_loop_2): Handle SLP trees with no children. * tree-vectorizer.h (enum slp_instance_kind): Add slp_inst_kind_gcond. (LOOP_VINFO_EARLY_BREAKS_LIVE_IVS): New. (vectorizable_early_exit): Expose. (class _loop_vec_info): Add early_break_live_stmts. * tree-vect-slp.cc (vect_build_slp_instance, vect_analyze_slp_instance): Support gcond instances. (vect_analyze_slp): Analyze gcond roots and early break live statements. (maybe_push_to_hybrid_worklist): Don't sink gconds. (vect_slp_analyze_operations): Support gconds. (vect_slp_check_for_roots): Update comments. (vectorize_slp_instance_root_stmt): Support gconds. (vect_schedule_slp): Pass vinfo to vectorize_slp_instance_root_stmt. * tree-vect-stmts.cc (vect_stmt_relevant_p): Record early break live statements. (vectorizable_early_exit): Support SLP. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-early-break_126.c: New test. * gcc.dg/vect/vect-early-break_127.c: New test. * gcc.dg/vect/vect-early-break_128.c: New test.
2024-10-14Add regression testEric Botcazou3-0/+38
gcc/testsuite/ PR ada/114593 * gnat.dg/specs/generic_inst2-child2.ads: New test. * gnat.dg/specs/generic_inst2.ads: New helper. * gnat.dg/specs/generic_inst2-child1.ads: Likewise.
2024-10-14libstdc++: Use std::move for iterator in ranges::fill [PR117094]Jonathan Wakely2-1/+35
Input iterators aren't required to be copyable. libstdc++-v3/ChangeLog: PR libstdc++/117094 * include/bits/ranges_algobase.h (__fill_fn): Use std::move for iterator that might not be copyable. * testsuite/25_algorithms/fill/constrained.cc: Check non-copyable iterator with sized sentinel.
2024-10-14libstdc++: Enable memset optimizations for distinct character types [PR93059]Jonathan Wakely1-8/+12
Currently we only optimize std::fill to memset when the source and destination types are the same byte-sized type. This means that we fail to optimize cases like std::fill(buf. buf+n, 0) because the literal 0 is not the same type as the character buffer. Such cases can safely be optimized to use memset, because assigning an int (or other integer) to a narrow character type has the same effects as converting the integer to unsigned char then copying it with memset. This patch enables the optimized code path when the fill character is a memcpy-able integer (using the new __memcpyable_integer trait). We still need to check is_same<U, T> to enable the memset optimization for filling a range of std::byte with a std::byte value, because that isn't a memcpyable integer. libstdc++-v3/ChangeLog: PR libstdc++/93059 * include/bits/stl_algobase.h (__fill_a1(T*, T*, const T&)): Change template parameters and enable_if condition to allow the fill value to be an integer.
2024-10-14libstdc++: Enable memcpy optimizations for distinct integral types [PR93059]Jonathan Wakely1-2/+87
Currently we only optimize std::copy, std::copy_n etc. to memmove when the source and destination types are the same. This means that we fail to optimize copying between distinct 1-byte types, e.g. copying from a buffer of unsigned char to a buffer of char8_t or vice versa. This patch adds more partial specializations of the __memcpyable trait so that we allow memcpy between integers of equal widths. This will enable memmove for copies between narrow character types and also between same-width types like int and unsigned. Enabling the optimization needs to be based on the width of the integer type, not just the size in bytes. This is because some targets define non-standard integral types such as __int20 in msp430, which has padding bits. It would not be safe to memcpy between e.g. __int20 and int32_t, even though sizeof(__int20) == sizeof(int32_t). A new trait is introduced to define the width, __memcpyable_integer, and then the __memcpyable trait compares the widths. It's safe to copy between signed and unsigned integers of the same width, because GCC only supports two's complement integers. I initially though it would be useful to define the specialization __memcpyable_integer<byte> to enable copying between narrow character types and std::byte. But that isn't possible with std::copy, because is_assignable<char&, std::byte> is false. Optimized copies using memmove will already happen for copying std::byte to std::byte, because __memcpyable<T*, T*> is true. libstdc++-v3/ChangeLog: PR libstdc++/93059 * include/bits/cpp_type_traits.h (__memcpyable): Add partial specialization for pointers to distinct types. (__memcpyable_integer): New trait to control which types can use cross-type memcpy optimizations.
2024-10-14RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and ↵Kito Cheng2-0/+410
__riscv_vendor_feature_bits This provides a common abstraction layer to probe the available extensions at run-time. These functions can be used to implement function multi-versioning or to detect available extensions. The advantages of providing this abstraction layer are: - Easy to port to other new platforms. - Easier to maintain in GCC for function multi-versioning. - For example, maintaining platform-dependent code in C code/libgcc is much easier than maintaining it in GCC by creating GIMPLEs... This API is intended to provide the capability to query minimal common available extensions on the system. The API is defined in the riscv-c-api-doc: https://github.com/riscv-non-isa/riscv-c-api-doc/blob/main/src/c-api.adoc Proposal to use unsigned long long for marchid and mimpid: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/91 Full function multi-versioning implementation will come later. We are posting this first because we intend to backport it to the GCC 14 branch to unblock LLVM 19 to use this with GCC 14.2, rather than waiting for GCC 15. Changes since v7: - Remove vendorID field in __riscv_vendor_feature_bits. - Fix C implies Zcf only for RV32. - Add more comments to kernel versions. Changes since v6: - Implement __riscv_cpu_model. - Set new sub extension bits which implied from previous extensions. Changes since v5: - Minor fixes on indentation. Changes since v4: - Bump to newest riscv-c-api-doc with some new extensions like Zve*, Zc* Zimop, Zcmop, Zawrs. - Rename the return variable name of hwprobe syscall. - Minor fixes on indentation. Changes since v3: - Fix non-linux build. - Let __init_riscv_feature_bits become constructor Changes since v2: - Prevent it initialize more than once. Changes since v1: - Fix the format. - Prevented race conditions by introducing a local variable to avoid load/store operations during the computation of the feature bit. Co-Developed-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn> Signed-off-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn> libgcc/ChangeLog: * config/riscv/feature_bits.c: New. * config/riscv/t-elf (LIB2ADD): Add feature_bits.c.
2024-10-14MAINTAINERS (s390 port): Add myselfStefan Schulze Frielinghaus1-0/+1
ChangeLog: * MAINTAINERS (s390 port): Add myself.
2024-10-14middle-end: [PR middle-end/116926] Allow widening optabs for vec-mode -> ↵Victor Do Nascimento1-0/+6
scalar-mode The recent refactoring of the dot_prod optab to convert-type exposed a limitation in how `find_widening_optab_handler_and_mode' is currently implemented, owing to the fact that, while the function expects the GET_MODE_CLASS (from_mode) == GET_MODE_CLASS (to_mode) condition to hold, the c6x backend implements a dot product from V2HI to SI, which triggers an ICE. Consequently, this patch adds some logic to allow widening optabs which accumulate vector elements to a single scalar. gcc/ChangeLog: PR middle-end/116926 * optabs-query.cc (find_widening_optab_handler_and_mode): Add handling of vector -> scalar optab handling.
2024-10-14aarch64: Fix folding of degenerate svwhilele case [PR117045]Richard Sandiford4-2/+76
The svwhilele folder mishandled the degenerate case in which the second argument is the maximum integer. In that case, the result is all-true regardless of the first parameter: If the second scalar operand is equal to the maximum signed integer value then a condition which includes an equality test can never fail and the result will be an all-true predicate. This is because the conceptual "increment the first operand by 1 after each element" is done modulo the range of the operand. The GCC code was instead treating it as infinite precision. whilele_5.c even had a test for the incorrect behaviour. The easiest fix seemed to be to handle that case specially before doing constant folding. This also copes with variable first operands. gcc/ PR target/116999 PR target/117045 * config/aarch64/aarch64-sve-builtins-base.cc (svwhilelx_impl::fold): Check for WHILELTs of the minimum value and WHILELEs of the maximum value. Fold them to all-false and all-true respectively. gcc/testsuite/ PR target/116999 PR target/117045 * gcc.target/aarch64/sve/acle/general/whilele_5.c: Fix bogus expected result. * gcc.target/aarch64/sve/acle/general/whilele_11.c: New test. * gcc.target/aarch64/sve/acle/general/whilele_12.c: Likewise.
2024-10-14Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' ↵Thomas Schwinge3-2/+11
__builtin_is_initial_device: Harmonize 'libgomp.oacc-fortran/acc_on_device-1-*' The test case 'libgomp.oacc-fortran/acc_on_device-1-1.f90' added in commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11 "Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device" was missing '-fno-builtin-acc_on_device', and all 'libgomp.oacc-fortran/acc_on_device-1-*' need comments, why that option is specified. PR testsuite/82250 libgomp/ * testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Add '-fno-builtin-acc_on_device'. * testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Comment. * testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Comment.
2024-10-14Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' ↵Thomas Schwinge1-1/+1
__builtin_is_initial_device: Fix effective-target keyword in 'libgomp.oacc-fortran/acc_on_device-2.f90' The test case 'libgomp.oacc-fortran/acc_on_device-2.f90' added in commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11 "Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device" had a mismatch between dump file production and its scanning; the former needs to use 'offload_target_nvptx' (like 'offload_target_amdgcn'), not 'offload_device_nvptx'. PR testsuite/82250 libgomp/ * testsuite/libgomp.oacc-fortran/acc_on_device-2.f90: Fix effective-target keyword.
2024-10-14middle-end/116891 - fix (negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)Richard Biener1-1/+1
Transforming -fma (-a, b, -c) to fma (a, b, c) is only valid when not rounding towards -inf or +inf as the sign of the multiplication changes. PR middle-end/116891 * match.pd ((negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)): Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.
2024-10-14RISC-V: Add testcases for form 4 of vector signed SAT_SUBPan Li9-0/+126
Form 4: #define DEF_VEC_SAT_S_SUB_FMT_4(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_sub_##T##_fmt_4 (T *out, T *op_1, T *op_2, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ T x = op_1[i]; \ T y = op_2[i]; \ T minus; \ bool overflow = __builtin_sub_overflow (x, y, &minus); \ out[i] = !overflow ? minus : x < 0 ? MIN : MAX; \ } \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-4-i16.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-4-i32.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-4-i64.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-4-i8.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-4-i16.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-4-i32.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-4-i64.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-4-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-14RISC-V: Add testcases for form 3 of vector signed SAT_SUBPan Li9-0/+126
Form 3: #define DEF_VEC_SAT_S_SUB_FMT_3(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_sub_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ T x = op_1[i]; \ T y = op_2[i]; \ T minus; \ bool overflow = __builtin_sub_overflow (x, y, &minus); \ out[i] = overflow ? x < 0 ? MIN : MAX : minus; \ } \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-3-i16.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-3-i32.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-3-i64.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-3-i8.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-3-i16.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-3-i32.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-3-i64.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-3-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-14Match: Support form 3 for vector signed integer SAT_SUBPan Li1-0/+12
This patch would like to support the form 3 of the vector signed integer SAT_SUB. Aka below example: Form 3: #define DEF_VEC_SAT_S_SUB_FMT_3(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_sub_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ T x = op_1[i]; \ T y = op_2[i]; \ T minus; \ bool overflow = __builtin_sub_overflow (x, y, &minus); \ out[i] = overflow ? x < 0 ? MIN : MAX : minus; \ } \ } Before this patch: 25 │ if (limit_11(D) != 0) 26 │ goto <bb 3>; [89.00%] 27 │ else 28 │ goto <bb 8>; [11.00%] 29 │ ;; succ: 3 30 │ ;; 8 31 │ 32 │ ;; basic block 3, loop depth 0 33 │ ;; pred: 2 34 │ _13 = (unsigned long) limit_11(D); 35 │ ;; succ: 4 36 │ 37 │ ;; basic block 4, loop depth 1 38 │ ;; pred: 3 39 │ ;; 7 40 │ # ivtmp.7_34 = PHI <0(3), ivtmp.7_30(7)> 41 │ _26 = op_1_12(D) + ivtmp.7_34; 42 │ x_29 = MEM[(int8_t *)_26]; 43 │ _1 = op_2_14(D) + ivtmp.7_34; 44 │ y_24 = MEM[(int8_t *)_1]; 45 │ _9 = .SUB_OVERFLOW (x_29, y_24); 46 │ _7 = IMAGPART_EXPR <_9>; 47 │ if (_7 != 0) 48 │ goto <bb 6>; [50.00%] 49 │ else 50 │ goto <bb 5>; [50.00%] 51 │ ;; succ: 6 52 │ ;; 5 53 │ 54 │ ;; basic block 5, loop depth 1 55 │ ;; pred: 4 56 │ _42 = REALPART_EXPR <_9>; 57 │ _2 = out_17(D) + ivtmp.7_34; 58 │ MEM[(int8_t *)_2] = _42; 59 │ ivtmp.7_27 = ivtmp.7_34 + 1; 60 │ if (_13 != ivtmp.7_27) 61 │ goto <bb 7>; [89.00%] 62 │ else 63 │ goto <bb 8>; [11.00%] 64 │ ;; succ: 7 65 │ ;; 8 66 │ 67 │ ;; basic block 6, loop depth 1 68 │ ;; pred: 4 69 │ _38 = x_29 < 0; 70 │ _39 = (signed char) _38; 71 │ _40 = -_39; 72 │ _41 = _40 ^ 127; 73 │ _33 = out_17(D) + ivtmp.7_34; 74 │ MEM[(int8_t *)_33] = _41; 75 │ ivtmp.7_25 = ivtmp.7_34 + 1; 76 │ if (_13 != ivtmp.7_25) 77 │ goto <bb 7>; [89.00%] 78 │ else 79 │ goto <bb 8>; [11.00%] After this patch: 77 │ _94 = .SELECT_VL (ivtmp_92, POLY_INT_CST [16, 16]); 78 │ vect_x_13.9_81 = .MASK_LEN_LOAD (vectp_op_1.7_79, 8B, { -1, ... }, _94, 0); 79 │ vect_y_15.12_85 = .MASK_LEN_LOAD (vectp_op_2.10_83, 8B, { -1, ... }, _94, 0); 80 │ vect_patt_49.13_86 = .SAT_SUB (vect_x_13.9_81, vect_y_15.12_85); 81 │ .MASK_LEN_STORE (vectp_out.14_88, 8B, { -1, ... }, _94, 0, vect_patt_49.13_86); 82 │ vectp_op_1.7_80 = vectp_op_1.7_79 + _94; 83 │ vectp_op_2.10_84 = vectp_op_2.10_83 + _94; 84 │ vectp_out.14_89 = vectp_out.14_88 + _94; 85 │ ivtmp_93 = ivtmp_92 - _94; The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add matching pattern for vector signed SAT_SUB form 3. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-14RISC-V: Add testcases for form 2 of vector signed SAT_SUBPan Li9-0/+126
Form 2: #define DEF_VEC_SAT_S_SUB_FMT_2(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_sub_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ T x = op_1[i]; \ T y = op_2[i]; \ T minus = (UT)x - (UT)y; \ out[i] = (x ^ y) >= 0 || (minus ^ x) >= 0 \ ? minus : x < 0 ? MIN : MAX; \ } \ } DEF_VEC_SAT_S_SUB_FMT_2(int8_t, uint8_t, INT8_MIN, INT8_MAX) The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-2-i16.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-2-i32.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-2-i64.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-2-i8.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-2-i16.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-2-i32.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-2-i64.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-2-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-14tree-optimization/116290 - fix compare-debug issue in ldistRichard Biener3-4/+23
Loop distribution does different analysis with -g0/-g due to counting a debug stmt starting a BB against a limit which will everntually lead to different IVOPTs choices. I've fixed a possible IVOPTs issue on the way even though it doesn't make a difference here. PR tree-optimization/116290 * tree-loop-distribution.cc (determine_reduction_stmt_1): PHIs have no debug variants. Start with first non-debug real stmt. * tree-ssa-loop-ivopts.cc (find_givs_in_bb): Do not analyze debug stmts. * gcc.dg/pr116290.c: New testcase.
2024-10-14SH: Fix cost estimation of mem load/storeOleg Endo1-7/+13
For memory loads/stores (that contain a MEM rtx) sh_rtx_costs would wrongly report a cost lower than 1 insn which is not accurate as it makes loads/stores appear cheaper than simple arithmetic insns. The cost of a load/store insn is at least 1 insn plus the cost of the address expression (some addressing modes can be considered more expensive than others due to additional constraints). gcc/ChangeLog: PR target/113533 * config/sh/sh.cc (sh_rtx_costs): Adjust cost estimation of MEM rtx to be always at least COST_N_INSNS (1). Forward speed argument to sh_address_cost. Co-authored-by: Roger Sayle <roger@nextmovesoftware.com>
2024-10-14SH: Add -fno-math-errno to fsca,fsrra tests.Oleg Endo5-5/+5
Without -fno-math-errno some of the test might fail because the expected insns will not be generated. gcc/testsuite/ChangeLog: * gcc.target/sh/pr53512-1.c: Add -fno-math-errno option. * gcc.target/sh/pr53512-2.c: Likewise. * gcc.target/sh/pr53512-3.c: Likewise. * gcc.target/sh/pr53512-4.c: Likewise. * gcc.target/sh/pr54680.c: Likewise.
2024-10-14Daily bump.GCC Administrator8-1/+101
2024-10-13libstdc++: testsuite: adjust name_fortify test for pre-defined _FORTIFY_SOURCESam James1-0/+1
Otherwise we get failures with toolchains that have _FORTIFY_SOURCE defined already to a different value like 3. libstdc++-v3/ChangeLog: * testsuite/17_intro/names_fortify.cc: Undefine _FORTIFY_SOURCE.
2024-10-13libstdc++: Fix ranges::copy_backward for a single memcpyable element [PR117121]Jonathan Wakely4-2/+48
The result iterator needs to be decremented before writing to it. Improve the PR 108846 tests for all of std::copy, std::copy_n, std::copy_backward, and the std::ranges versions. libstdc++-v3/ChangeLog: PR libstdc++/117121 * include/bits/ranges_algobase.h (copy_backward): Decrement output iterator before assigning one element through it. * testsuite/25_algorithms/copy/108846.cc: Ensure the algorithm's effects are correct for a single memcpyable element. * testsuite/25_algorithms/copy_backward/108846.cc: Likewise. * testsuite/25_algorithms/copy_n/108846.cc: Likewise.
2024-10-13MAINTAINERS: Add myself to write after approvalJosef Melcr1-0/+1
ChangeLog: * MAINTAINERS: Add myself to write after approval Signed-off-by: Josef Melcr <melcrjos@fit.cvut.cz>
2024-10-13Revert "c++: Fix overeager Woverloaded-virtual with conversion operators ↵Simon Martin8-135/+33
[PR109918]" This reverts commit 60163c85730e6b7c566e219222403ac87ddbbddd.
2024-10-13m68k: replace reload_in_progress by reload_in_progress || lra_in_progressAndreas Schwab3-11/+20
For now assume that LRA needs the same treatment as reload. * config/m68k/m68k.md ("movsi", "movxf"): Replace reload_in_progress by reload_in_progress || lra_in_progress. * config/m68k/m68k.cc (m68k_legitimate_mem_p) (emit_move_sequence): Likewise. * config/m68k/predicates.md ("fp_src_operand"): Likewise.
2024-10-13tree-optimization/116481 - avoid building function_type[]Richard Biener2-0/+24
The following avoids building an array type with function or method element type during diagnosing an array bound violation as this will result in an error, rejecting a program with a not too useful error message. Instead build such array type manually. PR tree-optimization/116481 * pointer-query.cc (build_printable_array_type): Build an array types with function or method element type manually to avoid bogus diagnostic. * gcc.dg/pr116481.c: New testcase.
2024-10-13Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' ↵Tobias Burnus13-46/+164
__builtin_is_initial_device It turned out that 'if (omp_is_initial_device() .eqv. true)' gave an ICE due to comparing 'int' with 'logical(4)'. When digging deeper, it also turned out that when the procedure pointer is needed, the builtin cannot be used, either. (Follow up to r15-2799-gf1bfba3a9b3f31 ) Extend the code to also use the builtin acc_on_device with OpenACC, which was previously only used in C/C++. Additionally, fix folding when offloading is not enabled. Fixes additionally the BT_BOOL data type, which was 'char'/integer(1) instead of bool, backing the booleaness; use bool_type_node as the rest of GCC. gcc/fortran/ChangeLog: * gfortran.h (gfc_option_t): Add disable_acc_on_device. * options.cc (gfc_handle_option): Handle -fno-builtin-acc_on_device. * trans-decl.cc (gfc_get_extern_function_decl): Move __builtin_omp_is_initial_device handling to ... * trans-expr.cc (get_builtin_fn): ... this new function. (conv_function_val): Call it. (update_builtin_function): New. (gfc_conv_procedure_call): Call it. * types.def (BT_BOOL): Fix type by using bool_type_node. gcc/ChangeLog: * gimple-fold.cc (gimple_fold_builtin_acc_on_device): Also fold when offloading is not configured. libgomp/ChangeLog: * libgomp.texi (TR13): Fix minor typos. (omp_is_initial_device): Improve wording. (acc_on_device): Note how to disable the builtin. * testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Remove TODO. * testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise. Add -fno-builtin-acc_on_device. * testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Update dg- as !offloading_enabled now compile-time expands acc_on_device. * testsuite/libgomp.fortran/target-is-initial-device-3.f90: New test. * testsuite/libgomp.oacc-fortran/acc_on_device-2.f90: New test.
2024-10-12[RISC-V] Avoid unnecessary extensions when value is already extendedJivan Hakobyan1-2/+18
This is a minor patch from Jivan from roughly a year ago. The basic idea here is similar to what we do when extending values for the sake of comparisons. Specifically if the value is already known to be properly extended, then an extension is just a copy. The original idea was to use a similar patch, but which aborted to identify cases where these unnecessary promotions where emitted. All that showed up when doing a testsuite run with that abort was the promotions created by the arithmetic with overflow patterns such as addv. Things like addv aren't *that* common so this never got high on my todo list, even after a minor issue in this space was raised in bugzilla. But with stage1 closing soon and no good reason not to go forward, I'm submitting this into the pre-commit tester now. My tester has been using it since roughly Feb :-) Plan would be to commit after the pre-commit tester renders its verdict. * config/riscv/riscv.md (zero_extendsidi2): If RHS is already zero extended, then this is just a copy. (extendsidi2): Similarly, but for sign extension.
2024-10-13Daily bump.GCC Administrator8-1/+358
2024-10-12Unsigned constants for ISO_FORTRAN_ENV and ISO_C_BINDING.Thomas Koenig9-8/+244
gcc/fortran/ChangeLog: * dump-parse-tree.cc (get_c_type_name): Also handle BT_UNSIGNED. * gfortran.h (NAMED_UINTCST): Define before inclusion of iso-c-binding.def and iso-fortran-env.def. (gfc_get_uint_kind_from_width_isofortranenv): Prototype. * gfortran.texi: Mention new constants in iso_c_binding and iso_fortran_env. * iso-c-binding.def: Handle NAMED_UINTCST. Add c_unsigned, c_unsigned_short,c_unsigned_char, c_unsigned_long, c_unsigned_long_long, c_uintmax_t, c_uint8_t, c_uint16_t, c_uint32_t, c_uint64_t, c_uint128_t, c_uint_least8_t, c_uint_least16_t, c_uint_least32_t, c_uint_least64_t, c_uint_least128_t, c_uint_fast8_t, c_uint_fast16_t, c_uint_fast32_t, c_uint_fast64_t and c_uint_fast128_t. * iso-fortran-env.def: Handle NAMED_UINTCST. Add uint8, uint16, uint32 and uint64. * module.cc (parse_integer): Whitespace fix. (write_module): Whitespace fix. (NAMED_UINTCST): Define before inclusion of iso-fortran-evn.def and iso-fortran-env.def. * symbol.cc: Likewise. * trans-types.cc (get_unsigned_kind_from_node): New function. (get_uint_kind_from_name): New function. (gfc_get_uint_kind_from_width_isofortranenv): New function. (get_uint_kind_from_width): New function. (gfc_init_kinds): Initialize gfc_c_uint_kind. gcc/testsuite/ChangeLog: * gfortran.dg/unsigned_36.f90: New test.
2024-10-12vect: Fix inconsistency in fully-masked lane-reducing op generation [PR116985]Feng Xue2-2/+28
To align vectorized def/use when lane-reducing op is present in loop reduction, we may need to insert extra trivial pass-through copies, which would cause mismatch between lane-reducing vector copy and loop mask index. This could be fixed by computing the right index around a new counter on effective lane- reducing vector copies. 2024-10-11 Feng Xue <fxue@os.amperecomputing.com> gcc/ PR tree-optimization/116985 * tree-vect-loop.cc (vect_transform_reduction): Compute loop mask index based on effective vector copies for reduction op. gcc/testsuite/ PR tree-optimization/116985 * gcc.dg/vect/pr116985.c: New testcase.
2024-10-12tree-optimization/117104 - add missed guards to max(a,b) != a simplificationRichard Biener2-1/+17
For vector types we have to make sure the comparison result is a vector type and the resulting compare operation is supported. As the resulting compare is never an equality compare I didn't bother to check for the cbranch case. PR tree-optimization/117104 * match.pd ((cmp:c (minmax:c @0 @1) @0) -> (out @0 @1)): Properly guard the vector case. * gcc.dg/pr117104.c: New testcase.
2024-10-12RISC-V] Slightly improve broadcasting small constants into vectorsJeff Law2-6/+21
I probably spent way more time on this than it's worth... I was looking at the code we generate for vector SAD and noticed that we were being a bit silly. Specifically: li a4,0 # 272 [c=4 l=4] *movsi_internal/1 Followed shortly by: vmv.s.x v3,a4 # 261 [c=4 l=4] *pred_broadcastrvvm1si/6 And no other uses of a4. We could have used x0 trivially. First we adjust the expander so that it doesn't force the constant into a register. In the matching pattern we change the appropriate source constraints from "r" to "rJ" and the output template is changed to use %z for the operand. The net is we drop the li completely and emit vmv.s.x,v3,x0. But wait, there's more. If we're broadcasting a constant in the range [-16..15] into a vector, we currently load the constant into a register and use vmv.v.r. We can instead use vmv.v.i, which avoids loading the constant into a GPR. For that case we again avoid forcing the constant into a register in the expander and adjust the output template to emit vmv.v.x or vmv.v.i based on whether or not the appropriate operand is a constant or general purpose register. So again, we'll drop a load immediate into a scalar for this case. Whether or not we should use vmv.v.i vs vmv.s.x for loading [-16..15] into the 0th element is probably uarch dependent. The tradeoff is loading the GPR vs the broadcast in the vector unit. I didn't bother with this case. Tested in my tester (which tests rv64gcv as a default codegen option). Will wait for the pre-commit tester to render a verdict. gcc/ * config/riscv/constraints.md (P): New constraint. * config/riscv/vector.md (pred_broadcast<mode> expander): Do not force small integers into GPRs so aggressively. (pred_broadcast<mode> insn & splitter): Allow splatting small constants across the vector register directly. Allow splatting (const_int 0) into element 0 directly.