aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-11-20Work in progress for refactoring simd intrinsicdevel/existing-fp8Saurabh Jha5-797/+330
2024-11-14aarch64: Add support for fp8fma instructionsSaurabh Jha9-4/+319
The AArch64 FEAT_FP8FMA extension introduces instructions for multiply-add of vectors. This patch introduces the following instructions: 1. {vmlalbq|vmlaltq}_f16_mf8_fpm. 2. {vmlalbq|vmlaltq}_lane{q}_f16_mf8_fpm. 3. {vmlallbbq|vmlallbtq|vmlalltbq|vmlallttq}_f32_mf8_fpm. 4. {vmlallbbq|vmlallbtq|vmlalltbq|vmlallttq}_lane{q}_f32_mf8_fpm. It introduces the fp8fma flag. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (check_simd_lane_bounds): Add support for new unspecs. (aarch64_expand_pragma_builtins): Add support for new unspecs. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): New flags. * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): New flags. * config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_FMA_FPM): Macro to declare fma intrinsics. (REQUIRED_EXTENSIONS): Define to declare functions behind command line flags. * config/aarch64/aarch64-simd.md: (@aarch64_<fpm_uns_op><VQ_HSF:mode><VQ_HSF:mode><V16QI_ONLY:mode><V16QI_ONLY:mode): Instruction pattern for fma intrinsics. (@aarch64_<fpm_uns_op><VQ_HSF:mode><VQ_HSF:mode><V16QI_ONLY:mode><VB:mode><SI_ONLY:mode): Instruction pattern for fma intrinsics with lane. * config/aarch64/aarch64.h (TARGET_FP8FMA): New flag for fp8fma instructions. * config/aarch64/iterators.md: New attributes and iterators. * doc/invoke.texi: New flag for fp8fma instructions. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/fma_fpm.c: New test.
2024-11-14aarch64: Add support for fp8dot2 and fp8dot4Saurabh Jha10-15/+380
The AArch64 FEAT_FP8DOT2 and FEAT_FP8DOT4 extension introduces instructions for dot product of vectors. This patch introduces the following intrinsics: 1. vdot{q}_{fp16|fp32}_mf8_fpm. 2. vdot{q}_lane{q}_{fp16|fp32}_mf8_fpm. It introduces two flags: fp8dot2 and fp8dot4. We had to add space for another type in aarch64_pragma_builtins_data struct. The macros were updated to reflect that. We added a new aarch64_builtin_signature variant, quaternary, and added support for it in the functions aarch64_fntype and aarch64_expand_pragma_builtin. We added a new namespace, function_checker, to implement range checks for functions defined using the new pragma approach. The old intrinsic range checks will continue to work. All the new AdvSIMD intrinsics we define that need lane checks should be using the function in this namespace to implement the checks. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (ENTRY): Change to handle extra type. (enum class): Added new variant. (struct aarch64_pragma_builtins_data): Add support for another type. (aarch64_get_number_of_args): Handle new signature. (require_integer_constant): New function to check whether the operand is an integer constant. (require_immediate_range): New function to validate index ranges. (check_simd_lane_bounds): New function to validate index operands. (aarch64_general_check_builtin_call): Call function_checker::check-simd_lane_bounds. (aarch64_expand_pragma_builtin): Handle new signature. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): New flags. * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): New flags. * config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_BINARY): Change to handle extra type. (ENTRY_BINARY_FPM): Change to handle extra type. (ENTRY_UNARY_FPM): Change to handle extra type. (ENTRY_TERNARY_FPM_LANE): Macro to declare fpm ternary with lane intrinsics. (ENTRY_VDOT_FPM): Macro to declare vdot intrinsics. (REQUIRED_EXTENSIONS): Define to declare functions behind command line flags. * config/aarch64/aarch64-simd.md: (@aarch64_<fpm_uns_op><VHF:mode><VHF:mode><VB:mode><VB:mode>): Instruction pattern for vdot2 intrinsics. (@aarch64_<fpm_uns_op><VHF:mode><VHF:mode><VB:mode><VB2:mode><SI_ONLY:mode>): Instruction pattern for vdot2 intrinsics with lane. (@aarch64_<fpm_uns_op><VDQSF:mode><VDQSF:mode><VB:mode><VB:mode>): Instruction pattern for vdot4 intrinsics. (@aarch64_<fpm_uns_op><VDQSF:mode><VDQSF:mode><VB:mode><VB2:mode><SI_ONLY:mode>): Instruction pattern for vdo4 intrinsics with lane. * config/aarch64/aarch64.h (TARGET_FP8DOT2): New flag for fp8dot2 instructions. (TARGET_FP8DOT4): New flag for fp8dot4 instructions. * config/aarch64/iterators.md: New attributes and iterators. * doc/invoke.texi: New flag for fp8dot2 and fp8dot4 instructions. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/vdot2_fpmdot.c: New test. * gcc.target/aarch64/simd/vdot4_fpmdot.c: New test.
2024-11-14aarch64: Add support for fp8 convert and scaleSaurabh Jha8-49/+587
The AArch64 FEAT_FP8 extension introduces instructions for conversion and scaling. This patch introduces the following intrinsics: 1. vcvt{1|2}_{bf16|high_bf16|low_bf16}_mf8_fpm. 2. vcvt{q}_mf8_f16_fpm. 3. vcvt_{high}_mf8_f32_fpm. 4. vscale{q}_{f16|f32|f64}. We introduced two aarch64_builtin_signatures enum variants, unary and ternary, and added support for these variants in the functions aarch64_fntype and aarch64_expand_pragma_builtin. We added new simd_types for integers (s32, s32q, and s64q) and for floating points (f8 and f8q). Because we added support for fp8 intrinsics here, we modified the check in acle/fp8.c that was checking that __ARM_FEATURE_FP8 macro is not defined. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (ENTRY): Modified to support uses_fpmr flag. (enum class): New variants to support new signatures. (struct aarch64_pragma_builtins_data): Add a new boolean field, uses_fpmr. (aarch64_get_number_of_args): Helper function used in aarch64_fntype and aarch64_expand_pragma_builtin. (aarch64_fntype): Handle new signatures. (aarch64_expand_pragma_builtin): Handle new signatures. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): New flag for FP8. * config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_BINARY): Macro to declare binary intrinsics. (ENTRY_TERNARY): Macro to declare ternary intrinsics. (ENTRY_UNARY): Macro to declare unary intrinsics. (ENTRY_VHSDF): Macro to declare binary intrinsics. (ENTRY_VHSDF_VHSDI): Macro to declare binary intrinsics. (REQUIRED_EXTENSIONS): Define to declare functions behind command line flags. * config/aarch64/aarch64-simd.md (@aarch64_<fpm_unary_bf_uns_op><V8BF_ONLY:mode><VB:mode>): Unary pattern. (@aarch64_<fpm_unary_hf_uns_op><V8HF_ONLY:mode><VB:mode>): Unary pattern. (@aarch64_lower_<fpm_unary_bf_uns_op><V8BF_ONLY:mode><V16QI_ONLY:mode>): Unary pattern. (@aarch64_lower_<fpm_unary_hf_uns_op><V8HF_ONLY:mode><V16QI_ONLY:mode>): Unary pattern. (@aarch64<fpm_uns_op><VB:mode><VCVTFPM:mode><VH_SF:mode>): Binary pattern. (@aarch64_<fpm_uns_op><V16QI_ONLY:mode><V8QI_ONLY:mode><V4SF_ONLY:mode><V4SF_ONLY:mode>): Unary pattern. (@aarch64_<fpm_uns_op><VHSDF:mode><VHSDI:mode>): Binary pattern. * config/aarch64/iterators.md: New attributes and iterators. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/fp8.c: Remove check that fp8 feature macro doesn't exist. * gcc.target/aarch64/simd/scale_fpm.c: New test. * gcc.target/aarch64/simd/vcvt_fpm.c: New test.
2024-11-14aarch64: Refactor infrastructure for advsimd intrinsicsVladimir Miloserdov2-19/+77
This patch refactors the infrastructure for defining advsimd pragma intrinsics, adding support for more flexible type and signature handling in future SIMD extensions. A new simd_type structure is introduced, which allows for consistent mode and qualifier management across various advsimd operations. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (ENTRY): Modify to include modes and qualifiers for simd_type structure. (ENTRY_VHSDF): Move to aarch64-builtins.cc to decouple. (struct simd_type): New structure for managing mode and qualifier combinations for SIMD types. (struct aarch64_pragma_builtins_data): Replace mode with simd_type to support multiple argument types for intrinsics. (aarch64_fntype): Modify to handle different shapes type. (aarch64_expand_pragma_builtin): Modify to handle different shapes type. * config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_BINARY): Move from aarch64-builtins.cc. (ENTRY_VHSDF): Move from aarch64-builtins.cc. (REQUIRED_EXTENSIONS): New macro.
2024-11-14i386: Fix cstorebf4 fp comparison operand [PR117495]Hongyu Wang2-11/+33
For cstorebf4 it uses comparison_operator for BFmode compare, which is incorrect when directly uses ix86_expand_setcc as it does not canonicalize the input comparison to correct the compare code by swapping operands. The original code without AVX10.2 calls emit_store_flag_force, who actually calls to emit_store_flags_1 and recurisive calls to this expander again with swapped operand and flag. Therefore, we can avoid do the redundant recurisive call by adjusting the comparison_operator to ix86_fp_comparison_operator, and calls ix86_expand_setcc directly. gcc/ChangeLog: PR target/117495 * config/i386/i386.md (cstorebf4): Use ix86_fp_comparison_operator and calls ix86_expand_setcc directly. gcc/testsuite/ChangeLog: PR target/117495 * gcc.target/i386/pr117495.c: New test.
2024-11-13[PATCH] RISC-V: Bugfix for unrecognizable insn for XTheadVectorJin Ma2-2/+16
error: unrecognizable insn: (insn 35 34 36 2 (set (subreg:RVVM1SF (reg/v:RVVM1x4SF 142 [ _r ]) 0) (unspec:RVVM1SF [ (const_vector:RVVM1SF repeat [ (const_double:SF 0.0 [0x0.0p+0]) ]) (reg:DI 0 zero) (const_int 1 [0x1]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_TH_VWLDST)) -1 (nil)) during RTL pass: mode_sw PR target/116591 gcc/ChangeLog: * config/riscv/vector.md: Add restriction to call pred_th_whole_mov. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xtheadvector/pr116591.c: New test.
2024-11-13libstdc++: Refactor std::hash specializationsJonathan Wakely12-63/+344
This attempts to simplify and clean up our std::hash code. The primary benefit is improved diagnostics for users when they do something wrong involving std::hash or unordered containers. An additional benefit is that for the unstable ABI (--enable-symvers=gnu-versioned-namespace) we can reduce the memory footprint of several std::hash specializations. In the current design, __hash_enum is a base class of the std::hash primary template, but the partial specialization of __hash_enum for non-enum types is disabled. This means that if a user forgets to specialize std::hash for their class type (or forgets to use a custom hash function for unordered containers) they get error messages about std::__hash_enum not being constructible. This is confusing when there is no enum type involved: why should users care about __hash_enum not being constructible if they're not trying to hash enums? This change makes the std::hash primary template only derive from __hash_enum when the template argument type is an enum. Otherwise, it derives directly from a new class template, __hash_not_enabled. This new class template defines the deleted members that cause a given std::hash specialization to be a disabled specialization (as per P0513R0). Now when users try to use a disabled specialization, they get more descriptive errors that mention __hash_not_enabled instead of __hash_enum. Additionally, adjust __hash_base to remove the deprecated result_type and argument_type typedefs for C++20 and later. In the current code we use a __poison_hash base class in the std::hash specializations for std::unique_ptr, std::optional, and std::variant. The primary template of __poison_hash has deleted special members, which is used to conditionally disable the derived std::hash specialization. This can also result in confusing diagnostics, because seeing "poison" in an enabled specialization is misleading. Only some uses of __poison_hash actually "poison" anything, i.e. cause a specialization to be disabled. In other cases it's just an empty base class that does nothing. This change removes __poison_hash and changes the std::hash specializations that were using it to conditionally derive from __hash_not_enabled instead. When the std::hash specialization is enabled, there is no more __poison_hash base class. However, to preserve the ABI properties of those std::hash specializations, we need to replace __poison_hash with some other empty base class. This is needed because in the current code std::hash<std::variant<int, const int>> has two __poison_hash<int> base classes, which must have unique addresses, so sizeof(std::hash<std::variant<int, const int>>) == 2. To preserve this unfortunate property, a new __hash_empty_base class is used as a base class to re-introduce du0plicate base classes that increase the class size. For the unstable ABI we don't use __hash_empty_base so the std::hash<std::variant<T...>> specializations are always size 1, and the class hierarchy is much simpler so will compile faster. Additionally, remove the result_type and argument_type typedefs from all disabled specializations of std::hash for std::unique_ptr, std::optional, and std::variant. Those typedefs are useless for disabled specializations, and although the standard doesn't say they must *not* be present for disabled specializations, it certainly only requires them for enabled specializations. Finally, for C++20 the typedefs are also removed from enabled specializations of std::hash for std::unique_ptr, std::optional, and std::variant. libstdc++-v3/ChangeLog: * doc/xml/manual/evolution.xml: Document removal of nested types from std::hash specializations. * doc/html/manual/api.html: Regenerate. * include/bits/functional_hash.h (__hash_base): Remove deprecated nested types for C++20. (__hash_empty_base): Define new class template. (__is_hash_enabled_for): Define new variable template. (__poison_hash): Remove. (__hash_not_enabled): Define new class template. (__hash_enum): Remove partial specialization for non-enums. (hash): Derive from __hash_not_enabled for non-enums, instead of __hash_enum. * include/bits/unique_ptr.h (__uniq_ptr_hash): Derive from __hash_base. Conditionally derive from __hash_empty_base. (__uniq_ptr_hash<>): Remove disabled specialization. (hash): Do not derive from __hash_base unconditionally. Conditionally derive from either __uniq_ptr_hash or __hash_not_enabled. * include/std/optional (__optional_hash_call_base): Remove. (__optional_hash): Define new class template. (hash): Derive from either (hash): Conditionally derive from either __optional_hash or __hash_not_enabled. Remove nested typedefs. * include/std/variant (_Base_dedup): Replace __poison_hash with __hash_empty_base. (__variant_hash_call_base_impl): Remove. (__variant_hash): Define new class template. (hash): Conditionally derive from either __variant_hash or __hash_not_enabled. Remove nested typedefs. * testsuite/20_util/optional/hash.cc: Check whether nested types are present. * testsuite/20_util/variant/hash.cc: Likewise. * testsuite/20_util/optional/hash_abi.cc: New test. * testsuite/20_util/unique_ptr/hash/abi.cc: New test. * testsuite/20_util/unique_ptr/hash/types.cc: New test. * testsuite/20_util/variant/hash_abi.cc: New test.
2024-11-13libstdc++: Add _Hashtable::_M_locate(const key_type&)Jonathan Wakely1-188/+145
We have two overloads of _M_find_before_node but they have quite different performance characteristics, which isn't necessarily obvious. The original version, _M_find_before_node(bucket, key, hash_code), looks only in the specified bucket, doing a linear search within that bucket for an element that compares equal to the key. This is the typical fast lookup for hash containers, assuming the load factor is low so that each bucket isn't too large. The newer _M_find_before_node(key) was added in r12-6272-ge3ef832a9e8d6a and could be naively assumed to calculate the hash code and bucket for key and then call the efficient _M_find_before_node(bkt, key, code) function. But in fact it does a linear search of the entire container. This is potentially very slow and should only be used for a suitably small container, as determined by the __small_size_threshold() function. We don't even have a comment pointing out this O(N) performance of the newer overload. Additionally, the newer overload is only ever used in exactly one place, which would suggest it could just be removed. However there are several places that do the linear search of the whole container with an explicit loop each time. This adds a new member function, _M_locate, and uses it to replace most uses of _M_find_node and the loops doing linear searches. This new member function does both forms of lookup, the linear search for small sizes and the _M_find_node(bkt, key, code) lookup within a single bucket. The new function returns a __location_type which is a struct that contains a pointer to the first node matching the key (if such a node is present), or the hash code and bucket index for the key. The hash code and bucket index allow the caller to know where a new node with that key should be inserted, for the cases where the lookup didn't find a matching node. The result struct actually contains a pointer to the node *before* the one that was located, as that is needed for it to be useful in erase and extract members. There is a member function that returns the found node, i.e. _M_before->_M_nxt downcast to __node_ptr, which should be used in most cases. This new function greatly simplifies the functions that currently have to do two kinds of lookup and explicitly check the current size against the small size threshold. Additionally, now that try_emplace is defined directly in _Hashtable (not in _Insert_base) we can use _M_locate in there too, to speed up some try_emplace calls. Previously it did not do the small-size linear search. It would be possible to add a function to get a __location_type from an iterator, and then rewrite some functions like _M_erase and _M_extract_node to take a __location_type parameter. While that might be conceptually nice, it wouldn't really make the code any simpler or more readable than it is now. That isn't done in this change. libstdc++-v3/ChangeLog: * include/bits/hashtable.h (__location_type): New struct. (_M_locate): New member function. (_M_find_before_node(const key_type&)): Remove. (_M_find_node): Move variable initialization into condition. (_M_find_node_tr): Likewise. (operator=(initializer_list<T>), try_emplace, _M_reinsert_node) (_M_merge_unique, find, erase(const key_type&)): Use _M_locate for lookup.
2024-11-13libstdc++: Simplify _Hashtable merge functionsJonathan Wakely7-28/+626
I realised that _M_merge_unique and _M_merge_multi call extract(iter) which then has to call _M_get_previous_node to iterate through the bucket to find the node before the one iter points to. Since the merge function is already iterating over the entire container, we had the previous node a moment ago. Walking the whole bucket to find it again is wasteful. We could just rewrite the loop in terms of node pointers instead of iterators, and then call _M_extract_node directly. However, this is only possible when the source container is the same type as the destination, because otherwise we can't access the source's private members (_M_before_begin, _M_begin, _M_extract_node etc.) Add overloads of _M_merge_unique and _M_merge_multi that work with source containers of the same type, to enable this optimization. For both overloads of _M_merge_unique we can also remove the conditional modifications to __n_elt and just consistently decrement it for every element processed. Use a multiplier of one or zero that dictates whether __n_elt is passed to _M_insert_unique_node or not. We can also remove the repeated calls to size() and just keep track of the size in a local variable. Although _M_merge_unique and _M_merge_multi should be safe for "self-merge", i.e. when doing c.merge(c), it's wasteful to search/insert every element when we don't need to do anything. Add 'this == &source' checks to the overloads taking an lvalue of the container's own type. Because those checks aren't needed for the rvalue overloads, change those to call the underlying _M_merge_xxx function directly instead of going through the lvalue overload that checks the address. I've also added more extensive tests for better coverage of the new overloads added in this commit. libstdc++-v3/ChangeLog: * include/bits/hashtable.h (_M_merge_unique): Add overload for merging from same type. (_M_merge_unique<Compatible>): Simplify size tracking. Add comment. (_M_merge_multi): Add overload for merging from same type. (_M_merge_multi<Compatible>): Add comment. * include/bits/unordered_map.h (unordered_map::merge): Check for self-merge in the lvalue overload. Call _M_merge_unique directly for the rvalue overload. (unordered_multimap::merge): Likewise. * include/bits/unordered_set.h (unordered_set::merge): Likewise. (unordered_multiset::merge): Likewise. * testsuite/23_containers/unordered_map/modifiers/merge.cc: Add more tests. * testsuite/23_containers/unordered_multimap/modifiers/merge.cc: Likewise. * testsuite/23_containers/unordered_multiset/modifiers/merge.cc: Likewise. * testsuite/23_containers/unordered_set/modifiers/merge.cc: Likewise.
2024-11-13libstdc++: Remove _Hashtable_base::_S_equalsJonathan Wakely1-24/+24
This removes the overloaded _S_equals and _S_node_equals functions, replacing them with 'if constexpr' in the handful of places they're used. libstdc++-v3/ChangeLog: * include/bits/hashtable_policy.h (_Hashtable_base::_S_equals): Remove. (_Hashtable_base::_S_node_equals): Remove. (_Hashtable_base::_M_key_equals_tr): Fix inaccurate static_assert string. (_Hashtable_base::_M_equals, _Hashtable_base::_M_equals_tr): Use 'if constexpr' instead of _S_equals. (_Hashtable_base::_M_node_equals): Use 'if constexpr' instead of _S_node_equals.
2024-11-13libstdc++: Remove _Equality base class from _HashtableJonathan Wakely2-164/+94
libstdc++-v3/ChangeLog: * include/bits/hashtable.h (_Hashtable): Remove _Equality base class. (_Hashtable::_M_equal): Define equality comparison here instead of in _Equality::_M_equal. * include/bits/hashtable_policy.h (_Equality): Remove.
2024-11-13libstdc++: Remove _Insert base class from _HashtableJonathan Wakely2-215/+144
There's no reason to have a separate base class defining the insert member functions now. They can all be moved into the _Hashtable class, which simplifies them slightly. libstdc++-v3/ChangeLog: * include/bits/hashtable.h (_Hashtable): Remove inheritance from __detail::_Insert and move its members into _Hashtable. * include/bits/hashtable_policy.h (__detail::_Insert): Remove. Reviewed-by: François Dumont <fdumont@gcc.gnu.org>
2024-11-13libstdc++: Use RAII in _HashtableJonathan Wakely1-44/+55
Use scoped guard types to clean up if an exception is thrown. This allows some try-catch blocks to be removed. libstdc++-v3/ChangeLog: * include/bits/hashtable.h (operator=(const _Hashtable&)): Use RAII instead of try-catch. (_M_assign(_Ht&&, _NodeGenerator&)): Likewise. Reviewed-by: François Dumont <fdumont@gcc.gnu.org>
2024-11-13libstdc++: Replace _Hashtable::__fwd_value_for with castJonathan Wakely1-9/+5
We can just use a cast to the appropriate type instead of calling a function to do it. This gives the compiler less work to compile and optimize, and at -O0 avoids a function call per element. libstdc++-v3/ChangeLog: * include/bits/hashtable.h (_Hashtable::__fwd_value_for): Remove. (_Hashtable::_M_assign): Use static_cast instead of __fwd_value_for. Reviewed-by: François Dumont <fdumont@gcc.gnu.org>
2024-11-13libstdc++: Add _Hashtable::_M_assign for the common caseJonathan Wakely2-16/+19
This adds a convenient _M_assign overload for the common case where the node generator is the _AllocNode type. Only two places need to call _M_assign with a _ReuseOrAllocNode node generator, so all the other calls to _M_assign can use the new overload instead of manually constructing a node generator. The _AllocNode::operator(Args&&...) function doesn't need to be a variadic template. It is only ever called with a single argument of type const value_type& or value_type&&, so could be simplified. That isn't done in this commit. libstdc++-v3/ChangeLog: * include/bits/hashtable.h (_Hashtable): Remove typedefs for node generators. (_Hashtable::_M_assign(_Ht&&)): Add new overload. (_Hashtable::operator=(initializer_list<value_type>)): Add local typedef for node generator. (_Hashtable::_M_assign_elements): Likewise. (_Hashtable::operator=(const _Hashtable&)): Use new _M_assign overload. (_Hashtable(const _Hashtable&)): Likewise. (_Hashtable(const _Hashtable&, const allocator_type&)): Likewise. (_Hashtable(_Hashtable&&, __node_alloc_type&&, false_type)): Likewise. * include/bits/hashtable_policy.h (_Insert): Remove typedef for node generator. Reviewed-by: François Dumont <fdumont@gcc.gnu.org>
2024-11-13libstdc++: Refactor Hashtable erasureJonathan Wakely1-74/+39
This reworks the internal member functions for erasure from unordered containers, similarly to the earlier commit doing it for insertion. Instead of multiple overloads of _M_erase which are selected via tag dispatching, the erase(const key_type&) member can use 'if constexpr' to choose an appropriate implementation (returning after erasing a single element for unique keys, or continuing to erase all equivalent elements for non-unique keys). libstdc++-v3/ChangeLog: * include/bits/hashtable.h (_Hashtable::_M_erase): Remove overloads for erasing by key, moving logic to ... (_Hashtable::erase): ... here. Reviewed-by: François Dumont <fdumont@gcc.gnu.org>
2024-11-13libstdc++: Refactor Hashtable insertion [PR115285]Jonathan Wakely4-322/+177
This completely reworks the internal member functions for insertion into unordered containers. Currently we use a mixture of tag dispatching (for unique vs non-unique keys) and template specialization (for maps vs sets) to correctly implement insert and emplace members. This removes a lot of complexity and indirection by using 'if constexpr' to select the appropriate member function to call. Previously there were four overloads of _M_emplace, for unique keys and non-unique keys, and for hinted insertion and non-hinted. However two of those were redundant, because we always ignore the hint for unique keys and always use a hint for non-unique keys. Those four overloads have been replaced by two new non-overloaded function templates: _M_emplace_uniq and _M_emplace_multi. The former is for unique keys and doesn't take a hint, and the latter is for non-unique keys and takes a hint. In the body of _M_emplace_uniq there are special cases to handle emplacing values from which a key_type can be extracted directly. This means we don't need to allocate a node and construct a value_type that might be discarded if an equivalent key is already present. The special case applies when emplacing the key_type into std::unordered_set, or when emplacing std::pair<cv key_type, X> into std::unordered_map, or when emplacing two values into std::unordered_map where the first has type cv key_type. For the std::unordered_set case, obviously if we're inserting something that's already the key_type, we can look it up directly. For the std::unordered_map cases, we know that the inserted std::pair<const key_type, mapped_type> would have its first element initialized from first member of a std::pair value, or from the first of two values, so if that is a key_type, we can look that up directly. All the _M_insert overloads used a node generator parameter, but apart from the one case where _M_insert_range was called from _Hashtable::operator=(initializer_list<value_type>), that parameter was always the _AllocNode type, never the _ReuseOrAllocNode type. Because operator=(initializer_list<value_type>) was rewritten in an earlier commit, all calls to _M_insert now use _AllocNode, so there's no reason to pass the generator as a template parameter when inserting. The multiple overloads of _Hashtable::_M_insert can all be removed now, because the _Insert_base::insert members now call either _M_emplace_uniq or _M_emplace_multi directly, only passing a hint to the latter. Which one to call is decided using 'if constexpr (__unique_keys::value)' so there is no unnecessary code instantiation, and overload resolution is much simpler. The partial specializations of the _Insert class template can be entirely removed, moving the minor differences in 'insert' member functions into the common _Insert_base base class. The different behaviour for maps and sets can be implemented using enable_if constraints and 'if constexpr'. With the _Insert class template no longer needed, the _Insert_base class template can be renamed to _Insert. This is a minor simplification for the complex inheritance hierarchy used by _Hashtable, removing one base class. It also means one less class template instantiation, and no need to match the right partial specialization of _Insert. The _Insert base class could be removed entirely by moving all its 'insert' members into _Hashtable, because without any variation in specializations of _Insert there is no reason to use a base class to define those members. That is left for a later commit. Consistently using _M_emplace_uniq or _M_emplace_multi for insertion means we no longer attempt to avoid constructing a value_type object to find its key, removing the PR libstdc++/96088 optimizations. This fixes the bugs caused by those optimizations, such as PR libstdc++/115285, but causes regressions in the expected number of allocations and temporary objects constructed for the PR 96088 tests. It should be noted that the "regressions" in the 96088 tests put us exactly level with the number of allocations done by libc++ for those same tests. To mitigate this to some extent, _M_emplace_uniq detects when the emplace arguments already contain a key_type (either as the sole argument, for unordered_set, or as the first part of a pair of arguments, for unordered_map). In that specific case we don't need to allocate a node and construct a value type to check for an existing element with equivalent key. The remaining regressions in the number of allocations and temporaries should be addressed separately, with more conservative optimizations specific to std::string. That is not part of this commit. libstdc++-v3/ChangeLog: PR libstdc++/115285 * include/bits/hashtable.h (_Hashtable::_M_emplace): Replace with _M_emplace_uniq and _M_emplace_multi. (_Hashtable::_S_forward_key, _Hashtable::_M_insert_unique) (_Hashtable::_M_insert_unique_aux, _Hashtable::_M_insert): Remove. * include/bits/hashtable_policy.h (_ConvertToValueType): Remove. (_Insert_base::_M_insert_range): Remove overload for unique keys and rename overload for non-unique keys to ... (_Insert_base::_M_insert_range_multi): ... this. (_Insert_base::insert): Call _M_emplace_uniq or _M_emplace_multi instead of _M_insert. Add insert overloads from _Insert. (_Insert_base): Rename to _Insert. (_Insert): Remove * testsuite/23_containers/unordered_map/96088.cc: Adjust expected number of allocations. * testsuite/23_containers/unordered_set/96088.cc: Likewise.
2024-11-13libstdc++: Allow unordered_set assignment to assign to existing nodesJonathan Wakely3-17/+35
Currently the _ReuseOrAllocNode::operator(Args&&...) function always destroys the value stored in recycled nodes and constructs a new value. The _ReuseOrAllocNode type is only ever used for implementing assignment, either from another unordered container of the same type, or from std::initializer_list<value_type>. Consequently, the parameter pack Args only ever consists of a single parameter or type const value_type& or value_type. We can replace the variadic parameter pack with a single forwarding reference parameter, and when the value_type is assignable from that type we can use assignment instead of destroying the existing value and then constructing a new one. Using assignment is typically only possible for sets, because for maps the value_type is std::pair<const key_type, mapped_type> and in most cases std::is_assignable_v<const key_type&, const key_type&> is false. libstdc++-v3/ChangeLog: * include/bits/hashtable_policy.h (_ReuseOrAllocNode::operator()): Replace parameter pack with a single parameter. Assign to existing value when possible. * testsuite/23_containers/unordered_multiset/allocator/move_assign.cc: Adjust expected count of operations. * testsuite/23_containers/unordered_set/allocator/move_assign.cc: Likewise. Reviewed-by: François Dumont <fdumont@gcc.gnu.org>
2024-11-13libstdc++: Refactor _Hashtable::operator=(initializer_list<value_type>)Jonathan Wakely1-3/+32
This replaces a call to _M_insert_range with open coding the loop. This will allow removing the node generator parameter from _M_insert_range in a later commit. libstdc++-v3/ChangeLog: * include/bits/hashtable.h (operator=(initializer_list)): Refactor to not use _M_insert_range. Reviewed-by: François Dumont <fdumont@gcc.gnu.org>
2024-11-13libstdc++: Fix calculation of system time in performance testsJonathan Wakely1-1/+4
The system_time() function used the wrong element of the splits array. Also add a comment about the units for time measurements. libstdc++-v3/ChangeLog: * testsuite/util/testsuite_performance.h (time_counter): Add comment about times. (time_counter::system_time): Use correct split value.
2024-11-13libstdc++: Write timestamp to libstdc++-performance.sum fileJonathan Wakely1-0/+3
The results of 'make check-performance' are appended to the .sum file, with no indication where one set of results ends and the next begins. We could just remove the file when starting a new run, but appending makes it a little easier to compare with previous runs, without having to copy and store old files. This adds a header containing a timestamp to the file when starting a new run. libstdc++-v3/ChangeLog: * scripts/check_performance: Add timestamp to output file at start of run.
2024-11-13libstdc++: Use __is_single_threaded() in performance testsJonathan Wakely1-6/+3
With recent glibc releases the __gthread_active_p() function is always true, so we always append "-thread" onto performance benchmark names. Use the __gnu_cxx::__is_single_threaded() function instead. libstdc++-v3/ChangeLog: * testsuite/util/testsuite_performance.h: Use __gnu_cxx::__is_single_threaded instead of __gthread_active_p().
2024-11-13libstdc++: Stop using std::unary_function in perf testsJonathan Wakely2-2/+8
This fixes some -Wdeprecated-declarations warnings. libstdc++-v3/ChangeLog: * testsuite/performance/ext/pb_ds/hash_int_erase_mem.cc: Replace std::unary_function with result_type and argument_type typedefs. * testsuite/util/performance/assoc/multimap_common_type.hpp: Likewise.
2024-11-13libstdc++: Fix nodiscard warnings in perf test for memory poolsJonathan Wakely1-3/+12
The use of unnamed std::lock_guard temporaries was intentional here, as they were used like barriers (but std::barrier isn't available until C++20). But that gives nodiscard warnings, because unnamed temporary locks are usually unintentional. Use named variables in new block scopes instead. libstdc++-v3/ChangeLog: * testsuite/performance/20_util/memory_resource/pools.cc: Fix -Wunused-value warnings about unnamed std::lock_guard objects.
2024-11-13aarch64: Relax add_overloaded_function assertRichard Sandiford4-18/+28
There are some SVE intrinsics that support one set of suffixes for one extension (E1, say) and another set of suffixes for another extension (E2, say). It is usually the case that, mutatis mutandis, E2 extends E1. Listing E1 first would then ensure that the manual C overload would also require E1, making it suitable for resolving both the E1 forms and, where appropriate, the E2 forms. However, there was one exception: the I8MM, F32MM, and F64MM extensions to SVE each added variants of svmmla, but there was no svmmla for SVE itself. This was handled by adding an SVE entry for svmmla that only defined the C overload; it had no variants of its own. This situation occurs more often with upcoming patches. Rather than keep adding these dummy entries, it seemed better to make the code automatically compute the lowest common denominator for all definitions that share the same C overload. gcc/ * config/aarch64/aarch64-protos.h (aarch64_required_extensions::common_denominator): New member function. * config/aarch64/aarch64-sve-builtins-base.def: Remove zero-variant entry for mmla. * config/aarch64/aarch64-sve-builtins-shapes.cc (mmla_def): Remove support for it. * config/aarch64/aarch64-sve-builtins.cc (function_builder::add_overloaded): Relax the assert for duplicate definitions and instead calculate the common denominator of all requirements.
2024-11-13i386: Add -mveclibabi=aocl [PR56504]Filip Kastl7-16/+418
We currently support generating vectorized math calls to the AMD core math library (ACML) (-mveclibabi=acml). That library is end-of-life and its successor is the math library from AMD Optimizing CPU Libraries (AOCL). This patch adds support for AOCL (-mveclibabi=aocl). That significantly broadens the range of vectorized math functions optimized for AMD CPUs that GCC can generate calls to. See the edit to invoke.texi for a complete list of added functions. Compared to the list of functions in AOCL LibM docs I left out these vectorized function families: - sincos and all functions working with arrays ... Because these functions have pointer arguments and that would require a bigger rework of ix86_veclibabi_aocl(). Also, I'm not sure if GCC even ever generates calls to these functions. - linearfrac ... Because these functions are specific to the AMD library. There's no equivalent glibc function nor GCC internal function nor GCC built-in. - powx, sqrt, fabs ... Because GCC doesn't vectorize these functions into calls and uses instructions instead. I also left amd_vrd2_expm1() (the AMD docs list the function but I wasn't able to link calls to it with the current version of the library). gcc/ChangeLog: PR target/56504 * config/i386/i386-options.cc (ix86_option_override_internal): Add ix86_veclibabi_type_aocl case. * config/i386/i386-options.h (ix86_veclibabi_aocl): Add extern ix86_veclibabi_aocl(). * config/i386/i386-opts.h (enum ix86_veclibabi): Add ix86_veclibabi_type_aocl into the ix86_veclibabi enum. * config/i386/i386.cc (ix86_veclibabi_aocl): New function. * config/i386/i386.opt: Add the 'aocl' type. * doc/invoke.texi: Document -mveclibabi=aocl. gcc/testsuite/ChangeLog: PR target/56504 * gcc.target/i386/vectorize-aocl1.c: New test. Signed-off-by: Filip Kastl <fkastl@suse.cz>
2024-11-13hppa: Remove inner `fix:SF/DF` from fixed-point patternsJohn David Anglin1-8/+8
2024-11-13 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: PR target/117525 * config/pa/pa.md (fix_truncsfsi2): Remove inner `fix:SF`. (fix_truncdfsi2, fix_truncsfdi2, fix_truncdfdi2, fixuns_truncsfsi2, fixuns_truncdfsi2, fixuns_truncsfdi2, fixuns_truncdfdi2): Likewise.
2024-11-13diagnostics: avoid using global_dc in path-printingDavid Malcolm6-35/+53
gcc/analyzer/ChangeLog: * checker-path.cc (checker_path::debug): Explicitly use global_dc's reference printer. * diagnostic-manager.cc (diagnostic_manager::prune_interproc_events): Likewise. (diagnostic_manager::prune_system_headers): Likewise. gcc/ChangeLog: * diagnostic-path.cc (diagnostic_event::get_desc): Add param "ref_pp" and use instead of global_dc. (class path_label): Likewise, adding field m_ref_pp. (event_range::event_range): Add param "ref_pp" and pass to m_path_label. (path_summary::path_summary): Add param "ref_pp" and pass to event_range ctor. (diagnostic_text_output_format::print_path): Pass *pp to path_summary ctor. (selftest::test_empty_path): Pass *event_pp to pass_summary ctor. (selftest::test_intraprocedural_path): Likewise. (selftest::test_interprocedural_path_1): Likewise. (selftest::test_interprocedural_path_2): Likewise. (selftest::test_recursion): Likewise. (selftest::test_control_flow_1): Likewise. (selftest::test_control_flow_2): Likewise. (selftest::test_control_flow_3): Likewise. (selftest::assert_cfg_edge_path_streq): Likewise. (selftest::test_control_flow_5): Likewise. (selftest::test_control_flow_6): Likewise. * diagnostic-path.h (diagnostic_event::get_desc): Add param "ref_pp". * lazy-diagnostic-path.cc (selftest::test_intraprocedural_path): Pass *event_pp to get_desc. * simple-diagnostic-path.cc (selftest::test_intraprocedural_path): Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-11-13Match: Fold pow calls to ldexp when possible [PR57492]Soumya AR3-0/+101
This patch transforms the following POW calls to equivalent LDEXP calls, as discussed in PR57492: powi (powof2, i) -> ldexp (1.0, i * log2 (powof2)) powof2 * ldexp (x, i) -> ldexp (x, i + log2 (powof2)) a * ldexp(1., i) -> ldexp (a, i) This is especially helpful for SVE architectures as LDEXP calls can be implemented using the FSCALE instruction, as seen in the following patch: https://gcc.gnu.org/g:9b2915d95d855333d4d8f66b71a75f653ee0d076 SPEC2017 was run with this patch, while there are no noticeable improvements, there are no non-noise regressions either. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. Signed-off-by: Soumya AR <soumyaa@nvidia.com> gcc/ChangeLog: PR target/57492 * match.pd: Added patterns to fold calls to pow to ldexp and optimize specific ldexp calls. gcc/testsuite/ChangeLog: PR target/57492 * gcc.dg/tree-ssa/ldexp.c: New test. * gcc.dg/tree-ssa/pow-to-ldexp.c: New test.
2024-11-13RISC-V: Add Multi-Versioning Test CasesYangyu Chen9-0/+458
This patch adds test cases for the Function Multi-Versioning (FMV) feature for RISC-V, which reuses the existing test cases from the aarch64 and ported them to RISC-V. Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/testsuite/ChangeLog: * g++.target/riscv/mv-symbols1.C: New test. * g++.target/riscv/mv-symbols2.C: New test. * g++.target/riscv/mv-symbols3.C: New test. * g++.target/riscv/mv-symbols4.C: New test. * g++.target/riscv/mv-symbols5.C: New test. * g++.target/riscv/mvc-symbols1.C: New test. * g++.target/riscv/mvc-symbols2.C: New test. * g++.target/riscv/mvc-symbols3.C: New test. * g++.target/riscv/mvc-symbols4.C: New test.
2024-11-13RISC-V: Implement TARGET_GENERATE_VERSION_DISPATCHER_BODY and ↵Yangyu Chen1-0/+587
TARGET_GET_FUNCTION_VERSIONS_DISPATCHER This patch implements the TARGET_GENERATE_VERSION_DISPATCHER_BODY and TARGET_GET_FUNCTION_VERSIONS_DISPATCHER for RISC-V. This is used to generate the dispatcher function and get the dispatcher function for function multiversioning. This patch copies many codes from commit 0cfde688e213 ("[aarch64] Add function multiversioning support") and modifies them to fit the RISC-V port. A key difference is the data structure of feature bits in RISC-V C-API is a array of unsigned long long, while in AArch64 is not a array. So we need to generate the array reference for each feature bits element in the dispatcher function. Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * config/riscv/riscv.cc (add_condition_to_bb): New function. (dispatch_function_versions): New function. (get_suffixed_assembler_name): New function. (make_resolver_func): New function. (riscv_generate_version_dispatcher_body): New function. (riscv_get_function_versions_dispatcher): New function. (TARGET_GENERATE_VERSION_DISPATCHER_BODY): Implement it. (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): Implement it.
2024-11-13RISC-V: Implement TARGET_MANGLE_DECL_ASSEMBLER_NAMEYangyu Chen1-0/+39
This patch implements the TARGET_MANGLE_DECL_ASSEMBLER_NAME for RISC-V. This is used to add function multiversioning suffixes to the assembler name. Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * config/riscv/riscv.cc (riscv_mangle_decl_assembler_name): New function. (TARGET_MANGLE_DECL_ASSEMBLER_NAME): Define.
2024-11-13RISC-V: Implement TARGET_COMPARE_VERSION_PRIORITY and ↵Yangyu Chen1-0/+127
TARGET_OPTION_FUNCTION_VERSIONS This patch implements TARGET_COMPARE_VERSION_PRIORITY and TARGET_OPTION_FUNCTION_VERSIONS for RISC-V. The TARGET_COMPARE_VERSION_PRIORITY is implemented to compare the priority of two function versions based on the rules defined in the RISC-V C-API Doc PR #85: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85/files#diff-79a93ca266139524b8b642e582ac20999357542001f1f4666fbb62b6fb7a5824R721 If multiple versions have equal priority, we select the function with the most number of feature bits generated by riscv_minimal_hwprobe_feature_bits. When it comes to the same number of feature bits, we diff two versions and select the one with the least significant bit set. Since a feature appears earlier in the feature_bits might be more important to performance. The TARGET_OPTION_FUNCTION_VERSIONS is implemented to check whether the two function versions are the same. This Implementation reuses the code in TARGET_COMPARE_VERSION_PRIORITY and check it returns 0, which means the equal priority. Co-Developed-by: Hank Chang <hank.chang@sifive.com> Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * config/riscv/riscv.cc (parse_features_for_version): New function. (compare_fmv_features): New function. (riscv_compare_version_priority): New function. (riscv_common_function_versions): New function. (TARGET_COMPARE_VERSION_PRIORITY): Implement it. (TARGET_OPTION_FUNCTION_VERSIONS): Implement it.
2024-11-13RISC-V: Implement TARGET_OPTION_VALID_VERSION_ATTRIBUTE_PYangyu Chen4-13/+115
This patch implements the TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P for RISC-V. This hook is used to process attribute ((target_version ("..."))). As it is the first patch which introduces the target_version attribute, we also set TARGET_HAS_FMV_TARGET_ATTRIBUTE to 0 to use "target_version" for function versioning. Co-Developed-by: Hank Chang <hank.chang@sifive.com> Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_process_target_attr): Remove as it is not used. (riscv_option_valid_version_attribute_p): Declare. (riscv_process_target_version_attr): Declare. * config/riscv/riscv-target-attr.cc (riscv_target_attrs): Renamed from riscv_attributes. (riscv_target_version_attrs): New attributes for target_version. (riscv_process_one_target_attr): New arguments to select attrs. (riscv_process_target_attr): Likewise. (riscv_option_valid_attribute_p): Likewise. (riscv_process_target_version_attr): New function. (riscv_option_valid_version_attribute_p): New function. * config/riscv/riscv.cc (TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): Implement it. * config/riscv/riscv.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): Define it to 0 to use "target_version" for function versioning.
2024-11-13RISC-V: Implement riscv_minimal_hwprobe_feature_bitsYangyu Chen4-0/+226
This patch implements the riscv_minimal_hwprobe_feature_bits feature for the RISC-V target. The feature bits are defined in the libgcc/config/riscv/feature_bits.c to provide bitmasks of ISA extensions that defined in RISC-V C-API. Thus, we need a function to generate the feature bits for IFUNC resolver to dispatch between different functions based on the hardware features. The minimal feature bits means to use the earliest extension appeard in the Linux hwprobe to cover the given ISA string. To allow older kernels without some implied extensions probe to run the FMV dispatcher correctly. For example, V implies Zve32x, but Zve32x appears in the Linux kernel since v6.11. If we use isa string directly to generate FMV dispatcher with functions with "arch=+v" extension, since we have V implied the Zve32x, FMV dispatcher will check if the Zve32x extension is supported by the host. If the Linux kernel is older than v6.11, the FMV dispatcher will fail to detect the Zve32x extension even it already implies by the V extension, thus making the FMV dispatcher fail to dispatch the correct function. Thus, we need to generate the minimal feature bits to cover the given ISA string to allow the FMV dispatcher to work correctly on older kernels. Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * common/config/riscv/riscv-common.cc (RISCV_EXT_BITMASK): New macro. (struct riscv_ext_bitmask_table_t): New struct. (riscv_minimal_hwprobe_feature_bits): New function. * common/config/riscv/riscv-ext-bitmask.def: New file. * config/riscv/riscv-subset.h (GCC_RISCV_SUBSET_H): Include riscv-feature-bits.h. (riscv_minimal_hwprobe_feature_bits): Declare the function. * config/riscv/riscv-feature-bits.h: New file.
2024-11-13RISC-V: Implement Priority syntax parser for Function Multi-VersioningYangyu Chen2-0/+27
This patch adds the priority syntax parser to support the Function Multi-Versioning (FMV) feature in RISC-V. This feature allows users to specify the priority of the function version in the attribute syntax. Chnages based on RISC-V C-API PR: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85 Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::handle_priority): New function. (riscv_target_attr_parser::update_settings): Update priority attribute. * config/riscv/riscv.opt: Add TargetVariable riscv_fmv_priority.
2024-11-13Introduce TARGET_CLONES_ATTR_SEPARATOR for RISC-VYangyu Chen7-15/+48
Some architectures may use ',' in the attribute string, but it is not used as the separator for different targets. To avoid conflict, we introduce a new macro TARGET_CLONES_ATTR_SEPARATOR to separate different clones. As an example, according to RISC-V C-API Specification [1], RISC-V allows ',' in the attribute string in the "arch=" option to specify one more ISA extensions in the same target function, which conflict with the default separator to separate different clones. This patch introduces TARGET_CLONES_ATTR_SEPARATOR for RISC-V and choose '#' as the separator, since '#' is not allowed in the target_clones option string. [1] https://github.com/riscv-non-isa/riscv-c-api-doc/blob/c6c5d6d9cf96b342293315a5dff3d25e96ef8191/src/c-api.adoc#__attribute__targetattr-string Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * defaults.h (TARGET_CLONES_ATTR_SEPARATOR): Define new macro. * multiple_target.cc (get_attr_str): Use TARGET_CLONES_ATTR_SEPARATOR to separate attributes. (separate_attrs): Likewise. (expand_target_clones): Likewise. * attribs.cc (attr_strcmp): Likewise. (sorted_attr_string): Likewise. * tree.cc (get_target_clone_attr_len): Likewise. * config/riscv/riscv.h (TARGET_CLONES_ATTR_SEPARATOR): Define TARGET_CLONES_ATTR_SEPARATOR for RISC-V. * doc/tm.texi: Document TARGET_CLONES_ATTR_SEPARATOR. * doc/tm.texi.in: Likewise.
2024-11-13Fortran: Fix failing character pointer fcn assignment [PR105054]Paul Thomas2-0/+100
2024-11-14 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/105054 * resolve.cc (get_temp_from_expr): If the pointer function has a deferred character length, generate a new deferred charlen for the temporary. gcc/testsuite/ PR fortran/105054 * gfortran.dg/ptr_func_assign_6.f08: New test.
2024-11-13c: add Wzero-as-null-pointer-constant [PR117059]Martin Uecker4-4/+148
Add warnings for the use of zero as a null pointer constant to the C FE. PR c/117059 gcc/c-family/ChangeLog: * c.opt (Wzero-as-null-pointer-constant): Enable for C and ObjC. gcc/c/ChangeLog: * c-typeck.cc (parse_build_binary_op): Add warning. (build_conditional_expr): Add warning. (convert_for_assignment): Add warning. gcc/ChangeLog: * doc/invoke.texi (Wzero-as-null-pointer-constant): Adapt description. gcc/testsuite/ChangeLog: * gcc.dg/Wzero-as-null-pointer-constant.c: New test. Suggested-by: Alejandro Colomar <alx@kernel.org> Acked-by: Alejandro Colomar <alx@kernel.org> Reviewed-by: Joseph Myers <josmyers@redhat.com>
2024-11-13c: Handle C23 floating constant {d,D}{32,64,128} suffixes like {df,dd,dl}Jakub Jelinek6-5/+99
C23 roughly says that {d,D}{32,64,128} floating point constant suffixes are alternate spellings of {df,dd,dl} suffixes in annex H. So, the following patch allows that alternate spelling. Or is it intentional it isn't enabled and we need to do everything in there first before trying to define __STDC_IEC_60559_DFP__? Like add support for _Decimal32x and _Decimal64x types (including the d32x and d64x suffixes) etc. 2024-11-13 Jakub Jelinek <jakub@redhat.com> libcpp/ * expr.cc (interpret_float_suffix): Handle d32 and D32 suffixes for C like df, d64 and D64 like dd and d128 and D128 like dl. gcc/c-family/ * c-lex.cc (interpret_float): Subtract 3 or 4 from copylen rather than 2 if last character of CPP_N_DFLOAT is a digit. gcc/testsuite/ * gcc.dg/dfp/c11-constants-3.c: New test. * gcc.dg/dfp/c11-constants-4.c: New test. * gcc.dg/dfp/c23-constants-3.c: New test. * gcc.dg/dfp/c23-constants-4.c: New test.
2024-11-13c: Implement C2Y N3298 - Introduce complex literals [PR117029]Jakub Jelinek26-35/+594
The following patch implements the C2Y N3298 paper Introduce complex literals by providing different (or no) diagnostics on imaginary constants (except for integer ones). For _DecimalN constants we don't support _Complex _DecimalN and error on any i/j suffixes mixed with DD/DL/DF, so nothing changed there. 2024-11-13 Jakub Jelinek <jakub@redhat.com> PR c/117029 libcpp/ * include/cpplib.h (struct cpp_options): Add imaginary_constants member. * init.cc (struct lang_flags): Add imaginary_constants bitfield. (lang_defaults): Add column for imaginary_constants. (cpp_set_lang): Copy over imaginary_constants. * expr.cc (cpp_classify_number): Diagnose CPP_N_IMAGINARY non-CPP_N_FLOATING constants differently for C. gcc/testsuite/ * gcc.dg/cpp/pr7263-3.c: Adjust expected diagnostic wording. * gcc.dg/c23-imaginary-constants-1.c: New test. * gcc.dg/c23-imaginary-constants-2.c: New test. * gcc.dg/c23-imaginary-constants-3.c: New test. * gcc.dg/c23-imaginary-constants-4.c: New test. * gcc.dg/c23-imaginary-constants-5.c: New test. * gcc.dg/c23-imaginary-constants-6.c: New test. * gcc.dg/c23-imaginary-constants-7.c: New test. * gcc.dg/c23-imaginary-constants-8.c: New test. * gcc.dg/c23-imaginary-constants-9.c: New test. * gcc.dg/c23-imaginary-constants-10.c: New test. * gcc.dg/c2y-imaginary-constants-1.c: New test. * gcc.dg/c2y-imaginary-constants-2.c: New test. * gcc.dg/c2y-imaginary-constants-3.c: New test. * gcc.dg/c2y-imaginary-constants-4.c: New test. * gcc.dg/c2y-imaginary-constants-5.c: New test. * gcc.dg/c2y-imaginary-constants-6.c: New test. * gcc.dg/c2y-imaginary-constants-7.c: New test. * gcc.dg/c2y-imaginary-constants-8.c: New test. * gcc.dg/c2y-imaginary-constants-9.c: New test. * gcc.dg/c2y-imaginary-constants-10.c: New test. * gcc.dg/c2y-imaginary-constants-11.c: New test. * gcc.dg/c2y-imaginary-constants-12.c: New test.
2024-11-13aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]Soumya AR4-7/+72
This patch uses the FSCALE instruction provided by SVE to implement the standard ldexp family of functions. Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the following code: float test_ldexpf (float x, int i) { return __builtin_ldexpf (x, i); } double test_ldexp (double x, int i) { return __builtin_ldexp(x, i); } GCC Output: test_ldexpf: b ldexpf test_ldexp: b ldexp Since SVE has support for an FSCALE instruction, we can use this to process scalar floats by moving them to a vector register and performing an fscale call, similar to how LLVM tackles an ldexp builtin as well. New Output: test_ldexpf: fmov s31, w0 ptrue p7.b, vl4 fscale z0.s, p7/m, z0.s, z31.s ret test_ldexp: sxtw x0, w0 ptrue p7.b, vl8 fmov d31, x0 fscale z0.d, p7/m, z0.d, z31.d ret This is a revision of an earlier patch, and now uses the extended definition of aarch64_ptrue_reg to generate predicate registers with the appropriate set bits. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Soumya AR <soumyaa@nvidia.com> gcc/ChangeLog: PR target/111733 * config/aarch64/aarch64-sve.md (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar floating modes and expand to the existing pattern for FSCALE. * config/aarch64/iterators.md: (SVE_FULL_F_SCALAR): Added an iterator to match all FP SVE modes as well as their scalar equivalents. (VPRED): Extended the attribute to handle GPF_HF modes. * internal-fn.def (LDEXP): Changed macro to incorporate ldexpf16. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/fscale.c: New test.
2024-11-13RISC-V: Bugfix for max_sew_overlap_and_next_ratio_valid_for_prev_sew_p[pr117483]xuli2-2/+29
This patch fixs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117483 If prev and next satisfy the following rules, we should forbid the case (next.get_sew() < prev.get_sew() && (!next.get_ta() || !next.get_ma())) in the compatible function max_sew_overlap_and_next_ratio_valid_for_prev_sew_p. Otherwise, the tail elements of next will be polluted. DEF_SEW_LMUL_RULE (ge_sew, ratio_and_ge_sew, ratio_and_ge_sew, max_sew_overlap_and_next_ratio_valid_for_prev_sew_p, always_false, use_max_sew_and_lmul_with_next_ratio) Passed the rv64gcv full regression test. Signed-off-by: Li Xu <xuli1@eswincomputing.com> PR target/117483 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc: Fix bug. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr117483.c: New test.
2024-11-12[RISC-V] Fix costing of LO_SUM expressionsXianmiao Qu1-1/+2
This is a rewrite of a patch originally from Xianmiao Qu. Xianmiao noticed that the costs we compute for LO_SUM expressions was incorrect. Essentially we costed based solely on the first input to the LO_SUM. In a LO_SUM, the first input is almost always going to be a REG and thus isn't interesting. The second argument is almost always going to be some kind of symbolic operand, which is much more interesting from a costing standpoint. The right way to fix this is to sum the cost of the two operands. I've verified this produces the same code as Xianmiao's Qu's original patch. This has been tested on rv32 and rv64 in my tester. It missed today's bootstrap of riscv64 though :( Naturally I'll wait on the pre-commit CI tester to render a verdict, but I don't expect any problems. -- From Xianmiao Qu's original submission -- Currently, the cost of the LO_SUM expression is based on the cost of calculating the first subexpression. When the first subexpression is a register, the cost result will be zero. It seems a bit unreasonable for a SET expression to have a zero cost when its source is LO_SUM. Moreover, having a cost of zero for the expression will lead the loop invariant pass to calculate its benefits of being moved outside the loop as zero, thus preventing the out-of-loop placement of the loop invariant. As an example, consider the following test case: long a; long b[]; long *c; foo () { for (;;) *c = b[a]; } When compiling with -march=rv64gc -mabi=lp64d -Os, the following code is generated: .cfi_startproc lui a5,%hi(c) ld a4,%lo(c)(a5) lui a2,%hi(b) lui a1,%hi(a) .L2: ld a5,%lo(a)(a1) addi a3,a2,%lo(b) slli a5,a5,3 add a5,a5,a3 ld a5,0(a5) sd a5,0(a4) j .L2 After adjust the cost of the LO_SUM expression, the instruction addi will be moved outside the loop: .cfi_startproc lui a5,%hi(c) ld a3,%lo(c)(a5) lui a4,%hi(b) lui a2,%hi(a) addi a4,a4,%lo(b) .L2: ld a5,%lo(a)(a2) slli a5,a5,3 add a5,a5,a4 ld a5,0(a5) sd a5,0(a3) j .L2 gcc/ * config/riscv/riscv.cc (riscv_rtx_costs): Correct costing of LO_SUM expressions. Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2024-11-12Reapply "[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]"Jeff Law2-0/+36
This reverts commit de3b277247ce98d189f121155b75f490725a42f6.
2024-11-13i386: Zero extend 32-bit address to 64-bit with option -mx32 ↵Hu, Lin12-0/+36
-maddress-mode=long. [PR 117418] -maddress-mode=long let Pmode = DI_mode, so zero extend 32-bit address to 64-bit and uses a 64-bit register as a pointer for avoid raise an ICE. gcc/ChangeLog: PR target/117418 * config/i386/i386-expand.cc (ix86_expand_builtin): Convert pointer's mode according to Pmode. gcc/testsuite/ChangeLog: PR target/117418 * gcc.target/i386/pr117418-1.c: New test.
2024-11-13Daily bump.GCC Administrator5-1/+1075
2024-11-12Revert "[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]"Jeff Law2-36/+0
This reverts commit 69bd93c167fefbdff0cb88614275358b7a2b2941.
2024-11-12RISC-V: Fix target-attr-norelax.c testcaseYangyu Chen1-3/+4
The target-attr-norelax.c testcase was failing due to the redundant "\t" check in the assembly output, and forgot to skip the check for lto build in the testcase. gcc/testsuite/ChangeLog: * gcc.target/riscv/target-attr-norelax.c: Fix testcase.