aboutsummaryrefslogtreecommitdiff
path: root/gcc/doc
AgeCommit message (Collapse)AuthorFilesLines
2024-12-11gimple: Add limit after which slower switchlower algs are used [PR117091] ↵Filip Kastl1-0/+3
[PR117352] This patch adds a limit on the number of cases of a switch. When this limit is exceeded, switch lowering decides to use faster but less powerful algorithms. In particular this means that for finding bit tests switch lowering decides between the old dynamic programming O(n^2) algorithm and the new greedy algorithm that Andi Kleen recently added but then reverted due to PR117352. It also means that switch lowering may bail out on finding jump tables if the switch is too large (Btw it also may not bail! It can happen that the greedy algorithms finds some bit tests which then basically split the switch into multiple smaller switches and those may be small enough to fit under the limit.) The limit is implemented as --param switch-lower-slow-alg-max-cases. Exceeding the limit is reported through -Wdisabled-optimization. This patch fixes the issue with the greedy algorithm described in PR117352. The problem was incorrect usage of the is_beneficial() heuristic. gcc/ChangeLog: PR middle-end/117091 PR middle-end/117352 * doc/invoke.texi: Add switch-lower-slow-alg-max-cases. * params.opt: Add switch-lower-slow-alg-max-cases. * tree-switch-conversion.cc (jump_table_cluster::find_jump_tables): Note in a comment that we are looking for jump tables in case sequences delimited by the already found bit tests. (bit_test_cluster::find_bit_tests): Decide between find_bit_tests_fast() and find_bit_tests_slow(). (bit_test_cluster::find_bit_tests_fast): New function. (bit_test_cluster::find_bit_tests_slow): New function. (switch_decision_tree::analyze_switch_statement): Report exceeding the limit. * tree-switch-conversion.h: Add find_bit_tests_fast() and find_bit_tests_slow(). Co-Authored-By: Andi Kleen <ak@gcc.gnu.org> Signed-off-by: Filip Kastl <fkastl@suse.cz>
2024-12-11middle-end: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE [PR96342]Andre Vieira1-4/+4
This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the target can reject a simd_clone based on the vector mode it is using. This is needed because for VLS SVE vectorization the vectorizer accepts Advanced SIMD simd clones when vectorizing using SVE types because the simdlens might match. This will cause type errors later on. Other targets do not currently need to use this argument. gcc/ChangeLog: PR target/96342 * target.def (TARGET_SIMD_CLONE_USABLE): Add argument. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass stmt_info to call TARGET_SIMD_CLONE_USABLE. * config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument and use it to reject the use of SVE simd clones with Advanced SIMD modes. * config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument. * config/i386/i386.cc (ix86_simd_clone_usable): Likewise. * doc/tm.texi: Regenerate Co-authored-by: Victor Do Nascimento <victor.donascimento@arm.com> Co-authored-by: Tamar Christina <tamar.christina@arm.com>
2024-12-10Remove vcond{,u,eq} optabsRichard Sandiford1-26/+20
This patch removes the remaining traces of the vcond{,u,eq} optabs. Earlier patches removed the target-independent uses and I couldn't find any direct references to either the *_optabs or the ifns in target-specific code. gcc/ * doc/md.texi (vcond@var{m}@var{n}, vcondu@var{m}@var{n}) (vcondeq@var{m}@var{n}): Delete. (vcond_mask_@var{m}@var{n}): Redocument in standalone form. * internal-fn.def (VCOND, VCONDU, VCONDEQ): Delete. * internal-fn.cc (expand_vec_cond_optab_fn): Delete. * optabs.def (vcond_optab, vcondu_optab, vcondeq_optab): Delete.
2024-12-09c++: Allow overloaded builtins to be used in SFINAE contextMatthew Malcomson1-3/+8
This commit newly introduces the ability to use overloaded builtins in C++ SFINAE context. The goal behind this is in order to ensure there is a single mechanism that libstdc++ can use to determine whether a given type can be used in the atomic fetch_add (and similar) builtins. I am working on another patch that hopes to use this mechanism to identify whether fetch_add (and similar) work on floating point types. Current state of the world: GCC currently exposes resolved versions of these builtins to the user, so for GCC it's currently possible to use tests similar to the below to check for atomic loads on a 2 byte sized object. #if __has_builtin(__atomic_load_2) Clang does not expose resolved versions of the atomic builtins. clang currently allows SFINAE on builtins, so that C++ code can check whether a builtin is available on a given type. GCC does not (and that is what this patch aims to change). C libraries like libatomic can check whether a given atomic builtin can work on a given type by using autoconf to check for a miscompilation when attempting such a use. My goal: I would like to enable floating point fetch_add (and similar) in GCC, in order to use those overloads in libstdc++ implementation of atomic<float>::fetch_add. This should allow compilers targeting GPU's which have floating point fetch_add instructions to emit optimal code. In order to do that I need some consistent mechanism that libstdc++ can use to identify whether the fetch_add builtins have floating point overloads (and for which types these exist). I would hence like to enable SFINAE on builtins, so that libstdc++ can use that mechanism for the floating point fetch_add builtins. Implementation follows the existing mechanism for handling SFINAE contexts in c-common.cc. A boolean is passed into the c-common.cc function indicating whether these functions should emit errors or not. This boolean comes from `complain & tf_error` in the C++ frontend. (Similar to other functions like valid_array_size_p and c_build_vec_perm_expr). This is done both for resolve_overloaded_builtin and check_builtin_function_arguments, both of which can be used in SFINAE contexts. I attempted to trigger something using the `reject_gcc_builtin` function in an SFINAE context. Given the context where this function is called from the C++ frontend it looks like it may be possible, but I did not manage to trigger this in template context by attempting to do something similar to the testcases added around those calls. - I would appreciate any feedback on whether this is something that can happen in a template context, and if so some help writing a relevant testcase for it. Both of these functions have target hooks for target specific builtins that I have updated to take the extra boolean flag. I have not adjusted the functions implementing those target hooks (except to update the declarations) so target specific builtins will still error in SFINAE contexts. - I could imagine not updating the target hook definition since nothing would use that change. However I figure that allowing targets to decide this behaviour would be the right thing to do eventually, and since this is the target-independent part of the change to do that this patch should make that change. Could adjust if others disagree. Other relevant points that I'd appreciate reviewers check: - I did not pass this new flag through atomic_bitint_fetch_using_cas_loop since the _BitInt type is not available in the C++ frontend and I didn't want if conditions that can not be executed in the source. - I only test non-compile-time-constant types with SVE types, since I do not know of a way to get a VLA into a SFINAE context. - While writing tests I noticed a few differences with clang in this area. I don't think they are problematic but am mentioning them for completeness and to allow others to judge if these are a problem). - atomic_fetch_add on a boolean is allowed by clang. - When __atomic_load is passed an invalid memory model (i.e. too large), we give an SFINAE failure while clang does not. Bootstrap and regression tested on AArch64 and x86_64. Built first stage on targets whose target hook declaration needed updated (though did not regtest etc). Targets triplets I built in order to check the backend specific changes I made: - arm-none-linux-gnueabihf - avr-linux-gnu - riscv-linux-gnu - powerpc-linux-gnu - s390x-linux-gnu Ok for commit to trunk? gcc/c-family/ChangeLog: * c-common.cc (builtin_function_validate_nargs, check_builtin_function_arguments, speculation_safe_value_resolve_call, speculation_safe_value_resolve_params, sync_resolve_size, sync_resolve_params, get_atomic_generic_size, resolve_overloaded_atomic_exchange, resolve_overloaded_atomic_compare_exchange, resolve_overloaded_atomic_load, resolve_overloaded_atomic_store, resolve_overloaded_builtin): Add `complain` boolean parameter and determine whether to emit errors based on its value. * c-common.h (check_builtin_function_arguments, resolve_overloaded_builtin): Mention `complain` boolean parameter in declarations. Give it a default of `true`. gcc/ChangeLog: * config/aarch64/aarch64-c.cc (aarch64_resolve_overloaded_builtin,aarch64_check_builtin_call): Add new unused boolean parameter to match target hook definition. * config/arm/arm-builtins.cc (arm_check_builtin_call): Likewise. * config/arm/arm-c.cc (arm_resolve_overloaded_builtin): Likewise. * config/arm/arm-protos.h (arm_check_builtin_call): Likewise. * config/avr/avr-c.cc (avr_resolve_overloaded_builtin): Likewise. * config/riscv/riscv-c.cc (riscv_check_builtin_call, riscv_resolve_overloaded_builtin): Likewise. * config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin): Likewise. * config/rs6000/rs6000-protos.h (altivec_resolve_overloaded_builtin): Likewise. * config/s390/s390-c.cc (s390_resolve_overloaded_builtin): Likewise. * doc/tm.texi: Regenerate. * target.def (TARGET_RESOLVE_OVERLOADED_BUILTIN, TARGET_CHECK_BUILTIN_CALL): Update prototype to include a boolean parameter that indicates whether errors should be emitted. Update documentation to mention this fact. gcc/cp/ChangeLog: * call.cc (build_cxx_call): Pass `complain` parameter to check_builtin_function_arguments. Take its value from the `tsubst_flags_t` type `complain & tf_error`. * semantics.cc (finish_call_expr): Pass `complain` parameter to resolve_overloaded_builtin. Take its value from the `tsubst_flags_t` type `complain & tf_error`. gcc/testsuite/ChangeLog: * g++.dg/template/builtin-atomic-overloads.def: New test. * g++.dg/template/builtin-atomic-overloads1.C: New test. * g++.dg/template/builtin-atomic-overloads2.C: New test. * g++.dg/template/builtin-atomic-overloads3.C: New test. * g++.dg/template/builtin-atomic-overloads4.C: New test. * g++.dg/template/builtin-atomic-overloads5.C: New test. * g++.dg/template/builtin-atomic-overloads6.C: New test. * g++.dg/template/builtin-atomic-overloads7.C: New test. * g++.dg/template/builtin-atomic-overloads8.C: New test. * g++.dg/template/builtin-sfinae-check-function-arguments.C: New test. * g++.dg/template/builtin-speculation-overloads.def: New test. * g++.dg/template/builtin-speculation-overloads1.C: New test. * g++.dg/template/builtin-speculation-overloads2.C: New test. * g++.dg/template/builtin-speculation-overloads3.C: New test. * g++.dg/template/builtin-speculation-overloads4.C: New test. * g++.dg/template/builtin-speculation-overloads5.C: New test. * g++.dg/template/builtin-validate-nargs.C: New test. Signed-off-by: Matthew Malcomson <mmalcomson@nvidia.com>
2024-12-09docs: Clarify -fsanitize=hwaddress target support [PR117960]Jakub Jelinek1-2/+4
Since GCC 13 -fsanitize=hwaddress is not supported just on AArch64, but also on x86_64 (but only with -mlam=u48 or -mlam=u57). 2024-12-09 Jakub Jelinek <jakub@redhat.com> PR sanitizer/117960 * doc/invoke.texi (fsanitize=hwaddress): Clarify on which targets it is supported.
2024-12-09nvptx: Switch default from '-march=sm_30' to '-march=sm_52'Thomas Schwinge1-2/+2
In preparation of GCC/nvptx code changes that require sm_52 features, this commit raises nvptx code generation from sm_30 "Kepler" to sm_52 "Maxwell". The latter has been supported as of CUDA 6.5 (2014-08), and is thus supported by most Nvidia GPUs of the last decade, approximately. (This commit doesn't change the use of PTX ISA 6.0, which already requires CUDA 9.0 anyway.) To continue building sm_30 multilib variants (for use via building/linking with '-march=sm_30'), specify '--with-multilib-list=default,sm_30', for example. Or, to continue defaulting to sm_30 multilib variants, specify '--with-arch=sm_30' (plus '--without-multilib-list', if applicable). See the documentation, <https://gcc.gnu.org/install/specific.html#nvptx-x-none>. (Note that after a long deprecation time, eventually the sm_3x "Kepler architecture support is removed from CUDA 12.0", 2022-12.) gcc/ * config.gcc [nvptx-*]: Switch default from '-march=sm_30' to '-march=sm_52'. * doc/install.texi (Nvidia PTX Options): Update.
2024-12-08Support for 64-bit location_t: Activate 64-bit location_tLewis Hyatt1-16/+1
Change location_t to be a 64-bit integer instead of a 32-bit integer in libcpp. Also included in this change are the two other patches in the original series which depended on this one; I am committing them all at once in case it needs to be reverted later: -Support for 64-bit location_t: gimple parts The size of struct gimple increased by 8 bytes with the change in size of location_t from 32- to 64-bit; adjust the WORD markings in the comments accordingly. It seems that most of the WORD markings were off by one already, probably not having been updated after a previous reduction in the size of a gimple, so they have become retroactively correct again, and only a couple needed adjustment actually. Also add a comment that there is now 32 bits of unused padding available in struct gimple for 64-bit hosts. -Support for 64-bit location_t: Remove -flarge-source-files The option -flarge-source-files became unnecessary with 64-bit location_t and harms performance compared to the new default setting, so silently ignore it. libcpp/ChangeLog: * include/cpplib.h (struct cpp_token): Adjust comment about the struct size. * include/line-map.h (location_t): Change typedef from 32-bit to 64-bit integer. (LINE_MAP_MAX_COLUMN_NUMBER): Increase size to be appropriate for 64-bit location_t. (LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES): Likewise. (LINE_MAP_MAX_LOCATION_WITH_COLS): Likewise. (LINE_MAP_MAX_LOCATION): Likewise. (MAX_LOCATION_T): Likewise. (line_map_suggested_range_bits): Likewise. (struct line_map): Adjust comment about the struct size. (struct line_map_macro): Likewise. (struct line_map_ordinary): Likewise. Rearrange fields to optimize padding. gcc/testsuite/ChangeLog: * g++.dg/diagnostic/pr77949.C: Adapt the test for 64-bit location_t, when the previously expected failure doesn't actually happen. * g++.dg/modules/loc-prune-4.C: Adjust the expected output for the 64-bit location_t case. * gcc.dg/plugin/expensive_selftests_plugin.cc: Don't try to test the maximum supported column number in 64-bit location_t mode. * gcc.dg/plugin/location_overflow_plugin.cc: Adjust the base_location so it can effectively test 64-bit location_t. gcc/ChangeLog: * gimple.h (struct gphi): Update word marking comments to reflect the new size of location_t. (struct gimple): Likewise. Add a comment about padding. * common.opt: Mark -flarge-source-files as Ignored. * common.opt.urls: Regenerate. * doc/invoke.texi: Remove -flarge-source-files. * toplev.cc (process_options): Remove support for -flarge-source-files.
2024-12-06diagnostics: UX: add doc URLs for attributes (v2)David Malcolm3-2/+9
This is v2 of the patch; v1 was here: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655541.html Changed in v2: * added a new TARGET_DOCUMENTATION_NAME hook for figuring out which documentation URL to use when there are multiple per-target docs, such as for __attribute__((interrupt)); implemented this for all targets that have target-specific attributes * moved attribute_urlifier and its support code to a new gcc-attribute-urlifier.cc since it needs to use targetm for the above; gcc-urlifier.o is used by the driver. * fixed extend.texi so that some attributes that failed to appear in attr-urls.def now do so (affected nvptx "kernel" and "shared" attrs) * regenerated attr-urls.def for the above fix, and bringing in attributes added since v1 of the patch In r14-5118-gc5db4d8ba5f3de I added a mechanism to automatically add documentation URLs to quoted strings in diagnostics. In r14-6920-g9e49746da303b8 I added a mechanism to generate URLs for mentions of command-line options in quoted strings in diagnostics. This patch does a similar thing for attributes. It adds a new Python 3 script to scrape the generated HTML looking for documentation of attributes, and uses this to (re)generate a new gcc/attr-urls.def file. Running "make regenerate-attr-urls" after rebuilding the HTML docs will regenerate gcc/attr-urls.def in the source directory. The patch uses this to optionally add doc URLs for attributes in any diagnostic emitted during the lifetime of a auto_urlify_attributes instance, and adds such instances everywhere that a diagnostic refers to a diagnostic within quotes (based on grepping the source tree for references to attributes in strings and in code). For example, given: $ ./xgcc -B. -S ../../src/gcc/testsuite/gcc.dg/attr-access-2.c ../../src/gcc/testsuite/gcc.dg/attr-access-2.c:14:16: warning: attribute ‘access(read_write, 2, 3)’ positional argument 2 conflicts with previous designation by argument 1 [-Wattributes] with this patch the quoted text `access(read_write, 2, 3)' automatically gains the URL for our docs for "access": https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-access-function-attribute in a sufficiently modern terminal. Like r14-6920-g9e49746da303b8 this avoids the Makefile target depending on the generated HTML, since a missing URL is a minor problem, whereas requiring all users to build HTML docs seems more involved. Doing so also avoids Python 3 as a build requirement for everyone, but instead just for developers addding attributes. Like the options, we could add a CI test for this. The patch gathers both general and target-specific attributes. For example, the function attribute "interrupt" has 19 URLs within our docs: one common, and 18 target-specific ones. The patch adds a new target hook used when selecting the most appropriate one. Signed-off-by: David Malcolm <dmalcolm@redhat.com> gcc/ChangeLog: * Makefile.in (OBJS): Add -attribute-urlifier.o. (ATTR_URLS_HTML_DEPS): New. (regenerate-attr-urls): New. (regenerate-attr-urls-unit-test): New. * attr-urls.def: New file. * attribs.cc: Include "gcc-urlifier.h". (decl_attributes): Use auto_urlify_attributes. * config/aarch64/aarch64.cc (TARGET_DOCUMENTATION_NAME): New. * config/arc/arc.cc (TARGET_DOCUMENTATION_NAME): New. * config/arm/arm.cc (TARGET_DOCUMENTATION_NAME): New. * config/bfin/bfin.cc (TARGET_DOCUMENTATION_NAME): New. * config/bpf/bpf.cc (TARGET_DOCUMENTATION_NAME): New. * config/epiphany/epiphany.cc (TARGET_DOCUMENTATION_NAME): New. * config/gcn/gcn.cc (TARGET_DOCUMENTATION_NAME): New. * config/h8300/h8300.cc (TARGET_DOCUMENTATION_NAME): New. * config/i386/i386.cc (TARGET_DOCUMENTATION_NAME): New. * config/ia64/ia64.cc (TARGET_DOCUMENTATION_NAME): New. * config/m32c/m32c.cc (TARGET_DOCUMENTATION_NAME): New. * config/m32r/m32r.cc (TARGET_DOCUMENTATION_NAME): New. * config/m68k/m68k.cc (TARGET_DOCUMENTATION_NAME): New. * config/mcore/mcore.cc (TARGET_DOCUMENTATION_NAME): New. * config/microblaze/microblaze.cc (TARGET_DOCUMENTATION_NAME): New. * config/mips/mips.cc (TARGET_DOCUMENTATION_NAME): New. * config/msp430/msp430.cc (TARGET_DOCUMENTATION_NAME): New. * config/nds32/nds32.cc (TARGET_DOCUMENTATION_NAME): New. * config/nvptx/nvptx.cc (TARGET_DOCUMENTATION_NAME): New. * config/riscv/riscv.cc (TARGET_DOCUMENTATION_NAME): New. * config/rl78/rl78.cc (TARGET_DOCUMENTATION_NAME): New. * config/rs6000/rs6000.cc (TARGET_DOCUMENTATION_NAME): New. * config/rx/rx.cc (TARGET_DOCUMENTATION_NAME): New. * config/s390/s390.cc (TARGET_DOCUMENTATION_NAME): New. * config/sh/sh.cc (TARGET_DOCUMENTATION_NAME): New. * config/stormy16/stormy16.cc (TARGET_DOCUMENTATION_NAME): New. * config/v850/v850.cc (TARGET_DOCUMENTATION_NAME): New. * config/visium/visium.cc (TARGET_DOCUMENTATION_NAME): New. gcc/analyzer/ChangeLog: * region-model.cc: Include "gcc-urlifier.h". (reason_attr_access::emit): Use auto_urlify_attributes. * sm-taint.cc: Include "gcc-urlifier.h". (tainted_access_attrib_size::emit): Use auto_urlify_attributes. gcc/c-family/ChangeLog: * c-attribs.cc: Include "gcc-urlifier.h". (positional_argument): Use auto_urlify_attributes. * c-common.cc: Include "gcc-urlifier.h". (parse_optimize_options): Use auto_urlify_attributes with OPT_Wattributes. (attribute_fallthrough_p): Use auto_urlify_attributes. * c-warn.cc: Include "gcc-urlifier.h". (diagnose_mismatched_attributes): Use auto_urlify_attributes. gcc/c/ChangeLog: * c-decl.cc: Include "gcc-urlifier.h". (start_decl): Use auto_urlify_attributes with OPT_Wattributes. (start_function): Likewise. * c-parser.cc: Include "gcc-urlifier.h". (c_parser_statement_after_labels): Use auto_urlify_attributes with OPT_Wattributes. * c-typeck.cc: Include "gcc-urlifier.h". (maybe_warn_nodiscard): Use auto_urlify_attributes with OPT_Wunused_result. gcc/cp/ChangeLog: * cp-gimplify.cc: Include "gcc-urlifier.h". (process_stmt_hotness_attribute): Use auto_urlify_attributes with OPT_Wattributes. * cvt.cc: Include "gcc-urlifier.h". (maybe_warn_nodiscard): Use auto_urlify_attributes with OPT_Wunused_result. * decl.cc: Include "gcc-urlifier.h". (start_decl): Use auto_urlify_attributes. (start_preparsed_function): Likewise. gcc/ChangeLog: * diagnostic.cc (diagnostic_context::override_urlifier): New. * diagnostic.h (diagnostic_context::override_urlifier): New decl. * doc/extend.texi (Nvidia PTX Function Attributes): Update @cindex to specify that "kernel" is a function attribute and "shared" is a variable attribute, so that these entries are recognized by the regex in regenerate-attr-urls.py. * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGET_DOCUMENTATION_NAME): New. * gcc-attribute-urlifier.cc: New file. * gcc-urlifier.cc: Include diagnostic.h. (gcc_urlifier::make_doc): Convert to... (make_doc_url): ...this. (auto_override_urlifier::auto_override_urlifier): New. (auto_override_urlifier::~auto_override_urlifier): New. (selftest::gcc_urlifier_cc_tests): Split out body into... (selftest::test_gcc_urlifier): ...this. * gcc-urlifier.h: Include "pretty-print-urlifier.h" and "label-text.h". (make_doc_url): New decl. (class auto_override_urlifier): New. (class attribute_urlifier): New. (class auto_urlify_attributes): New. * gimple-ssa-warn-access.cc: Include "gcc-urlifier.h". (pass_waccess::execute): Use auto_urlify_attributes. * gimplify.cc: Include "gcc-urlifier.h". (expand_FALLTHROUGH): Use auto_urlify_attributes. * internal-fn.cc: Define INCLUDE_MEMORY and include "gcc-urlifier.h. (expand_FALLTHROUGH): Use auto_urlify_attributes. * ipa-pure-const.cc: Include "gcc-urlifier.h. (suggest_attribute): Use auto_urlify_attributes. * ipa-strub.cc: Include "gcc-urlifier.h. (can_strub_p): Use auto_urlify_attributes. * regenerate-attr-urls.py: New file. * selftest-run-tests.cc (selftest::run_tests): Call gcc_attribute_urlifier_cc_tests. * selftest.h (selftest::gcc_attribute_urlifier_cc_tests): New decl. * target.def (documentation_name): New DEFHOOKPOD. * tree-cfg.cc: Include "gcc-urlifier.h. (do_warn_unused_result): Use auto_urlify_attributes. * tree-ssa-uninit.cc: Include "gcc-urlifier.h. (maybe_warn_read_write_only): Use auto_urlify_attributes. (maybe_warn_pass_by_reference): Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-12-06nvptx: Support '-march=sm_89'Thomas Schwinge1-1/+1
gcc/ * config/nvptx/nvptx-sm.def: Add '89'. * config/nvptx/nvptx-gen.h: Regenerate. * config/nvptx/nvptx-gen.opt: Likewise. * config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust. * config/nvptx/nvptx.opt (-march-map=sm_89, -march-map=sm_90) (march-map=sm_90a): Likewise. * config.gcc: Likewise. * doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_89'. * config/nvptx/gen-multilib-matches-tests: Extend. gcc/testsuite/ * gcc.target/nvptx/march-map=sm_89.c: Adjust. * gcc.target/nvptx/march-map=sm_90.c: Likewise. * gcc.target/nvptx/march-map=sm_90a.c: Likewise. * gcc.target/nvptx/march=sm_89.c: New. libgomp/ * testsuite/libgomp.c/declare-variant-3-sm89.c: New. * testsuite/libgomp.c/declare-variant-3.h: Adjust.
2024-12-06nvptx: Support '-mptx=7.8'Thomas Schwinge1-1/+1
gcc/ * config/nvptx/nvptx-opts.h (enum ptx_version): Add 'PTX_VERSION_7_8'. * config/nvptx/nvptx.cc (ptx_version_to_string) (ptx_version_to_number): Adjust. * config/nvptx/nvptx.h (TARGET_PTX_7_8): New. * config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue' '7.8' for 'PTX_VERSION_7_8'. * doc/invoke.texi (Nvidia PTX Options): Document '-mptx=7.8'. gcc/testsuite/ * gcc.target/nvptx/mptx=7.8.c: New.
2024-12-06nvptx: Support '-march=sm_52'Thomas Schwinge1-1/+1
gcc/ * config/nvptx/nvptx-sm.def: Add '52'. * config/nvptx/nvptx-gen.h: Regenerate. * config/nvptx/nvptx-gen.opt: Likewise. * config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust. * config/nvptx/nvptx.opt (-march-map=sm_52): Likewise. * config.gcc: Likewise. * doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_52'. * config/nvptx/gen-multilib-matches-tests: Extend. gcc/testsuite/ * gcc.target/nvptx/march-map=sm_52.c: Adjust. * gcc.target/nvptx/march=sm_52.c: New. libgomp/ * testsuite/libgomp.c/declare-variant-3-sm52.c: New. * testsuite/libgomp.c/declare-variant-3.h: Adjust.
2024-12-06nvptx: Support '-march=sm_37'Thomas Schwinge1-1/+1
gcc/ * config/nvptx/nvptx-sm.def: Add '37'. * config/nvptx/nvptx-gen.h: Regenerate. * config/nvptx/nvptx-gen.opt: Likewise. * config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust. * config/nvptx/nvptx.opt (-march-map=sm_37, -march-map=sm_50): Likewise. * config.gcc: Likewise. * doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_37'. * config/nvptx/gen-multilib-matches-tests: Extend. gcc/testsuite/ * gcc.target/nvptx/march-map=sm_37.c: Adjust. * gcc.target/nvptx/march-map=sm_50.c: Likewise. * gcc.target/nvptx/march-map=sm_52.c: Likewise. * gcc.target/nvptx/march=sm_37.c: New. libgomp/ * testsuite/libgomp.c/declare-variant-3-sm37.c: New. * testsuite/libgomp.c/declare-variant-3.h: Adjust.
2024-12-06nvptx: Support '-mptx=4.1'Thomas Schwinge1-1/+1
gcc/ * config/nvptx/nvptx-opts.h (enum ptx_version): Add 'PTX_VERSION_4_1'. * config/nvptx/nvptx.cc (ptx_version_to_string) (ptx_version_to_number): Adjust. * config/nvptx/nvptx.h (TARGET_PTX_4_1): New. * config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue' '4.1' for 'PTX_VERSION_4_1'. * doc/invoke.texi (Nvidia PTX Options): Document '-mptx=4.1'. gcc/testsuite/ * gcc.target/nvptx/mptx=4.1.c: New.
2024-12-06nvptx: Expose '-mptx=4.2'Thomas Schwinge1-3/+7
'PTX_VERSION_4_2' was added in commit decde11183bdccc46587d6614b75f3d56a2f2e4a "[nvptx] Choose -mptx default based on -misa" for use for '-march=sm_52' ('first_ptx_version_supporting_sm', 'PTX_ISA_SM53'), as documented by Nvidia. However, '-mptx=4.2' wasn't exposed to the user, but there's no reason not to. gcc/ * config/nvptx/nvptx.h (TARGET_PTX_4_2): New. * config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue' '4.2' for 'PTX_VERSION_4_2'. * doc/invoke.texi (Nvidia PTX Options): Document '-mptx=4.2'. gcc/testsuite/ * gcc.target/nvptx/mptx=4.2.c: New.
2024-12-06nvptx: Support '--with-multilib-list'Thomas Schwinge2-10/+42
No change in behavior unless specifying it. gcc/ * config.gcc: nvptx: Support '--with-multilib-list'. * config/nvptx/gen-multilib-matches.sh: Adjust. * configure.ac: Likewise. * configure: Regenerate. * doc/install.texi: Update. * doc/invoke.texi: Align. * config/nvptx/gen-multilib-matches-tests: Extend.
2024-12-06RISC-V: Add --with-cmodel configure optionHau Hsu2-2/+8
Sometimes we want to use default cmodel other than medlow. Add a GCC configure option for that. gcc/ChangeLog: * config.gcc (riscv*-*-*): Add support for --with-cmodel configure option. (all_defaults): Add cmodel. * config/riscv/riscv.h (TARGET_DEFAULT_CMODEL): Remove. * doc/install.texi: Document --with-cmodel configure option. * doc/invoke.texi (-mcmodel): Mention --with-cmodel configure option. Co-authored-by: Kito Cheng <kito.cheng@sifive.com>
2024-12-05arm: Add CDE options for star-mc1 cpuArvin Zhong1-2/+4
This patch adds the CDE options support for the -mcpu=star-mc1. The star-mc1 is an Armv8-m Mainline CPU supporting CDE feature. gcc/ChangeLog: * config/arm/arm-cpus.in (star-mc1): Add CDE options. * doc/invoke.texi (cdecp options): Document for star-mc1. Signed-off-by: Qingxin Zhong <arvin.zhong@armchina.com>
2024-12-05AVR: target/107957 - Propagate zero_reg to store sources.Georg-Johann Lay1-2/+7
When -msplit-ldst is on, it may be possible to propagate __zero_reg__ to the sources of the new stores. For example, without this patch, unsigned long lx; void store_lsr17 (void) { lx >>= 17; } compiles to: store_lsr17: lds r26,lx+2 ; movqi_insn lds r27,lx+3 ; movqi_insn movw r24,r26 ; *movhi lsr r25 ; *lshrhi3_const ror r24 ldi r26,0 ; movqi_insn ldi r27,0 ; movqi_insn sts lx,r24 ; movqi_insn sts lx+1,r25 ; movqi_insn sts lx+2,r26 ; movqi_insn sts lx+3,r27 ; movqi_insn ret but with this patch it becomes: store_lsr17: lds r26,lx+2 ; movqi_insn lds r27,lx+3 ; movqi_insn movw r24,r26 ; *movhi lsr r25 ; *lshrhi3_const ror r24 sts lx,r24 ; movqi_insn sts lx+1,r25 ; movqi_insn sts lx+2,__zero_reg__ ; movqi_insn sts lx+3,__zero_reg__ ; movqi_insn ret gcc/ PR target/107957 * config/avr/avr-passes-fuse-move.h (bbinfo_t) <try_mem0_p>: Add static property. * config/avr/avr-passes.cc (bbinfo_t::try_mem0_p): Define it. (optimize_data_t::try_mem0): New method. (bbinfo_t::optimize_one_block) [bbinfo_t::try_mem0_p]: Run try_mem0. (bbinfo_t::optimize_one_function): Set bbinfo_t::try_mem0_p. * config/avr/avr.md (pushhi1_insn): Also allow zero as source. (define_split) [avropt_split_ldst]: Only run avr_split_ldst() when avr-fuse-move has been run at least once. * doc/invoke.texi (AVR Options) <-msplit-ldst>: Document it.
2024-12-05doc: Add store-forwarding-max-distance to invoke.texiFilip Kastl1-0/+5
gcc/ChangeLog: * doc/invoke.texi: Add store-forwarding-max-distance. Signed-off-by: Filip Kastl <fkastl@suse.cz>
2024-12-05Allow limited extended asm at toplevel [PR41045]Jakub Jelinek1-18/+14
In the Cauldron IPA/LTO BoF we've discussed toplevel asms and it was discussed it would be nice to tell the compiler something about what the toplevel asm does. Sure, I'm aware the kernel people said they aren't willing to use something like that, but perhaps other projects do. And for kernel perhaps we should add some new option which allows some dumb parsing of the toplevel asms and gather something from that parsing. The following patch is just a small step towards that, namely, allow some subset of extended inline asm outside of functions. The patch is unfinished, LTO streaming (out/in) of the ASM_EXPRs isn't implemented (it emits a sorry diagnostics), nor any cgraph/varpool changes to find out references etc. The patch allows something like: int a[2], b; enum { E1, E2, E3, E4, E5 }; struct S { int a; char b; long long c; }; asm (".section blah; .quad %P0, %P1, %P2, %P3, %P4; .previous" : : "m" (a), "m" (b), "i" (42), "i" (E4), "i" (sizeof (struct S))); Even for non-LTO, that could be useful e.g. for getting enumerators from C/C++ as integers into the toplevel asm, or sizeof/offsetof etc. The restrictions I've implemented are: 1) asm qualifiers aren't still allowed, so asm goto or asm inline can't be specified at toplevel, asm volatile has the volatile ignored for C++ with a warning and is an error in C like before 2) I see good use for mainly input operands, output maybe to make it clear that the inline asm may write some memory, I don't see a good use for clobbers, so the patch doesn't allow those (and of course labels because asm goto can't be specified) 3) the patch allows only constraints which don't allow registers, so typically "m" or "i" or other memory or immediate constraints; for memory, it requires that the operand is addressable and its address could be used in static var initializer (so that no code actually needs to be emitted for it), for others that they are constants usable in the static var initializers 4) the patch disallows + (there is no reload of the operands, so I don't see benefits of tying some operands together), nor % (who cares if something is commutative in this case), or & (again, no code is emitted around the asm), nor the 0-9 constraints Right now there is no way to tell the compiler that the inline asm defines some symbol, that is implemented in a later patch, as : constraint. Similarly, the c modifier doesn't work in all cases and the cc modifier is implemented separately. 2024-12-05 Jakub Jelinek <jakub@redhat.com> PR c/41045 gcc/ * output.h (insn_noperands): Declare. * final.cc (insn_noperands): No longer static. * varasm.cc (assemble_asm): Handle ASM_EXPR. * lto-streamer-out.cc (lto_output_toplevel_asms): Add sorry_at for non-STRING_CST toplevel asm for now. * doc/extend.texi (Basic @code{asm}, Extended @code{asm}): Document that extended asm is now allowed outside of functions with certain restrictions. gcc/c/ * c-parser.cc (c_parser_asm_string_literal): Add forward declaration. (c_parser_asm_definition): Parse also extended asm without clobbers/labels. * c-typeck.cc (build_asm_expr): Allow extended asm outside of functions and check extra restrictions. gcc/cp/ * cp-tree.h (finish_asm_stmt): Add TOPLEV_P argument. * parser.cc (cp_parser_asm_definition): Parse also extended asm without clobbers/labels outside of functions. * semantics.cc (finish_asm_stmt): Add TOPLEV_P argument, if set, check extra restrictions for extended asm outside of functions. * pt.cc (tsubst_stmt): Adjust finish_asm_stmt caller. gcc/testsuite/ * c-c++-common/toplevel-asm-1.c: New test. * c-c++-common/toplevel-asm-2.c: New test. * c-c++-common/toplevel-asm-3.c: New test.
2024-12-04libgdiagnostics: documentation tweaksDavid Malcolm7-8/+9
gcc/ChangeLog: * doc/libgdiagnostics/topics/execution-paths.rst: Add '§' before references to section of SARIF spec. * doc/libgdiagnostics/topics/fix-it-hints.rst: Likewise. * doc/libgdiagnostics/tutorial/01-hello-world.rst: Fix typo. * doc/libgdiagnostics/tutorial/02-physical-locations.rst: Likewise. * doc/libgdiagnostics/tutorial/04-notes.rst: Likewise. * doc/libgdiagnostics/tutorial/06-fix-it-hints.rst: Add link to diagnostic_add_fix_it_hint_replace. * doc/libgdiagnostics/tutorial/07-execution-paths.rst: Add '§'. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-12-04sched1: parameterize pressure scheduling spilling aggressiveness [PR/114729]Vineet Gupta1-0/+10
sched1 computes ECC (Excess Change Cost) for each insn, which represents the register pressure attributed to the insn. Currently the pressure sensitive scheduling algorithm deliberately ignores negative ECC values (pressure reduction), making them 0 (neutral), leading to more spills. This happens due to the assumption that the compiler has a reasonably accurate processor pipeline scheduling model and thus tries to aggresively fill pipeline bubbles with spill slots. This however might not be true, as the model might not be available for certains uarches or even applicable especially for modern out-of-order cores. The existing heuristic induces spill frenzy on RISC-V, noticably so on SPEC2017 507.Cactu. If insn scheduling is disabled completely, the total dynamic icounts for this workload are reduced in half from ~2.5 trillion insns to ~1.3 (w/ -fno-schedule-insns). This patch adds --param=cycle-accurate-model={0,1} to gate the spill behavior. - The default (1) preserves existing spill behavior. - targets/uarches sensitive to spilling can override the param to (0) to get the reverse effect. RISC-V backend does so too. The actual perf numbers are very promising. (1) On RISC-V BPI-F3 in-order CPU, -Ofast -march=rv64gcv_zba_zbb_zbs: Before: ------ Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par': 4,917,712.97 msec task-clock:u # 1.000 CPUs utilized 5,314 context-switches:u # 1.081 /sec 3 cpu-migrations:u # 0.001 /sec 204,784 page-faults:u # 41.642 /sec 7,868,291,222,513 cycles:u # 1.600 GHz 2,615,069,866,153 instructions:u # 0.33 insn per cycle 10,799,381,890 branches:u # 2.196 M/sec 15,714,572 branch-misses:u # 0.15% of all branches After: ----- Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par': 4,552,979.58 msec task-clock:u # 0.998 CPUs utilized 205,020 context-switches:u # 45.030 /sec 2 cpu-migrations:u # 0.000 /sec 204,221 page-faults:u # 44.854 /sec 7,285,176,204,764 cycles:u (7.4% faster) # 1.600 GHz 2,145,284,345,397 instructions:u (17.96% fewer) # 0.29 insn per cycle 10,799,382,011 branches:u # 2.372 M/sec 16,235,628 branch-misses:u # 0.15% of all branches (2) Wilco reported 20% perf gains on aarch64 Neoverse V2 runs. gcc/ChangeLog: PR target/11472 * params.opt (--param=cycle-accurate-model=): New opt. * doc/invoke.texi (cycle-accurate-model): Document. * haifa-sched.cc (model_excess_group_cost): Return negative delta if param_cycle_accurate_model is 0. (model_excess_cost): Ceil negative baseECC to 0 only if param_cycle_accurate_model is 1. Dump the actual ECC value. * config/riscv/riscv.cc (riscv_option_override): Set param to 0. gcc/testsuite/ChangeLog: PR target/114729 * gcc.target/riscv/riscv.exp: Enable new tests to build. * gcc.target/riscv/sched1-spills/spill1.cpp: Add new test. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2024-12-03libgdiagnostics: fix docs metadataDavid Malcolm1-2/+1
gcc/ChangeLog: * doc/libgdiagnostics/conf.py: Remove "author". Change "copyright" field to the FSF. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-12-03aarch64: Add support for AdvSIMD lutSaurabh Jha1-0/+2
The AArch64 FEAT_LUT extension is optional from Armv9.2-A and mandatory from Armv9.5-A. It introduces instructions for lookup table reads with bit indices. This patch adds support for AdvSIMD lut intrinsics. The intrinsics for this extension are implemented as the following builtin functions: * vluti2{q}_lane{q}_{u8|s8|p8} * vluti2{q}_lane{q}_{u16|s16|p16|f16|bf16} * vluti4q_lane{q}_{u8|s8|p8} * vluti4q_lane{q}_{u16|s16|p16|f16|bf16}_x2 We also introduced a new approach to do lane checks for AdvSIMD. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (aarch64_builtin_signatures): Add binary_lane. (aarch64_fntype): Handle it. (simd_types): Add 16-bit x2 types. (aarch64_pragma_builtins_checker): New class. (aarch64_general_check_builtin_call): Use it. (aarch64_expand_pragma_builtin): Add support for lut unspecs. * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): Add lut option. * config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_BINARY_LANE): Modify to use new ENTRY macro. (ENTRY_TERNARY_VLUT8): Macro to declare lut intrinsics. (ENTRY_TERNARY_VLUT16): Macro to declare lut intrinsics. (REQUIRED_EXTENSIONS): Declare lut intrinsics. * config/aarch64/aarch64-simd.md (@aarch64_<vluti_uns_op><VLUT:mode><VB:mode>): Instruction pattern for luti2 and luti4 intrinsics. (@aarch64_lutx2<VLUT:mode><VB:mode>): Instruction pattern for luti4x2 intrinsics. * config/aarch64/aarch64.h (TARGET_LUT): lut flag. * config/aarch64/iterators.md: Iterators and attributes for lut. * doc/invoke.texi: Document extension in AArch64 Options. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/lut-incorrect-range.c: New test. * gcc.target/aarch64/simd/lut-no-flag.c: New test. * gcc.target/aarch64/simd/lut.c: New test. Co-authored-by: Vladimir Miloserdov <vladimir.miloserdov@arm.com> Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
2024-12-02libgdiagnostics: fix a missing rename in the docsDavid Malcolm1-1/+1
gcc/ChangeLog: * doc/libgdiagnostics/tutorial/01-hello-world.rst: Update linker command for renaming. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-12-01[PATCH v7 08/12] Add a new pass for naive CRC loops detectionMariam Arutunian1-1/+15
This patch adds a new compiler pass aimed at identifying naive CRC implementations, characterized by the presence of a loop calculating a CRC (polynomial long division). Upon detection of a potential CRC, the pass prints an informational message. Performs CRC optimization if optimization level is >= 2 and if fno_gimple_crc_optimization given. This pass is added for the detection and optimization of naive CRC implementations, improving the efficiency of CRC-related computations. This patch includes only initial fast checks for filtering out non-CRCs, detected possible CRCs verification and optimization parts will be provided in subsequent patches. gcc/ * Makefile.in (OBJS): Add gimple-crc-optimization.o. * common.opt (foptimize-crc): New option. * common.opt.urls: Regenerate to add foptimize-crc. * doc/invoke.texi (-foptimize-crc): Add documentation. * gimple-crc-optimization.cc: New file. * opts.cc (default_options_table): Add OPT_foptimize_crc. (enable_fdo_optimizations): Enable optimize_crc. * passes.def (pass_crc_optimization): Add new pass. * timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar. * tree-pass.h (make_pass_crc_optimization): New extern function declaration.
2024-11-30Support for 64-bit location_t: Backend partsLewis Hyatt1-1/+1
A few targets have been using "unsigned int" function arguments that need to receive a "location_t". Change to "location_t" to prepare for the possibility that location_t can be configured to be a different type. gcc/ChangeLog: * config/aarch64/aarch64-c.cc (aarch64_resolve_overloaded_builtin): Change "unsigned int" argument to "location_t". * config/avr/avr-c.cc (avr_resolve_overloaded_builtin): Likewise. * config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): Likewise. * target.def: Likewise. * doc/tm.texi: Regenerate.
2024-11-30c++: Implement C++26 P3176R1 - The Oxford variadic commaJakub Jelinek1-2/+19
While we are already in stage3, I wonder if implementing this small paper wouldn't be useful even for GCC 15, so that we have in the GCC world one extra year of deprecation of variadic ellipsis without preceding comma. The paper just deprecates something, I'd hope most of the C++ code in the wild when it uses variadic functions at all uses the comma before the ellipsis. 2024-11-30 Jakub Jelinek <jakub@redhat.com> gcc/c-family/ * c.opt (Wdeprecated-variadic-comma-omission): New option. * c.opt.urls: Regenerate. * c-opts.cc (c_common_post_options): Default to -Wdeprecated-variadic-comma-omission for C++26 or -Wpedantic. gcc/cp/ * parser.cc: Implement C++26 P3176R1 - The Oxford variadic comma. (cp_parser_parameter_declaration_clause): Emit -Wdeprecated-variadic-comma-omission warnings. gcc/ * doc/invoke.texi (-Wdeprecated-variadic-comma-omission): Document. gcc/testsuite/ * g++.dg/cpp26/variadic-comma1.C: New test. * g++.dg/cpp26/variadic-comma2.C: New test. * g++.dg/cpp26/variadic-comma3.C: New test. * g++.dg/cpp26/variadic-comma4.C: New test. * g++.dg/cpp26/variadic-comma5.C: New test. * g++.dg/cpp1z/fold10.C: Expect a warning for C++26. * g++.dg/ext/attrib33.C: Likewise. * g++.dg/cpp1y/lambda-generic-variadic19.C: Likewise. * g++.dg/cpp2a/lambda-generic10.C: Likewise. * g++.dg/cpp0x/lambda/lambda-const3.C: Likewise. * g++.dg/cpp0x/variadic164.C: Likewise. * g++.dg/cpp0x/variadic17.C: Likewise. * g++.dg/cpp0x/udlit-args-neg.C: Likewise. * g++.dg/cpp0x/variadic28.C: Likewise. * g++.dg/cpp0x/gen-attrs-33.C: Likewise. * g++.dg/cpp23/explicit-obj-diagnostics3.C: Likewise. * g++.old-deja/g++.law/operators15.C: Likewise. * g++.old-deja/g++.mike/p811.C: Likewise. * g++.old-deja/g++.mike/p12306.C (printf): Add , before ... . * g++.dg/analyzer/fd-bind-pr107783.C (bind): Likewise. * g++.dg/cpp0x/vt-65790.C (printf): Likewise. libstdc++-v3/ * include/std/functional (_Bind_check_arity): Add , before ... . * include/bits/refwrap.h (_Mem_fn_traits, _Weak_result_type_impl): Likewise. * include/tr1/type_traits (is_function): Likewise.
2024-11-29Rename "libdiagnostics" to "libgdiagnostics"David Malcolm27-53/+53
"libdiagnostics" clashes with an existing soname in Debian, as per: https://gcc.gnu.org/pipermail/gcc/2024-November/245175.html Rename it to "libgdiagnostics" for uniqueness. I am being deliberately vague about what the "g" stands for: it could be "gnu", "gcc", or "gpl-licensed" as the reader desires. ChangeLog: * configure.ac: Rename "libdiagnostics" to "libgdiagnostics". * configure: Regenerate. gcc/ChangeLog: * Makefile.in: Rename "libdiagnostics" to "libgdiagnostics". * configure.ac: Likewise. * configure: Regenerate. * doc/install.texi: Rename "libdiagnostics" to "libgdiagnostics". * doc/libdiagnostics/*: Rename to doc/libgdiagnostics, renaming "libdiagnostics" to "libgdiagnostics" throughout. * libdiagnostics++.h: Rename to... * libgdiagnostics++.h: ...this, renaming "libdiagnostics" to "libgdiagnostics" throughout. * libdiagnostics.cc: Rename to... * libgdiagnostics.cc: ...this, renaming "libdiagnostics" to "libgdiagnostics" throughout. * libdiagnostics.h: Rename to... * libgdiagnostics.h: ...this, renaming "libdiagnostics" to "libgdiagnostics" throughout. * libdiagnostics.map: Rename to... * libgdiagnostics.map: ...this, renaming "libdiagnostics" to "libgdiagnostics" throughout. * libsarifreplay.cc: Update for renaming of "libdiagnostics" to "libgdiagnostics". * libsarifreplay.h: Likewise. * sarif-replay.cc: Likewise. gcc/testsuite/ChangeLog: * libdiagnostics.dg/*: Rename to libgdiagnostics.dg, renaming "libdiagnostics" to "libgdiagnostics" throughout. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-11-29AVR: target/117726 - Better optimize shifts.Georg-Johann Lay1-5/+5
This patch splits 2-byte and 3-byte shifts after reload into a 3-operand byte shift and a residual 2-operand shift. The "2op" shift insn alternatives are not needed and removed because all shift insn already have a "r,0,n" alternative that does the job. PR target/117726 gcc/ * config/avr/avr-passes.cc (avr_shift_is_3op, avr_emit_shift): Also handle 2-byte and 3-byte shifts. (avr_split_shift4, avr_split_shift3, avr_split_shift2): New local helper functions. (avr_split_shift): Use them. * config/avr/avr-passes.def (avr_pass_split_after_peephole2): Adjust comments. * config/avr/avr.cc (avr_out_ashlpsi3, avr_out_ashrpsi3) (avr_out_lshrpsi3): Support offset 15. (ashrhi3_out): Support offset 7 as 3-op. (ashrsi3_out): Support offset 15. (avr_rtx_costs_1): Adjust shift costs. * config/avr/avr.md (2op): Remove attribute value and all such insn alternatives. (ashlhi3, *ashlhi3, *ashlhi3_const): Add 3-op alternatives like C2l. (ashrhi3, *ashrhi3, *ashrhi3_const): Add 3-op alternatives like C2a. (lshrhi3, *lshrhi3, *lshrhi3_const): Add 3-op alternatives like C2r. (*ashlpsi3_split, *ashlpsi3): Add 3-op alternatives C15 and C3l. (*ashrpsi3_split, *ashrpsi3): Add 3-op alternatives C15 and C3r. (*lshrpsi3_split, *lshrpsi3): Add 3-op alternatives C15 and C3r. (ashlsi3, *ashlsi3, *ashlsi3_const): Remove "2op" alternative. (ashrsi3, *ashrsi3, *ashrsi3_const): Same. (lshrsi3, *lshrsi3, *lshrsi3_const): Same. (constr_split_suffix): Code attr morphed from constr_split_shift4. * config/avr/constraints.md (C2a, C2r, C2l) (C3a, C3r, C3l): New constraints. * doc/invoke.texi (AVR Options) <-msplit-bit-shift>: Adjust doc.
2024-11-29[PATCH v7 03/12] RISC-V: Add CRC expander to generate faster CRC.Mariam Arutunian1-0/+18
If the target is ZBC or ZBKC, it uses clmul instruction for the CRC calculation. Otherwise, if the target is ZBKB, generates table-based CRC, but for reversing inputs and the output uses bswap and brev8 instructions. Add new tests to check CRC generation for ZBC, ZBKC and ZBKB targets. gcc/ * expr.cc (gf2n_poly_long_div_quotient): New function. * expr.h (gf2n_poly_long_div_quotient): New function declaration. * hwint.cc (reflect_hwi): New function. * hwint.h (reflect_hwi): New function declaration. * config/riscv/bitmanip.md (crc_rev<ANYI1:mode><ANYI:mode>4): New expander for reversed CRC. (crc<SUBX1:mode><SUBX:mode>4): New expander for bit-forward CRC. * config/riscv/iterators.md (SUBX1, ANYI1): New iterators. * config/riscv/riscv-protos.h (generate_reflecting_code_using_brev): New function declaration. (expand_crc_using_clmul): Likewise. (expand_reversed_crc_using_clmul): Likewise. * config/riscv/riscv.cc (generate_reflecting_code_using_brev): New function. (expand_crc_using_clmul): Likewise. (expand_reversed_crc_using_clmul): Likewise. * config/riscv/riscv.md (UNSPEC_CRC, UNSPEC_CRC_REV): New unspecs. * doc/sourcebuild.texi: Document new target selectors. gcc/testsuite * lib/target-supports.exp (check_effective_target_riscv_zbc): New target supports predicate. (check_effective_target_riscv_zbkb): Likewise. (check_effective_target_riscv_zbkc): Likewise. (check_effective_target_zbc_ok): Likewise. (check_effective_target_zbkb_ok): Likewise. (check_effective_target_zbkc_ok): Likewise. (riscv_get_arch): Add zbkb and zbkc support. * gcc.target/riscv/crc-builtin-zbc32.c: New file. * gcc.target/riscv/crc-builtin-zbc64.c: Likewise. Co-author: Jeff Law <jlaw@ventanamicro.com>
2024-11-29AArch64: Suppress default options when march or mcpu used is not affected by it.Tamar Christina1-2/+7
This patch makes it so that when you use any of the Cortex-A53 errata workarounds but have specified an -march or -mcpu we know is not affected by it that we suppress the errata workaround. This is a driver only patch as the linker invocation needs to be changed as well. The linker and cc SPECs are different because for the linker we didn't seem to add an inversion flag for the option. That said, it's also not possible to configure the linker with it on by default. So not passing the flag is sufficient to turn it off. For the compilers however we have an inversion flag using -mno-, which is needed to disable the workarounds when the compiler has been configured with it by default. In case it's unclear how the patch does what it does (it took me a while to figure out the syntax): * Early matching will replace any -march=native or -mcpu=native with their expanded forms and erases the native arguments from the buffer. * Due to the above if we ensure we handle the new code after this erasure then we only have to handle the expanded form. * The expanded form needs to handle -march=<arch>+extensions and -mcpu=<cpu>+extensions and so we can't use normal string matching but instead use strstr with a custom driver function that's common between native and non-native builds. * For the compilers we output -mno-<workaround> and for the linker we just erase the --fix-<workaround> option. * The extra internal matching, e.g. the duplicate match of mcpu inside: mcpu=*:%{%:is_local_not_armv8_base(%{mcpu=*:%*}) is so we can extract the glob using %* because the outer match would otherwise reset at the %{. The reason for the outer glob at all is to skip the block early if no matches are found. The workaround has the effect of suppressing certain inlining and multiply-add formation which leads to about ~1% SPECCPU 2017 Intrate regression on modern cores. This patch is needed because most distros configure GCC with the workaround enabled by default. Expected output: > gcc -mcpu=neoverse-v1 -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l 0 > gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l 5 > gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l 5 > gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l 0 > gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-\-fix" | wc -l 0 > gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-\-fix" | wc -l 1 > -gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-\-fix" | wc -l 1 gcc/ChangeLog: * config/aarch64/aarch64-errata.h (TARGET_SUPPRESS_OPT_SPEC, TARGET_TURN_OFF_OPT_SPEC, CA53_ERR_835769_COMPILE_SPEC, CA53_ERR_843419_COMPILE_SPEC): New. (CA53_ERR_835769_SPEC, CA53_ERR_843419_SPEC): Use them. * config/aarch64/aarch64-elf-raw.h (CC1_SPEC, CC1PLUS_SPEC): Add AARCH64_ERRATA_COMPILE_SPEC. * config/aarch64/aarch64-freebsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise. * config/aarch64/aarch64-gnu.h (CC1_SPEC, CC1PLUS_SPEC): Likewise. * config/aarch64/aarch64-linux.h (CC1_SPEC, CC1PLUS_SPEC): Likewise. * config/aarch64/aarch64-netbsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise. * common/config/aarch64/aarch64-common.cc (is_host_cpu_not_armv8_base): New. * config/aarch64/driver-aarch64.cc: Remove extra newline * config/aarch64/aarch64.h (is_host_cpu_not_armv8_base): New. (MCPU_TO_MARCH_SPEC_FUNCTIONS): Add is_local_not_armv8_base. (EXTRA_SPEC_FUNCTIONS): Add is_local_cpu_armv8_base. * doc/invoke.texi: Document it. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/info_30: New test. * gcc.target/aarch64/cpunative/info_31: New test. * gcc.target/aarch64/cpunative/info_32: New test. * gcc.target/aarch64/cpunative/info_33: New test. * gcc.target/aarch64/cpunative/native_cpu_30.c: New test. * gcc.target/aarch64/cpunative/native_cpu_31.c: New test. * gcc.target/aarch64/cpunative/native_cpu_32.c: New test. * gcc.target/aarch64/cpunative/native_cpu_33.c: New test. * gcc.target/aarch64/erratas_opt_0.c: New test. * gcc.target/aarch64/erratas_opt_1.c: New test. * gcc.target/aarch64/erratas_opt_10.c: New test. * gcc.target/aarch64/erratas_opt_11.c: New test. * gcc.target/aarch64/erratas_opt_12.c: New test. * gcc.target/aarch64/erratas_opt_13.c: New test. * gcc.target/aarch64/erratas_opt_14.c: New test. * gcc.target/aarch64/erratas_opt_15.c: New test. * gcc.target/aarch64/erratas_opt_2.c: New test. * gcc.target/aarch64/erratas_opt_3.c: New test. * gcc.target/aarch64/erratas_opt_4.c: New test. * gcc.target/aarch64/erratas_opt_5.c: New test. * gcc.target/aarch64/erratas_opt_6.c: New test. * gcc.target/aarch64/erratas_opt_7.c: New test. * gcc.target/aarch64/erratas_opt_8.c: New test. * gcc.target/aarch64/erratas_opt_9.c: New test.
2024-11-29aarch64: add SVE2 FP8DOT2 and FP8DOT4 intrinsicsClaudio Bantaloukas1-0/+12
This patch adds support for the following intrinsics: - svdot[_f32_mf8]_fpm - svdot_lane[_f32_mf8]_fpm - svdot[_f16_mf8]_fpm - svdot_lane[_f16_mf8]_fpm The first two are available under a combination of the FP8DOT4 and SVE2 features. Alternatively under the SSVE_FP8DOT4 feature under streaming mode. The final two are available under a combination of the FP8DOT2 and SVE2 features. Alternatively under the SSVE_FP8DOT2 feature under streaming mode. gcc/ * config/aarch64/aarch64-option-extensions.def (fp8dot4, ssve-fp8dot4): Add new extensions. (fp8dot2, ssve-fp8dot2): Likewise. * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl): Support fp8. (svdotprod_lane_impl): Likewise. (svdot_lane): Provide an unspec for fp8 types. * config/aarch64/aarch64-sve-builtins-shapes.cc (ternary_mfloat8_def): Add new class. (ternary_mfloat8): Add new shape. (ternary_mfloat8_lane_group_selection_def): Add new class. (ternary_mfloat8_lane_group_selection): Add new shape. * config/aarch64/aarch64-sve-builtins-shapes.h (ternary_mfloat8, ternary_mfloat8_lane_group_selection): Declare. * config/aarch64/aarch64-sve-builtins-sve2.def (svdot, svdot_lane): Add new DEF_SVE_FUNCTION_GS_FPM, twice to deal with the combination of features providing support for 32 and 16 bit floating point. * config/aarch64/aarch64-sve2.md (@aarch64_sve_dot<mode>): Add new. (@aarch64_sve_dot_lane<mode>): Likewise. * config/aarch64/aarch64.h: (TARGET_FP8DOT4, TARGET_SSVE_FP8DOT4): Add new defines. (TARGET_FP8DOT2, TARGET_SSVE_FP8DOT2): Likewise. * config/aarch64/iterators.md (UNSPEC_DOT_FP8, UNSPEC_DOT_LANE_FP8): Add new unspecs. * doc/invoke.texi: Document fp8dot4, fp8dot2, ssve-fp8dot4, ssve-fp8dot2 extensions. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c: Add new. gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/dot_mf8.c: Likewise. * lib/target-supports.exp: Add dg-require-effective-target support for aarch64_asm_fp8dot2_ok, aarch64_asm_fp8dot4_ok, aarch64_asm_ssve-fp8dot2_ok and aarch64_asm_ssve-fp8dot4_ok.
2024-11-29aarch64: add SVE2 FP8 multiply accumulate intrinsicsClaudio Bantaloukas1-0/+5
This patch adds support for the following intrinsics: - svmlalb[_f16_mf8]_fpm - svmlalb[_n_f16_mf8]_fpm - svmlalt[_f16_mf8]_fpm - svmlalt[_n_f16_mf8]_fpm - svmlalb_lane[_f16_mf8]_fpm - svmlalt_lane[_f16_mf8]_fpm - svmlallbb[_f32_mf8]_fpm - svmlallbb[_n_f32_mf8]_fpm - svmlallbt[_f32_mf8]_fpm - svmlallbt[_n_f32_mf8]_fpm - svmlalltb[_f32_mf8]_fpm - svmlalltb[_n_f32_mf8]_fpm - svmlalltt[_f32_mf8]_fpm - svmlalltt[_n_f32_mf8]_fpm - svmlallbb_lane[_f32_mf8]_fpm - svmlallbt_lane[_f32_mf8]_fpm - svmlalltb_lane[_f32_mf8]_fpm - svmlalltt_lane[_f32_mf8]_fpm These are available under a combination of the FP8FMA and SVE2 features. Alternatively under the SSVE_FP8FMA feature under streaming mode. gcc/ * config/aarch64/aarch64-option-extensions.def (fp8fma, ssve-fp8fma): Add new options. * config/aarch64/aarch64-sve-builtins-functions.h (unspec_based_function_base): Add unspec_for_mfp8. (unspec_for): Return unspec_for_mfp8 on fpm-using cases. (sme_1mode_function): Fix call to parent ctor. (sme_2mode_function_t): Likewise. (unspec_based_mla_function, unspec_based_mla_lane_function): Handle fpm-using cases. * config/aarch64/aarch64-sve-builtins-shapes.cc (parse_element_type): Treat M as TYPE_SUFFIX_mf8 (ternary_mfloat8_lane_def): Add new class. (ternary_mfloat8_opt_n_def): Likewise. (ternary_mfloat8_lane): Add new shape. (ternary_mfloat8_opt_n): Likewise. * config/aarch64/aarch64-sve-builtins-shapes.h (ternary_mfloat8_lane, ternary_mfloat8_opt_n): Declare. * config/aarch64/aarch64-sve-builtins-sve2.cc (svmlalb_lane, svmlalb, svmlalt_lane, svmlalt): Update definitions with mfloat8_t unspec in ctor. (svmlallbb_lane, svmlallbb, svmlallbt_lane, svmlallbt, svmlalltb_lane, svmlalltb, svmlalltt_lane, svmlalltt, svmlal_impl): Add new FUNCTIONs. (svqrshr, svqrshrn, svqrshru, svqrshrun): Update definitions with nop mfloat8 unspec in ctor. * config/aarch64/aarch64-sve-builtins-sve2.def (svmlalb, svmlalt, svmlalb_lane, svmlalt_lane, svmlallbb, svmlallbt, svmlalltb, svmlalltt, svmlalltt_lane, svmlallbb_lane, svmlallbt_lane, svmlalltb_lane): Add new DEF_SVE_FUNCTION_GS_FPMs. * config/aarch64/aarch64-sve-builtins-sve2.h (svmlallbb_lane, svmlallbb, svmlallbt_lane, svmlallbt, svmlalltb_lane, svmlalltb, svmlalltt_lane, svmlalltt): Declare. * config/aarch64/aarch64-sve-builtins.cc (TYPES_h_float_mf8, TYPES_s_float_mf8): Add new types. (h_float_mf8, s_float_mf8): Add new SVE_TYPES_ARRAY. * config/aarch64/aarch64-sve2.md (@aarch64_sve_add_<sve2_fp8_fma_op_vnx8hf><mode>): Add new. (@aarch64_sve_add_<sve2_fp8_fma_op_vnx4sf><mode>): Add new. (@aarch64_sve_add_lane_<sve2_fp8_fma_op_vnx8hf><mode>): Likewise. (@aarch64_sve_add_lane_<sve2_fp8_fma_op_vnx4sf><mode>): Likewise. * config/aarch64/aarch64.h (TARGET_FP8FMA, TARGET_SSVE_FP8FMA): Likewise. * config/aarch64/iterators.md (VNx8HF_ONLY): Add new. (UNSPEC_FMLALB_FP8, UNSPEC_FMLALLBB_FP8, UNSPEC_FMLALLBT_FP8, UNSPEC_FMLALLTB_FP8, UNSPEC_FMLALLTT_FP8, UNSPEC_FMLALT_FP8): Likewise. (SVE2_FP8_TERNARY_VNX8HF, SVE2_FP8_TERNARY_VNX4SF): Likewise. (SVE2_FP8_TERNARY_LANE_VNX8HF, SVE2_FP8_TERNARY_LANE_VNX4SF): Likewise. (sve2_fp8_fma_op_vnx8hf, sve2_fp8_fma_op_vnx4sf): Likewise. * doc/invoke.texi: Document fp8fma and sve-fp8fma extensions. gcc/testsuite/ * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_DUAL_Z_REV, TEST_DUAL_LANE_REG, TEST_DUAL_ZD) Add fpm0 argument. * gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_opt_n_1.c: Add new shape test. * gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_1.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalb_lane_mf8.c: Add new test. * gcc.target/aarch64/sve2/acle/asm/mlalb_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlallbb_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlallbb_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlallbt_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlallbt_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalltb_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalltb_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalltt_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalltt_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalt_lane_mf8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/mlalt_mf8.c: Likewise. * lib/target-supports.exp: Add check_effective_target for fp8fma and ssve-fp8fma
2024-11-29__builtin_prefetch fixes [PR117608]Jakub Jelinek1-3/+5
The r15-4833-ge9ab41b79933 patch had among tons of config/i386 specific changes also important change to the generic code, allowing also 2 as valid value of the second argument of __builtin_prefetch: - /* Argument 1 must be either zero or one. */ - if (INTVAL (op1) != 0 && INTVAL (op1) != 1) + /* Argument 1 must be 0, 1 or 2. */ + if (INTVAL (op1) < 0 || INTVAL (op1) > 2) But the patch failed to document that change in __builtin_prefetch documentation, and more importantly didn't adjust any of the other backends to deal with it (my understanding is the expected behavior is that 2 will be silently handled as 0 unless backends have some more specific way). Some of the backends would ICE on it, in some cases gcc_assert failures/gcc_unreachable, in other cases crash later (e.g. accessing arrays with that value as index and due to accessing garbage after the array crashing at final.cc time), others treated 2 silently as 0, others treated 2 silently as 1. And even in the i386 backend there were bugs which caused ICEs. The patch added some if (write == 0) and write 2 handling into a (badly indented, maybe that is the reason, if (write == 1) body), rather than into the else side, so it would be always false. The new *prefetch_rst2 define_insn only accepts parameters 2 1 (i.e. read-shared with moderate degree of locality), so in order not to ICE the patch uses it only for __builtin_prefetch (ptr, 2, 1); or __builtin_ia32_prefetch (ptr, 2, 1, 0); and not for other values of the parameter. If that isn't what we want and we want it to be used also for all or some of __builtin_prefetch (ptr, 2, {0,2,3}); and corresponding __builtin_ia32_prefetch, maybe the define_insn could match other values. And there was another problem that -mno-mmx -mno-sse -mmovrs compilation would ICE on most of the prefetches, so I had to add the FAIL; cases. 2024-11-29 Jakub Jelinek <jakub@redhat.com> PR target/117608 * doc/extend.texi (__builtin_prefetch): Document that second argument may be also 2 and its meaning. * config/i386/i386.md (prefetch): Remove unreachable code. Clear write set operands[1] to const0_rtx if !TARGET_MOVRS or of locality is not 1. Formatting fixes. * config/i386/i386-expand.cc (ix86_expand_builtin): Use IN_RANGE. Call gen_prefetch even for TARGET_MOVRS. * config/alpha/alpha.md (prefetch): Treat read_or_write 2 like 0. * config/mips/mips.md (prefetch): Likewise. * config/arc/arc.md (prefetch_1, prefetch_2, prefetch_3): Likewise. * config/riscv/riscv.md (prefetch): Likewise. * config/loongarch/loongarch.md (prefetch): Likewise. * config/sparc/sparc.md (prefetch): Likewise. Use IN_RANGE. * config/ia64/ia64.md (prefetch): Likewise. * config/pa/pa.md (prefetch): Likewise. * config/aarch64/aarch64.md (prefetch): Likewise. * config/rs6000/rs6000.md (prefetch): Likewise. * gcc.dg/builtin-prefetch-1.c (good): Add tests with second argument 2. * gcc.target/i386/pr117608-1.c: New test. * gcc.target/i386/pr117608-2.c: New test.
2024-11-28[PATCH v6 02/12] Add built-ins and tests for bit-forward and bit-reversed CRCs.Mariam Arutunian1-0/+109
This patch introduces new built-in functions to GCC for computing bit-forward and bit-reversed CRCs. These builtins aim to provide efficient CRC calculation capabilities. When the target architecture supports CRC operations (as indicated by the presence of a CRC optab), the builtins will utilize the expander to generate CRC code. In the absence of hardware support, the builtins default to generating code for a table-based CRC calculation. The built-ins are defined as follows: __builtin_rev_crc16_data8, __builtin_rev_crc32_data8, __builtin_rev_crc32_data16, __builtin_rev_crc32_data32 __builtin_rev_crc64_data8, __builtin_rev_crc64_data16, __builtin_rev_crc64_data32, __builtin_rev_crc64_data64, __builtin_crc8_data8, __builtin_crc16_data16, __builtin_crc16_data8, __builtin_crc32_data8, __builtin_crc32_data16, __builtin_crc32_data32, __builtin_crc64_data8, __builtin_crc64_data16, __builtin_crc64_data32, __builtin_crc64_data64 Each built-in takes three parameters: crc: The initial CRC value. data: The data to be processed. polynomial: The CRC polynomial without the leading 1. To validate the correctness of these built-ins, this patch also includes additions to the GCC testsuite. This enhancement allows GCC to offer developers high-performance CRC computation options that automatically adapt to the capabilities of the target hardware. gcc/ * builtin-types.def (BT_FN_UINT8_UINT8_UINT8_CONST_SIZE): Define. (BT_FN_UINT16_UINT16_UINT8_CONST_SIZE): Likewise. (BT_FN_UINT16_UINT16_UINT16_CONST_SIZE): Likewise. (BT_FN_UINT32_UINT32_UINT8_CONST_SIZE): Likewise. (BT_FN_UINT32_UINT32_UINT16_CONST_SIZE): Likewise. (BT_FN_UINT32_UINT32_UINT32_CONST_SIZE): Likewise. (BT_FN_UINT64_UINT64_UINT8_CONST_SIZE): Likewise. (BT_FN_UINT64_UINT64_UINT16_CONST_SIZE): Likewise. (BT_FN_UINT64_UINT64_UINT32_CONST_SIZE): Likewise. (BT_FN_UINT64_UINT64_UINT64_CONST_SIZE): Likewise. * builtins.cc (associated_internal_fn): Handle CRC related builtins. (expand_builtin_crc_table_based): New function. (expand_builtin): Handle CRC related builtins. * builtins.def (BUILT_IN_CRC8_DATA8): New builtin. (BUILT_IN_CRC16_DATA8): Likewise. (BUILT_IN_CRC16_DATA16): Likewise. (BUILT_IN_CRC32_DATA8): Likewise. (BUILT_IN_CRC32_DATA16): Likewise. (BUILT_IN_CRC32_DATA32): Likewise. (BUILT_IN_CRC64_DATA8): Likewise. (BUILT_IN_CRC64_DATA16): Likewise. (BUILT_IN_CRC64_DATA32): Likewise. (BUILT_IN_CRC64_DATA64): Likewise. (BUILT_IN_REV_CRC8_DATA8): New builtin. (BUILT_IN_REV_CRC16_DATA8): Likewise. (BUILT_IN_REV_CRC16_DATA16): Likewise. (BUILT_IN_REV_CRC32_DATA8): Likewise. (BUILT_IN_REV_CRC32_DATA16): Likewise. (BUILT_IN_REV_CRC32_DATA32): Likewise. (BUILT_IN_REV_CRC64_DATA8): Likewise. (BUILT_IN_REV_CRC64_DATA16): Likewise. (BUILT_IN_REV_CRC64_DATA32): Likewise. (BUILT_IN_REV_CRC64_DATA64): Likewise. * builtins.h (expand_builtin_crc_table_based): New function declaration. * doc/extend.texi: Add documentation for new CRC builtins. gcc/testsuite/ * gcc.dg/crc-builtin-rev-target32.c: New test. * gcc.dg/crc-builtin-rev-target64.c: New test. * gcc.dg/crc-builtin-target32.c: New test. * gcc.dg/crc-builtin-target64.c: New test. Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com> Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com> Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2024-11-28[PATCH v6 01/12] Implement internal functions for efficient CRC computation.Mariam Arutunian1-0/+14
Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster CRC generation. One performs bit-forward and the other bit-reversed CRC computation. If CRC optabs are supported, they are used for the CRC computation. Otherwise, table-based CRC is generated. The supported data and CRC sizes are 8, 16, 32, and 64 bits. The polynomial is without the leading 1. A table with 256 elements is used to store precomputed CRCs. For the reflection of inputs and the output, a simple algorithm involving SHIFT, AND, and OR operations is used. gcc/ * doc/md.texi (crc@var{m}@var{n}4, crc_rev@var{m}@var{n}4): Document. * expr.cc (calculate_crc): New function. (assemble_crc_table): Likewise. (generate_crc_table): Likewise. (calculate_table_based_CRC): Likewise. (expand_crc_table_based): Likewise. (gen_common_operation_to_reflect): Likewise. (reflect_64_bit_value): Likewise. (reflect_32_bit_value): Likewise. (reflect_16_bit_value): Likewise. (reflect_8_bit_value): Likewise. (generate_reflecting_code_standard): Likewise. (expand_reversed_crc_table_based): Likewise. * expr.h (generate_reflecting_code_standard): New function declaration. (expand_crc_table_based): Likewise. (expand_reversed_crc_table_based): Likewise. * internal-fn.cc: (crc_direct): Define. (direct_crc_optab_supported_p): Likewise. (expand_crc_optab_fn): New function * internal-fn.def (CRC, CRC_REV): New internal functions. * optabs.def (crc_optab, crc_rev_optab): New optabs. Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com> Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com> Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2024-11-28docs: Fix up __sync_* documentation [PR117642]Jakub Jelinek1-8/+3
The PR14311 commit which added support for __sync_* builtins documented that there is a warning if a particular operation cannot be implemented. But that commit nor anything later on implemented such warning, it was always silent generation of the mentioned calls (which can in most cases result in linker errors of course because those functions aren't implemented anywhere, in libatomic or elsewhere in code shipped in gcc). So, the following patch just adjust the documentation to match the implementation. 2024-11-28 Jakub Jelinek <jakub@redhat.com> PR target/117642 * doc/extend.texi: Remove documentation of warning for unimplemented __sync_* operations, such warning has never been implemented.
2024-11-28Add support for nonnull_if_nonzero attribute [PR117023]Jakub Jelinek1-3/+32
As mentioned in an earlier thread, C2Y voted in a change which made various library APIs callable with NULL arguments in certain cases, e.g. memcpy (NULL, NULL, 0); is now valid, although memcpy (NULL, NULL, 1); remains invalid. This affects various APIs, including several of GCC builtins; plus on the C library side those APIs are often declared with nonnull attribute(s) as well. Florian suggested using the access attribute for this, but our docs explicitly say that access attribute doesn't imply nonnull and it doesn't cover e.g. the qsort case where the comparison function pointer may be also NULL if nmemb is 0, but must be non-zero otherwise. As this case affects 21 APIs in C standard and I think is going to affect various wrappers around those in various packages as well, I think it is a common thing that should have its own attribute, because we should still warn when people use qsort (NULL, 1, 1, NULL); etc., and similarly want to have -fsanitize=null instrumentation for those. So, the following patch introduces nonnull_if_nonzero attribute (or would you prefer cond_nonnull or some other name?), which has always 2 arguments, argument index of a pointer argument (like one argument nonnull) and argument index of an associated integral argument. If that argument is non-zero, it is UB to pass NULL to the pointer argument, if that argument is zero, it is valid. And changes various spots which already handled the nonnull attribute to handle this one as well, with sometimes using the ranger (or for -fsanitize=nonnull explicitly checking the associated argument value, so instead of if (!ptr) __ubsan_... (...); it will now do if (!ptr && sz) __ubsan_... (...);). I've so far omitted changing gimple_infer_range (am not 100% sure how I can use the ranger inside of the ranger) and changing the analyzer to handle it. And I haven't changed builtins.def etc. to make use of that attribute instead of nonnull where appropriate. I'd then follow with the builtins.def changes (and eventually glibc etc. would need to be adjusted too). 2024-11-28 Jakub Jelinek <jakub@redhat.com> PR c/117023 gcc/ * gimple.h (infer_nonnull_range_by_attribute): Add a tree * argument defaulted to NULL. * gimple.cc (infer_nonnull_range_by_attribute): Add op2 argument. Handle also nonnull_if_nonzero attributes. * tree.cc (get_nonnull_args): Fix comment typo. * builtins.cc (validate_arglist): Handle nonnull_if_nonzero attribute. * tree-ssa-ccp.cc (pass_post_ipa_warn::execute): Handle nonnull_if_nonzero attributes. * ubsan.cc (instrument_nonnull_arg): Adjust infer_nonnull_range_by_attribute caller. If it returned true and filed in non-NULL arg2, check that arg2 is non-zero as another condition next to checking that arg is zero. * doc/extend.texi (nonnull_if_nonzero): Document new attribute. gcc/c-family/ * c-attribs.cc (handle_nonnull_if_nonzero_attribute): New function. (c_common_gnu_attributes): Add nonnull_if_nonzero attribute. (handle_nonnull_attribute): Fix comment typo. * c-common.cc (struct nonnull_arg_ctx): Add other member. (check_function_nonnull): Also check nonnull_if_nonzero attributes. (check_nonnull_arg): Use different warning wording if pctx->other is non-zero. (check_function_arguments): Initialize ctx.other. gcc/testsuite/ * gcc.dg/nonnull-8.c: New test. * gcc.dg/nonnull-9.c: New test. * gcc.dg/nonnull-10.c: New test. * c-c++-common/ubsan/nonnull-6.c: New test. * c-c++-common/ubsan/nonnull-7.c: New test.
2024-11-28inline-asm, i386: Add "redzone" clobber supportJakub Jelinek3-1/+23
The following patch adds a "redzone" clobber (recognized everywhere, even on on targets which don't do anything with it), with which one can mark the rare case where inline asm pushes something on the stack or uses call instruction without taking red zone into account (i.e. addq $-128, %rsp; and addq $128, %rsp around that). 2024-11-28 Jakub Jelinek <jakub@redhat.com> gcc/ * target.def (redzone_clobber): New target hook. * varasm.cc (decode_reg_name_and_count): Return -5 for "redzone". * cfgexpand.cc (expand_asm_stmt): Handle redzone clobber. * config/i386/i386.h (struct machine_function): Add asm_redzone_clobber_seen member. * config/i386/i386.cc (ix86_compute_frame_layout): Don't use red zone if cfun->machine->asm_redzone_clobber_seen. (ix86_redzone_clobber): New function. (TARGET_REDZONE_CLOBBER): Redefine. * doc/extend.texi (Clobbers and Scratch Registers): Document the "redzone" clobber. * doc/tm.texi.in: Add @hook TARGET_REDZONE_CLOBBER. * doc/tm.texi: Regenerate. gcc/testsuite/ * gcc.dg/asm-redzone-1.c: New test. * gcc.target/i386/asm-redzone-1.c: New test.
2024-11-28expr, c: Don't clear whole unions [PR116416]Jakub Jelinek1-1/+39
As discussed earlier, we currently clear padding bits even when we don't have to and that causes pessimization of emitted code, e.g. for union U { int a; long b[64]; }; void bar (union U *); void foo (void) { union U u = { 0 }; bar (&u); } we need to clear just u.a, not the whole union, but on the other side in cases where the standard requires padding bits to be zeroed, like for C23 {} initializers of aggregates with padding bits, or for C++11 zero initialization we don't do that. This patch a) moves some of the stuff into complete_ctor_at_level_p (but not all the *p_complete = 0; case, for that it would need to change so that it passes around the ctor rather than just its type) and changes the handling of unions b) introduces a new option, so that users can either get the new behavior (only what is guaranteed by the standards, the default), or previous behavior (union padding zero initialization, no such guarantees in structures) or also a guarantee in structures c) introduces a new CONSTRUCTOR flag which says that the padding bits (if any) should be zero initialized (and sets it for now in the C FE for C23 {} initializers). Am not sure the CONSTRUCTOR_ZERO_PADDING_BITS flag is really needed for C23, if there is just empty initializer, I think we already mark it as incomplete if there are any missing initializers. Maybe with some designated initializer games, say void foo () { struct S { char a; long long b; }; struct T { struct S c; } t = { .c = {}, .c.a = 1, .c.b = 2 }; ... } Is this supposed to initialize padding bits in C23 and then the .c.a = 1 and .c.b = 2 stores preserve those padding bits, so is that supposed to be different from struct T t2 = { .c = { 1, 2 } }; ? What about just struct T t3 = { .c.a = 1, .c.b = 2 }; ? And I haven't touched the C++ FE for the flag, because I'm afraid I'm lost on where exactly is zero-initialization done (vs. other types of initialization) and where is e.g. zero-initialization of a temporary then (member-wise) copied. Say struct S { char a; long long b; }; struct T { constexpr T (int a, int b) : c () { c.a = a; c.b = b; } S c; }; void bar (T *); void foo () { T t (1, 2); bar (&t); } Is the c () value-initialization of t.c followed by c.a and c.b updates which preserve the zero initialized padding bits? Or is there some copy construction involved which does member-wise copying and makes the padding bits undefined? Looking at (older) clang++ with -O2, it initializes also the padding bits when c () is used and doesn't with c {}. For GCC, note that there is that optimization from Alex to zero padding bits for optimization purposes for small aggregates, so either one needs to look at -O0 -fdump-tree-gimple dumps, or use larger structures which aren't optimized that way. 2024-11-28 Jakub Jelinek <jakub@redhat.com> PR c++/116416 gcc/ * flag-types.h (enum zero_init_padding_bits_kind): New type. * tree.h (CONSTRUCTOR_ZERO_PADDING_BITS): Define. * common.opt (fzero-init-padding-bits=): New option. * expr.cc (categorize_ctor_elements_1): Handle CONSTRUCTOR_ZERO_PADDING_BITS or flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_ALL. Fix up *p_complete = -1; setting for unions. (complete_ctor_at_level_p): Handle unions differently for flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_STANDARD. * gimple-fold.cc (type_has_padding_at_level_p): Fix up UNION_TYPE handling, return also true for UNION_TYPE with no FIELD_DECLs and non-zero size, handle QUAL_UNION_TYPE like UNION_TYPE. * doc/invoke.texi (-fzero-init-padding-bits=@var{value}): Document. gcc/c/ * c-parser.cc (c_parser_braced_init): Set CONSTRUCTOR_ZERO_PADDING_BITS for flag_isoc23 empty initializers. * c-typeck.cc (constructor_zero_padding_bits): New variable. (struct constructor_stack): Add zero_padding_bits member. (really_start_incremental_init): Save and clear constructor_zero_padding_bits. (push_init_level): Save constructor_zero_padding_bits. Or into it CONSTRUCTOR_ZERO_PADDING_BITS from previous value if implicit. (pop_init_level): Set CONSTRUCTOR_ZERO_PADDING_BITS if constructor_zero_padding_bits and restore constructor_zero_padding_bits. gcc/testsuite/ * gcc.dg/plugin/infoleak-1.c (test_union_2b, test_union_4b): Expect diagnostics. * gcc.dg/c23-empty-init-5.c: New test. * gcc.dg/gnu11-empty-init-1.c: New test. * gcc.dg/gnu11-empty-init-2.c: New test. * gcc.dg/gnu11-empty-init-3.c: New test. * gcc.dg/gnu11-empty-init-4.c: New test.
2024-11-27c: Introduce -Wfree-labelsFlorian Weimer1-2/+13
This is another recent GCC extension whose use is apparently difficult to spot in code reviews. The name of the option is due to Jonathan Wakely. Part of it could apply to C++ as well (for labels at the end of a compound statement). gcc/c-family/ * c-opts.cc (c_common_post_options): Initialize warn_free_labels. * c.opt (Wfree-labels): New option. * c.opt.urls: Regenerate. gcc/c/ * c-parser.cc (c_parser_compound_statement_nostart): Use OPT_Wfree_labels for warning about labels on declarations. (c_parser_compound_statement_nostart): Use OPT_Wfree_labels for warning about labels at end of compound statements. gcc/ * doc/invoke.texi: Document -Wfree-labels. gcc/testsuite/ * gcc.dg/Wfree-labels-1.c: New test. * gcc.dg/Wfree-labels-2.c: New test. * gcc.dg/Wfree-labels-3.c: New test.
2024-11-25nios2: Remove all support for Nios II target.Sandra Loosemore4-629/+5
nios2 target support in GCC was deprecated in GCC 14 as the architecture has been EOL'ed by the vendor. This patch removes the entire port for GCC 15 There are still references to "nios2" in libffi and libgo. Since those libraries are imported into the gcc sources from master copies maintained by other projects, those will need to be addressed elsewhere. ChangeLog: * MAINTAINERS: Remove references to nios2. * configure.ac: Likewise. * configure: Regenerated. config/ChangeLog: * mt-nios2-elf: Deleted. contrib/ChangeLog: * config-list.mk: Remove references to Nios II. gcc/ChangeLog: * common/config/nios2/*: Delete entire directory. * config/nios2/*: Delete entire directory. * config.gcc: Remove references to nios2. * configure.ac: Likewise. * doc/extend.texi: Likewise. * doc/install.texi: Likewise. * doc/invoke.texi: Likewise. * doc/md.texi: Likewise. * regenerate-opt-urls.py: Likewise. * config.in: Regenerated. * configure: Regenerated. gcc/testsuite/ChangeLog: * g++.target/nios2/*: Delete entire directory. * gcc.target/nios2/*: Delete entire directory. * g++.dg/cpp0x/constexpr-rom.C: Remove refences to nios2. * g++.old-deja/g++.jason/thunk3.C: Likewise. * gcc.c-torture/execute/20101011-1.c: Likewise. * gcc.c-torture/execute/pr47237.c: Likewise. * gcc.dg/20020312-2.c: Likewise. * gcc.dg/20021029-1.c: Likewise. * gcc.dg/debug/btf/btf-datasec-1.c: Likewise. * gcc.dg/ifcvt-4.c: Likewise. * gcc.dg/stack-usage-1.c: Likewise. * gcc.dg/struct-by-value-1.c: Likewise. * gcc.dg/tree-ssa/reassoc-33.c: Likewise. * gcc.dg/tree-ssa/reassoc-34.c: Likewise. * gcc.dg/tree-ssa/reassoc-35.c: Likewise. * gcc.dg/tree-ssa/reassoc-36.c: Likewise. * lib/target-supports.exp: Likewise. libgcc/ChangeLog: * config/nios2/*: Delete entire directory. * config.host: Remove refences to nios2. * unwind-dw2-fde-dip.c: Likewise.
2024-11-25asan: Support dynamic shadow offsetKito Cheng2-1/+7
AddressSanitizer has supported dynamic shadow offsets since 2016[1], but GCC hasn't implemented this yet because targets using dynamic shadow offsets, such as Fuchsia and iOS, are mostly unsupported in GCC. However, RISC-V 64 switched to dynamic shadow offsets this year[2] because virtual memory space support varies across different RISC-V cores, such as Sv39, Sv48, and Sv57. We realized that the best way to handle this situation is by using a dynamic shadow offset to obtain the offset at runtime. We introduce a new target hook, TARGET_ASAN_DYNAMIC_SHADOW_OFFSET_P, to determine if the target is using a dynamic shadow offset, so this change won't affect the static offset path. Additionally, TARGET_ASAN_SHADOW_OFFSET continues to work even if TARGET_ASAN_DYNAMIC_SHADOW_OFFSET_P is non-zero, ensuring that KASAN functions as expected. This patch set has been verified on the Banana Pi F3, currently one of the most popular RISC-V development boards. All AddressSanitizer-related tests passed without introducing new regressions. It was also verified on AArch64 and x86_64 with no regressions in AddressSanitizer. [1] https://github.com/llvm/llvm-project/commit/130a190bf08a3d955d9db24dac936159dc049e12 [2] https://github.com/llvm/llvm-project/commit/da0c8b275564f814a53a5c19497669ae2d99538d gcc/ChangeLog: * asan.cc (asan_dynamic_shadow_offset_p): New. (asan_shadow_memory_dynamic_address): New. (asan_local_shadow_memory_dynamic_address): New. (get_asan_shadow_memory_dynamic_address_decl): New. (asan_maybe_insert_dynamic_shadow_at_function_entry): New. (asan_emit_stack_protection): Support dynamic shadow offset. (build_shadow_mem_access): Ditto. * asan.h (asan_maybe_insert_dynamic_shadow_at_function_entry): New. * doc/tm.texi (TARGET_ASAN_DYNAMIC_SHADOW_OFFSET_P): New. * doc/tm.texi.in (TARGET_ASAN_DYNAMIC_SHADOW_OFFSET_P): Ditto. * sanopt.cc (pass_sanopt::execute): Handle dynamic shadow offset. * target.def (asan_dynamic_shadow_offset_p): New. * toplev.cc (process_options): Handle dynamic shadow offset.
2024-11-25Add target-independent store forwarding avoidance passKonstantinos Eleftheriou4-0/+27
This pass detects cases of expensive store forwarding and tries to avoid them by reordering the stores and using suitable bit insertion sequences. For example it can transform this: strb w2, [x1, 1] ldr x0, [x1] # Expensive store forwarding to larger load. To: ldr x0, [x1] strb w2, [x1] bfi x0, x2, 0, 8 Assembly like this can appear with bitfields or type punning / unions. On stress-ng when running the cpu-union microbenchmark the following speedups have been observed. Neoverse-N1: +29.4% Intel Coffeelake: +13.1% AMD 5950X: +17.5% The transformation is rejected on cases that cause store_bit_field to generate subreg expressions on different register classes. Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c contain such cases and have been marked as XFAIL. Due to biasing of its operands in store_bit_field, there is a special handling for machines with BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN. The need for this was exosed by an issue exposed on the H8 architecture, which uses big-endian ordering, but BITS_BIG_ENDIAN is false. In that case, the START parameter of store_bit_field needs to be calculated from the end of the destination register. gcc/ChangeLog: * Makefile.in (OBJS): Add avoid-store-forwarding.o. * common.opt (favoid-store-forwarding): New option. * common.opt.urls: Regenerate. * doc/invoke.texi: New param store-forwarding-max-distance. * doc/passes.texi: Document new pass. * doc/tm.texi: Regenerate. * doc/tm.texi.in: Document new pass. * params.opt (store-forwarding-max-distance): New param. * passes.def: Add pass_rtl_avoid_store_forwarding before pass_early_remat. * target.def (avoid_store_forwarding_p): New DEFHOOK. * target.h (struct store_fwd_info): Declare. * targhooks.cc (default_avoid_store_forwarding_p): New function. * targhooks.h (default_avoid_store_forwarding_p): Declare. * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare. * avoid-store-forwarding.cc: New file. * avoid-store-forwarding.h: New file. * timevar.def (TV_AVOID_STORE_FORWARDING): New timevar. gcc/testsuite/ChangeLog: * gcc.target/aarch64/avoid-store-forwarding-1.c: New test. * gcc.target/aarch64/avoid-store-forwarding-2.c: New test. * gcc.target/aarch64/avoid-store-forwarding-3.c: New test. * gcc.target/aarch64/avoid-store-forwarding-4.c: New test. * gcc.target/aarch64/avoid-store-forwarding-5.c: New test. * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-1.c: New test. * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-2.c: New test. Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Signed-off-by: Konstantinos Eleftheriou <konstantinos.eleftheriou@vrull.eu>
2024-11-24Fix vectorization regressions on the SPARCEric Botcazou1-1/+1
This fixes the vectorization regressions present on the SPARC by switching from vcond[u] patterns to vec_cmp[u] + vcond_mask_ patterns. While I was at it, I merged the patterns for V4HI/V2SI and V8QI enabled with VIS 3/VIS 4 to follow the model of those enabled with VIS 4B, and standardized all the mnemonics to the version documented in the Oracle SPARC architecture 2015. gcc/ PR target/117715 * config/sparc/sparc-protos.h (sparc_expand_vcond): Rename to... (sparc_expand_vcond_mask): ...this. * config/sparc/sparc.cc (TARGET_VECTORIZE_GET_MASK_MODE): Define. (sparc_vis_init_builtins): Adjust the CODE_FOR_* identifiers. (sparc_get_mask_mode): New function. (sparc_expand_vcond): Rename to... (sparc_expand_vcond_mask): ...this and adjust. * config/sparc/sparc.md (unspec): Remove UNSPEC_FCMP & UNSPEC_FUCMP and rename UNSPEC_FPUCMPSHL into UNSPEC_FPCMPUSHL. (fcmp<gcond:code><GCM:gcm_name><P:mode>_vis): Merge into... (fpcmp<gcond:code>8<P:mode>_vis): Merge into... (fpcmp<fpcmpcond:code><FPCMP:vbits><P:mode>_vis): ...this. (fucmp<gcond:code>8<P:mode>_vis): Merge into... (fpcmpu<gcond:code><GCM:gcm_name><P:mode>_vis): Merge into... (fpcmpu<fpcmpucond:signed_code><FPCMP:vbits><P:mode>_vis): ...this. (vec_cmp<FPCMP:mode><P:mode>): New expander. (vec_cmpu<FPCMP:mode><P:mode>): Likewise. (vcond<GCM:mode><GCM:mode>): Delete. (vcondv8qiv8qi): Likewise. (vcondu<GCM:mode><GCM:mode>): Likewise. (vconduv8qiv8qi): Likewise. (vcond_mask_<FPCMP:mode><P:mode>): New expander. (fpcmp<fpcscond:code><FPCSMODE:vbits><P:mode>shl): Adjust. (fpcmpu<fpcsucond:code><FPCSMODE:vbits><P:mode>shl): Likewise. (fpcmpde<FPCSMODE:vbits><P:mode>shl): Likewise. (fpcmpur<FPCSMODE:vbits><P:mode>shl): Likewise. * doc/md.texi (vcond_mask_len_): Fix pasto. gcc/testsuite/ * gcc.target/sparc/20230328-1.c: Adjust to new mnemonics. * gcc.target/sparc/20230328-4.c: Likewise. * gcc.target/sparc/fcmp.c: Likewise. * gcc.target/sparc/fucmp.c: Likewise.
2024-11-24Adjust error message for initialized variable in .bssEric Botcazou1-1/+1
The current message does not make sense with -fno-zero-initialized-in-bss. gcc/ * doc/invoke.texi (-fno-zero-initialized-in-bss): Adjust for Ada. * varasm.cc (get_variable_section): Adjust the error message for an initialized variable in .bss to -fno-zero-initialized-in-bss. gcc/testsuite/ * gnat.dg/specs/bss1.ads: New test.
2024-11-22md-files: Add a note about escaped quotes in braced strings in md filesAndrew Pinski1-1/+4
While looking into PR 33532, It was noted that \" would be treated still as " for braced strings in the md file. I think that is still the correct thing to do. So let's just a note to the documentation on this behavior and NOT change read-md.cc (read_braced_string). Since this behavior has been there for the last 23 years and only one person ran into this behavior and helped with the conversion from using quoted strings to braced strings; that is you just need to remove the quote around the brace rather than change all of the code. Build the documentation to make sure it looks correct. gcc/ChangeLog: * doc/rtl.texi: Add a note about quotes in braced strings. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-22AVR: target/117726 - Tweak ashiftrt:SI and lshiftrt:SI insns.Georg-Johann Lay1-1/+10
This patch is similar to r15-5569 (tweak ashift:SI) but for ashiftrt and lshiftrt codes. It splits constant shift offsets > 16 into a 3-operand byte shift and a 2-operand residual bit shift. Moreover, some of the constraint alternatives have been promoted to 3-operand alternatives regardless of options. For example, ashift:HI and lshiftrt:HI can support 3 operands for offsets 9...12 without any overhead. Apart from that, it's a bit of code clean up for 2-byte and 4-byte shift insns: Use one RTL peephole with any_shift code iterator instead of 3 individual peepholes. It also removes some useless split insns; presumably introduced during the cc0 -> CCmode work. PR target/117726 gcc/ * config/avr/avr-passes.cc (avr_split_shift): Also handle ASHIFTRT and LSHIFTRT codes for 4-byte shifts. (constr_split_shift4): New code_attr. (avr_emit_shift): Adjust to new shift capabilities. * config/avr/predicates.md (scratch_or_d_register_operand): rename to scratch_or_dreg_operand. * config/avr/avr.md: Same. (define_peephole2): Write the RTL scratch peephole for 2-byte and 4-byte shifts that generates *sh*<mode>3_const insns using code iterator any_shift. (*ashlhi3_const_split, *ashrhi3_const_split, *ashrhi3_const_split) (*lshrsi3_const_split, *lshrhi3_const_split): Remove useless split insns. (define_split) [avropt_split_bit_shift]: Add splitters for 4-byte ASHIFTRT and LSHIFTRT insns using avr_split_shift(). (ashrsi3, *ashrsi3, *ashrsi3_const): Add "r,0,C4a" and "r,r,C4a" constraint alternatives depending on 2op, 3op. (lshrsi3, *lshrsi3, *lshrsi3_const): Add "r,0,C4r" and "r,r,C4r" constraint alternatives depending on 2op, 3op. Add "r,r,C15". (lshrhi3, *lshrhi3, *lshrhi3_const, ashlhi3, *ashlhi3) (*ashlhi3_const): Add "r,r,C7c" alternative. (ashrpsi, *ashrpsi3): Add "r,r,C22" alternative. (ashlqi, *ashlqi): Turn C06 alternative into "r,r,C06". * config/avr/constraints.md (C14, C22, C30, C7c): New constraints. * config/avr/avr.cc (ashlhi3_out, lshrhi3_out) [case 7, 9, 10, 11, 12]: Support as 3-operand insn. (lshrsi3_out) [case 15]: Same. (ashrsi3_out) [case 30]: Same. (ashrhi3_out) [case 14]: Same. (ashrqi3_out) [case 6]: Same. (avr_out_ashrpsi3) [case 22]: Same. * config/avr/avr.h: Fix comment typo. * doc/invoke.texi (AVR Options) <-msplit-bit-shift>: Document.
2024-11-22Add -f{,no-}assume-sane-operators-new-delete options [PR110137]Jakub Jelinek1-1/+32
The following patch adds a new option for optimizations related to replaceable global operators new/delete. The option isn't called -fassume-sane-operator-new (which clang++ implements), because 1) clang++ option means something different; initially it was an option to add malloc attribute to those declarations (but we have malloc attribute on all <new> calls already unconditionally); later it was changed to add noalias attribute rather than malloc, whatever it means, but it is certainly about the return value from the operator new (whether it can alias with other pointers); we already assume malloc-ish behavior that it doesn't alias any other pointers 2) the option only affects operator new, we want it affect also operator delete The option basically allows to choose between pre-PR101480 behavior (now the default, more optimistic) and post-PR101480 behavior (safer but penalizing most of the code in the wild for rare needs). I've tried to explain stuff in the documentation too. 2024-11-22 Jakub Jelinek <jakub@redhat.com> PR c++/110137 PR middle-end/101480 gcc/ * doc/invoke.texi (-fassume-sane-operators-new-delete, -fno-assume-sane-operators-new-delete): Document. * gimple.cc (gimple_call_fnspec): Handle -f{,no-}assume-sane-operators-new-delete. * ipa-inline-transform.cc (inline_call): Also clear flag_assume_sane_operators_new_delete on caller when inlining -fno-assume-sane-operators-new-delete callee into -fassume-sane-operators-new-delete caller. gcc/c-family/ * c.opt (fassume-sane-operators-new-delete): New option. gcc/testsuite/ * g++.dg/tree-ssa/pr110137-1.C: New test. * g++.dg/tree-ssa/pr110137-2.C: New test. * g++.dg/tree-ssa/pr110137-3.C: New test. * g++.dg/tree-ssa/pr110137-4.C: New test. * g++.dg/torture/pr10148.C: Add -fno-assume-sane-operators-new-delete as dg-additional-options. * g++.dg/warn/Warray-bounds-16.C: Revert 2021-11-10 changes.