Age | Commit message (Collapse) | Author | Files | Lines |
|
[PR117352]
This patch adds a limit on the number of cases of a switch. When this
limit is exceeded, switch lowering decides to use faster but less
powerful algorithms.
In particular this means that for finding bit tests switch lowering
decides between the old dynamic programming O(n^2) algorithm and the
new greedy algorithm that Andi Kleen recently added but then reverted
due to PR117352. It also means that switch lowering may bail out on
finding jump tables if the switch is too large (Btw it also may not
bail! It can happen that the greedy algorithms finds some bit tests
which then basically split the switch into multiple smaller switches and
those may be small enough to fit under the limit.)
The limit is implemented as --param switch-lower-slow-alg-max-cases.
Exceeding the limit is reported through -Wdisabled-optimization.
This patch fixes the issue with the greedy algorithm described in
PR117352. The problem was incorrect usage of the is_beneficial()
heuristic.
gcc/ChangeLog:
PR middle-end/117091
PR middle-end/117352
* doc/invoke.texi: Add switch-lower-slow-alg-max-cases.
* params.opt: Add switch-lower-slow-alg-max-cases.
* tree-switch-conversion.cc (jump_table_cluster::find_jump_tables):
Note in a comment that we are looking for jump tables in
case sequences delimited by the already found bit tests.
(bit_test_cluster::find_bit_tests): Decide between
find_bit_tests_fast() and find_bit_tests_slow().
(bit_test_cluster::find_bit_tests_fast): New function.
(bit_test_cluster::find_bit_tests_slow): New function.
(switch_decision_tree::analyze_switch_statement): Report
exceeding the limit.
* tree-switch-conversion.h: Add find_bit_tests_fast() and
find_bit_tests_slow().
Co-Authored-By: Andi Kleen <ak@gcc.gnu.org>
Signed-off-by: Filip Kastl <fkastl@suse.cz>
|
|
This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the
target can reject a simd_clone based on the vector mode it is using.
This is needed because for VLS SVE vectorization the vectorizer accepts
Advanced SIMD simd clones when vectorizing using SVE types because the simdlens
might match. This will cause type errors later on.
Other targets do not currently need to use this argument.
gcc/ChangeLog:
PR target/96342
* target.def (TARGET_SIMD_CLONE_USABLE): Add argument.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass stmt_info to
call TARGET_SIMD_CLONE_USABLE.
* config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument
and use it to reject the use of SVE simd clones with Advanced SIMD
modes.
* config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument.
* config/i386/i386.cc (ix86_simd_clone_usable): Likewise.
* doc/tm.texi: Regenerate
Co-authored-by: Victor Do Nascimento <victor.donascimento@arm.com>
Co-authored-by: Tamar Christina <tamar.christina@arm.com>
|
|
This patch removes the remaining traces of the vcond{,u,eq} optabs.
Earlier patches removed the target-independent uses and I couldn't
find any direct references to either the *_optabs or the ifns
in target-specific code.
gcc/
* doc/md.texi (vcond@var{m}@var{n}, vcondu@var{m}@var{n})
(vcondeq@var{m}@var{n}): Delete.
(vcond_mask_@var{m}@var{n}): Redocument in standalone form.
* internal-fn.def (VCOND, VCONDU, VCONDEQ): Delete.
* internal-fn.cc (expand_vec_cond_optab_fn): Delete.
* optabs.def (vcond_optab, vcondu_optab, vcondeq_optab): Delete.
|
|
This commit newly introduces the ability to use overloaded builtins in
C++ SFINAE context.
The goal behind this is in order to ensure there is a single mechanism
that libstdc++ can use to determine whether a given type can be used in
the atomic fetch_add (and similar) builtins. I am working on another
patch that hopes to use this mechanism to identify whether fetch_add
(and similar) work on floating point types.
Current state of the world:
GCC currently exposes resolved versions of these builtins to the
user, so for GCC it's currently possible to use tests similar to the
below to check for atomic loads on a 2 byte sized object.
#if __has_builtin(__atomic_load_2)
Clang does not expose resolved versions of the atomic builtins.
clang currently allows SFINAE on builtins, so that C++ code can
check whether a builtin is available on a given type.
GCC does not (and that is what this patch aims to change).
C libraries like libatomic can check whether a given atomic builtin
can work on a given type by using autoconf to check for a
miscompilation when attempting such a use.
My goal:
I would like to enable floating point fetch_add (and similar) in
GCC, in order to use those overloads in libstdc++ implementation of
atomic<float>::fetch_add.
This should allow compilers targeting GPU's which have floating
point fetch_add instructions to emit optimal code.
In order to do that I need some consistent mechanism that libstdc++
can use to identify whether the fetch_add builtins have floating
point overloads (and for which types these exist).
I would hence like to enable SFINAE on builtins, so that libstdc++
can use that mechanism for the floating point fetch_add builtins.
Implementation follows the existing mechanism for handling SFINAE
contexts in c-common.cc. A boolean is passed into the c-common.cc
function indicating whether these functions should emit errors or not.
This boolean comes from `complain & tf_error` in the C++ frontend.
(Similar to other functions like valid_array_size_p and
c_build_vec_perm_expr).
This is done both for resolve_overloaded_builtin and
check_builtin_function_arguments, both of which can be used in SFINAE
contexts.
I attempted to trigger something using the `reject_gcc_builtin`
function in an SFINAE context. Given the context where this
function is called from the C++ frontend it looks like it may be
possible, but I did not manage to trigger this in template context
by attempting to do something similar to the testcases added around
those calls.
- I would appreciate any feedback on whether this is something that
can happen in a template context, and if so some help writing a
relevant testcase for it.
Both of these functions have target hooks for target specific builtins
that I have updated to take the extra boolean flag. I have not adjusted
the functions implementing those target hooks (except to update the
declarations) so target specific builtins will still error in SFINAE
contexts.
- I could imagine not updating the target hook definition since nothing
would use that change. However I figure that allowing targets to
decide this behaviour would be the right thing to do eventually, and
since this is the target-independent part of the change to do that
this patch should make that change.
Could adjust if others disagree.
Other relevant points that I'd appreciate reviewers check:
- I did not pass this new flag through
atomic_bitint_fetch_using_cas_loop since the _BitInt type is not
available in the C++ frontend and I didn't want if conditions that can
not be executed in the source.
- I only test non-compile-time-constant types with SVE types, since I do
not know of a way to get a VLA into a SFINAE context.
- While writing tests I noticed a few differences with clang in this
area. I don't think they are problematic but am mentioning them for
completeness and to allow others to judge if these are a problem).
- atomic_fetch_add on a boolean is allowed by clang.
- When __atomic_load is passed an invalid memory model (i.e. too
large), we give an SFINAE failure while clang does not.
Bootstrap and regression tested on AArch64 and x86_64.
Built first stage on targets whose target hook declaration needed
updated (though did not regtest etc). Targets triplets I built in order
to check the backend specific changes I made:
- arm-none-linux-gnueabihf
- avr-linux-gnu
- riscv-linux-gnu
- powerpc-linux-gnu
- s390x-linux-gnu
Ok for commit to trunk?
gcc/c-family/ChangeLog:
* c-common.cc (builtin_function_validate_nargs,
check_builtin_function_arguments,
speculation_safe_value_resolve_call,
speculation_safe_value_resolve_params, sync_resolve_size,
sync_resolve_params, get_atomic_generic_size,
resolve_overloaded_atomic_exchange,
resolve_overloaded_atomic_compare_exchange,
resolve_overloaded_atomic_load, resolve_overloaded_atomic_store,
resolve_overloaded_builtin): Add `complain` boolean parameter
and determine whether to emit errors based on its value.
* c-common.h (check_builtin_function_arguments,
resolve_overloaded_builtin): Mention `complain` boolean
parameter in declarations. Give it a default of `true`.
gcc/ChangeLog:
* config/aarch64/aarch64-c.cc
(aarch64_resolve_overloaded_builtin,aarch64_check_builtin_call):
Add new unused boolean parameter to match target hook
definition.
* config/arm/arm-builtins.cc (arm_check_builtin_call): Likewise.
* config/arm/arm-c.cc (arm_resolve_overloaded_builtin):
Likewise.
* config/arm/arm-protos.h (arm_check_builtin_call): Likewise.
* config/avr/avr-c.cc (avr_resolve_overloaded_builtin):
Likewise.
* config/riscv/riscv-c.cc (riscv_check_builtin_call,
riscv_resolve_overloaded_builtin): Likewise.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Likewise.
* config/rs6000/rs6000-protos.h (altivec_resolve_overloaded_builtin):
Likewise.
* config/s390/s390-c.cc (s390_resolve_overloaded_builtin):
Likewise.
* doc/tm.texi: Regenerate.
* target.def (TARGET_RESOLVE_OVERLOADED_BUILTIN,
TARGET_CHECK_BUILTIN_CALL): Update prototype to include a
boolean parameter that indicates whether errors should be
emitted. Update documentation to mention this fact.
gcc/cp/ChangeLog:
* call.cc (build_cxx_call): Pass `complain` parameter to
check_builtin_function_arguments. Take its value from the
`tsubst_flags_t` type `complain & tf_error`.
* semantics.cc (finish_call_expr): Pass `complain` parameter to
resolve_overloaded_builtin. Take its value from the
`tsubst_flags_t` type `complain & tf_error`.
gcc/testsuite/ChangeLog:
* g++.dg/template/builtin-atomic-overloads.def: New test.
* g++.dg/template/builtin-atomic-overloads1.C: New test.
* g++.dg/template/builtin-atomic-overloads2.C: New test.
* g++.dg/template/builtin-atomic-overloads3.C: New test.
* g++.dg/template/builtin-atomic-overloads4.C: New test.
* g++.dg/template/builtin-atomic-overloads5.C: New test.
* g++.dg/template/builtin-atomic-overloads6.C: New test.
* g++.dg/template/builtin-atomic-overloads7.C: New test.
* g++.dg/template/builtin-atomic-overloads8.C: New test.
* g++.dg/template/builtin-sfinae-check-function-arguments.C: New test.
* g++.dg/template/builtin-speculation-overloads.def: New test.
* g++.dg/template/builtin-speculation-overloads1.C: New test.
* g++.dg/template/builtin-speculation-overloads2.C: New test.
* g++.dg/template/builtin-speculation-overloads3.C: New test.
* g++.dg/template/builtin-speculation-overloads4.C: New test.
* g++.dg/template/builtin-speculation-overloads5.C: New test.
* g++.dg/template/builtin-validate-nargs.C: New test.
Signed-off-by: Matthew Malcomson <mmalcomson@nvidia.com>
|
|
Since GCC 13 -fsanitize=hwaddress is not supported just on AArch64, but also
on x86_64 (but only with -mlam=u48 or -mlam=u57).
2024-12-09 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/117960
* doc/invoke.texi (fsanitize=hwaddress): Clarify on which targets
it is supported.
|
|
In preparation of GCC/nvptx code changes that require sm_52 features, this
commit raises nvptx code generation from sm_30 "Kepler" to sm_52 "Maxwell".
The latter has been supported as of CUDA 6.5 (2014-08), and is thus supported
by most Nvidia GPUs of the last decade, approximately. (This commit doesn't
change the use of PTX ISA 6.0, which already requires CUDA 9.0 anyway.)
To continue building sm_30 multilib variants (for use via building/linking with
'-march=sm_30'), specify '--with-multilib-list=default,sm_30', for example. Or,
to continue defaulting to sm_30 multilib variants, specify '--with-arch=sm_30'
(plus '--without-multilib-list', if applicable). See the documentation,
<https://gcc.gnu.org/install/specific.html#nvptx-x-none>.
(Note that after a long deprecation time, eventually the
sm_3x "Kepler architecture support is removed from CUDA 12.0", 2022-12.)
gcc/
* config.gcc [nvptx-*]: Switch default from '-march=sm_30' to
'-march=sm_52'.
* doc/install.texi (Nvidia PTX Options): Update.
|
|
Change location_t to be a 64-bit integer instead of a 32-bit integer in
libcpp.
Also included in this change are the two other patches in the original
series which depended on this one; I am committing them all at once in case
it needs to be reverted later:
-Support for 64-bit location_t: gimple parts
The size of struct gimple increased by 8 bytes with the change in size of
location_t from 32- to 64-bit; adjust the WORD markings in the comments
accordingly. It seems that most of the WORD markings were off by one already,
probably not having been updated after a previous reduction in the size of a
gimple, so they have become retroactively correct again, and only a couple
needed adjustment actually.
Also add a comment that there is now 32 bits of unused padding available in
struct gimple for 64-bit hosts.
-Support for 64-bit location_t: Remove -flarge-source-files
The option -flarge-source-files became unnecessary with 64-bit location_t
and harms performance compared to the new default setting, so silently
ignore it.
libcpp/ChangeLog:
* include/cpplib.h (struct cpp_token): Adjust comment about the
struct size.
* include/line-map.h (location_t): Change typedef from 32-bit to 64-bit
integer.
(LINE_MAP_MAX_COLUMN_NUMBER): Increase size to be appropriate for
64-bit location_t.
(LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES): Likewise.
(LINE_MAP_MAX_LOCATION_WITH_COLS): Likewise.
(LINE_MAP_MAX_LOCATION): Likewise.
(MAX_LOCATION_T): Likewise.
(line_map_suggested_range_bits): Likewise.
(struct line_map): Adjust comment about the struct size.
(struct line_map_macro): Likewise.
(struct line_map_ordinary): Likewise. Rearrange fields to optimize
padding.
gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/pr77949.C: Adapt the test for 64-bit location_t,
when the previously expected failure doesn't actually happen.
* g++.dg/modules/loc-prune-4.C: Adjust the expected output for the
64-bit location_t case.
* gcc.dg/plugin/expensive_selftests_plugin.cc: Don't try to test
the maximum supported column number in 64-bit location_t mode.
* gcc.dg/plugin/location_overflow_plugin.cc: Adjust the base_location
so it can effectively test 64-bit location_t.
gcc/ChangeLog:
* gimple.h (struct gphi): Update word marking comments to reflect
the new size of location_t.
(struct gimple): Likewise. Add a comment about padding.
* common.opt: Mark -flarge-source-files as Ignored.
* common.opt.urls: Regenerate.
* doc/invoke.texi: Remove -flarge-source-files.
* toplev.cc (process_options): Remove support for
-flarge-source-files.
|
|
This is v2 of the patch; v1 was here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655541.html
Changed in v2:
* added a new TARGET_DOCUMENTATION_NAME hook for figuring out which
documentation URL to use when there are multiple per-target docs,
such as for __attribute__((interrupt)); implemented this for all
targets that have target-specific attributes
* moved attribute_urlifier and its support code to a new
gcc-attribute-urlifier.cc since it needs to use targetm for the
above; gcc-urlifier.o is used by the driver.
* fixed extend.texi so that some attributes that failed to appear in
attr-urls.def now do so (affected nvptx "kernel" and "shared" attrs)
* regenerated attr-urls.def for the above fix, and bringing in
attributes added since v1 of the patch
In r14-5118-gc5db4d8ba5f3de I added a mechanism to automatically add
documentation URLs to quoted strings in diagnostics.
In r14-6920-g9e49746da303b8 I added a mechanism to generate URLs for
mentions of command-line options in quoted strings in diagnostics.
This patch does a similar thing for attributes. It adds a new Python 3
script to scrape the generated HTML looking for documentation of
attributes, and uses this to (re)generate a new gcc/attr-urls.def file.
Running "make regenerate-attr-urls" after rebuilding the HTML docs will
regenerate gcc/attr-urls.def in the source directory.
The patch uses this to optionally add doc URLs for attributes in any
diagnostic emitted during the lifetime of a auto_urlify_attributes
instance, and adds such instances everywhere that a diagnostic refers
to a diagnostic within quotes (based on grepping the source tree
for references to attributes in strings and in code).
For example, given:
$ ./xgcc -B. -S ../../src/gcc/testsuite/gcc.dg/attr-access-2.c
../../src/gcc/testsuite/gcc.dg/attr-access-2.c:14:16: warning:
attribute ‘access(read_write, 2, 3)’ positional argument 2 conflicts
with previous designation by argument 1 [-Wattributes]
with this patch the quoted text `access(read_write, 2, 3)'
automatically gains the URL for our docs for "access":
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-access-function-attribute
in a sufficiently modern terminal.
Like r14-6920-g9e49746da303b8 this avoids the Makefile target
depending on the generated HTML, since a missing URL is a minor
problem, whereas requiring all users to build HTML docs seems more
involved. Doing so also avoids Python 3 as a build requirement for
everyone, but instead just for developers addding attributes.
Like the options, we could add a CI test for this.
The patch gathers both general and target-specific attributes.
For example, the function attribute "interrupt" has 19 URLs within our
docs: one common, and 18 target-specific ones.
The patch adds a new target hook used when selecting the most
appropriate one.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
gcc/ChangeLog:
* Makefile.in (OBJS): Add -attribute-urlifier.o.
(ATTR_URLS_HTML_DEPS): New.
(regenerate-attr-urls): New.
(regenerate-attr-urls-unit-test): New.
* attr-urls.def: New file.
* attribs.cc: Include "gcc-urlifier.h".
(decl_attributes): Use auto_urlify_attributes.
* config/aarch64/aarch64.cc (TARGET_DOCUMENTATION_NAME): New.
* config/arc/arc.cc (TARGET_DOCUMENTATION_NAME): New.
* config/arm/arm.cc (TARGET_DOCUMENTATION_NAME): New.
* config/bfin/bfin.cc (TARGET_DOCUMENTATION_NAME): New.
* config/bpf/bpf.cc (TARGET_DOCUMENTATION_NAME): New.
* config/epiphany/epiphany.cc (TARGET_DOCUMENTATION_NAME): New.
* config/gcn/gcn.cc (TARGET_DOCUMENTATION_NAME): New.
* config/h8300/h8300.cc (TARGET_DOCUMENTATION_NAME): New.
* config/i386/i386.cc (TARGET_DOCUMENTATION_NAME): New.
* config/ia64/ia64.cc (TARGET_DOCUMENTATION_NAME): New.
* config/m32c/m32c.cc (TARGET_DOCUMENTATION_NAME): New.
* config/m32r/m32r.cc (TARGET_DOCUMENTATION_NAME): New.
* config/m68k/m68k.cc (TARGET_DOCUMENTATION_NAME): New.
* config/mcore/mcore.cc (TARGET_DOCUMENTATION_NAME): New.
* config/microblaze/microblaze.cc (TARGET_DOCUMENTATION_NAME):
New.
* config/mips/mips.cc (TARGET_DOCUMENTATION_NAME): New.
* config/msp430/msp430.cc (TARGET_DOCUMENTATION_NAME): New.
* config/nds32/nds32.cc (TARGET_DOCUMENTATION_NAME): New.
* config/nvptx/nvptx.cc (TARGET_DOCUMENTATION_NAME): New.
* config/riscv/riscv.cc (TARGET_DOCUMENTATION_NAME): New.
* config/rl78/rl78.cc (TARGET_DOCUMENTATION_NAME): New.
* config/rs6000/rs6000.cc (TARGET_DOCUMENTATION_NAME): New.
* config/rx/rx.cc (TARGET_DOCUMENTATION_NAME): New.
* config/s390/s390.cc (TARGET_DOCUMENTATION_NAME): New.
* config/sh/sh.cc (TARGET_DOCUMENTATION_NAME): New.
* config/stormy16/stormy16.cc (TARGET_DOCUMENTATION_NAME): New.
* config/v850/v850.cc (TARGET_DOCUMENTATION_NAME): New.
* config/visium/visium.cc (TARGET_DOCUMENTATION_NAME): New.
gcc/analyzer/ChangeLog:
* region-model.cc: Include "gcc-urlifier.h".
(reason_attr_access::emit): Use auto_urlify_attributes.
* sm-taint.cc: Include "gcc-urlifier.h".
(tainted_access_attrib_size::emit): Use auto_urlify_attributes.
gcc/c-family/ChangeLog:
* c-attribs.cc: Include "gcc-urlifier.h".
(positional_argument): Use auto_urlify_attributes.
* c-common.cc: Include "gcc-urlifier.h".
(parse_optimize_options): Use auto_urlify_attributes with
OPT_Wattributes.
(attribute_fallthrough_p): Use auto_urlify_attributes.
* c-warn.cc: Include "gcc-urlifier.h".
(diagnose_mismatched_attributes): Use auto_urlify_attributes.
gcc/c/ChangeLog:
* c-decl.cc: Include "gcc-urlifier.h".
(start_decl): Use auto_urlify_attributes with OPT_Wattributes.
(start_function): Likewise.
* c-parser.cc: Include "gcc-urlifier.h".
(c_parser_statement_after_labels): Use auto_urlify_attributes with
OPT_Wattributes.
* c-typeck.cc: Include "gcc-urlifier.h".
(maybe_warn_nodiscard): Use auto_urlify_attributes with
OPT_Wunused_result.
gcc/cp/ChangeLog:
* cp-gimplify.cc: Include "gcc-urlifier.h".
(process_stmt_hotness_attribute): Use auto_urlify_attributes with
OPT_Wattributes.
* cvt.cc: Include "gcc-urlifier.h".
(maybe_warn_nodiscard): Use auto_urlify_attributes with
OPT_Wunused_result.
* decl.cc: Include "gcc-urlifier.h".
(start_decl): Use auto_urlify_attributes.
(start_preparsed_function): Likewise.
gcc/ChangeLog:
* diagnostic.cc (diagnostic_context::override_urlifier): New.
* diagnostic.h (diagnostic_context::override_urlifier): New decl.
* doc/extend.texi (Nvidia PTX Function Attributes): Update
@cindex to specify that "kernel" is a function attribute and
"shared" is a variable attribute, so that these entries are
recognized by the regex in regenerate-attr-urls.py.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_DOCUMENTATION_NAME): New.
* gcc-attribute-urlifier.cc: New file.
* gcc-urlifier.cc: Include diagnostic.h.
(gcc_urlifier::make_doc): Convert to...
(make_doc_url): ...this.
(auto_override_urlifier::auto_override_urlifier): New.
(auto_override_urlifier::~auto_override_urlifier): New.
(selftest::gcc_urlifier_cc_tests): Split out body into...
(selftest::test_gcc_urlifier): ...this.
* gcc-urlifier.h: Include "pretty-print-urlifier.h" and "label-text.h".
(make_doc_url): New decl.
(class auto_override_urlifier): New.
(class attribute_urlifier): New.
(class auto_urlify_attributes): New.
* gimple-ssa-warn-access.cc: Include "gcc-urlifier.h".
(pass_waccess::execute): Use auto_urlify_attributes.
* gimplify.cc: Include "gcc-urlifier.h".
(expand_FALLTHROUGH): Use auto_urlify_attributes.
* internal-fn.cc: Define INCLUDE_MEMORY and include
"gcc-urlifier.h.
(expand_FALLTHROUGH): Use auto_urlify_attributes.
* ipa-pure-const.cc: Include "gcc-urlifier.h.
(suggest_attribute): Use auto_urlify_attributes.
* ipa-strub.cc: Include "gcc-urlifier.h.
(can_strub_p): Use auto_urlify_attributes.
* regenerate-attr-urls.py: New file.
* selftest-run-tests.cc (selftest::run_tests): Call
gcc_attribute_urlifier_cc_tests.
* selftest.h (selftest::gcc_attribute_urlifier_cc_tests): New
decl.
* target.def (documentation_name): New DEFHOOKPOD.
* tree-cfg.cc: Include "gcc-urlifier.h.
(do_warn_unused_result): Use auto_urlify_attributes.
* tree-ssa-uninit.cc: Include "gcc-urlifier.h.
(maybe_warn_read_write_only): Use auto_urlify_attributes.
(maybe_warn_pass_by_reference): Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
gcc/
* config/nvptx/nvptx-sm.def: Add '89'.
* config/nvptx/nvptx-gen.h: Regenerate.
* config/nvptx/nvptx-gen.opt: Likewise.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust.
* config/nvptx/nvptx.opt (-march-map=sm_89, -march-map=sm_90)
(march-map=sm_90a): Likewise.
* config.gcc: Likewise.
* doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_89'.
* config/nvptx/gen-multilib-matches-tests: Extend.
gcc/testsuite/
* gcc.target/nvptx/march-map=sm_89.c: Adjust.
* gcc.target/nvptx/march-map=sm_90.c: Likewise.
* gcc.target/nvptx/march-map=sm_90a.c: Likewise.
* gcc.target/nvptx/march=sm_89.c: New.
libgomp/
* testsuite/libgomp.c/declare-variant-3-sm89.c: New.
* testsuite/libgomp.c/declare-variant-3.h: Adjust.
|
|
gcc/
* config/nvptx/nvptx-opts.h (enum ptx_version): Add
'PTX_VERSION_7_8'.
* config/nvptx/nvptx.cc (ptx_version_to_string)
(ptx_version_to_number): Adjust.
* config/nvptx/nvptx.h (TARGET_PTX_7_8): New.
* config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue'
'7.8' for 'PTX_VERSION_7_8'.
* doc/invoke.texi (Nvidia PTX Options): Document '-mptx=7.8'.
gcc/testsuite/
* gcc.target/nvptx/mptx=7.8.c: New.
|
|
gcc/
* config/nvptx/nvptx-sm.def: Add '52'.
* config/nvptx/nvptx-gen.h: Regenerate.
* config/nvptx/nvptx-gen.opt: Likewise.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust.
* config/nvptx/nvptx.opt (-march-map=sm_52): Likewise.
* config.gcc: Likewise.
* doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_52'.
* config/nvptx/gen-multilib-matches-tests: Extend.
gcc/testsuite/
* gcc.target/nvptx/march-map=sm_52.c: Adjust.
* gcc.target/nvptx/march=sm_52.c: New.
libgomp/
* testsuite/libgomp.c/declare-variant-3-sm52.c: New.
* testsuite/libgomp.c/declare-variant-3.h: Adjust.
|
|
gcc/
* config/nvptx/nvptx-sm.def: Add '37'.
* config/nvptx/nvptx-gen.h: Regenerate.
* config/nvptx/nvptx-gen.opt: Likewise.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust.
* config/nvptx/nvptx.opt (-march-map=sm_37, -march-map=sm_50):
Likewise.
* config.gcc: Likewise.
* doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_37'.
* config/nvptx/gen-multilib-matches-tests: Extend.
gcc/testsuite/
* gcc.target/nvptx/march-map=sm_37.c: Adjust.
* gcc.target/nvptx/march-map=sm_50.c: Likewise.
* gcc.target/nvptx/march-map=sm_52.c: Likewise.
* gcc.target/nvptx/march=sm_37.c: New.
libgomp/
* testsuite/libgomp.c/declare-variant-3-sm37.c: New.
* testsuite/libgomp.c/declare-variant-3.h: Adjust.
|
|
gcc/
* config/nvptx/nvptx-opts.h (enum ptx_version): Add
'PTX_VERSION_4_1'.
* config/nvptx/nvptx.cc (ptx_version_to_string)
(ptx_version_to_number): Adjust.
* config/nvptx/nvptx.h (TARGET_PTX_4_1): New.
* config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue'
'4.1' for 'PTX_VERSION_4_1'.
* doc/invoke.texi (Nvidia PTX Options): Document '-mptx=4.1'.
gcc/testsuite/
* gcc.target/nvptx/mptx=4.1.c: New.
|
|
'PTX_VERSION_4_2' was added in commit decde11183bdccc46587d6614b75f3d56a2f2e4a
"[nvptx] Choose -mptx default based on -misa" for use for '-march=sm_52'
('first_ptx_version_supporting_sm', 'PTX_ISA_SM53'), as documented by Nvidia.
However, '-mptx=4.2' wasn't exposed to the user, but there's no reason not to.
gcc/
* config/nvptx/nvptx.h (TARGET_PTX_4_2): New.
* config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue'
'4.2' for 'PTX_VERSION_4_2'.
* doc/invoke.texi (Nvidia PTX Options): Document '-mptx=4.2'.
gcc/testsuite/
* gcc.target/nvptx/mptx=4.2.c: New.
|
|
No change in behavior unless specifying it.
gcc/
* config.gcc: nvptx: Support '--with-multilib-list'.
* config/nvptx/gen-multilib-matches.sh: Adjust.
* configure.ac: Likewise.
* configure: Regenerate.
* doc/install.texi: Update.
* doc/invoke.texi: Align.
* config/nvptx/gen-multilib-matches-tests: Extend.
|
|
Sometimes we want to use default cmodel other than medlow. Add a GCC
configure option for that.
gcc/ChangeLog:
* config.gcc (riscv*-*-*): Add support for --with-cmodel configure option.
(all_defaults): Add cmodel.
* config/riscv/riscv.h (TARGET_DEFAULT_CMODEL): Remove.
* doc/install.texi: Document --with-cmodel configure option.
* doc/invoke.texi (-mcmodel): Mention --with-cmodel configure option.
Co-authored-by: Kito Cheng <kito.cheng@sifive.com>
|
|
This patch adds the CDE options support for the -mcpu=star-mc1.
The star-mc1 is an Armv8-m Mainline CPU supporting CDE feature.
gcc/ChangeLog:
* config/arm/arm-cpus.in (star-mc1): Add CDE options.
* doc/invoke.texi (cdecp options): Document for star-mc1.
Signed-off-by: Qingxin Zhong <arvin.zhong@armchina.com>
|
|
When -msplit-ldst is on, it may be possible to propagate __zero_reg__
to the sources of the new stores. For example, without this patch,
unsigned long lx;
void store_lsr17 (void)
{
lx >>= 17;
}
compiles to:
store_lsr17:
lds r26,lx+2 ; movqi_insn
lds r27,lx+3 ; movqi_insn
movw r24,r26 ; *movhi
lsr r25 ; *lshrhi3_const
ror r24
ldi r26,0 ; movqi_insn
ldi r27,0 ; movqi_insn
sts lx,r24 ; movqi_insn
sts lx+1,r25 ; movqi_insn
sts lx+2,r26 ; movqi_insn
sts lx+3,r27 ; movqi_insn
ret
but with this patch it becomes:
store_lsr17:
lds r26,lx+2 ; movqi_insn
lds r27,lx+3 ; movqi_insn
movw r24,r26 ; *movhi
lsr r25 ; *lshrhi3_const
ror r24
sts lx,r24 ; movqi_insn
sts lx+1,r25 ; movqi_insn
sts lx+2,__zero_reg__ ; movqi_insn
sts lx+3,__zero_reg__ ; movqi_insn
ret
gcc/
PR target/107957
* config/avr/avr-passes-fuse-move.h (bbinfo_t) <try_mem0_p>:
Add static property.
* config/avr/avr-passes.cc (bbinfo_t::try_mem0_p): Define it.
(optimize_data_t::try_mem0): New method.
(bbinfo_t::optimize_one_block) [bbinfo_t::try_mem0_p]: Run try_mem0.
(bbinfo_t::optimize_one_function): Set bbinfo_t::try_mem0_p.
* config/avr/avr.md (pushhi1_insn): Also allow zero as source.
(define_split) [avropt_split_ldst]: Only run avr_split_ldst()
when avr-fuse-move has been run at least once.
* doc/invoke.texi (AVR Options) <-msplit-ldst>: Document it.
|
|
gcc/ChangeLog:
* doc/invoke.texi: Add store-forwarding-max-distance.
Signed-off-by: Filip Kastl <fkastl@suse.cz>
|
|
In the Cauldron IPA/LTO BoF we've discussed toplevel asms and it was
discussed it would be nice to tell the compiler something about what
the toplevel asm does. Sure, I'm aware the kernel people said they
aren't willing to use something like that, but perhaps other projects
do. And for kernel perhaps we should add some new option which allows
some dumb parsing of the toplevel asms and gather something from that
parsing.
The following patch is just a small step towards that, namely, allow
some subset of extended inline asm outside of functions.
The patch is unfinished, LTO streaming (out/in) of the ASM_EXPRs isn't
implemented (it emits a sorry diagnostics), nor any cgraph/varpool
changes to find out references etc.
The patch allows something like:
int a[2], b;
enum { E1, E2, E3, E4, E5 };
struct S { int a; char b; long long c; };
asm (".section blah; .quad %P0, %P1, %P2, %P3, %P4; .previous"
: : "m" (a), "m" (b), "i" (42), "i" (E4), "i" (sizeof (struct S)));
Even for non-LTO, that could be useful e.g. for getting enumerators from
C/C++ as integers into the toplevel asm, or sizeof/offsetof etc.
The restrictions I've implemented are:
1) asm qualifiers aren't still allowed, so asm goto or asm inline can't be
specified at toplevel, asm volatile has the volatile ignored for C++ with
a warning and is an error in C like before
2) I see good use for mainly input operands, output maybe to make it clear
that the inline asm may write some memory, I don't see a good use for
clobbers, so the patch doesn't allow those (and of course labels because
asm goto can't be specified)
3) the patch allows only constraints which don't allow registers, so
typically "m" or "i" or other memory or immediate constraints; for
memory, it requires that the operand is addressable and its address
could be used in static var initializer (so that no code actually
needs to be emitted for it), for others that they are constants usable
in the static var initializers
4) the patch disallows + (there is no reload of the operands, so I don't
see benefits of tying some operands together), nor % (who cares if
something is commutative in this case), or & (again, no code is emitted
around the asm), nor the 0-9 constraints
Right now there is no way to tell the compiler that the inline asm defines
some symbol, that is implemented in a later patch, as : constraint.
Similarly, the c modifier doesn't work in all cases and the cc modifier
is implemented separately.
2024-12-05 Jakub Jelinek <jakub@redhat.com>
PR c/41045
gcc/
* output.h (insn_noperands): Declare.
* final.cc (insn_noperands): No longer static.
* varasm.cc (assemble_asm): Handle ASM_EXPR.
* lto-streamer-out.cc (lto_output_toplevel_asms): Add sorry_at
for non-STRING_CST toplevel asm for now.
* doc/extend.texi (Basic @code{asm}, Extended @code{asm}): Document
that extended asm is now allowed outside of functions with certain
restrictions.
gcc/c/
* c-parser.cc (c_parser_asm_string_literal): Add forward declaration.
(c_parser_asm_definition): Parse also extended asm without
clobbers/labels.
* c-typeck.cc (build_asm_expr): Allow extended asm outside of
functions and check extra restrictions.
gcc/cp/
* cp-tree.h (finish_asm_stmt): Add TOPLEV_P argument.
* parser.cc (cp_parser_asm_definition): Parse also extended asm
without clobbers/labels outside of functions.
* semantics.cc (finish_asm_stmt): Add TOPLEV_P argument, if set,
check extra restrictions for extended asm outside of functions.
* pt.cc (tsubst_stmt): Adjust finish_asm_stmt caller.
gcc/testsuite/
* c-c++-common/toplevel-asm-1.c: New test.
* c-c++-common/toplevel-asm-2.c: New test.
* c-c++-common/toplevel-asm-3.c: New test.
|
|
gcc/ChangeLog:
* doc/libgdiagnostics/topics/execution-paths.rst: Add '§' before
references to section of SARIF spec.
* doc/libgdiagnostics/topics/fix-it-hints.rst: Likewise.
* doc/libgdiagnostics/tutorial/01-hello-world.rst: Fix typo.
* doc/libgdiagnostics/tutorial/02-physical-locations.rst: Likewise.
* doc/libgdiagnostics/tutorial/04-notes.rst: Likewise.
* doc/libgdiagnostics/tutorial/06-fix-it-hints.rst: Add link to
diagnostic_add_fix_it_hint_replace.
* doc/libgdiagnostics/tutorial/07-execution-paths.rst: Add '§'.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
sched1 computes ECC (Excess Change Cost) for each insn, which represents
the register pressure attributed to the insn.
Currently the pressure sensitive scheduling algorithm deliberately ignores
negative ECC values (pressure reduction), making them 0 (neutral), leading
to more spills. This happens due to the assumption that the compiler has
a reasonably accurate processor pipeline scheduling model and thus tries
to aggresively fill pipeline bubbles with spill slots.
This however might not be true, as the model might not be available for
certains uarches or even applicable especially for modern out-of-order cores.
The existing heuristic induces spill frenzy on RISC-V, noticably so on
SPEC2017 507.Cactu. If insn scheduling is disabled completely, the
total dynamic icounts for this workload are reduced in half from
~2.5 trillion insns to ~1.3 (w/ -fno-schedule-insns).
This patch adds --param=cycle-accurate-model={0,1} to gate the spill
behavior.
- The default (1) preserves existing spill behavior.
- targets/uarches sensitive to spilling can override the param to (0)
to get the reverse effect. RISC-V backend does so too.
The actual perf numbers are very promising.
(1) On RISC-V BPI-F3 in-order CPU, -Ofast -march=rv64gcv_zba_zbb_zbs:
Before:
------
Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':
4,917,712.97 msec task-clock:u # 1.000 CPUs utilized
5,314 context-switches:u # 1.081 /sec
3 cpu-migrations:u # 0.001 /sec
204,784 page-faults:u # 41.642 /sec
7,868,291,222,513 cycles:u # 1.600 GHz
2,615,069,866,153 instructions:u # 0.33 insn per cycle
10,799,381,890 branches:u # 2.196 M/sec
15,714,572 branch-misses:u # 0.15% of all branches
After:
-----
Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':
4,552,979.58 msec task-clock:u # 0.998 CPUs utilized
205,020 context-switches:u # 45.030 /sec
2 cpu-migrations:u # 0.000 /sec
204,221 page-faults:u # 44.854 /sec
7,285,176,204,764 cycles:u (7.4% faster) # 1.600 GHz
2,145,284,345,397 instructions:u (17.96% fewer) # 0.29 insn per cycle
10,799,382,011 branches:u # 2.372 M/sec
16,235,628 branch-misses:u # 0.15% of all branches
(2) Wilco reported 20% perf gains on aarch64 Neoverse V2 runs.
gcc/ChangeLog:
PR target/11472
* params.opt (--param=cycle-accurate-model=): New opt.
* doc/invoke.texi (cycle-accurate-model): Document.
* haifa-sched.cc (model_excess_group_cost): Return negative
delta if param_cycle_accurate_model is 0.
(model_excess_cost): Ceil negative baseECC to 0 only if
param_cycle_accurate_model is 1.
Dump the actual ECC value.
* config/riscv/riscv.cc (riscv_option_override): Set param
to 0.
gcc/testsuite/ChangeLog:
PR target/114729
* gcc.target/riscv/riscv.exp: Enable new tests to build.
* gcc.target/riscv/sched1-spills/spill1.cpp: Add new test.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
|
|
gcc/ChangeLog:
* doc/libgdiagnostics/conf.py: Remove "author". Change
"copyright" field to the FSF.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
The AArch64 FEAT_LUT extension is optional from Armv9.2-A and mandatory
from Armv9.5-A. It introduces instructions for lookup table reads with
bit indices.
This patch adds support for AdvSIMD lut intrinsics. The intrinsics for
this extension are implemented as the following builtin functions:
* vluti2{q}_lane{q}_{u8|s8|p8}
* vluti2{q}_lane{q}_{u16|s16|p16|f16|bf16}
* vluti4q_lane{q}_{u8|s8|p8}
* vluti4q_lane{q}_{u16|s16|p16|f16|bf16}_x2
We also introduced a new approach to do lane checks for AdvSIMD.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc
(aarch64_builtin_signatures): Add binary_lane.
(aarch64_fntype): Handle it.
(simd_types): Add 16-bit x2 types.
(aarch64_pragma_builtins_checker): New class.
(aarch64_general_check_builtin_call): Use it.
(aarch64_expand_pragma_builtin): Add support for lut unspecs.
* config/aarch64/aarch64-option-extensions.def
(AARCH64_OPT_EXTENSION): Add lut option.
* config/aarch64/aarch64-simd-pragma-builtins.def
(ENTRY_BINARY_LANE): Modify to use new ENTRY macro.
(ENTRY_TERNARY_VLUT8): Macro to declare lut intrinsics.
(ENTRY_TERNARY_VLUT16): Macro to declare lut intrinsics.
(REQUIRED_EXTENSIONS): Declare lut intrinsics.
* config/aarch64/aarch64-simd.md
(@aarch64_<vluti_uns_op><VLUT:mode><VB:mode>): Instruction
pattern for luti2 and luti4 intrinsics.
(@aarch64_lutx2<VLUT:mode><VB:mode>): Instruction pattern for
luti4x2 intrinsics.
* config/aarch64/aarch64.h
(TARGET_LUT): lut flag.
* config/aarch64/iterators.md: Iterators and attributes for lut.
* doc/invoke.texi: Document extension in AArch64 Options.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/lut-incorrect-range.c: New test.
* gcc.target/aarch64/simd/lut-no-flag.c: New test.
* gcc.target/aarch64/simd/lut.c: New test.
Co-authored-by: Vladimir Miloserdov <vladimir.miloserdov@arm.com>
Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
|
|
gcc/ChangeLog:
* doc/libgdiagnostics/tutorial/01-hello-world.rst: Update linker
command for renaming.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
This patch adds a new compiler pass aimed at identifying naive CRC
implementations, characterized by the presence of a loop calculating
a CRC (polynomial long division). Upon detection of a potential CRC,
the pass prints an informational message.
Performs CRC optimization if optimization level is >= 2 and if
fno_gimple_crc_optimization given.
This pass is added for the detection and optimization of naive CRC
implementations, improving the efficiency of CRC-related computations.
This patch includes only initial fast checks for filtering out non-CRCs,
detected possible CRCs verification and optimization parts will be
provided in subsequent patches.
gcc/
* Makefile.in (OBJS): Add gimple-crc-optimization.o.
* common.opt (foptimize-crc): New option.
* common.opt.urls: Regenerate to add foptimize-crc.
* doc/invoke.texi (-foptimize-crc): Add documentation.
* gimple-crc-optimization.cc: New file.
* opts.cc (default_options_table): Add OPT_foptimize_crc.
(enable_fdo_optimizations): Enable optimize_crc.
* passes.def (pass_crc_optimization): Add new pass.
* timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar.
* tree-pass.h (make_pass_crc_optimization): New extern function
declaration.
|
|
A few targets have been using "unsigned int" function arguments that need to
receive a "location_t". Change to "location_t" to prepare for the
possibility that location_t can be configured to be a different type.
gcc/ChangeLog:
* config/aarch64/aarch64-c.cc (aarch64_resolve_overloaded_builtin):
Change "unsigned int" argument to "location_t".
* config/avr/avr-c.cc (avr_resolve_overloaded_builtin): Likewise.
* config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): Likewise.
* target.def: Likewise.
* doc/tm.texi: Regenerate.
|
|
While we are already in stage3, I wonder if implementing this small paper
wouldn't be useful even for GCC 15, so that we have in the GCC world one
extra year of deprecation of variadic ellipsis without preceding comma.
The paper just deprecates something, I'd hope most of the C++ code in the
wild when it uses variadic functions at all uses the comma before the
ellipsis.
2024-11-30 Jakub Jelinek <jakub@redhat.com>
gcc/c-family/
* c.opt (Wdeprecated-variadic-comma-omission): New option.
* c.opt.urls: Regenerate.
* c-opts.cc (c_common_post_options): Default to
-Wdeprecated-variadic-comma-omission for C++26 or -Wpedantic.
gcc/cp/
* parser.cc: Implement C++26 P3176R1 - The Oxford variadic comma.
(cp_parser_parameter_declaration_clause): Emit
-Wdeprecated-variadic-comma-omission warnings.
gcc/
* doc/invoke.texi (-Wdeprecated-variadic-comma-omission): Document.
gcc/testsuite/
* g++.dg/cpp26/variadic-comma1.C: New test.
* g++.dg/cpp26/variadic-comma2.C: New test.
* g++.dg/cpp26/variadic-comma3.C: New test.
* g++.dg/cpp26/variadic-comma4.C: New test.
* g++.dg/cpp26/variadic-comma5.C: New test.
* g++.dg/cpp1z/fold10.C: Expect a warning for C++26.
* g++.dg/ext/attrib33.C: Likewise.
* g++.dg/cpp1y/lambda-generic-variadic19.C: Likewise.
* g++.dg/cpp2a/lambda-generic10.C: Likewise.
* g++.dg/cpp0x/lambda/lambda-const3.C: Likewise.
* g++.dg/cpp0x/variadic164.C: Likewise.
* g++.dg/cpp0x/variadic17.C: Likewise.
* g++.dg/cpp0x/udlit-args-neg.C: Likewise.
* g++.dg/cpp0x/variadic28.C: Likewise.
* g++.dg/cpp0x/gen-attrs-33.C: Likewise.
* g++.dg/cpp23/explicit-obj-diagnostics3.C: Likewise.
* g++.old-deja/g++.law/operators15.C: Likewise.
* g++.old-deja/g++.mike/p811.C: Likewise.
* g++.old-deja/g++.mike/p12306.C (printf): Add , before ... .
* g++.dg/analyzer/fd-bind-pr107783.C (bind): Likewise.
* g++.dg/cpp0x/vt-65790.C (printf): Likewise.
libstdc++-v3/
* include/std/functional (_Bind_check_arity): Add , before ... .
* include/bits/refwrap.h (_Mem_fn_traits, _Weak_result_type_impl):
Likewise.
* include/tr1/type_traits (is_function): Likewise.
|
|
"libdiagnostics" clashes with an existing soname in Debian, as
per:
https://gcc.gnu.org/pipermail/gcc/2024-November/245175.html
Rename it to "libgdiagnostics" for uniqueness.
I am being deliberately vague about what the "g" stands for:
it could be "gnu", "gcc", or "gpl-licensed" as the reader desires.
ChangeLog:
* configure.ac: Rename "libdiagnostics" to "libgdiagnostics".
* configure: Regenerate.
gcc/ChangeLog:
* Makefile.in: Rename "libdiagnostics" to "libgdiagnostics".
* configure.ac: Likewise.
* configure: Regenerate.
* doc/install.texi: Rename "libdiagnostics" to
"libgdiagnostics".
* doc/libdiagnostics/*: Rename to doc/libgdiagnostics, renaming
"libdiagnostics" to "libgdiagnostics" throughout.
* libdiagnostics++.h: Rename to...
* libgdiagnostics++.h: ...this, renaming "libdiagnostics" to
"libgdiagnostics" throughout.
* libdiagnostics.cc: Rename to...
* libgdiagnostics.cc: ...this, renaming "libdiagnostics" to
"libgdiagnostics" throughout.
* libdiagnostics.h: Rename to...
* libgdiagnostics.h: ...this, renaming "libdiagnostics" to
"libgdiagnostics" throughout.
* libdiagnostics.map: Rename to...
* libgdiagnostics.map: ...this, renaming "libdiagnostics" to
"libgdiagnostics" throughout.
* libsarifreplay.cc: Update for renaming of "libdiagnostics"
to "libgdiagnostics".
* libsarifreplay.h: Likewise.
* sarif-replay.cc: Likewise.
gcc/testsuite/ChangeLog:
* libdiagnostics.dg/*: Rename to libgdiagnostics.dg, renaming
"libdiagnostics" to "libgdiagnostics" throughout.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
This patch splits 2-byte and 3-byte shifts after reload into
a 3-operand byte shift and a residual 2-operand shift.
The "2op" shift insn alternatives are not needed and removed because
all shift insn already have a "r,0,n" alternative that does the job.
PR target/117726
gcc/
* config/avr/avr-passes.cc (avr_shift_is_3op, avr_emit_shift):
Also handle 2-byte and 3-byte shifts.
(avr_split_shift4, avr_split_shift3, avr_split_shift2): New
local helper functions.
(avr_split_shift): Use them.
* config/avr/avr-passes.def (avr_pass_split_after_peephole2):
Adjust comments.
* config/avr/avr.cc (avr_out_ashlpsi3, avr_out_ashrpsi3)
(avr_out_lshrpsi3): Support offset 15.
(ashrhi3_out): Support offset 7 as 3-op.
(ashrsi3_out): Support offset 15.
(avr_rtx_costs_1): Adjust shift costs.
* config/avr/avr.md (2op): Remove attribute value and all such insn
alternatives.
(ashlhi3, *ashlhi3, *ashlhi3_const): Add 3-op alternatives like C2l.
(ashrhi3, *ashrhi3, *ashrhi3_const): Add 3-op alternatives like C2a.
(lshrhi3, *lshrhi3, *lshrhi3_const): Add 3-op alternatives like C2r.
(*ashlpsi3_split, *ashlpsi3): Add 3-op alternatives C15 and C3l.
(*ashrpsi3_split, *ashrpsi3): Add 3-op alternatives C15 and C3r.
(*lshrpsi3_split, *lshrpsi3): Add 3-op alternatives C15 and C3r.
(ashlsi3, *ashlsi3, *ashlsi3_const): Remove "2op" alternative.
(ashrsi3, *ashrsi3, *ashrsi3_const): Same.
(lshrsi3, *lshrsi3, *lshrsi3_const): Same.
(constr_split_suffix): Code attr morphed from constr_split_shift4.
* config/avr/constraints.md (C2a, C2r, C2l)
(C3a, C3r, C3l): New constraints.
* doc/invoke.texi (AVR Options) <-msplit-bit-shift>: Adjust doc.
|
|
If the target is ZBC or ZBKC, it uses clmul instruction for the CRC
calculation. Otherwise, if the target is ZBKB, generates table-based
CRC, but for reversing inputs and the output uses bswap and brev8
instructions. Add new tests to check CRC generation for ZBC, ZBKC and
ZBKB targets.
gcc/
* expr.cc (gf2n_poly_long_div_quotient): New function.
* expr.h (gf2n_poly_long_div_quotient): New function declaration.
* hwint.cc (reflect_hwi): New function.
* hwint.h (reflect_hwi): New function declaration.
* config/riscv/bitmanip.md (crc_rev<ANYI1:mode><ANYI:mode>4): New
expander for reversed CRC.
(crc<SUBX1:mode><SUBX:mode>4): New expander for bit-forward CRC.
* config/riscv/iterators.md (SUBX1, ANYI1): New iterators.
* config/riscv/riscv-protos.h (generate_reflecting_code_using_brev):
New function declaration.
(expand_crc_using_clmul): Likewise.
(expand_reversed_crc_using_clmul): Likewise.
* config/riscv/riscv.cc (generate_reflecting_code_using_brev): New
function.
(expand_crc_using_clmul): Likewise.
(expand_reversed_crc_using_clmul): Likewise.
* config/riscv/riscv.md (UNSPEC_CRC, UNSPEC_CRC_REV): New unspecs.
* doc/sourcebuild.texi: Document new target selectors.
gcc/testsuite
* lib/target-supports.exp (check_effective_target_riscv_zbc): New
target supports predicate.
(check_effective_target_riscv_zbkb): Likewise.
(check_effective_target_riscv_zbkc): Likewise.
(check_effective_target_zbc_ok): Likewise.
(check_effective_target_zbkb_ok): Likewise.
(check_effective_target_zbkc_ok): Likewise.
(riscv_get_arch): Add zbkb and zbkc support.
* gcc.target/riscv/crc-builtin-zbc32.c: New file.
* gcc.target/riscv/crc-builtin-zbc64.c: Likewise.
Co-author: Jeff Law <jlaw@ventanamicro.com>
|
|
This patch makes it so that when you use any of the Cortex-A53 errata
workarounds but have specified an -march or -mcpu we know is not affected by it
that we suppress the errata workaround.
This is a driver only patch as the linker invocation needs to be changed as
well. The linker and cc SPECs are different because for the linker we didn't
seem to add an inversion flag for the option. That said, it's also not possible
to configure the linker with it on by default. So not passing the flag is
sufficient to turn it off.
For the compilers however we have an inversion flag using -mno-, which is needed
to disable the workarounds when the compiler has been configured with it by
default.
In case it's unclear how the patch does what it does (it took me a while to
figure out the syntax):
* Early matching will replace any -march=native or -mcpu=native with their
expanded forms and erases the native arguments from the buffer.
* Due to the above if we ensure we handle the new code after this erasure then
we only have to handle the expanded form.
* The expanded form needs to handle -march=<arch>+extensions and
-mcpu=<cpu>+extensions and so we can't use normal string matching but
instead use strstr with a custom driver function that's common between
native and non-native builds.
* For the compilers we output -mno-<workaround> and for the linker we just
erase the --fix-<workaround> option.
* The extra internal matching, e.g. the duplicate match of mcpu inside:
mcpu=*:%{%:is_local_not_armv8_base(%{mcpu=*:%*}) is so we can extract the glob
using %* because the outer match would otherwise reset at the %{. The reason
for the outer glob at all is to skip the block early if no matches are found.
The workaround has the effect of suppressing certain inlining and multiply-add
formation which leads to about ~1% SPECCPU 2017 Intrate regression on modern
cores. This patch is needed because most distros configure GCC with the
workaround enabled by default.
Expected output:
> gcc -mcpu=neoverse-v1 -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l
0
> gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l
5
> gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l
5
> gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l
0
> gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-\-fix" | wc -l
0
> gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-\-fix" | wc -l
1
> -gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-\-fix" | wc -l
1
gcc/ChangeLog:
* config/aarch64/aarch64-errata.h (TARGET_SUPPRESS_OPT_SPEC,
TARGET_TURN_OFF_OPT_SPEC, CA53_ERR_835769_COMPILE_SPEC,
CA53_ERR_843419_COMPILE_SPEC): New.
(CA53_ERR_835769_SPEC, CA53_ERR_843419_SPEC): Use them.
* config/aarch64/aarch64-elf-raw.h (CC1_SPEC, CC1PLUS_SPEC): Add
AARCH64_ERRATA_COMPILE_SPEC.
* config/aarch64/aarch64-freebsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* config/aarch64/aarch64-gnu.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* config/aarch64/aarch64-linux.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* config/aarch64/aarch64-netbsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* common/config/aarch64/aarch64-common.cc
(is_host_cpu_not_armv8_base): New.
* config/aarch64/driver-aarch64.cc: Remove extra newline
* config/aarch64/aarch64.h (is_host_cpu_not_armv8_base): New.
(MCPU_TO_MARCH_SPEC_FUNCTIONS): Add is_local_not_armv8_base.
(EXTRA_SPEC_FUNCTIONS): Add is_local_cpu_armv8_base.
* doc/invoke.texi: Document it.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpunative/info_30: New test.
* gcc.target/aarch64/cpunative/info_31: New test.
* gcc.target/aarch64/cpunative/info_32: New test.
* gcc.target/aarch64/cpunative/info_33: New test.
* gcc.target/aarch64/cpunative/native_cpu_30.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_31.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_32.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_33.c: New test.
* gcc.target/aarch64/erratas_opt_0.c: New test.
* gcc.target/aarch64/erratas_opt_1.c: New test.
* gcc.target/aarch64/erratas_opt_10.c: New test.
* gcc.target/aarch64/erratas_opt_11.c: New test.
* gcc.target/aarch64/erratas_opt_12.c: New test.
* gcc.target/aarch64/erratas_opt_13.c: New test.
* gcc.target/aarch64/erratas_opt_14.c: New test.
* gcc.target/aarch64/erratas_opt_15.c: New test.
* gcc.target/aarch64/erratas_opt_2.c: New test.
* gcc.target/aarch64/erratas_opt_3.c: New test.
* gcc.target/aarch64/erratas_opt_4.c: New test.
* gcc.target/aarch64/erratas_opt_5.c: New test.
* gcc.target/aarch64/erratas_opt_6.c: New test.
* gcc.target/aarch64/erratas_opt_7.c: New test.
* gcc.target/aarch64/erratas_opt_8.c: New test.
* gcc.target/aarch64/erratas_opt_9.c: New test.
|
|
This patch adds support for the following intrinsics:
- svdot[_f32_mf8]_fpm
- svdot_lane[_f32_mf8]_fpm
- svdot[_f16_mf8]_fpm
- svdot_lane[_f16_mf8]_fpm
The first two are available under a combination of the FP8DOT4 and SVE2 features.
Alternatively under the SSVE_FP8DOT4 feature under streaming mode.
The final two are available under a combination of the FP8DOT2 and SVE2 features.
Alternatively under the SSVE_FP8DOT2 feature under streaming mode.
gcc/
* config/aarch64/aarch64-option-extensions.def
(fp8dot4, ssve-fp8dot4): Add new extensions.
(fp8dot2, ssve-fp8dot2): Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl): Support fp8.
(svdotprod_lane_impl): Likewise.
(svdot_lane): Provide an unspec for fp8 types.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(ternary_mfloat8_def): Add new class.
(ternary_mfloat8): Add new shape.
(ternary_mfloat8_lane_group_selection_def): Add new class.
(ternary_mfloat8_lane_group_selection): Add new shape.
* config/aarch64/aarch64-sve-builtins-shapes.h
(ternary_mfloat8, ternary_mfloat8_lane_group_selection): Declare.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svdot, svdot_lane): Add new DEF_SVE_FUNCTION_GS_FPM, twice to deal
with the combination of features providing support for 32 and 16 bit
floating point.
* config/aarch64/aarch64-sve2.md (@aarch64_sve_dot<mode>): Add new.
(@aarch64_sve_dot_lane<mode>): Likewise.
* config/aarch64/aarch64.h:
(TARGET_FP8DOT4, TARGET_SSVE_FP8DOT4): Add new defines.
(TARGET_FP8DOT2, TARGET_SSVE_FP8DOT2): Likewise.
* config/aarch64/iterators.md
(UNSPEC_DOT_FP8, UNSPEC_DOT_LANE_FP8): Add new unspecs.
* doc/invoke.texi: Document fp8dot4, fp8dot2, ssve-fp8dot4, ssve-fp8dot2
extensions.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c: Add new.
gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c:
Likewise.
* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/dot_mf8.c: Likewise.
* lib/target-supports.exp: Add dg-require-effective-target support for
aarch64_asm_fp8dot2_ok, aarch64_asm_fp8dot4_ok,
aarch64_asm_ssve-fp8dot2_ok and aarch64_asm_ssve-fp8dot4_ok.
|
|
This patch adds support for the following intrinsics:
- svmlalb[_f16_mf8]_fpm
- svmlalb[_n_f16_mf8]_fpm
- svmlalt[_f16_mf8]_fpm
- svmlalt[_n_f16_mf8]_fpm
- svmlalb_lane[_f16_mf8]_fpm
- svmlalt_lane[_f16_mf8]_fpm
- svmlallbb[_f32_mf8]_fpm
- svmlallbb[_n_f32_mf8]_fpm
- svmlallbt[_f32_mf8]_fpm
- svmlallbt[_n_f32_mf8]_fpm
- svmlalltb[_f32_mf8]_fpm
- svmlalltb[_n_f32_mf8]_fpm
- svmlalltt[_f32_mf8]_fpm
- svmlalltt[_n_f32_mf8]_fpm
- svmlallbb_lane[_f32_mf8]_fpm
- svmlallbt_lane[_f32_mf8]_fpm
- svmlalltb_lane[_f32_mf8]_fpm
- svmlalltt_lane[_f32_mf8]_fpm
These are available under a combination of the FP8FMA and SVE2 features.
Alternatively under the SSVE_FP8FMA feature under streaming mode.
gcc/
* config/aarch64/aarch64-option-extensions.def
(fp8fma, ssve-fp8fma): Add new options.
* config/aarch64/aarch64-sve-builtins-functions.h
(unspec_based_function_base): Add unspec_for_mfp8.
(unspec_for): Return unspec_for_mfp8 on fpm-using cases.
(sme_1mode_function): Fix call to parent ctor.
(sme_2mode_function_t): Likewise.
(unspec_based_mla_function, unspec_based_mla_lane_function): Handle
fpm-using cases.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_element_type): Treat M as TYPE_SUFFIX_mf8
(ternary_mfloat8_lane_def): Add new class.
(ternary_mfloat8_opt_n_def): Likewise.
(ternary_mfloat8_lane): Add new shape.
(ternary_mfloat8_opt_n): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.h
(ternary_mfloat8_lane, ternary_mfloat8_opt_n): Declare.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svmlalb_lane, svmlalb, svmlalt_lane, svmlalt): Update definitions
with mfloat8_t unspec in ctor.
(svmlallbb_lane, svmlallbb, svmlallbt_lane, svmlallbt, svmlalltb_lane,
svmlalltb, svmlalltt_lane, svmlalltt, svmlal_impl): Add new FUNCTIONs.
(svqrshr, svqrshrn, svqrshru, svqrshrun): Update definitions with
nop mfloat8 unspec in ctor.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svmlalb, svmlalt, svmlalb_lane, svmlalt_lane, svmlallbb, svmlallbt,
svmlalltb, svmlalltt, svmlalltt_lane, svmlallbb_lane, svmlallbt_lane,
svmlalltb_lane): Add new DEF_SVE_FUNCTION_GS_FPMs.
* config/aarch64/aarch64-sve-builtins-sve2.h
(svmlallbb_lane, svmlallbb, svmlallbt_lane, svmlallbt, svmlalltb_lane,
svmlalltb, svmlalltt_lane, svmlalltt): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(TYPES_h_float_mf8, TYPES_s_float_mf8): Add new types.
(h_float_mf8, s_float_mf8): Add new SVE_TYPES_ARRAY.
* config/aarch64/aarch64-sve2.md
(@aarch64_sve_add_<sve2_fp8_fma_op_vnx8hf><mode>): Add new.
(@aarch64_sve_add_<sve2_fp8_fma_op_vnx4sf><mode>): Add new.
(@aarch64_sve_add_lane_<sve2_fp8_fma_op_vnx8hf><mode>): Likewise.
(@aarch64_sve_add_lane_<sve2_fp8_fma_op_vnx4sf><mode>): Likewise.
* config/aarch64/aarch64.h
(TARGET_FP8FMA, TARGET_SSVE_FP8FMA): Likewise.
* config/aarch64/iterators.md
(VNx8HF_ONLY): Add new.
(UNSPEC_FMLALB_FP8, UNSPEC_FMLALLBB_FP8, UNSPEC_FMLALLBT_FP8,
UNSPEC_FMLALLTB_FP8, UNSPEC_FMLALLTT_FP8, UNSPEC_FMLALT_FP8): Likewise.
(SVE2_FP8_TERNARY_VNX8HF, SVE2_FP8_TERNARY_VNX4SF): Likewise.
(SVE2_FP8_TERNARY_LANE_VNX8HF, SVE2_FP8_TERNARY_LANE_VNX4SF): Likewise.
(sve2_fp8_fma_op_vnx8hf, sve2_fp8_fma_op_vnx4sf): Likewise.
* doc/invoke.texi: Document fp8fma and sve-fp8fma extensions.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h
(TEST_DUAL_Z_REV, TEST_DUAL_LANE_REG, TEST_DUAL_ZD) Add fpm0 argument.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_opt_n_1.c: Add
new shape test.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_1.c:
Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalb_lane_mf8.c: Add new test.
* gcc.target/aarch64/sve2/acle/asm/mlalb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalt_mf8.c: Likewise.
* lib/target-supports.exp: Add check_effective_target for fp8fma and
ssve-fp8fma
|
|
The r15-4833-ge9ab41b79933 patch had among tons of config/i386
specific changes also important change to the generic code, allowing
also 2 as valid value of the second argument of __builtin_prefetch:
- /* Argument 1 must be either zero or one. */
- if (INTVAL (op1) != 0 && INTVAL (op1) != 1)
+ /* Argument 1 must be 0, 1 or 2. */
+ if (INTVAL (op1) < 0 || INTVAL (op1) > 2)
But the patch failed to document that change in __builtin_prefetch
documentation, and more importantly didn't adjust any of the other
backends to deal with it (my understanding is the expected behavior
is that 2 will be silently handled as 0 unless backends have some
more specific way). Some of the backends would ICE on it, in some
cases gcc_assert failures/gcc_unreachable, in other cases crash later
(e.g. accessing arrays with that value as index and due to accessing
garbage after the array crashing at final.cc time), others treated 2
silently as 0, others treated 2 silently as 1.
And even in the i386 backend there were bugs which caused ICEs.
The patch added some if (write == 0) and write 2 handling into
a (badly indented, maybe that is the reason, if (write == 1) body),
rather than into the else side, so it would be always false.
The new *prefetch_rst2 define_insn only accepts parameters 2 1
(i.e. read-shared with moderate degree of locality), so in order
not to ICE the patch uses it only for __builtin_prefetch (ptr, 2, 1);
or __builtin_ia32_prefetch (ptr, 2, 1, 0); and not for other values
of the parameter. If that isn't what we want and we want it to be used
also for all or some of __builtin_prefetch (ptr, 2, {0,2,3}); and
corresponding __builtin_ia32_prefetch, maybe the define_insn could match
other values.
And there was another problem that -mno-mmx -mno-sse -mmovrs compilation
would ICE on most of the prefetches, so I had to add the FAIL; cases.
2024-11-29 Jakub Jelinek <jakub@redhat.com>
PR target/117608
* doc/extend.texi (__builtin_prefetch): Document that second
argument may be also 2 and its meaning.
* config/i386/i386.md (prefetch): Remove unreachable code.
Clear write set operands[1] to const0_rtx if !TARGET_MOVRS or
of locality is not 1. Formatting fixes.
* config/i386/i386-expand.cc (ix86_expand_builtin): Use IN_RANGE.
Call gen_prefetch even for TARGET_MOVRS.
* config/alpha/alpha.md (prefetch): Treat read_or_write 2 like 0.
* config/mips/mips.md (prefetch): Likewise.
* config/arc/arc.md (prefetch_1, prefetch_2, prefetch_3): Likewise.
* config/riscv/riscv.md (prefetch): Likewise.
* config/loongarch/loongarch.md (prefetch): Likewise.
* config/sparc/sparc.md (prefetch): Likewise. Use IN_RANGE.
* config/ia64/ia64.md (prefetch): Likewise.
* config/pa/pa.md (prefetch): Likewise.
* config/aarch64/aarch64.md (prefetch): Likewise.
* config/rs6000/rs6000.md (prefetch): Likewise.
* gcc.dg/builtin-prefetch-1.c (good): Add tests with second argument
2.
* gcc.target/i386/pr117608-1.c: New test.
* gcc.target/i386/pr117608-2.c: New test.
|
|
This patch introduces new built-in functions to GCC for computing
bit-forward and bit-reversed CRCs.
These builtins aim to provide efficient CRC calculation capabilities.
When the target architecture supports CRC operations (as indicated by the
presence of a CRC optab),
the builtins will utilize the expander to generate CRC code.
In the absence of hardware support, the builtins default to generating code
for a table-based CRC calculation.
The built-ins are defined as follows:
__builtin_rev_crc16_data8,
__builtin_rev_crc32_data8, __builtin_rev_crc32_data16,
__builtin_rev_crc32_data32
__builtin_rev_crc64_data8, __builtin_rev_crc64_data16,
__builtin_rev_crc64_data32, __builtin_rev_crc64_data64,
__builtin_crc8_data8,
__builtin_crc16_data16, __builtin_crc16_data8,
__builtin_crc32_data8, __builtin_crc32_data16, __builtin_crc32_data32,
__builtin_crc64_data8, __builtin_crc64_data16, __builtin_crc64_data32,
__builtin_crc64_data64
Each built-in takes three parameters:
crc: The initial CRC value.
data: The data to be processed.
polynomial: The CRC polynomial without the leading 1.
To validate the correctness of these built-ins, this patch also includes
additions to the GCC testsuite.
This enhancement allows GCC to offer developers high-performance CRC
computation options
that automatically adapt to the capabilities of the target hardware.
gcc/
* builtin-types.def (BT_FN_UINT8_UINT8_UINT8_CONST_SIZE): Define.
(BT_FN_UINT16_UINT16_UINT8_CONST_SIZE): Likewise.
(BT_FN_UINT16_UINT16_UINT16_CONST_SIZE): Likewise.
(BT_FN_UINT32_UINT32_UINT8_CONST_SIZE): Likewise.
(BT_FN_UINT32_UINT32_UINT16_CONST_SIZE): Likewise.
(BT_FN_UINT32_UINT32_UINT32_CONST_SIZE): Likewise.
(BT_FN_UINT64_UINT64_UINT8_CONST_SIZE): Likewise.
(BT_FN_UINT64_UINT64_UINT16_CONST_SIZE): Likewise.
(BT_FN_UINT64_UINT64_UINT32_CONST_SIZE): Likewise.
(BT_FN_UINT64_UINT64_UINT64_CONST_SIZE): Likewise.
* builtins.cc (associated_internal_fn): Handle CRC related builtins.
(expand_builtin_crc_table_based): New function.
(expand_builtin): Handle CRC related builtins.
* builtins.def (BUILT_IN_CRC8_DATA8): New builtin.
(BUILT_IN_CRC16_DATA8): Likewise.
(BUILT_IN_CRC16_DATA16): Likewise.
(BUILT_IN_CRC32_DATA8): Likewise.
(BUILT_IN_CRC32_DATA16): Likewise.
(BUILT_IN_CRC32_DATA32): Likewise.
(BUILT_IN_CRC64_DATA8): Likewise.
(BUILT_IN_CRC64_DATA16): Likewise.
(BUILT_IN_CRC64_DATA32): Likewise.
(BUILT_IN_CRC64_DATA64): Likewise.
(BUILT_IN_REV_CRC8_DATA8): New builtin.
(BUILT_IN_REV_CRC16_DATA8): Likewise.
(BUILT_IN_REV_CRC16_DATA16): Likewise.
(BUILT_IN_REV_CRC32_DATA8): Likewise.
(BUILT_IN_REV_CRC32_DATA16): Likewise.
(BUILT_IN_REV_CRC32_DATA32): Likewise.
(BUILT_IN_REV_CRC64_DATA8): Likewise.
(BUILT_IN_REV_CRC64_DATA16): Likewise.
(BUILT_IN_REV_CRC64_DATA32): Likewise.
(BUILT_IN_REV_CRC64_DATA64): Likewise.
* builtins.h (expand_builtin_crc_table_based): New function
declaration.
* doc/extend.texi: Add documentation for new CRC builtins.
gcc/testsuite/
* gcc.dg/crc-builtin-rev-target32.c: New test.
* gcc.dg/crc-builtin-rev-target64.c: New test.
* gcc.dg/crc-builtin-target32.c: New test.
* gcc.dg/crc-builtin-target64.c: New test.
Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com>
Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com>
Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
|
|
Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster
CRC generation.
One performs bit-forward and the other bit-reversed CRC computation.
If CRC optabs are supported, they are used for the CRC computation.
Otherwise, table-based CRC is generated.
The supported data and CRC sizes are 8, 16, 32, and 64 bits.
The polynomial is without the leading 1.
A table with 256 elements is used to store precomputed CRCs.
For the reflection of inputs and the output, a simple algorithm involving
SHIFT, AND, and OR operations is used.
gcc/
* doc/md.texi (crc@var{m}@var{n}4, crc_rev@var{m}@var{n}4): Document.
* expr.cc (calculate_crc): New function.
(assemble_crc_table): Likewise.
(generate_crc_table): Likewise.
(calculate_table_based_CRC): Likewise.
(expand_crc_table_based): Likewise.
(gen_common_operation_to_reflect): Likewise.
(reflect_64_bit_value): Likewise.
(reflect_32_bit_value): Likewise.
(reflect_16_bit_value): Likewise.
(reflect_8_bit_value): Likewise.
(generate_reflecting_code_standard): Likewise.
(expand_reversed_crc_table_based): Likewise.
* expr.h (generate_reflecting_code_standard): New function declaration.
(expand_crc_table_based): Likewise.
(expand_reversed_crc_table_based): Likewise.
* internal-fn.cc: (crc_direct): Define.
(direct_crc_optab_supported_p): Likewise.
(expand_crc_optab_fn): New function
* internal-fn.def (CRC, CRC_REV): New internal functions.
* optabs.def (crc_optab, crc_rev_optab): New optabs.
Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com>
Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com>
Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
|
|
The PR14311 commit which added support for __sync_* builtins documented that
there is a warning if a particular operation cannot be implemented.
But that commit nor anything later on implemented such warning, it was
always silent generation of the mentioned calls (which can in most cases
result in linker errors of course because those functions aren't implemented
anywhere, in libatomic or elsewhere in code shipped in gcc).
So, the following patch just adjust the documentation to match the
implementation.
2024-11-28 Jakub Jelinek <jakub@redhat.com>
PR target/117642
* doc/extend.texi: Remove documentation of warning for unimplemented
__sync_* operations, such warning has never been implemented.
|
|
As mentioned in an earlier thread, C2Y voted in a change which made
various library APIs callable with NULL arguments in certain cases,
e.g.
memcpy (NULL, NULL, 0);
is now valid, although
memcpy (NULL, NULL, 1);
remains invalid. This affects various APIs, including several of
GCC builtins; plus on the C library side those APIs are often declared
with nonnull attribute(s) as well.
Florian suggested using the access attribute for this, but our docs
explicitly say that access attribute doesn't imply nonnull and it doesn't
cover e.g. the qsort case where the comparison function pointer may be
also NULL if nmemb is 0, but must be non-zero otherwise.
As this case affects 21 APIs in C standard and I think is going to affect
various wrappers around those in various packages as well, I think it
is a common thing that should have its own attribute, because we should
still warn when people use
qsort (NULL, 1, 1, NULL);
etc., and similarly want to have -fsanitize=null instrumentation for those.
So, the following patch introduces nonnull_if_nonzero attribute (or would
you prefer cond_nonnull or some other name?), which has always 2 arguments,
argument index of a pointer argument (like one argument nonnull) and
argument index of an associated integral argument. If that argument is
non-zero, it is UB to pass NULL to the pointer argument, if that argument
is zero, it is valid. And changes various spots which already handled the
nonnull attribute to handle this one as well, with sometimes using the
ranger (or for -fsanitize=nonnull explicitly checking the associated
argument value, so instead of if (!ptr) __ubsan_... (...); it will
now do if (!ptr && sz) __ubsan_... (...);).
I've so far omitted changing gimple_infer_range (am not 100% sure how I can
use the ranger inside of the ranger) and changing the analyzer to handle it.
And I haven't changed builtins.def etc. to make use of that attribute
instead of nonnull where appropriate.
I'd then follow with the builtins.def changes (and eventually glibc
etc. would need to be adjusted too).
2024-11-28 Jakub Jelinek <jakub@redhat.com>
PR c/117023
gcc/
* gimple.h (infer_nonnull_range_by_attribute): Add a tree *
argument defaulted to NULL.
* gimple.cc (infer_nonnull_range_by_attribute): Add op2 argument.
Handle also nonnull_if_nonzero attributes.
* tree.cc (get_nonnull_args): Fix comment typo.
* builtins.cc (validate_arglist): Handle nonnull_if_nonzero attribute.
* tree-ssa-ccp.cc (pass_post_ipa_warn::execute): Handle
nonnull_if_nonzero attributes.
* ubsan.cc (instrument_nonnull_arg): Adjust
infer_nonnull_range_by_attribute caller. If it returned true and
filed in non-NULL arg2, check that arg2 is non-zero as another
condition next to checking that arg is zero.
* doc/extend.texi (nonnull_if_nonzero): Document new attribute.
gcc/c-family/
* c-attribs.cc (handle_nonnull_if_nonzero_attribute): New
function.
(c_common_gnu_attributes): Add nonnull_if_nonzero attribute.
(handle_nonnull_attribute): Fix comment typo.
* c-common.cc (struct nonnull_arg_ctx): Add other member.
(check_function_nonnull): Also check nonnull_if_nonzero attributes.
(check_nonnull_arg): Use different warning wording if pctx->other
is non-zero.
(check_function_arguments): Initialize ctx.other.
gcc/testsuite/
* gcc.dg/nonnull-8.c: New test.
* gcc.dg/nonnull-9.c: New test.
* gcc.dg/nonnull-10.c: New test.
* c-c++-common/ubsan/nonnull-6.c: New test.
* c-c++-common/ubsan/nonnull-7.c: New test.
|
|
The following patch adds a "redzone" clobber (recognized everywhere,
even on on targets which don't do anything with it),
with which one can mark the rare case where inline asm pushes
something on the stack or uses call instruction without taking
red zone into account (i.e. addq $-128, %rsp; and addq $128, %rsp
around that).
2024-11-28 Jakub Jelinek <jakub@redhat.com>
gcc/
* target.def (redzone_clobber): New target hook.
* varasm.cc (decode_reg_name_and_count): Return -5 for
"redzone".
* cfgexpand.cc (expand_asm_stmt): Handle redzone clobber.
* config/i386/i386.h (struct machine_function): Add
asm_redzone_clobber_seen member.
* config/i386/i386.cc (ix86_compute_frame_layout): Don't
use red zone if cfun->machine->asm_redzone_clobber_seen.
(ix86_redzone_clobber): New function.
(TARGET_REDZONE_CLOBBER): Redefine.
* doc/extend.texi (Clobbers and Scratch Registers): Document
the "redzone" clobber.
* doc/tm.texi.in: Add @hook TARGET_REDZONE_CLOBBER.
* doc/tm.texi: Regenerate.
gcc/testsuite/
* gcc.dg/asm-redzone-1.c: New test.
* gcc.target/i386/asm-redzone-1.c: New test.
|
|
As discussed earlier, we currently clear padding bits even when we
don't have to and that causes pessimization of emitted code,
e.g. for
union U { int a; long b[64]; };
void bar (union U *);
void
foo (void)
{
union U u = { 0 };
bar (&u);
}
we need to clear just u.a, not the whole union, but on the other side
in cases where the standard requires padding bits to be zeroed, like for
C23 {} initializers of aggregates with padding bits, or for C++11 zero
initialization we don't do that.
This patch
a) moves some of the stuff into complete_ctor_at_level_p (but not
all the *p_complete = 0; case, for that it would need to change
so that it passes around the ctor rather than just its type) and
changes the handling of unions
b) introduces a new option, so that users can either get the new
behavior (only what is guaranteed by the standards, the default),
or previous behavior (union padding zero initialization, no such
guarantees in structures) or also a guarantee in structures
c) introduces a new CONSTRUCTOR flag which says that the padding bits
(if any) should be zero initialized (and sets it for now in the C
FE for C23 {} initializers).
Am not sure the CONSTRUCTOR_ZERO_PADDING_BITS flag is really needed
for C23, if there is just empty initializer, I think we already mark
it as incomplete if there are any missing initializers. Maybe with
some designated initializer games, say
void foo () {
struct S { char a; long long b; };
struct T { struct S c; } t = { .c = {}, .c.a = 1, .c.b = 2 };
...
}
Is this supposed to initialize padding bits in C23 and then the .c.a = 1
and .c.b = 2 stores preserve those padding bits, so is that supposed
to be different from struct T t2 = { .c = { 1, 2 } };
? What about just struct T t3 = { .c.a = 1, .c.b = 2 }; ?
And I haven't touched the C++ FE for the flag, because I'm afraid I'm lost
on where exactly is zero-initialization done (vs. other types of
initialization) and where is e.g. zero-initialization of a temporary then
(member-wise) copied.
Say
struct S { char a; long long b; };
struct T { constexpr T (int a, int b) : c () { c.a = a; c.b = b; } S c; };
void bar (T *);
void
foo ()
{
T t (1, 2);
bar (&t);
}
Is the c () value-initialization of t.c followed by c.a and c.b updates
which preserve the zero initialized padding bits? Or is there some
copy construction involved which does member-wise copying and makes the
padding bits undefined?
Looking at (older) clang++ with -O2, it initializes also the padding bits
when c () is used and doesn't with c {}.
For GCC, note that there is that optimization from Alex to zero padding bits
for optimization purposes for small aggregates, so either one needs to look
at -O0 -fdump-tree-gimple dumps, or use larger structures which aren't
optimized that way.
2024-11-28 Jakub Jelinek <jakub@redhat.com>
PR c++/116416
gcc/
* flag-types.h (enum zero_init_padding_bits_kind): New type.
* tree.h (CONSTRUCTOR_ZERO_PADDING_BITS): Define.
* common.opt (fzero-init-padding-bits=): New option.
* expr.cc (categorize_ctor_elements_1): Handle
CONSTRUCTOR_ZERO_PADDING_BITS or
flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_ALL. Fix
up *p_complete = -1; setting for unions.
(complete_ctor_at_level_p): Handle unions differently for
flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_STANDARD.
* gimple-fold.cc (type_has_padding_at_level_p): Fix up UNION_TYPE
handling, return also true for UNION_TYPE with no FIELD_DECLs
and non-zero size, handle QUAL_UNION_TYPE like UNION_TYPE.
* doc/invoke.texi (-fzero-init-padding-bits=@var{value}): Document.
gcc/c/
* c-parser.cc (c_parser_braced_init): Set CONSTRUCTOR_ZERO_PADDING_BITS
for flag_isoc23 empty initializers.
* c-typeck.cc (constructor_zero_padding_bits): New variable.
(struct constructor_stack): Add zero_padding_bits member.
(really_start_incremental_init): Save and clear
constructor_zero_padding_bits.
(push_init_level): Save constructor_zero_padding_bits. Or into it
CONSTRUCTOR_ZERO_PADDING_BITS from previous value if implicit.
(pop_init_level): Set CONSTRUCTOR_ZERO_PADDING_BITS if
constructor_zero_padding_bits and restore
constructor_zero_padding_bits.
gcc/testsuite/
* gcc.dg/plugin/infoleak-1.c (test_union_2b, test_union_4b): Expect
diagnostics.
* gcc.dg/c23-empty-init-5.c: New test.
* gcc.dg/gnu11-empty-init-1.c: New test.
* gcc.dg/gnu11-empty-init-2.c: New test.
* gcc.dg/gnu11-empty-init-3.c: New test.
* gcc.dg/gnu11-empty-init-4.c: New test.
|
|
This is another recent GCC extension whose use is apparently
difficult to spot in code reviews.
The name of the option is due to Jonathan Wakely. Part of it
could apply to C++ as well (for labels at the end of a compound
statement).
gcc/c-family/
* c-opts.cc (c_common_post_options): Initialize
warn_free_labels.
* c.opt (Wfree-labels): New option.
* c.opt.urls: Regenerate.
gcc/c/
* c-parser.cc (c_parser_compound_statement_nostart): Use
OPT_Wfree_labels for warning about labels on declarations.
(c_parser_compound_statement_nostart): Use OPT_Wfree_labels
for warning about labels at end of compound statements.
gcc/
* doc/invoke.texi: Document -Wfree-labels.
gcc/testsuite/
* gcc.dg/Wfree-labels-1.c: New test.
* gcc.dg/Wfree-labels-2.c: New test.
* gcc.dg/Wfree-labels-3.c: New test.
|
|
nios2 target support in GCC was deprecated in GCC 14 as the
architecture has been EOL'ed by the vendor. This patch removes the
entire port for GCC 15
There are still references to "nios2" in libffi and libgo. Since those
libraries are imported into the gcc sources from master copies maintained
by other projects, those will need to be addressed elsewhere.
ChangeLog:
* MAINTAINERS: Remove references to nios2.
* configure.ac: Likewise.
* configure: Regenerated.
config/ChangeLog:
* mt-nios2-elf: Deleted.
contrib/ChangeLog:
* config-list.mk: Remove references to Nios II.
gcc/ChangeLog:
* common/config/nios2/*: Delete entire directory.
* config/nios2/*: Delete entire directory.
* config.gcc: Remove references to nios2.
* configure.ac: Likewise.
* doc/extend.texi: Likewise.
* doc/install.texi: Likewise.
* doc/invoke.texi: Likewise.
* doc/md.texi: Likewise.
* regenerate-opt-urls.py: Likewise.
* config.in: Regenerated.
* configure: Regenerated.
gcc/testsuite/ChangeLog:
* g++.target/nios2/*: Delete entire directory.
* gcc.target/nios2/*: Delete entire directory.
* g++.dg/cpp0x/constexpr-rom.C: Remove refences to nios2.
* g++.old-deja/g++.jason/thunk3.C: Likewise.
* gcc.c-torture/execute/20101011-1.c: Likewise.
* gcc.c-torture/execute/pr47237.c: Likewise.
* gcc.dg/20020312-2.c: Likewise.
* gcc.dg/20021029-1.c: Likewise.
* gcc.dg/debug/btf/btf-datasec-1.c: Likewise.
* gcc.dg/ifcvt-4.c: Likewise.
* gcc.dg/stack-usage-1.c: Likewise.
* gcc.dg/struct-by-value-1.c: Likewise.
* gcc.dg/tree-ssa/reassoc-33.c: Likewise.
* gcc.dg/tree-ssa/reassoc-34.c: Likewise.
* gcc.dg/tree-ssa/reassoc-35.c: Likewise.
* gcc.dg/tree-ssa/reassoc-36.c: Likewise.
* lib/target-supports.exp: Likewise.
libgcc/ChangeLog:
* config/nios2/*: Delete entire directory.
* config.host: Remove refences to nios2.
* unwind-dw2-fde-dip.c: Likewise.
|
|
AddressSanitizer has supported dynamic shadow offsets since 2016[1], but
GCC hasn't implemented this yet because targets using dynamic shadow
offsets, such as Fuchsia and iOS, are mostly unsupported in GCC.
However, RISC-V 64 switched to dynamic shadow offsets this year[2] because
virtual memory space support varies across different RISC-V cores, such as
Sv39, Sv48, and Sv57. We realized that the best way to handle this
situation is by using a dynamic shadow offset to obtain the offset at
runtime.
We introduce a new target hook, TARGET_ASAN_DYNAMIC_SHADOW_OFFSET_P, to
determine if the target is using a dynamic shadow offset, so this change
won't affect the static offset path. Additionally, TARGET_ASAN_SHADOW_OFFSET
continues to work even if TARGET_ASAN_DYNAMIC_SHADOW_OFFSET_P is non-zero,
ensuring that KASAN functions as expected.
This patch set has been verified on the Banana Pi F3, currently one of the
most popular RISC-V development boards. All AddressSanitizer-related tests
passed without introducing new regressions.
It was also verified on AArch64 and x86_64 with no regressions in
AddressSanitizer.
[1] https://github.com/llvm/llvm-project/commit/130a190bf08a3d955d9db24dac936159dc049e12
[2] https://github.com/llvm/llvm-project/commit/da0c8b275564f814a53a5c19497669ae2d99538d
gcc/ChangeLog:
* asan.cc (asan_dynamic_shadow_offset_p): New.
(asan_shadow_memory_dynamic_address): New.
(asan_local_shadow_memory_dynamic_address): New.
(get_asan_shadow_memory_dynamic_address_decl): New.
(asan_maybe_insert_dynamic_shadow_at_function_entry): New.
(asan_emit_stack_protection): Support dynamic shadow offset.
(build_shadow_mem_access): Ditto.
* asan.h (asan_maybe_insert_dynamic_shadow_at_function_entry): New.
* doc/tm.texi (TARGET_ASAN_DYNAMIC_SHADOW_OFFSET_P): New.
* doc/tm.texi.in (TARGET_ASAN_DYNAMIC_SHADOW_OFFSET_P): Ditto.
* sanopt.cc (pass_sanopt::execute): Handle dynamic shadow offset.
* target.def (asan_dynamic_shadow_offset_p): New.
* toplev.cc (process_options): Handle dynamic shadow offset.
|
|
This pass detects cases of expensive store forwarding and tries to
avoid them by reordering the stores and using suitable bit insertion
sequences. For example it can transform this:
strb w2, [x1, 1]
ldr x0, [x1] # Expensive store forwarding to larger load.
To:
ldr x0, [x1]
strb w2, [x1]
bfi x0, x2, 0, 8
Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following
speedups have been observed.
Neoverse-N1: +29.4%
Intel Coffeelake: +13.1%
AMD 5950X: +17.5%
The transformation is rejected on cases that cause store_bit_field to
generate subreg expressions on different register classes. Files
avoid-store-forwarding-4.c and avoid-store-forwarding-5.c contain such
cases and have been marked as XFAIL.
Due to biasing of its operands in store_bit_field, there is a special
handling for machines with BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN. The
need for this was exosed by an issue exposed on the H8 architecture,
which uses big-endian ordering, but BITS_BIG_ENDIAN is false. In that
case, the START parameter of store_bit_field needs to be calculated
from the end of the destination register.
gcc/ChangeLog:
* Makefile.in (OBJS): Add avoid-store-forwarding.o.
* common.opt (favoid-store-forwarding): New option.
* common.opt.urls: Regenerate.
* doc/invoke.texi: New param store-forwarding-max-distance.
* doc/passes.texi: Document new pass.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Document new pass.
* params.opt (store-forwarding-max-distance): New param.
* passes.def: Add pass_rtl_avoid_store_forwarding before
pass_early_remat.
* target.def (avoid_store_forwarding_p): New DEFHOOK.
* target.h (struct store_fwd_info): Declare.
* targhooks.cc (default_avoid_store_forwarding_p): New function.
* targhooks.h (default_avoid_store_forwarding_p): Declare.
* tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
* avoid-store-forwarding.cc: New file.
* avoid-store-forwarding.h: New file.
* timevar.def (TV_AVOID_STORE_FORWARDING): New timevar.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-5.c: New test.
* gcc.target/x86_64/abi/callabi/avoid-store-forwarding-1.c: New test.
* gcc.target/x86_64/abi/callabi/avoid-store-forwarding-2.c: New test.
Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Signed-off-by: Konstantinos Eleftheriou <konstantinos.eleftheriou@vrull.eu>
|
|
This fixes the vectorization regressions present on the SPARC by switching
from vcond[u] patterns to vec_cmp[u] + vcond_mask_ patterns. While I was
at it, I merged the patterns for V4HI/V2SI and V8QI enabled with VIS 3/VIS 4
to follow the model of those enabled with VIS 4B, and standardized all the
mnemonics to the version documented in the Oracle SPARC architecture 2015.
gcc/
PR target/117715
* config/sparc/sparc-protos.h (sparc_expand_vcond): Rename to...
(sparc_expand_vcond_mask): ...this.
* config/sparc/sparc.cc (TARGET_VECTORIZE_GET_MASK_MODE): Define.
(sparc_vis_init_builtins): Adjust the CODE_FOR_* identifiers.
(sparc_get_mask_mode): New function.
(sparc_expand_vcond): Rename to...
(sparc_expand_vcond_mask): ...this and adjust.
* config/sparc/sparc.md (unspec): Remove UNSPEC_FCMP & UNSPEC_FUCMP
and rename UNSPEC_FPUCMPSHL into UNSPEC_FPCMPUSHL.
(fcmp<gcond:code><GCM:gcm_name><P:mode>_vis): Merge into...
(fpcmp<gcond:code>8<P:mode>_vis): Merge into...
(fpcmp<fpcmpcond:code><FPCMP:vbits><P:mode>_vis): ...this.
(fucmp<gcond:code>8<P:mode>_vis): Merge into...
(fpcmpu<gcond:code><GCM:gcm_name><P:mode>_vis): Merge into...
(fpcmpu<fpcmpucond:signed_code><FPCMP:vbits><P:mode>_vis): ...this.
(vec_cmp<FPCMP:mode><P:mode>): New expander.
(vec_cmpu<FPCMP:mode><P:mode>): Likewise.
(vcond<GCM:mode><GCM:mode>): Delete.
(vcondv8qiv8qi): Likewise.
(vcondu<GCM:mode><GCM:mode>): Likewise.
(vconduv8qiv8qi): Likewise.
(vcond_mask_<FPCMP:mode><P:mode>): New expander.
(fpcmp<fpcscond:code><FPCSMODE:vbits><P:mode>shl): Adjust.
(fpcmpu<fpcsucond:code><FPCSMODE:vbits><P:mode>shl): Likewise.
(fpcmpde<FPCSMODE:vbits><P:mode>shl): Likewise.
(fpcmpur<FPCSMODE:vbits><P:mode>shl): Likewise.
* doc/md.texi (vcond_mask_len_): Fix pasto.
gcc/testsuite/
* gcc.target/sparc/20230328-1.c: Adjust to new mnemonics.
* gcc.target/sparc/20230328-4.c: Likewise.
* gcc.target/sparc/fcmp.c: Likewise.
* gcc.target/sparc/fucmp.c: Likewise.
|
|
The current message does not make sense with -fno-zero-initialized-in-bss.
gcc/
* doc/invoke.texi (-fno-zero-initialized-in-bss): Adjust for Ada.
* varasm.cc (get_variable_section): Adjust the error message for an
initialized variable in .bss to -fno-zero-initialized-in-bss.
gcc/testsuite/
* gnat.dg/specs/bss1.ads: New test.
|
|
While looking into PR 33532, It was noted that \" would be treated
still as " for braced strings in the md file. I think that is still
the correct thing to do. So let's just a note to the documentation
on this behavior and NOT change read-md.cc (read_braced_string).
Since this behavior has been there for the last 23 years and only
one person ran into this behavior and helped with the conversion
from using quoted strings to braced strings; that is you just need
to remove the quote around the brace rather than change all of the
code.
Build the documentation to make sure it looks correct.
gcc/ChangeLog:
* doc/rtl.texi: Add a note about quotes in braced strings.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
This patch is similar to r15-5569 (tweak ashift:SI) but for
ashiftrt and lshiftrt codes. It splits constant shift offsets > 16
into a 3-operand byte shift and a 2-operand residual bit shift.
Moreover, some of the constraint alternatives have been promoted
to 3-operand alternatives regardless of options. For example,
ashift:HI and lshiftrt:HI can support 3 operands for offsets 9...12
without any overhead.
Apart from that, it's a bit of code clean up for 2-byte and 4-byte
shift insns: Use one RTL peephole with any_shift code iterator
instead of 3 individual peepholes. It also removes some useless
split insns; presumably introduced during the cc0 -> CCmode work.
PR target/117726
gcc/
* config/avr/avr-passes.cc (avr_split_shift): Also handle
ASHIFTRT and LSHIFTRT codes for 4-byte shifts.
(constr_split_shift4): New code_attr.
(avr_emit_shift): Adjust to new shift capabilities.
* config/avr/predicates.md (scratch_or_d_register_operand):
rename to scratch_or_dreg_operand.
* config/avr/avr.md: Same.
(define_peephole2): Write the RTL scratch peephole for 2-byte and
4-byte shifts that generates *sh*<mode>3_const insns using code
iterator any_shift.
(*ashlhi3_const_split, *ashrhi3_const_split, *ashrhi3_const_split)
(*lshrsi3_const_split, *lshrhi3_const_split): Remove useless
split insns.
(define_split) [avropt_split_bit_shift]: Add splitters
for 4-byte ASHIFTRT and LSHIFTRT insns using avr_split_shift().
(ashrsi3, *ashrsi3, *ashrsi3_const): Add "r,0,C4a" and "r,r,C4a"
constraint alternatives depending on 2op, 3op.
(lshrsi3, *lshrsi3, *lshrsi3_const): Add "r,0,C4r" and "r,r,C4r"
constraint alternatives depending on 2op, 3op. Add "r,r,C15".
(lshrhi3, *lshrhi3, *lshrhi3_const, ashlhi3, *ashlhi3)
(*ashlhi3_const): Add "r,r,C7c" alternative.
(ashrpsi, *ashrpsi3): Add "r,r,C22" alternative.
(ashlqi, *ashlqi): Turn C06 alternative into "r,r,C06".
* config/avr/constraints.md (C14, C22, C30, C7c): New constraints.
* config/avr/avr.cc (ashlhi3_out, lshrhi3_out)
[case 7, 9, 10, 11, 12]: Support as 3-operand insn.
(lshrsi3_out) [case 15]: Same.
(ashrsi3_out) [case 30]: Same.
(ashrhi3_out) [case 14]: Same.
(ashrqi3_out) [case 6]: Same.
(avr_out_ashrpsi3) [case 22]: Same.
* config/avr/avr.h: Fix comment typo.
* doc/invoke.texi (AVR Options) <-msplit-bit-shift>: Document.
|
|
The following patch adds a new option for optimizations related to
replaceable global operators new/delete.
The option isn't called -fassume-sane-operator-new (which clang++
implements), because
1) clang++ option means something different; initially it was an
option to add malloc attribute to those declarations (but we have
malloc attribute on all <new> calls already unconditionally);
later it was changed to add noalias attribute rather than malloc,
whatever it means, but it is certainly about the return value
from the operator new (whether it can alias with other pointers);
we already assume malloc-ish behavior that it doesn't alias any
other pointers
2) the option only affects operator new, we want it affect also
operator delete
The option basically allows to choose between pre-PR101480 behavior
(now the default, more optimistic) and post-PR101480 behavior (safer
but penalizing most of the code in the wild for rare needs).
I've tried to explain stuff in the documentation too.
2024-11-22 Jakub Jelinek <jakub@redhat.com>
PR c++/110137
PR middle-end/101480
gcc/
* doc/invoke.texi (-fassume-sane-operators-new-delete,
-fno-assume-sane-operators-new-delete): Document.
* gimple.cc (gimple_call_fnspec): Handle
-f{,no-}assume-sane-operators-new-delete.
* ipa-inline-transform.cc (inline_call): Also clear
flag_assume_sane_operators_new_delete on caller when inlining
-fno-assume-sane-operators-new-delete callee into
-fassume-sane-operators-new-delete caller.
gcc/c-family/
* c.opt (fassume-sane-operators-new-delete): New option.
gcc/testsuite/
* g++.dg/tree-ssa/pr110137-1.C: New test.
* g++.dg/tree-ssa/pr110137-2.C: New test.
* g++.dg/tree-ssa/pr110137-3.C: New test.
* g++.dg/tree-ssa/pr110137-4.C: New test.
* g++.dg/torture/pr10148.C: Add -fno-assume-sane-operators-new-delete
as dg-additional-options.
* g++.dg/warn/Warray-bounds-16.C: Revert 2021-11-10 changes.
|