Age | Commit message (Collapse) | Author | Files | Lines |
|
When cross-compiling GCC with target libc headers available and
configure option --enable-s390-excess-float-precision has been omitted,
identify whether they clamp float_t to double or respect
__FLT_EVAL_METHOD__ via a compile test that coerces the build-system
compiler to use the target headers. Then derive the setting from that.
gcc/ChangeLog:
2020-12-16 Marius Hillenbrand <mhillen@linux.ibm.com>
* configure.ac: Change --enable-s390-excess-float-precision
default behavior for cross compiles with target headers.
* configure: Regenerate.
* doc/install.texi: Adjust documentation.
|
|
This patch adds some documentation to rtl.texi about the SSA form.
It only really describes the high-level structure -- I think for
API-level stuff it's better to rely on function comments instead.
gcc/
* doc/rtl.texi (RTL SSA): New node.
|
|
Since g:7caa49706316e650fb67719e1a1bf3a35054b685 the option is ignored
as we print used command line for -fverbose-asm output.
gcc/ChangeLog:
* doc/options.texi: Remove Report keyword.
* opt-functions.awk: Print error when Report keyword
is used.
* optc-gen.awk: Do not handle Report keyword.
* opts.h (struct cl_option): Remove cl_report bitfield flag.
|
|
gcc/ChangeLog:
PR sanitizer/97868
* common.opt: Add new warning -Wtsan.
* doc/invoke.texi: Likewise.
* tsan.c (instrument_builtin_call): Warn users about unsupported
std::atomic_thread_fence.
gcc/testsuite/ChangeLog:
PR sanitizer/97868
* gcc.dg/tsan/atomic-fence.c: New test.
|
|
And here is the user-facing documentation.
gcc/
* doc/cppopts.texi: Document new cpp opt.
* doc/invoke.texi: Add C++20 module option & documentation.
|
|
If somebody has -march=x86-64-v2 (or -v3 or -v4) in $CFLAGS, $CXXFLAGS etc.,
then -m32 or -mabi=ms stops working.
What is worse, if one configures gcc --with-arch-64=x86-64-v2 (or -v3 or -v4),
then -mabi=ms stops working.
I think that is a nightmare user experience. It is ok that x86-64-v[234]
behave slightly different from other -march= options (in that they imply
unless overridden -mtune=generic rather then -mtune= equal to the -march
argument), but the error when one mixes it with -mabi=ms, or -m32 doesn't
improve anything.
It is true that the exact option set is only defined in the x86-64 psABI
(IMHO that is a mistake too, we should copy that into the GCC documentation
like we document it for any other -march= option), but there is no reason
why that exact set of CPU features can't be used for other ABIs, it is just
a set of CPU features. If we add micro-architecture levels to the 32-bit
ABI (I doubt anyone wants to do that, but just hypothetically), then those
micro-architecture levels wouldn't certainly be called x86-64-v* but perhaps
i386-v*.
In the tests, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 can't be expected on -m32
not because the CPU feature wouldn't be set, but because the instruction
is 64-bit only and 32-bit code doesn't have __int128 etc. support.
2020-12-15 Jakub Jelinek <jakub@redhat.com>
* config/i386/i386-options.c (ix86_option_override_internal): Don't
error on -march=x86-64-v[234] with -m32 or -mabi=ms.
* config.gcc: Don't reject --with-arch=x86-64-v[234] or
--with-arch_32=x86-64-v[234].
* doc/invoke.texi (-march=x86-64-v[234]): Document what the option
does for other ABIs.
* gcc.target/i386/x86-64-v2.c: Don't expect
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
* gcc.target/i386/x86-64-v2-other.c: New test.
* gcc.target/i386/x86-64-v2-msabi.c: New test.
* gcc.target/i386/x86-64-v3.c: Fix a comment pasto. Don't expect
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
* gcc.target/i386/x86-64-v3-other.c: New test.
* gcc.target/i386/x86-64-v3-msabi.c: New test.
* gcc.target/i386/x86-64-v4.c:Don't expect
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
* gcc.target/i386/x86-64-v4-other.c: New test.
* gcc.target/i386/x86-64-v4-msabi.c: New test.
|
|
gcc/ChangeLog:
2020-12-15 Gerald Pfeifer <gerald@pfeifer.com>
* doc/invoke.texi (Instrumentation Options): Update link to
KernelAddressSanitizer.
|
|
PR middle-end/98160).
Resolves:
PR middle-end/98166 - bogus -Wmismatched-dealloc on user-defined allocator and inlining
PR c++/57111 - 57111 - Generalize -Wfree-nonheap-object to delete
PR middle-end/98160 - ICE in default_tree_printer at gcc/tree-diagnostic.c:270
gcc/ChangeLog:
PR middle-end/98166
PR c++/57111
PR middle-end/98160
* builtins.c (check_access): Call tree_inlined_location
fndecl_alloc_p): Handle BUILT_IN_ALIGNED_ALLOC and
BUILT_IN_GOMP_ALLOC.
call_dealloc_p): Remove unused function.
(new_delete_mismatch_p): Call valid_new_delete_pair_p and rework.
(matching_alloc_calls_p): Handle built-in deallocation functions.
(warn_dealloc_offset): Corrct the handling of user-defined operators
delete.
(maybe_emit_free_warning): Avoid assuming expression is a decl.
Simplify.
* doc/extend.texi (attribute malloc): Update.
* tree-ssa-dce.c (valid_new_delete_pair_p): Factor code out into
valid_new_delete_pair_p in tree.c.
* tree.c (tree_inlined_location): Define new function.
(valid_new_delete_pair_p): Define.
* tree.h (tree_inlined_location): Declare.
(valid_new_delete_pair_p): Declare.
gcc/c-family/ChangeLog:
PR middle-end/98166
PR c++/57111
PR middle-end/98160
* c-attribs.c (maybe_add_noinline): New function.
(handle_malloc_attribute): Call it. Use ATTR_FLAG_INTERNAL.
Implicitly add attribute noinline to functions not declared inline
and warn on those.
libstdc++-v3/ChangeLog:
* testsuite/ext/vstring/requirements/exception/basic.cc: Suppress
a false positive warning.
* testsuite/ext/vstring/requirements/exception/propagation_consistent.cc:
Same.
gcc/testsuite/ChangeLog:
PR middle-end/98166
PR c++/57111
PR middle-end/98160
* g++.dg/warn/Wmismatched-dealloc-2.C: Adjust test of expected warning.
* g++.dg/warn/Wmismatched-new-delete.C: Same.
* gcc.dg/Wmismatched-dealloc.c: Same.
* c-c++-common/Wfree-nonheap-object-2.c: New test.
* c-c++-common/Wfree-nonheap-object-3.c: New test.
* c-c++-common/Wfree-nonheap-object.c: New test.
* c-c++-common/Wmismatched-dealloc.c: New test.
* g++.dg/warn/Wfree-nonheap-object-3.C: New test.
* g++.dg/warn/Wfree-nonheap-object-4.C: New test.
* g++.dg/warn/Wmismatched-dealloc-2.C: New test.
* g++.dg/warn/Wmismatched-new-delete-2.C: New test.
* g++.dg/warn/Wmismatched-new-delete.C: New test.
* gcc.dg/Wmismatched-dealloc-2.c: New test.
* gcc.dg/Wmismatched-dealloc-3.c: New test.
* gcc.dg/Wmismatched-dealloc.c: New test.
|
|
This patch adds support for -mcpu=cortex-a78c command line option.
For more information about this processor, see [0]:
[0] https://developer.arm.com/ip-products/processors/cortex-a/cortex-a78c
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Cortex-A78C core.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Update docs.
|
|
This patch adds support for
* Complex Addition with rotation of 90 and 270.
Addition with rotation of the second argument around the Argand plane.
Supported rotations are 90 and 180.
c = a + (b * I) and c = a + (b * I * I * I)
gcc/ChangeLog:
* tree-vect-slp-patterns.c: New file.
* Makefile.in: Add it.
* doc/passes.texi: Document it.
* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New.
* optabs.def (cadd90_optab, cadd270_optab): New.
* doc/md.texi: Document them.
* tree-vect-loop.c (vect_analyze_loop_2): Add dissolve code.
* tree-vect-slp.c:
(vect_free_slp_instance, vect_create_new_slp_node): Export.
(vect_match_slp_patterns_2, vect_match_slp_patterns): New.
(vect_analyze_slp): Use it.
* tree-vectorizer.h (vect_free_slp_tree): Export.
(enum _complex_operation): Forward declare.
(class vect_pattern): New
gcc/testsuite/ChangeLog:
* lib/target-supports.exp
(check_effective_target_arm_v8_3a_complex_neon_ok_nocache): Fix it.
(check_effective_target_vect_complex_add_byte
,check_effective_target_vect_complex_add_int
,check_effective_target_vect_complex_add_short
,check_effective_target_vect_complex_add_long
,check_effective_target_vect_complex_add_half
,check_effective_target_vect_complex_add_float
,check_effective_target_vect_complex_add_double): New.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c: New test.
* gcc.dg/vect/complex/complex-add-pattern-template.c: New test.
* gcc.dg/vect/complex/complex-add-template.c: New test.
* gcc.dg/vect/complex/complex-operations-run.c: New test.
* gcc.dg/vect/complex/complex-operations.c: New test.
* gcc.dg/vect/complex/complex.exp: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-double.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-half-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-int.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-long.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-short.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c: New test.
|
|
This patch addresses some of the issues that I found when looking into the
failures of the scan-assembler-symbol-section tests on Solaris/SPARC.
* The first issue was that on Solaris/SPARC, section names are
double-quoted, both with as and gas:
.section ".text"
When using as, the section flag and type syntax is completely
different from other ELF targets:
.section "my_named_section",#alloc,#execinstr,#progbits
This patch fixes this by stripping double quotes from section names.
* However, this didn't work initially (only the leading quote was
stripped), which is due to David's recent AIX patch: with the
introduction of the new capturing group to handle both .section (ELF)
and .csect (XCOFF), $full_section_directive would never be empty on
ELF and Mach-O targets, so the extraction of the section name didn't
work any longer. This had also broken the Darwin tests completely.
* With working double quote stripping, all but one of the tests PASSed
on Solaris/SPARC, the exception being:
FAIL: gcc.dg/20021029-1.c scan-assembler-symbol-section symbol ar (found __sparc_get_pc_thunk.l7) has section ^\\\\.(const|rodata)|\\\\[RO\\\\] (found .text.__sparc_get_pc_thunk.l7%__sparc_get_pc_thunk.l7)
This is due to the symbol name (ar) not being anchored in the test and
unexpectedly matchting __sparc_get_pc_thunk.l7.
* Next, I ran the tests on Darwin 11 and found two failing tests:
FAIL: gcc.dg/darwin-sections.c scan-assembler-symbol-section symbol ^_a\$ (symbol not found) has section \\\\.data
FAIL: gcc.dg/darwin-sections.c scan-assembler-symbol-section symbol ^_b\$ (symbol not found) has section \\\\.data
is due to Iain's recent "Darwin : Begin rework of zero-fill sections."
patch which emits
.globl _a
.zerofill __DATA,__common,_a,1,0
This is already scanned for, so the two scans above can just go.
The other failing test is
FAIL: g++.dg/gomp/tls-5.C -std=c++14 scan-assembler-symbol-section symbol ^_?_ZGR2ir_\$ (symbol not found) has section ^\\\\.tdata|\\\\[TL\\\\]
FAIL: g++.dg/gomp/tls-5.C -std=c++14 scan-assembler-symbol-section symbol ^_?ir\$ (symbol not found) has section ^\\\\.tbss|\\\\[TL\\\\]
Other scans are guarded by target tls_native, and indeed the assembler
output has
___emutls_v._ZGR2ir_:
___emutls_t._ZGR2ir_:
___emutls_v.ir:
Unfortunately scan-assembler-symbol-section doesn't support selects
yet, which this test implements both for the benefit of this test and
for symmetry.
With those changes, test results are clean now on sparc-sun-solaris2.11,
i386-pc-solaris2.11, i386-apple-darwin11.4.2, and
powerpc-ibm-aix7.2.4.0.
2020-12-03 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc:
* doc/sourcebuild.texi (Commands for use in dg-final, Scan the
assembly output, scan-assembler-symbol-section): Document.
(scan-symbol-section): Document.
gcc/testsuite:
* lib/scanasm.exp (scan-symbol-section): Pass args to
dg-scan-symbol-section.
(scan-assembler-symbol-section): Likewise.
(dg-scan-symbol-section): Handle selector from orig_args.
Get patterns from orig_args.
(parse_section_of_symbols): Fix section_pattern.
Strip double quotes from section name.
* g++.dg/gomp/tls-5.C: Restrict ir, _ZGR2ir_ scans to tls_native.
* gcc.dg/20021029-1.c: Anchor ar symbol.
* gcc.dg/darwin-sections.c: Remove obsolete scans for _a, _b in
.data.
|
|
gcc/ChangeLog
2020-12-01 Andrea Corallo <andrea.corallo@arm.com>
* doc/sourcebuild.texi (arm_softfloat): Improve documentation.
gcc/testsuite/ChangeLog
2020-12-01 Andrea Corallo <andrea.corallo@arm.com>
* lib/target-supports.exp (check_effective_target_arm_softfloat):
Improve documentation.
|
|
New +pauth (Pointer Authentication from Armv8.3-A) feature option for
-march command line option.
Please note that majority of PAUTH instructions are implemented behind HINT
instruction. PAUTH stays an Armv8.3-A feature but now can be assigned to other
architectures or CPUs.
gcc/ChangeLog:
* config/aarch64/aarch64-option-extensions.def
(AARCH64_OPT_EXTENSION): New +pauth option in -march for AArch64.
* config/aarch64/aarch64.h (AARCH64_FL_PAUTH): New pauth extension bitmask.
(AARCH64_ISA_PUATH): New ISA bitmask for PAUTH.
(AARCH64_FL_FOR_ARCH8_3): Add PAUTH to Armv8.3-A.
(TARGET_PAUTH): New target mask to isolate PAUTH instructions.
* config/aarch64/aarch64.md (do_return): Condition set to TARGET_PAUTH.
* doc/invoke.texi: Update docs for +flagm and +pauth.
|
|
gcc/ChangeLog:
* doc/extend.texi (used function attribute): Document saving
the declaration from linker garbage collection.
(used variable attribute): Likewise.
|
|
-mcet was removed by
commit 231baae28ef7ff784683fa5a80df119da2b9a99b
Author: H.J. Lu <hongjiu.lu@intel.com>
Date: Tue Apr 24 16:56:04 2018 +0000
x86/CET: Remove the -mcet command-lint option
PR target/98162
* doc/extend.texi: Remove -mcet.
|
|
The use of a constant double zero is required for post-reload compare
elimination to be able to discard redundant floating-point comparisons,
for example with a VAX RTL instruction stream like:
(insn 34 4 3 2 (parallel [
(set (reg/v:DF 0 %r0 [orig:24 x ] [24])
(mem/c:DF (plus:SI (reg/f:SI 12 %ap)
(const_int 4 [0x4])) [1 x+0 S8 A32]))
(clobber (reg:CC 16 %psl))
]) ".../gcc/testsuite/gcc.target/vax/cmpelim-eq-movdf.c":9:1 37 {*movdf}
(nil))
(note 3 34 35 2 NOTE_INSN_FUNCTION_BEG)
(insn 35 3 36 2 (set (reg:CCZ 16 %psl)
(compare:CCZ (reg/v:DF 0 %r0 [orig:24 x ] [24])
(const_double:DF 0.0 [0x0.0p+0]))) ".../gcc/testsuite/gcc.target/vax/cmpelim-eq-movdf.c":10:6 21 {*cmpdf_ccz}
(nil))
(jump_insn 36 35 9 2 (set (pc)
(if_then_else (eq (reg:CCZ 16 %psl)
(const_int 0 [0]))
(label_ref 11)
(pc))) ".../gcc/testsuite/gcc.target/vax/cmpelim-eq-movdf.c":10:6 537 {*branch_ccz}
(int_list:REG_BR_PROB 536870916 (nil))
-> 11)
that we want to transform into:
(insn 34 4 3 2 (parallel [
(set (reg:CCZ 16 %psl)
(compare:CCZ (mem/c:DF (plus:SI (reg/f:SI 12 %ap)
(const_int 4 [0x4])) [1 x+0 S8 A32])
(const_double:DF 0.0 [0x0.0p+0])))
(set (reg/v:DF 0 %r0 [orig:24 x ] [24])
(mem/c:DF (plus:SI (reg/f:SI 12 %ap)
(const_int 4 [0x4])) [1 x+0 S8 A32]))
]) ".../gcc/testsuite/gcc.target/vax/cmpelim-eq-movdf.c":9:1 40 {*movdf_ccz}
(nil))
(note 3 34 36 2 NOTE_INSN_FUNCTION_BEG)
(jump_insn 36 3 9 2 (set (pc)
(if_then_else (eq (reg:CCZ 16 %psl)
(const_int 0 [0]))
(label_ref 11)
(pc))) ".../gcc/testsuite/gcc.target/vax/cmpelim-eq-movdf.c":10:6 537 {*branch_ccz}
(int_list:REG_BR_PROB 536870916 (nil))
-> 11)
with the upcoming MODE_CC representation.
For this we need to express the `const_double:DF 0.0 [0x0.0p+0]' rtx as
recorded above in the relevant pattern(s) in machine description. The
way we represent double constants, as a host-dependent number of wide
integers, however means that we currently have no portable way to encode
a double zero constant in machine description.
Define a syntactic rtx alias then to represent `(const_double 0 0 ...)'
as if the suitable number of zeros have been supplied according to the
host-specific definition of CONST_DOUBLE_FORMAT.
gcc/
* read-rtl.c (rtx_reader::read_rtx_code): Handle syntactic
`const_double_zero' rtx.
* doc/rtl.texi (Constant Expression Types): Document it.
|
|
Add wide integer aka 'w' rtx format support to int iterators so that
machine description can iterate over `const_int' expressions.
This is made by expanding standard integer aka 'i' format support,
observing that any standard integer already present in any of our
existing RTL code will also fit into HOST_WIDE_INT, so there is no need
for a separate handler. Any truncation of the number parsed is made by
the caller. An assumption is made however that no place relies on
capping out of range values to INT_MAX.
Now the 'p' format is handled explicitly rather than being implied by
rtx being a SUBREG, so actually assert that it is, just to play safe.
gcc/
* read-rtl.c: Add a page-feed separator at the start of iterator
code.
(struct iterator_group): Change the return type to HOST_WIDE_INT
for the `find_builtin' member. Likewise the second parameter
type for the `apply_iterator' member.
(atoll) [!HAVE_ATOQ]: Reorder.
(find_mode, find_code): Change the return type to HOST_WIDE_INT.
(apply_mode_iterator, apply_code_iterator)
(apply_subst_iterator): Change the second parameter type to
HOST_WIDE_INT.
(find_int): Handle input suitable for HOST_WIDE_INT output.
(apply_int_iterator): Rewrite in terms of explicit format
interpretation.
(rtx_reader::read_rtx_operand) <'w'>: Fold into...
<'i', 'n', 'p'>: ... this.
* doc/md.texi (Int Iterators): Document 'w' rtx format support.
|
|
2020-12-03 Venkataramanan Kumar <Venkataramanan.Kumar@amd.com>
Sharavan Kumar <Shravan.Kumar@amd.com>
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_amd_cpu) recognize znver3.
* common/config/i386/i386-common.c (processor_names): Add
znver3.
(processor_alias_table): Add znver3 and AMDFAM19H entry.
* common/config/i386/i386-cpuinfo.h (processor_types): Add
AMDFAM19H.
(processor_subtypes): AMDFAM19H_ZNVER3.
* config.gcc (i[34567]86-*-linux* | ...): Likewise.
* config/i386/driver-i386.c: (host_detect_local_cpu): Let
-march=native recognize znver3 processors.
* config/i386/i386-c.c (ix86_target_macros_internal): Add
znver3.
* config/i386/i386-options.c (m_znver3): New definition.
(m_ZNVER): Include m_znver3.
(processor_cost_table): Add znver3.
* config/i386/i386.c (ix86_reassociation_width): Likewise.
* config/i386/i386.h (TARGET_znver3): New definition.
(enum processor_type): Add PROCESSOR_ZNVER3.
* config/i386/i386.md (define_attr "cpu"): Add znver3.
* config/i386/x86-tune-sched.c: (ix86_issue_rate): Likewise.
(ix86_adjust_cost): Likewise.
* config/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_CHAINS:
Likewise.
* config/i386/znver1.md: Add new reservations for znver3.
* doc/extend.texi: Add details about znver3.
* doc/invoke.texi: Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/funcspec-56.inc: Handle new march.
* g++.target/i386/mv29.C: New file.
|
|
PR94600
We say very little about reads and writes to aggregate /
compound objects, just scalar objects (i.e. assignments don't
cause reads). Let's lets say something safe about aggregate
objects, but only for those that are the same size as a scalar
type.
There's an equal-sounding section (Volatiles) in extend.texi,
but this seems a more appropriate place, as specifying the
behavior of a standard qualifier.
gcc:
2020-12-04 Hans-Peter Nilsson <hp@axis.com>
Martin Sebor <msebor@redhat.com>
PR middle-end/94600
* doc/implement-c.texi (Qualifiers implementation): Add blurb
about access to the whole of a volatile aggregate object, only for
same-size as a scalar object.
|
|
The following patch makes the choice between 32-bit and 64-bit DWARF formats
selectable by command line switch, rather than being hardcoded through
DWARF_OFFSET_SIZE macro.
The options themselves don't turn on debug info themselves, so one needs
to use -g -gdwarf64 or similar.
2020-12-04 Jakub Jelinek <jakub@redhat.com>
* common.opt (-gdwarf32, -gdwarf64): New options.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Default
dwarf_offset_size to 8 if not overridden from the command line.
* dwarf2out.c: Change all occurrences of DWARF_OFFSET_SIZE to
dwarf_offset_size.
* doc/invoke.texi (-gdwarf32, -gdwarf64): Document.
|
|
gcc/ChangeLog:
* doc/tm.texi: Change argument of the record_gcc_switches
hook and remove SWITCH_TYPE_* enum values.
* dwarf2out.c (gen_producer_string): Move to opts.c and remove
handling of the dwarf_record_gcc_switches option.
(dwarf2out_early_finish): Use moved gen_producer_string
function.
* opts.c (gen_producer_string): New.
* opts.h (gen_producer_string): New.
* target.def: Change type of record_gcc_switches.
* target.h (enum print_switch_type): Remove.
(elf_record_gcc_switches): Change first argument.
* toplev.c (MAX_LINE): Remove.
(print_to_asm_out_file): Likewise.
(print_to_stderr): Likewise.
(print_single_switch): Likewise.
(print_switch_values): Likewise.
(init_asm_output): Use new gen_producer_string function.
(process_options): Likewise.
* varasm.c (elf_record_gcc_switches): Just save the string argument
to the ELF container.
|
|
contrib/ChangeLog:
* check-params-in-docs.py: use flake8 and add some
tweaks to ignore aarch64 params.
gcc/ChangeLog:
* doc/invoke.texi: Add missing params.
|
|
PR c++/90629 - Support for -Wmismatched-new-delete
PR middle-end/94527 - Add an __attribute__ that marks a function as freeing an object
gcc/ChangeLog:
PR c++/90629
PR middle-end/94527
* builtins.c (access_ref::access_ref): Initialize new member.
(compute_objsize): Use access_ref::deref. Handle simple pointer
assignment.
(expand_builtin): Remove handling of the free built-in.
(call_dealloc_argno): Same.
(find_assignment_location): New function.
(fndecl_alloc_p): Same.
(gimple_call_alloc_p): Same.
(call_dealloc_p): Same.
(matching_alloc_calls_p): Same.
(warn_dealloc_offset): Same.
(maybe_emit_free_warning): Same.
* builtins.h (struct access_ref): Declare new member.
(maybe_emit_free_warning): Make extern. Make use of access_ref.
Handle -Wmismatched-new-delete.
* calls.c (initialize_argument_information): Call
maybe_emit_free_warning.
* doc/extend.texi (attribute malloc): Update.
* doc/invoke.texi (-Wfree-nonheap-object): Expand documentation.
(-Wmismatched-new-delete): Document new option.
(-Wmismatched-dealloc): Document new option.
gcc/c-family/ChangeLog:
PR c++/90629
PR middle-end/94527
* c-attribs.c (handle_dealloc_attribute): New function.
(handle_malloc_attribute): Handle argument forms of attribute.
* c.opt (-Wmismatched-dealloc): New option.
(-Wmismatched-new-delete): New option.
gcc/testsuite/ChangeLog:
PR c++/90629
PR middle-end/94527
* g++.dg/asan/asan_test.cc: Fix a bug.
* g++.dg/warn/delete-array-1.C: Add expected warning.
* g++.old-deja/g++.other/delete2.C: Add expected warning.
* g++.dg/warn/Wfree-nonheap-object-2.C: New test.
* g++.dg/warn/Wfree-nonheap-object.C: New test.
* g++.dg/warn/Wmismatched-new-delete.C: New test.
* g++.dg/warn/Wmismatched-dealloc-2.C: New test.
* g++.dg/warn/Wmismatched-dealloc.C: New test.
* gcc.dg/Wmismatched-dealloc.c: New test.
* gcc.dg/analyzer/malloc-1.c: Prune out expected warning.
* gcc.dg/attr-malloc.c: New test.
* gcc.dg/free-1.c: Adjust text of expected warning.
* gcc.dg/free-2.c: Same.
* gcc.dg/torture/pr71816.c: Prune out expected warning.
* gcc.dg/tree-ssa/pr19831-2.c: Add an expected warning.
* gcc.dg/Wfree-nonheap-object-2.c: New test.
* gcc.dg/Wfree-nonheap-object-3.c: New test.
* gcc.dg/Wfree-nonheap-object.c: New test.
libstdc++-v3/ChangeLog:
* testsuite/ext/vstring/modifiers/clear/56166.cc: Suppress a false
positive warning.
|
|
The following patch adds __builtin_bit_cast builtin, similarly to
clang or MSVC which implement std::bit_cast using such an builtin too.
It checks the various std::bit_cast requirements, when not constexpr
evaluated acts pretty much like VIEW_CONVERT_EXPR of the source argument
to the destination type and the hardest part is obviously the constexpr
evaluation.
I've left out PDP11 handling of those, couldn't figure out how exactly are
bitfields laid out there
2020-12-03 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/93121
* fold-const.h (native_encode_initializer): Add mask argument
defaulted to nullptr.
(find_bitfield_repr_type): Declare.
(native_interpret_aggregate): Declare.
* fold-const.c (find_bitfield_repr_type): New function.
(native_encode_initializer): Add mask argument and support for
filling it. Handle also some bitfields without integral
DECL_BIT_FIELD_REPRESENTATIVE.
(native_interpret_aggregate): New function.
* gimple-fold.h (clear_type_padding_in_mask): Declare.
* gimple-fold.c (struct clear_padding_struct): Add clear_in_mask
member.
(clear_padding_flush): Handle buf->clear_in_mask.
(clear_padding_union): Copy clear_in_mask. Don't error if
buf->clear_in_mask is set.
(clear_padding_type): Don't error if buf->clear_in_mask is set.
(clear_type_padding_in_mask): New function.
(gimple_fold_builtin_clear_padding): Set buf.clear_in_mask to false.
* doc/extend.texi (__builtin_bit_cast): Document.
* c-common.h (enum rid): Add RID_BUILTIN_BIT_CAST.
* c-common.c (c_common_reswords): Add __builtin_bit_cast.
* cp-tree.h (cp_build_bit_cast): Declare.
* cp-tree.def (BIT_CAST_EXPR): New tree code.
* cp-objcp-common.c (names_builtin_p): Handle RID_BUILTIN_BIT_CAST.
(cp_common_init_ts): Handle BIT_CAST_EXPR.
* cxx-pretty-print.c (cxx_pretty_printer::postfix_expression):
Likewise.
* parser.c (cp_parser_postfix_expression): Handle
RID_BUILTIN_BIT_CAST.
* semantics.c (cp_build_bit_cast): New function.
* tree.c (cp_tree_equal): Handle BIT_CAST_EXPR.
(cp_walk_subtrees): Likewise.
* pt.c (tsubst_copy): Likewise.
* constexpr.c (check_bit_cast_type, cxx_eval_bit_cast): New functions.
(cxx_eval_constant_expression): Handle BIT_CAST_EXPR.
(potential_constant_expression_1): Likewise.
* cp-gimplify.c (cp_genericize_r): Likewise.
* g++.dg/cpp2a/bit-cast1.C: New test.
* g++.dg/cpp2a/bit-cast2.C: New test.
* g++.dg/cpp2a/bit-cast3.C: New test.
* g++.dg/cpp2a/bit-cast4.C: New test.
* g++.dg/cpp2a/bit-cast5.C: New test.
|
|
New +flagm (Condition flag manipulation) feature option for -march command line
option.
Please note that FLAGM stays a Armv8.4-A feature but now can be
assigned to other architectures or CPUs.
gcc/ChangeLog:
* config/aarch64/aarch64-option-extensions.def
(AARCH64_OPT_EXTENSION): New +flagm option in -march for AArch64.
* config/aarch64/aarch64.h (AARCH64_FL_FLAGM): Add new flagm extension bit
mask.
(AARCH64_FL_FOR_ARCH8_4): Add flagm to Armv8.4-A.
* doc/invoke.texi: Update docs with +flagm.
|
|
This patch introduces maybe_emit_call_builtin___clear_cache for the
builtin expander machinery and the trampoline initializers to use to
clear the instruction cache, removing a source of inconsistencies and
subtle errors in low-level machinery.
I've adjusted all trampoline_init implementations that used to issue
explicit calls to __clear_cache or similar to use this new primitive.
Specifically on vxworks targets, we needed to drop the __clear_cache
symbol in libgcc, for reasons related with linking that I didn't need
to understand, and we wanted to call cacheTextUpdate directly, despite
the different calling conventions: the second argument is a length
rather than the end address.
So I introduced a target hook to enable target OS-level overriding of
builtin __clear_cache call emission, retaining nearly (*) the same
logic to govern the decision on whether to emit a call (or nothing, or
a machine-dependent insn) but enabling a call to a target
system-defined function with different calling conventions to be
issued, without having to modify .md files of the various
architectures supported by the target system to introduce or modify
clear_cache insns.
(*) I write "nearly" mainly because, when not optimizing, we'd issue a
call regardless, but since the call may now be overridden, I added it
to the set of builtins that are not directly turned into calls when
not optimizing, following the normal expansion path instead. It
wouldn't be hard to skip the emission of cache-clearing insns when not
optimizing, but it didn't seem very important, especially for the new
uses from trampoline init.
Another difference that might be relevant is that now we expand
the begin and end arguments unconditionally. This might make a
difference if they have side effects. That's prettty much impossible
at expand time, but I thought I'd mention it.
I have NOT modified targets that did not issue cache-clearing calls in
trampoline init to use the new clear_cache-calling infrastructure even
if it would expand to nothing. I have considered doing so, to have
__builtin___clear_cache and trampoline init call cacheTextUpdate on
all vxworks targets, but decided not to, since on targets that don't
do any cache clearing, cacheTextUpdate ought to be a no-op, even
though rs6000 seems to use icbi and dcbf instructions in the function
called to initialize a trampoline, but AFAICT not in the __clear_cache
builtin. Hopefully target maintainers will have a look and take
advantage of this new piece of infrastructure to remove such
(apparent?) inconsistencies. Not rs6000 and other that call asm-coded
trampoline setup instructions, for sure, but they might wish to
introduce a CLEAR_INSN_CACHE macro or a clear_cache expander if they
don't have one.
for gcc/ChangeLog
* builtins.c (default_emit_call_builtin___clear_cache): New.
(maybe_emit_call_builtin___clear_cache): New.
(expand_builtin___clear_cache): Split into the above.
(expand_builtin): Do not issue clear_cache call any more.
* builtins.h (maybe_emit_call_builtin___clear_cache): Declare.
* config/aarch64/aarch64.c (aarch64_trampoline_init): Use
maybe_emit_call_builtin___clear_cache.
* config/arc/arc.c (arc_trampoline_init): Likewise.
* config/arm/arm.c (arm_trampoline_init): Likewise.
* config/c6x/c6x.c (c6x_initialize_trampoline): Likewise.
* config/csky/csky.c (csky_trampoline_init): Likewise.
* config/m68k/linux.h (FInALIZE_TRAMPOLINE): Likewise.
* config/tilegx/tilegx.c (tilegx_trampoline_init): Likewise.
* config/tilepro/tilepro.c (tilepro_trampoline_init): Ditto.
* config/vxworks.c: Include rtl.h, memmodel.h, and optabs.h.
(vxworks_emit_call_builtin___clear_cache): New.
* config/vxworks.h (CLEAR_INSN_CACHE): Drop.
(TARGET_EMIT_CALL_BUILTIN___CLEAR_CACHE): Define.
* target.def (trampoline_init): In the documentation, refer to
maybe_emit_call_builtin___clear_cache.
(emit_call_builtin___clear_cache): New.
* doc/tm.texi.in: Add new hook point.
(CLEAR_CACHE_INSN): Remove duplicate 'both'.
* doc/tm.texi: Rebuilt.
* targhooks.h (default_meit_call_builtin___clear_cache):
Declare.
* tree.h (BUILTIN_ASM_NAME_PTR): New.
for libgcc/ChangeLog
* config/t-vxworks (LIB2ADD): Drop.
* config/t-vxworks7 (LIB2ADD): Likewise.
* config/vxcache.c: Remove.
|
|
This commit in GNU binutils 2.35:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=b7d072167715829eed0622616f6ae0182900de3e
added the section flag 'o' to .section directive:
.section __patchable_function_entries,"awo",@progbits,foo
which specifies the symbol name which the section references. Assembler
creates a unique __patchable_function_entries section with the section,
where foo is defined, as its linked-to section. Linker keeps a section
if its linked-to section is kept during garbage collection.
This patch checks assembler support for the section flag 'o' and uses
it to implement __patchable_function_entries section. Since Solaris may
use GNU assembler with Solairs ld. Even if GNU assembler supports the
section flag 'o', it doesn't mean that Solairs ld supports it. This
feature is disabled for Solairs targets.
gcc/
PR middle-end/93195
PR middle-end/93197
* configure.ac (HAVE_GAS_SECTION_LINK_ORDER): New. Define 1 if
the assembler supports the section flag 'o' for specifying
section with link-order.
* output.h (SECTION_LINK_ORDER): New. Defined to 0x8000000.
(SECTION_MACH_DEP): Changed from 0x8000000 to 0x10000000.
* targhooks.c (default_print_patchable_function_entry): Pass
SECTION_LINK_ORDER to switch_to_section if the section flag 'o'
works. Pass current_function_decl to switch_to_section.
* varasm.c (default_elf_asm_named_section): Use 'o' flag for
SECTION_LINK_ORDER if assembler supports it.
* config.in: Regenerated.
* configure: Likewise.
* doc/sourcebuild.texi: Document o_flag_in_section.
gcc/testsuite/
PR middle-end/93195
* g++.dg/pr93195a.C: New test.
* g++.dg/pr93195b.C: Likewise.
* lib/target-supports.exp
(check_effective_target_o_flag_in_section): New proc.
|
|
In assemly code, the section flag 'R' sets the SHF_GNU_RETAIN flag to
indicate that the section must be preserved by the linker.
Add SECTION_RETAIN to indicate a section should be retained by the linker
and set SECTION_RETAIN on section for the preserved symbol if assembler
supports SHF_GNU_RETAIN. All retained symbols are placed in separate
sections with
.section .data.rel.local.preserved_symbol,"awR"
preserved_symbol:
...
.section .data.rel.local,"aw"
not_preserved_symbol:
...
to avoid
.section .data.rel.local,"awR"
preserved_symbol:
...
not_preserved_symbol:
...
which places not_preserved_symbol definition in the SHF_GNU_RETAIN
section.
gcc/
2020-12-01 H.J. Lu <hjl.tools@gmail.com>
* configure.ac (HAVE_GAS_SHF_GNU_RETAIN): New. Define 1 if
the assembler supports marking sections with SHF_GNU_RETAIN flag.
* output.h (SECTION_RETAIN): New. Defined as 0x4000000.
(SECTION_MACH_DEP): Changed from 0x4000000 to 0x8000000.
(default_unique_section): Add a bool argument.
* varasm.c (get_section): Set SECTION_RETAIN for the preserved
symbol with HAVE_GAS_SHF_GNU_RETAIN.
(resolve_unique_section): Used named section for the preserved
symbol if assembler supports SHF_GNU_RETAIN.
(get_variable_section): Handle the preserved common symbol with
HAVE_GAS_SHF_GNU_RETAIN.
(default_elf_asm_named_section): Require the full declaration and
use the 'R' flag for SECTION_RETAIN.
* config.in: Regenerated.
* configure: Likewise.
* doc/sourcebuild.texi: Document R_flag_in_section.
gcc/testsuite/
2020-12-01 H.J. Lu <hjl.tools@gmail.com>
Jozef Lawrynowicz <jozef.l@mittosystems.com>
* c-c++-common/attr-used.c: Check the 'R' flag.
* c-c++-common/attr-used-2.c: Likewise.
* c-c++-common/attr-used-3.c: New test.
* c-c++-common/attr-used-4.c: Likewise.
* gcc.c-torture/compile/attr-used-retain-1.c: Likewise.
* gcc.c-torture/compile/attr-used-retain-2.c: Likewise.
* lib/target-supports.exp
(check_effective_target_R_flag_in_section): New proc.
|
|
GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA
levels:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250
Binutils has been updated to support GNU_PROPERTY_X86_ISA_1_V[234] marker:
https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13
with
commit b0ab06937385e0ae25cebf1991787d64f439bf12
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Oct 30 06:49:57 2020 -0700
x86: Support GNU_PROPERTY_X86_ISA_1_BASELINE marker
and
commit 32930e4edbc06bc6f10c435dbcc63131715df678
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Oct 9 05:05:57 2020 -0700
x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker
in x86 ELF binaries.
Add -mneeded to emit GNU_PROPERTY_X86_ISA_1_NEEDED property to indicate
the micro-architecture ISA level required to execute the binary.
gcc/
* config.gcc: Replace cet.o with gnu-property.o. Replace
i386/t-cet with i386/t-gnu-property.
* config/i386/cet.c: Renamed to ...
* config/i386/gnu-property.c: This.
(emit_gnu_property): New function.
(file_end_indicate_exec_stack_and_cet): Renamed to ...
(file_end_indicate_exec_stack_and_gnu_property): This. Call
emit_gnu_property to generate GNU_PROPERTY_X86_FEATURE_1_AND and
GNU_PROPERTY_X86_ISA_1_NEEDED properties.
* config/i386/i386.opt (mneeded): New.
* config/i386/linux-common.h (file_end_indicate_exec_stack_and_cet):
Renamed to ...
(file_end_indicate_exec_stack_and_gnu_property): This.
(TARGET_ASM_FILE_END): Updated.
* config/i386/t-cet: Renamed to ...
* config/i386/t-gnu-property: This.
(cet.o): Renamed to ...
(gnu-property.o): This.
* doc/invoke.texi: Document -mneeded.
gcc/testsuite/
* gcc.target/i386/x86-needed-1.c: New test.
* gcc.target/i386/x86-needed-2.c: Likewise.
* gcc.target/i386/x86-needed-3.c: Likewise.
|
|
encoding
gcc/c-family
* c-cppbuiltin.c (c_cpp_builtins): Add predefined
{__GNUC_EXECUTION_CHARSET_NAME} and
_WIDE_EXECUTION_CHARSET_NAME} macros.
gcc/
* doc/cpp.texi: Document new macros.
gcc/testsuite/
* c-c++-common/cpp/wide-narrow-predef-macros.c: New test.
libcpp/
* charset.c (init_iconv_desc): Initialize "to" and "from" fields.
* directives.c (cpp_get_narrow_charset_name): New function.
(cpp_get_wide_charset_name): Likewise.
* include/cpplib.h (cpp_get_narrow_charset_name): Prototype.
(cpp_get_wide_charset_name): Likewise.
* internal.h (cset_converter): Add "to" and "from" fields.
|
|
Historically, float_t has been defined as double on s390 and gcc would
emit double precision insns for evaluating float expressions when in
standard-compliant mode. Configure that behavior at compile-time as prep
for changes in glibc: When glibc ties float_t to double, keep the old
behavior; when glibc derives float_t from FLT_EVAL_METHOD (as on most
other archs), revert to the default behavior (i.e.,
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT). Provide a configure option
--enable-s390-excess-float-precision to override the check.
gcc/ChangeLog:
2020-12-01 Marius Hillenbrand <mhillen@linux.ibm.com>
* configure.ac: Add configure option
--enable-s390-excess-float-precision and check to derive default
from glibc.
* config/s390/s390.c: Guard s390_excess_precision with an ifdef
for ENABLE_S390_EXCESS_FLOAT_PRECISION.
* doc/install.texi: Document --enable-s390-excess-float-precision.
* configure: Regenerate.
* config.in: Regenerate.
|
|
This patch adds a new GCC plugin event: PLUGIN_ANALYZER_INIT, called
when -fanalyzer is starting, allowing for GCC plugins to register
additional state-machine-based checks within -fanalyzer. The idea
is that 3rd-party code might want to add domain-specific checks for
its own APIs - with the caveat that the analyzer is itself still
rather experimental.
As an example, the patch adds a proof-of-concept plugin to the testsuite
for checking CPython code: verifying that code that relinquishes
CPython's global interpreter lock doesn't attempt to do anything with
PyObjects in the sections where the lock isn't held. It also adds a
warning about nested releases of the lock, which is forbidden.
For example:
demo.c: In function 'foo':
demo.c:11:3: warning: use of PyObject '*(obj)' without the GIL
11 | Py_INCREF (obj);
| ^~~~~~~~~
'test': events 1-3
|
| 15 | void test (PyObject *obj)
| | ^~~~
| | |
| | (1) entry to 'test'
| 16 | {
| 17 | Py_BEGIN_ALLOW_THREADS
| | ~~~~~~~~~~~~~~~~~~~~~~
| | |
| | (2) releasing the GIL here
| 18 | foo (obj);
| | ~~~~~~~~~
| | |
| | (3) calling 'foo' from 'test'
|
+--> 'foo': events 4-5
|
| 9 | foo (PyObject *obj)
| | ^~~
| | |
| | (4) entry to 'foo'
| 10 | {
| 11 | Py_INCREF (obj);
| | ~~~~~~~~~
| | |
| | (5) PyObject '*(obj)' used here without the GIL
|
Doing so requires adding some logic for ignoring macro expansions in
analyzer diagnostics, since the insides of Py_INCREF and
Py_BEGIN_ALLOW_THREADS are not of interest to the user for these cases.
gcc/analyzer/ChangeLog:
* analyzer-pass.cc (pass_analyzer::execute): Move sorry call to...
(sorry_no_analyzer): New.
* analyzer.h (class state_machine): New forward decl.
(class logger): New forward decl.
(class plugin_analyzer_init_iface): New.
(sorry_no_analyzer): New decl.
* checker-path.cc (checker_path::fixup_locations): New.
* checker-path.h (checker_event::set_location): New.
(checker_path::fixup_locations): New decl.
* diagnostic-manager.cc
(diagnostic_manager::emit_saved_diagnostic): Call
checker_path::fixup_locations, and call fixup_location
on the primary location.
* engine.cc: Include "plugin.h".
(class plugin_analyzer_init_impl): New.
(impl_run_checkers): Invoke PLUGIN_ANALYZER_INIT callbacks.
* pending-diagnostic.h (pending_diagnostic::fixup_location): New
vfunc.
gcc/ChangeLog:
* doc/plugins.texi (Plugin callbacks): Add PLUGIN_ANALYZER_INIT.
* plugin.c (register_callback): Likewise.
(invoke_plugin_callbacks_full): Likewise.
* plugin.def (PLUGIN_ANALYZER_INIT): New event.
gcc/testsuite/ChangeLog:
* gcc.dg/plugin/analyzer_gil_plugin.c: New test.
* gcc.dg/plugin/gil-1.c: New test.
* gcc.dg/plugin/gil.h: New header.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add the new plugin
and test.
|
|
The optional target selector for the dg-require-effective-target
directive needs to be { target selector } not just { selector } as
currently documented.
gcc/ChangeLog:
* doc/sourcebuild.texi (Directives): Fix description of
dg-require-effective-target to include "target" in selector.
|
|
gcc/ChangeLog:
* config.gcc (*-*-darwin*): Set d_target_objs and target_has_targetdm.
* config/elfos.h (TARGET_D_MINFO_SECTION): New macro.
(TARGET_D_MINFO_START_NAME): New macro.
(TARGET_D_MINFO_END_NAME): New macro.
* config/t-darwin: Add darwin-d.o.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (D language and ABI): Add @hook for
TARGET_D_MINFO_SECTION, TARGET_D_MINFO_START_NAME, and
TARGET_D_MINFO_END_NAME.
* config/darwin-d.c: New file.
gcc/d/ChangeLog:
* d-target.def (d_minfo_section): New hook.
(d_minfo_start_name): New hook.
(d_minfo_end_name): New hook.
* modules.cc: Include d-target.h.
(register_moduleinfo): Update to use new targetdm hooks.
|
|
PR other/98027
* doc/install.texi: Default to --enable-cet=auto.
|
|
preference in backend
This is a patch that introduces the aarch64-autovec-preference that can
take values from 0 - 4, 0 being the default.
It can be used to override the autovectorisation preferences in the
backend:
0 - use default scheme
1 - only use Advanced SIMD
2 - only use SVE
3 - use Advanced SIMD and SVE, prefer Advanced SIMD in the event of a
tie (as determined by costs)
4 - use Advanced SIMD and SVE, prefer SVE in the event of a tie (as
determined by costs)
It can valuable for experimentation when comparing SVE and Advanced SIMD
autovectorisation strategies.
It achieves this adjusting the order of the interleaved SVE and Advanced
SIMD modes in aarch64_autovectorize_vector_modes.
It also adjusts aarch64_preferred_simd_mode to use the new comparison
function to pick Advanced SIMD or SVE to start with.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
* config/aarch64/aarch64.opt
(-param=aarch64-autovec-preference): Define.
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set aarch64_sve_compare_costs to 0 when preferring only Advanced
SIMD.
(aarch64_cmp_autovec_modes): Define.
(aarch64_preferred_simd_mode): Adjust to use the above.
(aarch64_autovectorize_vector_modes): Likewise.
* doc/invoke.texi: Document aarch64-autovec-preference param.
|
|
Handling stack variables has three features.
1) Ensure HWASAN required alignment for stack variables
When tagging shadow memory, we need to ensure that each tag granule is
only used by one variable at a time.
This is done by ensuring that each tagged variable is aligned to the tag
granule representation size and also ensure that the end of each
object is aligned to ensure the start of any other data stored on the
stack is in a different granule.
This patch ensures the above by forcing the stack pointer to be aligned
before and after allocating any stack objects. Since we are forcing
alignment we also use `align_local_variable` to ensure this new alignment
is advertised properly through SET_DECL_ALIGN.
2) Put tags into each stack variable pointer
Make sure that every pointer to a stack variable includes a tag of some
sort on it.
The way tagging works is:
1) For every new stack frame, a random tag is generated.
2) A base register is formed from the stack pointer value and this
random tag.
3) References to stack variables are now formed with RTL describing an
offset from this base in both tag and value.
The random tag generation is handled by a backend hook. This hook
decides whether to introduce a random tag or use the stack background
based on the parameter hwasan-random-frame-tag. Using the stack
background is necessary for testing and bootstrap. It is necessary
during bootstrap to avoid breaking the `configure` test program for
determining stack direction.
Using the stack background means that every stack frame has the initial
tag of zero and variables are tagged with incrementing tags from 1,
which also makes debugging a bit easier.
Backend hooks define the size of a tag, the layout of the HWASAN shadow
memory, and handle emitting the code that inserts and extracts tags from a
pointer.
3) For each stack variable, tag and untag the shadow stack on function
prologue and epilogue.
On entry to each function we tag the relevant shadow stack region for
each stack variable. This stack region is tagged to match the tag added to
each pointer to that variable.
This is the first patch where we use the HWASAN shadow space, so we need
to add in the libhwasan initialisation code that creates this shadow
memory region into the binary we produce. This instrumentation is done
in `compile_file`.
When exiting a function we need to ensure the shadow stack for this
function has no remaining tags. Without clearing the shadow stack area
for this stack frame, later function calls could get false positives
when those later function calls check untagged areas (such as parameters
passed on the stack) against a shadow stack area with left-over tag.
Hence we ensure that the entire stack frame is cleared on function exit.
config/ChangeLog:
* bootstrap-hwasan.mk: Disable random frame tags for stack-tagging
during bootstrap.
gcc/ChangeLog:
* asan.c (struct hwasan_stack_var): New.
(hwasan_sanitize_p): New.
(hwasan_sanitize_stack_p): New.
(hwasan_sanitize_allocas_p): New.
(initialize_sanitizer_builtins): Define new builtins.
(ATTR_NOTHROW_LIST): New macro.
(hwasan_current_frame_tag): New.
(hwasan_frame_base): New.
(stack_vars_base_reg_p): New.
(hwasan_maybe_init_frame_base_init): New.
(hwasan_record_stack_var): New.
(hwasan_get_frame_extent): New.
(hwasan_increment_frame_tag): New.
(hwasan_record_frame_init): New.
(hwasan_emit_prologue): New.
(hwasan_emit_untag_frame): New.
(hwasan_finish_file): New.
(hwasan_truncate_to_tag_size): New.
* asan.h (hwasan_record_frame_init): New declaration.
(hwasan_record_stack_var): New declaration.
(hwasan_emit_prologue): New declaration.
(hwasan_emit_untag_frame): New declaration.
(hwasan_get_frame_extent): New declaration.
(hwasan_maybe_enit_frame_base_init): New declaration.
(hwasan_frame_base): New declaration.
(stack_vars_base_reg_p): New declaration.
(hwasan_current_frame_tag): New declaration.
(hwasan_increment_frame_tag): New declaration.
(hwasan_truncate_to_tag_size): New declaration.
(hwasan_finish_file): New declaration.
(hwasan_sanitize_p): New declaration.
(hwasan_sanitize_stack_p): New declaration.
(hwasan_sanitize_allocas_p): New declaration.
(HWASAN_TAG_SIZE): New macro.
(HWASAN_TAG_GRANULE_SIZE): New macro.
(HWASAN_STACK_BACKGROUND): New macro.
* builtin-types.def (BT_FN_VOID_PTR_UINT8_PTRMODE): New.
* builtins.def (DEF_SANITIZER_BUILTIN): Enable for HWASAN.
* cfgexpand.c (align_local_variable): When using hwasan ensure
alignment to tag granule.
(align_frame_offset): New.
(expand_one_stack_var_at): For hwasan use tag offset.
(expand_stack_vars): Record stack objects for hwasan.
(expand_one_stack_var_1): Record stack objects for hwasan.
(init_vars_expansion): Initialise hwasan state.
(expand_used_vars): Emit hwasan prologue and generate hwasan epilogue.
(pass_expand::execute): Emit hwasan base initialization if needed.
* doc/tm.texi (TARGET_MEMTAG_TAG_SIZE,TARGET_MEMTAG_GRANULE_SIZE,
TARGET_MEMTAG_INSERT_RANDOM_TAG,TARGET_MEMTAG_ADD_TAG,
TARGET_MEMTAG_SET_TAG,TARGET_MEMTAG_EXTRACT_TAG,
TARGET_MEMTAG_UNTAGGED_POINTER): Document new hooks.
* doc/tm.texi.in (TARGET_MEMTAG_TAG_SIZE,TARGET_MEMTAG_GRANULE_SIZE,
TARGET_MEMTAG_INSERT_RANDOM_TAG,TARGET_MEMTAG_ADD_TAG,
TARGET_MEMTAG_SET_TAG,TARGET_MEMTAG_EXTRACT_TAG,
TARGET_MEMTAG_UNTAGGED_POINTER): Document new hooks.
* explow.c (get_dynamic_stack_base): Take new `base` argument.
* explow.h (get_dynamic_stack_base): Take new `base` argument.
* sanitizer.def (BUILT_IN_HWASAN_INIT): New.
(BUILT_IN_HWASAN_TAG_MEM): New.
* target.def (target_memtag_tag_size,target_memtag_granule_size,
target_memtag_insert_random_tag,target_memtag_add_tag,
target_memtag_set_tag,target_memtag_extract_tag,
target_memtag_untagged_pointer): New hooks.
* targhooks.c (HWASAN_SHIFT): New.
(HWASAN_SHIFT_RTX): New.
(default_memtag_tag_size): New default hook.
(default_memtag_granule_size): New default hook.
(default_memtag_insert_random_tag): New default hook.
(default_memtag_add_tag): New default hook.
(default_memtag_set_tag): New default hook.
(default_memtag_extract_tag): New default hook.
(default_memtag_untagged_pointer): New default hook.
* targhooks.h (default_memtag_tag_size): New default hook.
(default_memtag_granule_size): New default hook.
(default_memtag_insert_random_tag): New default hook.
(default_memtag_add_tag): New default hook.
(default_memtag_set_tag): New default hook.
(default_memtag_extract_tag): New default hook.
(default_memtag_untagged_pointer): New default hook.
* toplev.c (compile_file): Call hwasan_finish_file when finished.
|
|
These flags can't be used at the same time as any of the other
sanitizers.
We add an equivalent flag to -static-libasan in -static-libhwasan to
ensure static linking.
The -fsanitize=kernel-hwaddress option is for compiling targeting the
kernel. This flag has defaults to match the LLVM implementation and
sets some other behaviors to work in the kernel (e.g. accounting for
the fact that the stack pointer will have 0xff in the top byte and to not
call the userspace library initialisation routines).
The defaults are that we do not sanitize variables on the stack and
always recover from a detected bug.
Since we are introducing a few more conflicts between sanitizer flags we
refactor the checking for such conflicts to use a helper function which
makes checking for such conflicts more easy and consistent.
We introduce a backend hook `targetm.memtag.can_tag_addresses` that
indicates to the mid-end whether a target has a feature like AArch64 TBI
where the top byte of an address is ignored.
Without this feature hwasan sanitization is not done.
gcc/ChangeLog:
* common.opt (flag_sanitize_recover): Default for kernel
hwaddress.
(static-libhwasan): New cli option.
* config/aarch64/aarch64.c (aarch64_can_tag_addresses): New.
(TARGET_MEMTAG_CAN_TAG_ADDRESSES): New.
* config/gnu-user.h (LIBHWASAN_EARLY_SPEC): hwasan equivalent of
asan command line flags.
* cppbuiltin.c (define_builtin_macros_for_compilation_flags):
Add hwasan equivalent of __SANITIZE_ADDRESS__.
* doc/invoke.texi: Document hwasan command line flags.
* doc/tm.texi: Document new hook.
* doc/tm.texi.in: Document new hook.
* flag-types.h (enum sanitize_code): New sanitizer values.
* gcc.c (STATIC_LIBHWASAN_LIBS): New macro.
(LIBHWASAN_SPEC): New macro.
(LIBHWASAN_EARLY_SPEC): New macro.
(SANITIZER_EARLY_SPEC): Update to include hwasan.
(SANITIZER_SPEC): Update to include hwasan.
(sanitize_spec_function): Use hwasan options.
* opts.c (finish_options): Describe conflicts between address
sanitizers.
(find_sanitizer_argument): New.
(report_conflicting_sanitizer_options): New.
(sanitizer_opts): Introduce new sanitizer flags.
(common_handle_option): Add defaults for kernel sanitizer.
* params.opt (hwasan--instrument-stack): New
(hwasan-random-frame-tag): New
(hwasan-instrument-allocas): New
(hwasan-instrument-reads): New
(hwasan-instrument-writes): New
(hwasan-instrument-mem-intrinsics): New
* target.def (HOOK_PREFIX): Add new hook.
(can_tag_addresses): Add new hook under memtag prefix.
* targhooks.c (default_memtag_can_tag_addresses): New.
* targhooks.h (default_memtag_can_tag_addresses): New decl.
* toplev.c (process_options): Ensure hwasan only on
architectures that advertise the possibility.
|
|
This is an analogous option to --bootstrap-asan to configure. It allows
bootstrapping GCC using HWASAN.
For the same reasons as for ASAN we have to avoid using the HWASAN
sanitizer when compiling libiberty and the lto-plugin.
Also add a function to query whether -fsanitize=hwaddress has been
passed.
ChangeLog:
* configure: Regenerate.
* configure.ac: Add --bootstrap-hwasan option.
config/ChangeLog:
* bootstrap-hwasan.mk: New file.
gcc/ChangeLog:
* doc/install.texi: Document new option.
libiberty/ChangeLog:
* configure: Regenerate.
* configure.ac: Avoid using sanitizer.
lto-plugin/ChangeLog:
* Makefile.am: Avoid using sanitizer.
* Makefile.in: Regenerate.
|
|
This reverts commit c4fa3728ab4f78984a549894e0e8c4d6a253e540,
which caused a regression in the default for flag_excess_precision.
2020-11-24 Ulrich Weigand <uweigand@de.ibm.com>
gcc/
PR tree-optimization/97970
* doc/invoke.texi (-ffast-math): Revert last change.
* opts.c: Revert last change.
|
|
This patch implements the following set of changes:
1. If a component flag of -ffast-math (or -funsafe-math-optimizations)
is explicitly set (or reset) on the command line, this should override
any implicit change due to -f(no-)fast-math, no matter in which order
the flags come on the command line. This change affects all flags.
2. Any component flag modified from its default by -ffast-math should
be reset to the default by -fno-fast-math. This was previously
not done for the following flags:
-fcx-limited-range
-fexcess-precision=
3. Once -ffinite-math-only is true, the -f(no-)signaling-nans flag has
no meaning (if we have no NaNs at all, it does not matter whether
there is a difference between quiet and signaling NaNs). Therefore,
it does not make sense for -ffast-math to imply -fno-signaling-nans.
(This is also a documentation change.)
4. -ffast-math is documented to imply -fno-rounding-math, however the
latter setting is the default anyway; therefore it does not make
sense to try to modify it from its default setting.
5. The __FAST_MATH__ preprocessor macro should be defined if and only
if all the component flags of -ffast-math are set to the value that
is documented as the effect of -ffast-math. The following flags
were currently *not* so tested:
-fcx-limited-range
-fassociative-math
-freciprocal-math
-frounding-math
(Note that we should still *test* for -fno-rounding-math here even
though it is not set as to 4. -ffast-math -frounding-math should
not set the __FAST_MATH__ macro.)
This is also a documentation change.
2020-11-24 Ulrich Weigand <uweigand@de.ibm.com>
gcc/
* doc/invoke.texi (-ffast-math): Remove mention of -fno-signaling-nans.
Clarify conditions when __FAST_MATH__ preprocessor macro is defined.
* opts.c (common_handle_option): Pass OPTS_SET to set_fast_math_flags
and set_unsafe_math_optimizations_flags.
(set_fast_math_flags): Add OPTS_SET argument, and use it to avoid
setting flags already explicitly set on the command line. In the !set
case, also reset x_flag_cx_limited_range and x_flag_excess_precision.
Never reset x_flag_signaling_nans or x_flag_rounding_math.
(set_unsafe_math_optimizations_flags): Add OPTS_SET argument, and use
it to avoid setting flags already explicitly set on the command line.
(fast_math_flags_set_p): Also test x_flag_cx_limited_range,
x_flag_associative_math, x_flag_reciprocal_math, and
x_flag_rounding_math.
|
|
gcc/
* doc/install.texi (Prerequisites) <Tcl>: Add comment.
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-1.c: Avoid Tcl 8.5-specific
behavior.
* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Avoid
Tcl 8.5-specific behavior.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
Reported-by: David Edelsohn <dje.gcc@gmail.com>
|
|
The "persistent" attribute is now handled generically, and does not
need specific support in the MSP430 back end.
gcc/ChangeLog:
* config/msp430/msp430.c (msp430_section_attr): Don't warn for "lower"
attribute used with "noinit" or "persistent" attributes.
(msp430_persist_attr): Remove.
(attr_lower_exclusions): Remove ATTR_PERSIST exclusion.
(attr_upper_exclusions): Likewise.
(attr_either_exclusions): Likewise.
(attr_persist_exclusions): Remove.
(msp430_attribute_table): Remove ATTR_PERSIST handling.
(msp430_handle_generic_attribute): Remove ATTR_PERSIST section conflict
handling.
(TARGET_ASM_INIT_SECTIONS): Remove.
(msp430_init_sections): Remove.
(msp430_select_section): Use default_elf_select_section for decls with
the "persistent" attribute.
(msp430_section_type_flags): Remove ".persistent" section handling.
* doc/extend.texi (MSP430 Variable Attributes): Remove "noinit" and
"persistent" documentation.
gcc/testsuite/ChangeLog:
* g++.target/msp430/data-attributes.C: Remove expected warnings for
"lower" attribute conflicts.
Adjust expected wording for "persistent" attribute misuse.
* gcc.target/msp430/data-attributes-2.c: Likewise.
* gcc.target/msp430/pr78818-auto-warn.c: Likewise.
|
|
The "persistent" attribute is used for variables that are initialized
by the program loader, but are not initialized by the runtime startup
code. "persistent" variables are placed in a non-volatile area of
memory, which allows their value to "persist" between processor resets.
gcc/c-family/ChangeLog:
* c-attribs.c (handle_special_var_sec_attribute): New.
(handle_noinit_attribute): Remove.
(attr_noinit_exclusions): Rename to...
(attr_section_exclusions): ...this, and add "persistent" attribute
exclusion.
(c_common_attribute_table): Add "persistent" attribute.
gcc/ChangeLog:
* doc/extend.texi (Common Variable Attributes): Document the
"persistent" variable attribute.
* doc/sourcebuild.texi (Effective-Target Keywords): Document
the "persistent" effective target keyword.
* tree.h (DECL_PERSISTENT_P): Define.
* varasm.c (bss_initializer_p): Return false for a
DECL_PERSISTENT_P decl initialized to zero.
(default_section_type_flags): Handle the ".persistent" section.
(default_elf_select_section): Likewise.
(default_unique_section): Likewise.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/noinit-attribute.c: Moved to...
* c-c++-common/torture/attr-noinit-main.inc: ...here.
* lib/target-supports.exp (check_effective_target_persistent): New.
* c-c++-common/torture/attr-noinit-1.c: New test.
* c-c++-common/torture/attr-noinit-2.c: New test.
* c-c++-common/torture/attr-noinit-3.c: New test.
* c-c++-common/torture/attr-noinit-invalid.c: New test.
* c-c++-common/torture/attr-persistent-1.c: New test.
* c-c++-common/torture/attr-persistent-2.c: New test.
* c-c++-common/torture/attr-persistent-3.c: New test.
* c-c++-common/torture/attr-persistent-invalid.c: New test.
* c-c++-common/torture/attr-persistent-main.inc: New test.
|
|
Document how to configure using asan (bootstrap-asan option to the
--with-build-config configure argument).
gcc/ChangeLog:
* doc/install.texi: Document bootstrap-asan option.
|
|
This patch finishes the second half of -Wrange-loop-construct I promised
to implement: it warns when a loop variable in a range-based for-loop is
initialized with a value of a different type resulting in a copy. For
instance:
int arr[10];
for (const double &x : arr) { ... }
where in every iteration we have to create and destroy a temporary value
of type double, to which we bind the reference. This could negatively
impact performance.
As per Clang, this doesn't warn when the range returns a copy, hence the
glvalue_p check.
gcc/ChangeLog:
PR c++/94695
* doc/invoke.texi: Update the -Wrange-loop-construct description.
gcc/cp/ChangeLog:
PR c++/94695
* parser.c (warn_for_range_copy): Warn when the loop variable is
initialized with a value of a different type resulting in a copy.
gcc/testsuite/ChangeLog:
PR c++/94695
* g++.dg/warn/Wrange-loop-construct2.C: New test.
|
|
I noticed a couple of places we used @code{program} instead of
@command{program}.
gcc/
* doc/invoke.texi: Replace a couple of @code with @command
|
|
The following patch implements __builtin_clear_padding builtin that clears
the padding bits in object representation (but preserves value
representation). Inside of unions it clears only those padding bits that
are padding for all the union members (so that it never alters value
representation).
It handles trailing padding, padding in the middle of structs including
bitfields (PDP11 unhandled, I've never figured out how those bitfields
work), VLAs (doesn't handle variable length structures, but I think almost
nobody uses them and it isn't worth the extra complexity). For VLAs and
sufficiently large arrays it uses runtime clearing loop instead of emitting
straight-line code (unless arrays are inside of a union).
The way I think this can be used for atomics is e.g. if the structures
are power of two sized and small enough that we use the hw atomics
for say compare_exchange __builtin_clear_padding could be called first on
the address of expected and desired arguments (for desired only if we want
to ensure that most of the time the atomic memory will have padding bits
cleared), then perform the weak cmpxchg and if that fails, we got the
value from the atomic memory; we can call __builtin_clear_padding on a copy
of that and then compare it with expected, and if it is the same with the
padding bits masked off, we can use the original with whatever random
padding bits in it as the new expected for next cmpxchg.
__builtin_clear_padding itself is not atomic and therefore it shouldn't
be called on the atomic memory itself, but compare_exchange*'s expected
argument is a reference and normally the implementation may store there
the current value from memory, so padding bits can be cleared in that,
and desired is passed by value rather than reference, so clearing is fine
too.
When using libatomic, we can use it either that way, or add new libatomic
APIs that accept another argument, pointer to the padding bit bitmask,
and construct that in the template as
alignas (_T) unsigned char _mask[sizeof (_T)];
std::memset (_mask, ~0, sizeof (_mask));
__builtin_clear_padding ((_T *) _mask);
which will have bits cleared for padding bits and set for bits taking part
in the value representation. Then libatomic could internally instead
of using memcmp compare
for (i = 0; i < N; i++) if ((val1[i] & mask[i]) != (val2[i] & mask[i]))
2020-11-20 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/88101
gcc/
* builtins.def (BUILT_IN_CLEAR_PADDING): New built-in function.
* gimplify.c (gimplify_call_expr): Rewrite single argument
BUILT_IN_CLEAR_PADDING into two-argument variant.
* gimple-fold.c (clear_padding_unit, clear_padding_buf_size): New
const variables.
(struct clear_padding_struct): New type.
(clear_padding_flush, clear_padding_add_padding,
clear_padding_emit_loop, clear_padding_type,
clear_padding_union, clear_padding_real_needs_padding_p,
clear_padding_type_may_have_padding_p,
gimple_fold_builtin_clear_padding): New functions.
(gimple_fold_builtin): Handle BUILT_IN_CLEAR_PADDING.
* doc/extend.texi (__builtin_clear_padding): Document.
gcc/c-family/
* c-common.c (check_builtin_function_arguments): Handle
BUILT_IN_CLEAR_PADDING.
gcc/testsuite/
* c-c++-common/builtin-clear-padding-1.c: New test.
* c-c++-common/torture/builtin-clear-padding-1.c: New test.
* c-c++-common/torture/builtin-clear-padding-2.c: New test.
* c-c++-common/torture/builtin-clear-padding-3.c: New test.
* c-c++-common/torture/builtin-clear-padding-4.c: New test.
* c-c++-common/torture/builtin-clear-padding-5.c: New test.
* g++.dg/torture/builtin-clear-padding-1.C: New test.
* g++.dg/torture/builtin-clear-padding-2.C: New test.
* gcc.dg/builtin-clear-padding-1.c: New test.
|
|
Add builtins for HALT and LMBD, per Texas Instruments document
SPRUHV7C. Use the new LMBD pattern to define an expand for clz.
Binutils [1] and sim [2] support for LMBD instruction are merged now.
[1] https://sourceware.org/pipermail/binutils/2020-October/113901.html
[2] https://sourceware.org/pipermail/gdb-patches/2020-November/173141.html
gcc/ChangeLog:
* config/pru/alu-zext.md: Add lmbd patterns for zero_extend
variants.
* config/pru/pru.c (enum pru_builtin): Add HALT and LMBD.
(pru_init_builtins): Ditto.
(pru_builtin_decl): Ditto.
(pru_expand_builtin): Ditto.
* config/pru/pru.h (CLZ_DEFINED_VALUE_AT_ZERO): Define PRU
value for CLZ with zero value parameter.
* config/pru/pru.md: Add halt, lmbd and clz patterns.
* doc/extend.texi: Document PRU builtins.
gcc/testsuite/ChangeLog:
* gcc.target/pru/halt.c: New test.
* gcc.target/pru/lmbd.c: New test.
|
|
Currently we have three vector cost models: cheap, dynamic and
unlimited. -O2 -ftree-vectorize uses “cheap” by default, but that's
still relatively aggressive about peeling and aliasing checks,
and can lead to significant code size growth.
This patch adds an even more conservative choice, which for lack of
imagination I've called “very cheap”. It only allows vectorisation
if the vector code entirely replaces the scalar code. It also
requires one iteration of the vector loop to pay for itself,
regardless of how often the loop iterates. (If the vector loop
needs multiple iterations to be beneficial then things are
probably too close to call, and the conservative thing would
be to stick with the scalar code.)
The idea is that this should be suitable for -O2, although the patch
doesn't change any defaults itself.
I tested this by building and running a bunch of workloads for SVE,
with three options:
(1) -O2
(2) -O2 -ftree-vectorize -fvect-cost-model=very-cheap
(3) -O2 -ftree-vectorize [-fvect-cost-model=cheap]
All three builds used the default -msve-vector-bits=scalable and
ran with the minimum vector length of 128 bits, which should give
a worst-case bound for the performance impact.
The workloads included a mixture of microbenchmarks and full
applications. Because it's quite an eclectic mix, there's not
much point giving exact figures. The aim was more to get a general
impression.
Code size growth with (2) was much lower than with (3). Only a
handful of tests increased by more than 5%, and all of them were
microbenchmarks.
In terms of performance, (2) was significantly faster than (1)
on microbenchmarks (as expected) but also on some full apps.
Again, performance only regressed on a handful of tests.
As expected, the performance of (3) vs. (1) and (3) vs. (2) is more
of a mixed bag. There are several significant improvements with (3)
over (2), but also some (smaller) regressions. That seems to be in
line with -O2 -ftree-vectorize being a kind of -O2.5.
The patch reorders vect_cost_model so that values are in order
of increasing aggressiveness, which makes it possible to use
range checks. The value 0 still represents “unlimited”,
so “if (flag_vect_cost_model)” is still a meaningful check.
gcc/
* doc/invoke.texi (-fvect-cost-model): Add a very-cheap model.
* common.opt (fvect-cost-model=): Add very-cheap as a possible option.
(fsimd-cost-model=): Likewise.
(vect_cost_model): Add very-cheap.
* flag-types.h (vect_cost_model): Add VECT_COST_MODEL_VERY_CHEAP.
Put the values in order of increasing aggressiveness.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use
range checks when comparing against VECT_COST_MODEL_CHEAP.
(vect_prune_runtime_alias_test_list): Do not allow any alias
checks for the very-cheap cost model.
* tree-vect-loop.c (vect_analyze_loop_costing): Do not allow
any peeling for the very-cheap cost model. Also require one
iteration of the vector loop to pay for itself.
gcc/testsuite/
* gcc.dg/vect/vect-cost-model-1.c: New test.
* gcc.dg/vect/vect-cost-model-2.c: Likewise.
* gcc.dg/vect/vect-cost-model-3.c: Likewise.
* gcc.dg/vect/vect-cost-model-4.c: Likewise.
* gcc.dg/vect/vect-cost-model-5.c: Likewise.
* gcc.dg/vect/vect-cost-model-6.c: Likewise.
|