Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/
* doc/invoke.texi (fprofile-info-section): Mention
__gcov_filename_to_gcfn(). Use "freestanding" to match with C11
standard language. Fix minor example code issues.
* gcov-io.h (GCOV_FILENAME_MAGIC): Define and document.
gcc/testsuite/
* gcc.dg/gcov-info-to-gcda.c: Test __gcov_filename_to_gcfn().
libgcc/
* gcov.h (__gcov_info_to_gcda): Mention __gcov_filename_to_gcfn().
(__gcov_filename_to_gcfn): Declare and document.
* libgcov-driver.c (dump_string): New.
(__gcov_filename_to_gcfn): Likewise.
(__gcov_info_to_gcda): Adjust comment to match C11 standard language.
|
|
I found this extension to -fdump-analyzer-feasibility very helpful when
debugging PR analyzer/105285.
gcc/analyzer/ChangeLog:
* diagnostic-manager.cc (epath_finder::process_worklist_item):
Call dump_feasible_path when a path that reaches the the target
enode is found.
(epath_finder::dump_feasible_path): New.
* engine.cc (feasibility_state::dump_to_pp): New.
* exploded-graph.h (feasibility_state::dump_to_pp): New decl.
* feasible-graph.cc (feasible_graph::dump_feasible_path): New.
* feasible-graph.h (feasible_graph::dump_feasible_path): New
decls.
* program-point.cc (function_point::print): Fix missing trailing
newlines.
* program-point.h (program_point::print_source_line): Remove
unimplemented decl.
gcc/ChangeLog:
* doc/invoke.texi (-fdump-analyzer-feasibility): Mention the
fpath.txt output.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
The choice of ieee or ibm long double format is orthogonal to multilibs,
as the two sets of symbols co-exist and don't need a separate multilib.
gcc/ChangeLog:
* doc/install.texi (Configuration): Remove misleading text
around LE PowerPC Linux multilibs.
|
|
This patch documents the Solaris-specific D bootstrap requirements.
Tested by building and inspecting gccinstall.{pdf,info}.
2022-03-16 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc:
PR d/103528
* doc/install.texi (Tools/packages necessary for building GCC)
(GDC): Document libphobos requirement.
(Host/target specific installation notes for GCC, *-*-solaris2*):
Document libphobos and GDC specifics.
|
|
In commit a2a919aa501e3 (2003), built-ins for modf and modff were added.
In extend.texi, section "Other Builtins", "modf" was added to the paragraph
"There are also built-in versions of the ISO C99 functions [...]" and
"modf" was also added to the paragraph "The ISO C90 functions [...]".
"modff" was not added to either paragraph.
Based on the context clues about where "modfl" and other similar function
pairs like "powf/powl" appear, I believe the reference to "modf" in the
first paragraph (C99) should instead be "modff".
2022-04-25 Paul A. Clarke <pc@us.ibm.com>
gcc
* doc/extend.texi (Other Builtins): Correct reference to 'modff'.
|
|
2022-04-22 Paul A. Clarke <pc@us.ibm.com>
gcc
* doc/extend.texi: Correct "This" to "These".
|
|
That is, support for cris-linux-gnu was removed in gcc-11, but
install.texi wasn't adjusted accordingly. Also, unfortunately the
developer-related sites are gone with no replacements. And, CRIS is
used in other chip series as well, but allude rather than list.
The generated manpages, info, pdf and html were sanity-checked.
gcc:
* doc/install.texi <CRIS>: Remove references to removed websites and
adjust for cris-*-elf being the only remaining toolchain.
|
|
...and related options. These stale bits were overlooked when support
for "Linux/GNU" and CRIS v32 was removed, before the gcc-11 release.
Resulting pdf, html and info inspected for sanity.
gcc:
* doc/invoke.texi <CRIS>: Remove references to options for removed
subtarget cris-axis-linux-gnu and tweak wording accordingly.
|
|
gcc:
* doc/install.texi (Specific): Adjust mingw-w64 download link.
|
|
So far z16 was identified as arch14. After the machine has been
announced we can now add the real name.
gcc/ChangeLog:
* common/config/s390/s390-common.cc: Rename PF_ARCH14 to PF_Z16.
* config.gcc: Add z16 as march/mtune switch.
* config/s390/driver-native.cc (s390_host_detect_local_cpu):
Recognize z16 with -march=native.
* config/s390/s390-opts.h (enum processor_type): Rename
PROCESSOR_ARCH14 to PROCESSOR_3931_Z16.
* config/s390/s390.cc (PROCESSOR_ARCH14): Rename to ...
(PROCESSOR_3931_Z16): ... throughout the file.
(s390_processor processor_table): Add z16 as cpu string.
* config/s390/s390.h (enum processor_flags): Rename PF_ARCH14 to
PF_Z16.
(TARGET_CPU_ARCH14): Rename to ...
(TARGET_CPU_Z16): ... this.
(TARGET_CPU_ARCH14_P): Rename to ...
(TARGET_CPU_Z16_P): ... this.
(TARGET_ARCH14): Rename to ...
(TARGET_Z16): ... this.
(TARGET_ARCH14_P): Rename to ...
(TARGET_Z16_P): ... this.
* config/s390/s390.md (cpu_facility): Rename arch14 to z16 and
check TARGET_Z16 instead of TARGET_ARCH14.
* config/s390/s390.opt: Add z16 to processor_type.
* doc/invoke.texi: Document z16 and arch14.
|
|
gcc/ChangeLog:
* doc/invoke.texi: Document it.
|
|
gcc/ChangeLog:
* doc/extend.texi (Common Function Attributes): Document that
'access' does not imply 'nonnull'.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
This patch fixes some spelling and grammar issues in the match.pd
documentation.
gcc/ChangeLog:
* doc/match-and-simplify.texi: Fix typos.
|
|
We should make sure that the hard register set that is actually cleared by
the target hook zero_call_used_regs should be a subset of all call used
registers.
At the same time, update documentation for the target hook
TARGET_ZERO_CALL_USED_REGS.
This new assertion identified a bug in the i386 implemenation, which
incorrectly set the zeroed_hardregs for stack registers. Fixed this bug
in i386 implementation.
gcc/ChangeLog:
2022-04-01 Qing Zhao <qing.zhao@oracle.com>
* config/i386/i386.cc (zero_all_st_registers): Return the value of
num_of_st.
(ix86_zero_call_used_regs): Update zeroed_hardregs set according to
the return value of zero_all_st_registers.
* doc/tm.texi: Update the documentation of TARGET_ZERO_CALL_USED_REGS.
* function.cc (gen_call_used_regs_seq): Add an assertion.
* target.def: Update the documentation of TARGET_ZERO_CALL_USED_REGS.
|
|
gcc/
* doc/options.texi (Option file format): Clarifications around
option definition records' help texts.
|
|
This patch implements the costing function determine_suggested_unroll_factor
for aarch64.
It determines the unrolling factor by dividing the number of X operations we
can do per cycle by the number of X operations, taking this information from
the vec_ops analysis during vector costing and the available issue_info
information.
We multiply the dividend by a potential reduction_latency, to improve our
pipeline utilization if we are stalled waiting on a particular reduction
operation.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_vector_costs): Define
determine_suggested_unroll_factor and m_has_avg.
(determine_suggested_unroll_factor): New function.
(aarch64_vector_costs::add_stmt_cost): Check for a qualifying pattern
to set m_nosve_pattern.
(aarch64_vector_costs::finish_costs): Use
determine_suggested_unroll_factor.
* config/aarch64/aarch64.opt (aarch64-vect-unroll-limit): New.
* doc/invoke.texi: (aarch64-vect-unroll-limit): Document new option.
|
|
Document predefined macros:
- __PTX_SM__ ,
- __PTX_ISA_VERSION_MAJOR__ and
- __PTX_ISA_VERSION_MINOR__ .
gcc/ChangeLog:
2022-03-29 Tom de Vries <tdevries@suse.de>
* doc/invoke.texi (march): Document __PTX_SM__.
(mptx): Document __PTX_ISA_VERSION_MAJOR__ and
__PTX_ISA_VERSION_MINOR__.
Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>
|
|
Update nvptx documentation:
- Use meaningful terms: "PTX ISA target architecture" and "PTX ISA version".
- Remove invalid claim that "ISA strings must be lower-case".
- Add missing sm_xx entries.
- Fix misa default.
- Add march, copying misa doc.
- Declare misa an march alias.
- Add march-map.
- Fix "for given the specified" typo.
gcc/ChangeLog:
2022-03-29 Tom de Vries <tdevries@suse.de>
* doc/invoke.texi (misa, mptx): Update.
(march, march-map): Add.
|
|
2022-03-29 Chenghua Xu <xuchenghua@loongson.cn>
Lulu Cheng <chenglulu@loongson.cn>
gcc/ChangeLog:
* doc/install.texi: Add LoongArch options section.
* doc/invoke.texi: Add LoongArch options section.
* doc/md.texi: Add LoongArch options section.
contrib/ChangeLog:
* config-list.mk: Add LoongArch triplet.
|
|
These have been misdocumented since C++98 POD was split into C++11 trivial
and standard-layout in r149721.
PR c++/59426
gcc/ChangeLog:
* doc/extend.texi: Refer to __is_trivial instead of __is_pod.
|
|
With TeX output ("make pdf"), @gccoptlist's content end up in a single
line such that TeX does not find the matching '@end ignore' for the
'@ignore' block – failing with a runaway error. Solution is to move
the @ignore block after the closing '}'.
(Follow up to r12-7808-g319ba7e241e7e21f9eb481f075310796f13d2035 )
gcc/
PR analyzer/103533
* doc/invoke.texi (Static Analyzer Options): Move
@ignore block after @gccoptlist's '}' for 'make pdf'.
|
|
PR analyzer/104954 tracks that -fanalyzer was taking a very long time
on a particular source file in the Linux kernel:
drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
One issue occurs with the repeated use of dynamic debug lines e.g. via
the DC_LOG_BANDWIDTH_CALCS macro, such as in print_bw_calcs_dceip in
drivers/gpu/drm/amd/display/dc/calcs/calcs_logger.h:
DC_LOG_BANDWIDTH_CALCS("#####################################################################");
DC_LOG_BANDWIDTH_CALCS("struct bw_calcs_dceip");
DC_LOG_BANDWIDTH_CALCS("#####################################################################");
[...snip dozens of lines...]
DC_LOG_BANDWIDTH_CALCS("[bw_fixed] dmif_request_buffer_size: %d",
bw_fixed_to_int(dceip->dmif_request_buffer_size));
When this is configured to use __dynamic_pr_debug, each of these becomes
code like:
do {
static struct _ddebug __attribute__((__aligned__(8)))
__attribute__((__section__("__dyndbg"))) __UNIQUE_ID_ddebug277 = {
[...snip...]
};
if (arch_static_branch(&__UNIQUE_ID_ddebug277.key, false))
__dynamic_pr_debug(&__UNIQUE_ID_ddebug277, [...the message...]);
} while (0);
The analyzer was naively seeing each call to __dynamic_pr_debug, noting
that the __UNIQUE_ID_nnnn object escapes. At each call, as successive
__UNIQUE_ID_nnnn object escapes, there are N escaped objects, and thus N
need clobbering, and so we have O(N^2) clobbering of escaped objects overall,
leading to huge amounts of pointless work: print_bw_calcs_data has 225
uses of DC_LOG_BANDWIDTH_CALCS, many of which are in loops.
This patch adds a way to identify declarations that aren't interesting
to the analyzer, so that we don't attempt to create binding_clusters
for them (i.e. we don't store any state for them in our state objects).
This is implemented by adding a new region::tracked_p, implemented for
declarations by walking the existing IPA data the first time the
analyzer sees a declaration, setting it to false for global vars that
have no loads/stores/aliases, and "sufficiently safe" address-of
ipa-refs.
The patch gives a large speedup of -fanalyzer on the above kernel
source file:
Before After
Total cc1 wallclock time: 180s 36s
analyzer wallclock time: 162s 17s
% spent in analyzer: 90% 47%
gcc/analyzer/ChangeLog:
PR analyzer/104954
* analyzer.opt (-fdump-analyzer-untracked): New option.
* engine.cc (impl_run_checkers): Handle it.
* region-model-asm.cc (region_model::on_asm_stmt): Don't attempt
to clobber regions with !tracked_p ().
* region-model-manager.cc (dump_untracked_region): New.
(region_model_manager::dump_untracked_regions): New.
(frame_region::dump_untracked_regions): New.
* region-model.h (region_model_manager::dump_untracked_regions):
New decl.
* region.cc (ipa_ref_requires_tracking): New.
(symnode_requires_tracking_p): New.
(decl_region::calc_tracked_p): New.
* region.h (region::tracked_p): New vfunc.
(frame_region::dump_untracked_regions): New decl.
(class decl_region): Note that this is also used fo SSA names.
(decl_region::decl_region): Initialize m_tracked.
(decl_region::tracked_p): New.
(decl_region::calc_tracked_p): New decl.
(decl_region::m_tracked): New.
* store.cc (store::get_or_create_cluster): Assert that we
don't try to create clusters for base regions that aren't
trackable.
(store::mark_as_escaped): Don't mark base regions that we're not
tracking.
gcc/ChangeLog:
PR analyzer/104954
* doc/invoke.texi (Static Analyzer Options): Add
-fdump-analyzer-untracked.
gcc/testsuite/ChangeLog:
PR analyzer/104954
* gcc.dg/analyzer/asm-x86-dyndbg-1.c: New test.
* gcc.dg/analyzer/asm-x86-dyndbg-2.c: New test.
* gcc.dg/analyzer/many-unused-locals.c: New test.
* gcc.dg/analyzer/untracked-1.c: New test.
* gcc.dg/analyzer/unused-local-1.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
gcc/ChangeLog:
PR analyzer/103533
* doc/invoke.texi: Document that enabling taint analyzer
checker disables some warnings from `-fanalyzer`.
Signed-off-by: Avinash Sonawane <rootkea@gmail.com>
|
|
gcc/ChangeLog:
* doc/invoke.texi: Document min-pagesize parameter.
|
|
march=sapphirerapids should be based on icelake server not cooperlake.
gcc/ChangeLog:
PR target/104963
* config/i386/i386.h (PTA_SAPPHIRERAPIDS): change it to base on ICX.
* doc/invoke.texi: Update documents for Intel sapphirerapids.
gcc/testsuite/ChangeLog:
PR target/104963
* gcc.target/i386/pr104963.c: New test case.
|
|
A well-formed call to std::move/forward is equivalent to a cast, but the
former being a function call means the compiler generates debug info,
which persists even after the call gets inlined, for an operation that's
never interesting to debug.
This patch addresses this problem by folding calls to std::move/forward
and other cast-like functions into simple casts as part of the frontend's
general expression folding routine. This behavior is controlled by a
new flag -ffold-simple-inlines, and otherwise by -fno-inline, so that
users can enable this folding with -O0 (which implies -fno-inline).
After this patch with -O2 and a non-checking compiler, debug info size
for some testcases from range-v3 and cmcstl2 decreases by as much as ~10%
and overall compile time and memory usage decreases by ~2%.
PR c++/96780
gcc/ChangeLog:
* doc/invoke.texi (C++ Dialect Options): Document
-ffold-simple-inlines.
gcc/c-family/ChangeLog:
* c.opt: Add -ffold-simple-inlines.
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_fold) <case CALL_EXPR>: Fold calls to
std::move/forward and other cast-like functions into simple
casts.
gcc/testsuite/ChangeLog:
* g++.dg/opt/pr96780.C: New test.
|
|
exception [PR102586]
As mentioned by Jason in the PR, non-trivially-copyable types (or non-POD
for purposes of layout?) types can be base classes of derived classes in
which the padding in those non-trivially-copyable types can be reused for
some real data members or even the layout can change and data members can
be moved to other positions.
__builtin_clear_padding is right now used for multiple purposes,
in <atomic> where it isn't used yet but was planned as the main spot
it can be used for trivially copyable types only, ditto for std::bit_cast
where we also use it. It is used for OpenMP long double atomics too but
long double is trivially copyable, and lastly for -ftrivial-auto-var-init=.
The following patch restricts the builtin to pointers to trivially-copyable
types, with the exception when it is called directly on an address of a
variable, in that case already the FE can verify it is the complete object
type and so it is safe to clear all the paddings in it.
2022-03-14 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/102586
gcc/
* doc/extend.texi (__builtin_clear_padding): Clearify that for C++
argument type should be pointer to trivially-copyable type unless it
is address of a variable or parameter.
gcc/cp/
* call.cc (build_cxx_call): Diagnose __builtin_clear_padding where
first argument's type is pointer to non-trivially-copyable type unless
it is address of a variable or parameter.
gcc/testsuite/
* g++.dg/cpp2a/builtin-clear-padding1.C: New test.
|
|
gcc/c-family/ChangeLog:
* c-target.def (check_string_object_format_arg): Fix description typo.
gcc/ChangeLog:
* doc/invoke.texi: Fix typos.
* doc/tm.texi.in: Remove duplicated word.
* doc/tm.texi: Regenerate.
libgomp/ChangeLog:
* libgomp.texi: Fix typo.
|
|
-fwrapv and C++20+ [PR104711]
As mentioned in the PR, different standards have different definition
on what is an UB left shift. They all agree on out of bounds (including
negative) shift count.
The rules used by ubsan are:
C99-C2x ((unsigned) x >> (uprecm1 - y)) != 0 then UB
C++11-C++17 x < 0 || ((unsigned) x >> (uprecm1 - y)) > 1 then UB
C++20 and later everything is well defined
Now, for C++20, I've in the P1236R1 implementation added an early
exit for -Wshift-overflow* warning so that it never warns, but apparently
-Wshift-negative-value remained as is. As it is well defined in C++20,
the following patch doesn't enable -Wshift-negative-value from -Wextra
anymore for C++20 and later, if users want for compatibility with C++17
and earlier get the warning, they still can by using -Wshift-negative-value
explicitly.
Another thing is -fwrapv, that is an extension to the standards, so it is up
to us how exactly we define that case. Our ubsan code treats
TYPE_OVERFLOW_WRAPS (type0) and cxx_dialect >= cxx20 the same as only
diagnosing out of bounds shift count and nothing else and IMHO it is most
sensical to treat -fwrapv signed left shifts the same as C++20 treats
them, https://eel.is/c++draft/expr.shift#2
"The value of E1 << E2 is the unique value congruent to E1×2^E2 modulo 2^N,
where N is the width of the type of the result.
[Note 1: E1 is left-shifted E2 bit positions; vacated bits are zero-filled.
— end note]"
with no UB dependent on the E1 values. The UB is only
"The behavior is undefined if the right operand is negative, or greater
than or equal to the width of the promoted left operand."
Under the hood (except for FEs and ubsan from FEs) GCC middle-end doesn't
consider UB in left shifts dependent on the first operand's value, only
the out of bounds shifts.
While this change isn't a regression, I'd think it is useful for GCC 12,
it doesn't add new warnings, but just removes warnings that aren't
appropriate.
2022-03-09 Jakub Jelinek <jakub@redhat.com>
PR c/104711
gcc/
* doc/invoke.texi (-Wextra): Document that -Wshift-negative-value
is enabled by it only for C++11 to C++17 rather than for C++03 or
later.
(-Wshift-negative-value): Similarly (except here we stated
that it is enabled for C++11 or later).
gcc/c-family/
* c-opts.cc (c_common_post_options): Don't enable
-Wshift-negative-value from -Wextra for C++20 or later.
* c-ubsan.cc (ubsan_instrument_shift): Adjust comments.
* c-warn.cc (maybe_warn_shift_overflow): Use TYPE_OVERFLOW_WRAPS
instead of TYPE_UNSIGNED.
gcc/c/
* c-fold.cc (c_fully_fold_internal): Don't emit
-Wshift-negative-value warning if TYPE_OVERFLOW_WRAPS.
* c-typeck.cc (build_binary_op): Likewise.
gcc/cp/
* constexpr.cc (cxx_eval_check_shift_p): Use TYPE_OVERFLOW_WRAPS
instead of TYPE_UNSIGNED.
* typeck.cc (cp_build_binary_op): Don't emit
-Wshift-negative-value warning if TYPE_OVERFLOW_WRAPS.
gcc/testsuite/
* c-c++-common/Wshift-negative-value-1.c: Remove
dg-additional-options, instead in target selectors of each diagnostic
check for exact C++ versions where it should be diagnosed.
* c-c++-common/Wshift-negative-value-2.c: Likewise.
* c-c++-common/Wshift-negative-value-3.c: Likewise.
* c-c++-common/Wshift-negative-value-4.c: Likewise.
* c-c++-common/Wshift-negative-value-7.c: New test.
* c-c++-common/Wshift-negative-value-8.c: New test.
* c-c++-common/Wshift-negative-value-9.c: New test.
* c-c++-common/Wshift-negative-value-10.c: New test.
* c-c++-common/Wshift-overflow-1.c: Remove
dg-additional-options, instead in target selectors of each diagnostic
check for exact C++ versions where it should be diagnosed.
* c-c++-common/Wshift-overflow-2.c: Likewise.
* c-c++-common/Wshift-overflow-5.c: Likewise.
* c-c++-common/Wshift-overflow-6.c: Likewise.
* c-c++-common/Wshift-overflow-7.c: Likewise.
* c-c++-common/Wshift-overflow-8.c: New test.
* c-c++-common/Wshift-overflow-9.c: New test.
* c-c++-common/Wshift-overflow-10.c: New test.
* c-c++-common/Wshift-overflow-11.c: New test.
* c-c++-common/Wshift-overflow-12.c: New test.
|
|
This adds a --param to allow disabling of vectorization of
floating point inductions. Ontop of -Ofast this should allow
549.fotonik3d_r to not miscompare.
2022-03-08 Richard Biener <rguenther@suse.de>
PR tree-optimization/84201
* params.opt (-param=vect-induction-float): Add.
* doc/invoke.texi (vect-induction-float): Document.
* tree-vect-loop.cc (vectorizable_induction): Honor
param_vect_induction_float.
* gcc.dg/vect/pr84201.c: New testcase.
|
|
As C++20 has already been published, we don't need to link to the draft
(which is now the C++23 draft anyway). And there's no need to say it's
part of the C++20 spec, or that there might be defect reports. That's
true for everything in C++20, so calling it out here just for Modules
isn't needed.
gcc/ChangeLog:
* doc/invoke.texi (C++ Modules): Remove anachronism.
|
|
At the same time, adding -Wtrivial-auto-var-init and update documentation.
-Wtrivial-auto-var-init and update documentation.
for the following testing case:
1 int g(int *);
2 int f1()
3 {
4 switch (0) {
5 int x;
6 default:
7 return g(&x);
8 }
9 }
compiling with -O -ftrivial-auto-var-init causes spurious warning:
warning: statement will never be executed [-Wswitch-unreachable]
5 | int x;
| ^
This is due to the compiler-generated initialization at the point of
the declaration.
We could avoid the warning to exclude the following cases:
when
flag_auto_var_init > AUTO_INIT_UNINITIALIZED
And
1) call to .DEFERRED_INIT
2) call to __builtin_clear_padding if the 2nd argument is present and non-zero
3) a gimple assign store right after the .DEFERRED_INIT call that has the LHS
as RHS
However, we still need to warn users about the incapability of the option
-ftrivial-auto-var-init by adding a new warning option -Wtrivial-auto-var-init
to report cases when it cannot initialize the auto variable. At the same
time, update documentation for -ftrivial-auto-var-init to connect it with
the new warning option -Wtrivial-auto-var-init, and add documentation
for -Wtrivial-auto-var-init.
gcc/ChangeLog:
PR middle-end/102276
* common.opt (-Wtrivial-auto-var-init): New option.
* doc/invoke.texi (-Wtrivial-auto-var-init): Document new option.
(-ftrivial-auto-var-init): Update option;
* gimplify.cc (emit_warn_switch_unreachable): New function.
(warn_switch_unreachable_r): Rename to ...
(warn_switch_unreachable_and_auto_init_r): This.
(maybe_warn_switch_unreachable): Rename to ...
(maybe_warn_switch_unreachable_and_auto_init): This.
(gimplify_switch_expr): Update calls to renamed function.
gcc/testsuite/ChangeLog:
PR middle-end/102276
* gcc.dg/auto-init-pr102276-1.c: New test.
* gcc.dg/auto-init-pr102276-2.c: New test.
* gcc.dg/auto-init-pr102276-3.c: New test.
* gcc.dg/auto-init-pr102276-4.c: New test.
|
|
PR gcov-profile/104677
gcc/ChangeLog:
* doc/invoke.texi: Document more .gcda file name generation.
|
|
The code generated by -mcmodel=medany is defined to be
position-independent, but is not guaranteed to function correctly when
linked into position-independent executables or libraries. See the
recent discussion at the psABI specification [1] for more details.
It would be better to reject these invalid sequences when linking, but
as pointed out in a recent LD bug [2] there may be some compatibility
issues related to the PCREL_HI20 relocations used to initialize GP.
Given the complexity here it's unlikely we'll be able to reject these
sequences any time soon, so instead just document that these may not
work.
[1]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/245
[2]: https://sourceware.org/bugzilla/show_bug.cgi?id=28789
gcc/ChangeLog:
* doc/invoke.texi (RISC-V -mcmodel=medany): Document the degree
of position independence that -mcmodel=medany affords.
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The following patch avoids infinite recursion during generic folding.
The (cmp (bswap @0) INTEGER_CST@1) simplification relies on
(bswap @1) actually being simplified, if it is not simplified, we just
move the bswap from one operand to the other and if @0 is also INTEGER_CST,
we apply the same rule next.
The reason why bswap @1 isn't folded to INTEGER_CST is that the INTEGER_CST
has TREE_OVERFLOW set on it and fold-const-call.cc predicate punts in
such cases:
static inline bool
integer_cst_p (tree t)
{
return TREE_CODE (t) == INTEGER_CST && !TREE_OVERFLOW (t);
}
The patch uses ! modifier to ensure the bswap is simplified and
extends support to GENERIC by means of requiring !EXPR_P which
is not perfect but a conservative approximation.
2022-02-22 Richard Biener <rguenther@suse.de>
PR tree-optimization/104644
* doc/match-and-simplify.texi: Amend ! documentation.
* genmatch.cc (expr::gen_transform): Code-generate ! support
for GENERIC.
(parser::parse_expr): Allow ! for GENERIC.
* match.pd (cmp (bswap @0) INTEGER_CST@1): Use ! modifier on
bswap.
* gcc.dg/pr104644.c: New test.
Co-Authored-by: Jakub Jelinek <jakub@redhat.com>
|
|
Since the ISA supported by Intel architectures in the documentation
are inconsistent with the actual, modify them all.
gcc/Changelog:
* doc/invoke.texi: Update documents for Intel architectures.
|
|
The problem in this PR is that we call VPSEL with a mask of vector
type instead of HImode. This happens because operand 3 in vcond_mask
is the pre-computed vector comparison and has vector type.
This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
returning the appropriate VxBI mode when targeting MVE. In turn, this
implies implementing vec_cmp<mode><MVE_vpred>,
vec_cmpu<mode><MVE_vpred> and vcond_mask_<mode><MVE_vpred>, and we can
move vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode> and
vcond_mask_<mode><v_cmp_result> back to neon.md since they are not
used by MVE anymore. The new *<MVE_vpred> patterns listed above are
implemented in mve.md since they are only valid for MVE. However this
may make maintenance/comparison more painful than having all of them
in vec-common.md.
In the process, we can get rid of the recently added vcond_mve
parameter of arm_expand_vector_compare.
Compared to neon.md's vcond_mask_<mode><v_cmp_result> before my "arm:
Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
iterator added in r12-835 (to have V4HF/V8HF support), as well as the
(!<Is_float_mode> || flag_unsafe_math_optimizations) condition which
was not present before r12-834 although SF modes were enabled by VDQW
(I think this was a bug).
Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
longer need to generate vpsel with vectors of 0 and 1: the masks are
now merged via scalar 'ands' instructions operating on 16-bit masks
after converting the boolean vectors.
In addition, this patch fixes a problem in arm_expand_vcond() where
the result would be a vector of 0 or 1 instead of operand 1 or 2.
Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new
arm_mve effective target.
Reducing the number of iterations in pr100757-3.c from 32 to 8, we
generate the code below:
float a[32];
float fn1(int d) {
float c = 4.0f;
for (int b = 0; b < 8; b++)
if (a[b] != 2.0f)
c = 5.0f;
return c;
}
fn1:
ldr r3, .L3+48
vldr.64 d4, .L3 // q2=(2.0,2.0,2.0,2.0)
vldr.64 d5, .L3+8
vldrw.32 q0, [r3] // q0=a(0..3)
adds r3, r3, #16
vcmp.f32 eq, q0, q2 // cmp a(0..3) == (2.0,2.0,2.0,2.0)
vldrw.32 q1, [r3] // q1=a(4..7)
vmrs r3, P0
vcmp.f32 eq, q1, q2 // cmp a(4..7) == (2.0,2.0,2.0,2.0)
vmrs r2, P0 @ movhi
ands r3, r3, r2 // r3=select(a(0..3]) & select(a(4..7))
vldr.64 d4, .L3+16 // q2=(5.0,5.0,5.0,5.0)
vldr.64 d5, .L3+24
vmsr P0, r3
vldr.64 d6, .L3+32 // q3=(4.0,4.0,4.0,4.0)
vldr.64 d7, .L3+40
vpsel q3, q3, q2 // q3=vcond_mask(4.0,5.0)
vmov.32 r2, q3[1] // keep the scalar max
vmov.32 r0, q3[3]
vmov.32 r3, q3[2]
vmov.f32 s11, s12
vmov s15, r2
vmov s14, r3
vmaxnm.f32 s15, s11, s15
vmaxnm.f32 s15, s15, s14
vmov s14, r0
vmaxnm.f32 s15, s15, s14
vmov r0, s15
bx lr
.L4:
.align 3
.L3:
.word 1073741824 // 2.0f
.word 1073741824
.word 1073741824
.word 1073741824
.word 1084227584 // 5.0f
.word 1084227584
.word 1084227584
.word 1084227584
.word 1082130432 // 4.0f
.word 1082130432
.word 1082130432
.word 1082130432
This patch adds tests that trigger an ICE without this fix.
The pr100757*.c testcases are derived from
gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
various types and return values different from 0 and 1 to avoid
commonalization with boolean masks. In addition, since we should not
need these masks, the tests make sure they are not present.
Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.
2022-02-22 Christophe Lyon <christophe.lyon@arm.com>
PR target/100757
gcc/
* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
(arm_expand_vector_compare): Update prototype.
* config/arm/arm.cc (TARGET_VECTORIZE_GET_MASK_MODE): New.
(arm_vector_mode_supported_p): Add support for VxBI modes.
(arm_expand_vector_compare): Remove useless generation of vpsel.
(arm_expand_vcond): Fix select operands.
(arm_get_mask_mode): New.
* config/arm/mve.md (vec_cmp<mode><MVE_vpred>): New.
(vec_cmpu<mode><MVE_vpred>): New.
(vcond_mask_<mode><MVE_vpred>): New.
* config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>)
(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): Move to ...
* config/arm/neon.md (vec_cmp<mode><v_cmp_result>)
(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): ... here
and disable for MVE.
* doc/sourcebuild.texi (arm_mve): Document new effective-target.
gcc/testsuite/
PR target/100757
* gcc.target/arm/simd/pr100757-2.c: New.
* gcc.target/arm/simd/pr100757-3.c: New.
* gcc.target/arm/simd/pr100757-4.c: New.
* gcc.target/arm/simd/pr100757.c: New.
* gcc.dg/signbit-2.c: Skip when targeting ARM/MVE.
* lib/target-supports.exp (check_effective_target_arm_mve): New.
|
|
Currently supported internally are 3.1, 6.0, 6.3 and 7.0.
However, -mptx= supports 3.1, 6.3, 7.0 – but not the internal default 6.0.
Add -mptx=6.0 for consistency.
Tested on nvptx.
gcc/ChangeLog:
* config/nvptx/nvptx.opt (mptx): Add 6.0 alias PTX_VERSION_6_0.
* doc/invoke.texi (-mptx): Update for new values and defaults.
Co-Authored-By: Tom de Vries <tdevries@suse.de>
|
|
Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].
To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.
For linux kernel, only the support of the compiler is required.
[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768
Signed-off-by: Dan Li <ashimida@linux.alibaba.com>
gcc/ChangeLog:
* config/aarch64/aarch64.cc (SLOT_REQUIRED):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_layout_frame): Likewise, and
change callee_adjust when scs is enabled.
(aarch64_save_callee_saves):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_restore_callee_saves):
Change wb_candidate[12] to wb_pop_candidate[12].
(aarch64_get_separate_components):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_expand_prologue): Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue): Pop x30 frome SCS, while
preventing it from being popped from the regular stack again.
(aarch64_override_options_internal): Add SCS compile option check.
(TARGET_HAVE_SHADOW_CALL_STACK): New hook.
* config/aarch64/aarch64.h (struct GTY): Add is_scs_enabled,
wb_pop_candidate[12], and rename wb_candidate[12] to
wb_push_candidate[12].
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop): Likewise.
* doc/invoke.texi: Document -fsanitize=shadow-call-stack.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add hook have_shadow_call_stack.
* flag-types.h (enum sanitize_code):
Add SANITIZE_SHADOW_CALL_STACK.
* opts.cc (parse_sanitizer_options): Add shadow-call-stack
and exclude SANITIZE_SHADOW_CALL_STACK.
* target.def: New hook.
* toplev.cc (process_options): Add SCS compile option check.
* ubsan.cc (ubsan_expand_null_ifn): Enum type conversion.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
* gcc.target/aarch64/shadow_call_stack_5.c: New test.
* gcc.target/aarch64/shadow_call_stack_6.c: New test.
* gcc.target/aarch64/shadow_call_stack_7.c: New test.
* gcc.target/aarch64/shadow_call_stack_8.c: New test.
|
|
Resolves:
PR middle-end/104355 - Misleading and outdated -Warray-bounds documentation
gcc/ChangeLog:
PR middle-end/104355
* doc/invoke.texi (-Warray-bounds): Update documentation.
|
|
gcc:
* doc/install.texi (Specific): Change the www.bitwizard.nl
reference to use https.
|
|
Add -m[no-]direct-extern-access and nodirect_extern_access attribute.
-mdirect-extern-access is the default. With nodirect_extern_access
attribute, GOT is always used to access undefined data and function
symbols with nodirect_extern_access attribute, including in PIE and
non-PIE. With -mno-direct-extern-access:
1. Always use GOT to access undefined data and function symbols,
including in PIE and non-PIE. These will avoid copy relocations
in executables. This is compatible with existing executables and
shared libraries.
2. In executable and shared library, bind symbols with the STV_PROTECTED
visibility locally:
a. The address of data symbol is the address of data body.
b. For systems without function descriptor, the function pointer is
the address of function body.
c. The resulting shared libraries may not be incompatible with
executables which have copy relocations on protected symbols or
use executable PLT entries as function addresses for protected
functions in shared libraries.
3. Update asm_preferred_eh_data_format to select PC relative EH encoding
format with -mno-direct-extern-access to avoid copy relocation.
4. Add ix86_reloc_rw_mask for TARGET_ASM_RELOC_RW_MASK to avoid copy
relocation with -mno-direct-extern-access.
gcc/
PR target/35513
PR target/100593
* config/i386/gnu-property.cc: Include "i386-protos.h".
(file_end_indicate_exec_stack_and_gnu_property): Generate
a GNU_PROPERTY_1_NEEDED note for -mno-direct-extern-access or
nodirect_extern_access attribute.
* config/i386/i386-options.cc
(handle_nodirect_extern_access_attribute): New function.
(ix86_attribute_table): Add nodirect_extern_access attribute.
* config/i386/i386-protos.h (ix86_force_load_from_GOT_p): Add a
bool argument.
(ix86_has_no_direct_extern_access): New.
* config/i386/i386.cc (ix86_has_no_direct_extern_access): New.
(ix86_force_load_from_GOT_p): Add a bool argument to indicate
call operand. Force non-call load from GOT for
-mno-direct-extern-access or nodirect_extern_access attribute.
(legitimate_pic_address_disp_p): Avoid copy relocation in PIE
for -mno-direct-extern-access or nodirect_extern_access attribute.
(ix86_print_operand): Pass true to ix86_force_load_from_GOT_p
for call operand.
(asm_preferred_eh_data_format): Use PC-relative format for
-mno-direct-extern-access to avoid copy relocation. Check
ptr_mode instead of TARGET_64BIT when selecting DW_EH_PE_sdata4.
(ix86_binds_local_p): Set ix86_has_no_direct_extern_access to
true for -mno-direct-extern-access or nodirect_extern_access
attribute. Don't treat protected data as extern and avoid copy
relocation on common symbol with -mno-direct-extern-access or
nodirect_extern_access attribute.
(ix86_reloc_rw_mask): New to avoid copy relocation for
-mno-direct-extern-access.
(TARGET_ASM_RELOC_RW_MASK): New.
* config/i386/i386.opt: Add -mdirect-extern-access.
* doc/extend.texi: Document nodirect_extern_access attribute.
* doc/invoke.texi: Document -m[no-]direct-extern-access.
gcc/testsuite/
PR target/35513
PR target/100593
* g++.target/i386/pr35513-1.C: New file.
* g++.target/i386/pr35513-2.C: Likewise.
* gcc.target/i386/pr35513-1a.c: Likewise.
* gcc.target/i386/pr35513-1b.c: Likewise.
* gcc.target/i386/pr35513-2a.c: Likewise.
* gcc.target/i386/pr35513-2b.c: Likewise.
* gcc.target/i386/pr35513-3a.c: Likewise.
* gcc.target/i386/pr35513-3b.c: Likewise.
* gcc.target/i386/pr35513-4a.c: Likewise.
* gcc.target/i386/pr35513-4b.c: Likewise.
* gcc.target/i386/pr35513-5a.c: Likewise.
* gcc.target/i386/pr35513-5b.c: Likewise.
* gcc.target/i386/pr35513-6a.c: Likewise.
* gcc.target/i386/pr35513-6b.c: Likewise.
* gcc.target/i386/pr35513-7a.c: Likewise.
* gcc.target/i386/pr35513-7b.c: Likewise.
* gcc.target/i386/pr35513-8.c: Likewise.
* gcc.target/i386/pr35513-9a.c: Likewise.
* gcc.target/i386/pr35513-9b.c: Likewise.
* gcc.target/i386/pr35513-10a.c: Likewise.
* gcc.target/i386/pr35513-10b.c: Likewise.
* gcc.target/i386/pr35513-11a.c: Likewise.
* gcc.target/i386/pr35513-11b.c: Likewise.
* gcc.target/i386/pr35513-12a.c: Likewise.
* gcc.target/i386/pr35513-12b.c: Likewise.
|
|
We have recently updated the default for the `-misa-spec=' option, yet
we still have not documented it nor its `--with-isa-spec=' counterpart
in the GCC manuals. Fix that.
gcc/
* doc/install.texi (Configuration): Document `--with-isa-spec='
RISC-V option.
* doc/invoke.texi (Option Summary): List `-misa-spec=' RISC-V
option.
(RISC-V Options): Document it.
|
|
Due to a pasto error in the documentation, vec_replace_unaligned was
implemented with the same function prototypes as vec_replace_elt. It was
intended that vec_replace_unaligned always specify output vectors as having
type vector unsigned char, to emphasize that elements are potentially
misaligned by this built-in function. This patch corrects the
misimplementation.
2022-02-04 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
PR target/100808
* doc/extend.texi (Basic PowerPC Built-in Functions Available on ISA
3.1): Provide consistent type names. Remove unnecessary semicolons.
Fix bad line breaks.
|
|
gcc/ChangeLog:
* doc/cpp.texi (Variadic Macros): Replace C++2a with C++20.
|
|
gcc/ChangeLog:
PR analyzer/104270
* doc/invoke.texi (-ftrivial-auto-var-init=): Add reference to
-Wanalyzer-use-of-uninitialized-value to paragraph documenting that
-ftrivial-auto-var-init= doesn't suppress warnings.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
This patch boosts the analysis for complex mul,fma and fms in order to ensure
that it doesn't create an incorrect output.
Essentially it adds an extra verification to check that the two nodes it's going
to combine do the same operations on compatible values. The reason it needs to
do this is that if one computation differs from the other then with the current
implementation we have no way to deal with it since we have to remove the
permute.
When we can keep the permute around we can probably handle these by unrolling.
While implementing this since I have to do the traversal anyway I took advantage
of it by simplifying the code a bit. Previously we would determine whether
something is a conjugate and then try to figure out which conjugate it is and
then try to see if the permutes match what we expect.
Now the code that does the traversal will detect this in one go and return to us
whether the operation is something that can be combined and whether a conjugate
is present.
Secondly because it does this I can now simplify the checking code itself to
essentially just try to apply fixed patterns to each operation.
The patterns represent the order operations should appear in. For instance a
complex MUL operation combines :
Left 1 + Right 1
Left 2 + Right 2
with a permute on the nodes consisting of:
{ Even, Even } + { Odd, Odd }
{ Even, Odd } + { Odd, Even }
By abstracting over these patterns the checking code becomes quite simple.
As part of this I was checking the order of the operands which was left in
"slp" order. as in, the same order they showed up in during SLP, which means
that the accumulator is first. However it looks like I didn't document this
and the x86 optab was implemented assuming the same order as FMA, i.e. that
the accumulator is last.
I have this changed the order to match that of FMA and FMS which corrects the
x86 codegen and will update the Arm targets. This has now also been
documented.
gcc/ChangeLog:
PR tree-optimization/102819
PR tree-optimization/103169
* doc/md.texi: Update docs for cfms, cfma.
* tree-data-ref.h (same_data_refs): Accept optional offset.
* tree-vect-slp-patterns.cc (is_linear_load_p): Fix issue with repeating
patterns.
(vect_normalize_conj_loc): Remove.
(is_eq_or_top): Change to take two nodes.
(enum _conj_status, compatible_complex_nodes_p,
vect_validate_multiplication): New.
(class complex_add_pattern, complex_add_pattern::matches,
complex_add_pattern::recognize, class complex_mul_pattern,
complex_mul_pattern::recognize, class complex_fms_pattern,
complex_fms_pattern::recognize, class complex_operations_pattern,
complex_operations_pattern::recognize, addsub_pattern::recognize): Pass
new cache.
(complex_fms_pattern::matches, complex_mul_pattern::matches): Pass new
cache and use new validation code.
* tree-vect-slp.cc (vect_match_slp_patterns_2, vect_match_slp_patterns,
vect_analyze_slp): Pass along cache.
(compatible_calls_p): Expose.
* tree-vectorizer.h (compatible_calls_p, slp_node_hash,
slp_compat_nodes_map_t): New.
(class vect_pattern): Update signatures include new cache.
gcc/testsuite/ChangeLog:
PR tree-optimization/102819
PR tree-optimization/103169
* g++.dg/vect/pr99149.cc: xfail for now.
* gcc.dg/vect/complex/pr102819-1.c: New test.
* gcc.dg/vect/complex/pr102819-2.c: New test.
* gcc.dg/vect/complex/pr102819-3.c: New test.
* gcc.dg/vect/complex/pr102819-4.c: New test.
* gcc.dg/vect/complex/pr102819-5.c: New test.
* gcc.dg/vect/complex/pr102819-6.c: New test.
* gcc.dg/vect/complex/pr102819-7.c: New test.
* gcc.dg/vect/complex/pr102819-8.c: New test.
* gcc.dg/vect/complex/pr102819-9.c: New test.
* gcc.dg/vect/complex/pr103169.c: New test.
|
|
This flips the default for the errata handling for an old version
(TL;DR: workaround: no multiply instruction last on a cache-line).
Newer versions of the CRIS cpu don't have that bug. While the impact
of the workaround is very marginal (coremark: less than .05% larger,
less than .0005% slower) it's an irritating pseudorandom factor when
assessing the impact of other changes.
Also, fix a wart requiring changes to more than TARGET_DEFAULT to flip
the default.
People building old kernels or operating systems to run on
ETRAX 100 LX are advised to pass "-mmul-bug-workaround".
gcc:
* config/cris/cris.h (TARGET_DEFAULT): Don't include MASK_MUL_BUG.
(MUL_BUG_ASM_DEFAULT): New macro.
(MAYBE_AS_NO_MUL_BUG_ABORT): Define in terms of MUL_BUG_ASM_DEFAULT.
* doc/invoke.texi (CRIS Options, -mmul-bug-workaround): Adjust
accordingly.
|
|
As reported at
https://gcc.gnu.org/pipermail/gcc/2022-February/238216.html,
multiprecision.org now uses https so this updates the documentation
to use https instead of http.
Committed as obvious.
gcc/ChangeLog:
* doc/install.texi:
|
|
As the minimal GCC version that can build the current master
is 4.8, it does not make sense mentioning something for older
versions.
gcc/ChangeLog:
* doc/install.texi: Remove option for GCC < 4.8.
|