Age | Commit message (Collapse) | Author | Files | Lines |
|
* sv.po: Update.
|
|
libbacktrace/ChangeLog:
* atomic.c (backtrace_atomic_store_int): Use int for old value.
|
|
Value-initialization built an AGGR_INIT_EXPR to set AGGR_INIT_ZERO_FIRST on.
Passing that AGGR_INIT_EXPR to maybe_constant_value returned a TARGET_EXPR,
which potential_constant_expression_1 mistook for a temporary.
We shouldn't add a TARGET_EXPR to the AGGR_INIT_EXPR in this case, just like
we already avoid adding it to CONSTRUCTOR or CALL_EXPR.
PR c++/119652
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_outermost_constant_expr): Also don't add a
TARGET_EXPR around AGGR_INIT_EXPR.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/constinit20.C: New test.
|
|
After discussion with the open source support team at Apple, we have
established that the cores conform to the 8.5 and 8.6 requirements.
One of the mandatory features (FEAT_SPECRES) is not exposed (or
available) in user-space code but is supported for privileged code.
The values for chip IDs and the LITTLE.big variants have been taken
from lists in the XNU and LLVM sources.
PR target/113257
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Apple-a12,
Apple-M1, Apple-M2, Apple-M3 with expanded names to allow for the
LITTLE.big versions.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Add apple-m1,2 and 3 cores to the ones listed
for arch and tune selections.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
libstdc++-v3/ChangeLog:
PR libstdc++/119642
* include/bits/formatfwd.h: Remove stray pragma.
|
|
This adds the new C23 headers to the PCH, and also removes the
__has_include check for <stacktrace> because we provide that
unconditionally now.
libstdc++-v3/ChangeLog:
* include/precompiled/stdc++.h: Include <stdbit.h> and
<stdckdint.h>. Include <stacktrace> unconditionally.
|
|
libstdc++-v3/ChangeLog:
* doc/doxygen/user.cfg.in (INPUT): Add flat_map, flat_set,
text_encoding, stdbit.h and stdckdint.h.
|
|
Darwin/macOS installed libiconv does not accept // trailers on
conversion codes; this causes the init_iconv to fail - and then
that SEGVs later.
Remove the trailing // as it is not needed elsewhere.
Also print a warning if we fail to init the conversion.
gcc/cobol/ChangeLog:
* symbols.cc : Remove trailing // on standard_internal.
(cbl_field_t::internalize): Print a warning if we fail to
initialise iconv.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
For both GCN, nvptx, this gets rid of 'configure'-time:
configure: WARNING: No native atomic operations are provided for this platform.
configure: WARNING: They will be faked using a mutex.
configure: WARNING: Performance of certain classes will degrade as a result.
..., and changes:
-checking for lock policy for shared_ptr reference counts... mutex
+checking for lock policy for shared_ptr reference counts... atomic
That means, '[...]/[target]/libstdc++-v3/', 'Makefile's change:
-ATOMICITY_SRCDIR = config/cpu/generic/atomicity_mutex
+ATOMICITY_SRCDIR = config/cpu/generic/atomicity_builtins
..., and '[...]/[target]/libstdc++-v3/config.h' changes:
/* Defined if shared_ptr reference counting should use atomic operations. */
-/* #undef HAVE_ATOMIC_LOCK_POLICY */
+#define HAVE_ATOMIC_LOCK_POLICY 1
/* Define if the compiler supports C++11 atomics. */
-/* #undef _GLIBCXX_ATOMIC_BUILTINS */
+#define _GLIBCXX_ATOMIC_BUILTINS 1
..., and '[...]/[target]/libstdc++-v3/include/[target]/bits/c++config.h'
changes:
/* Defined if shared_ptr reference counting should use atomic operations. */
-/* #undef _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY */
+#define _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY 1
/* Define if the compiler supports C++11 atomics. */
-/* #undef _GLIBCXX_ATOMIC_BUILTINS */
+#define _GLIBCXX_ATOMIC_BUILTINS 1
This means that '[...]/[target]/libstdc++-v3/libsupc++/atomicity.cc',
'[...]/[target]/libstdc++-v3/libsupc++/atomicity.o' then uses atomic
instructions for synchronization instead of C++ static local variables, which
in turn for their guard variables, via 'libstdc++-v3/libsupc++/guard.cc', used
'libgcc/gthr.h' recursive mutexes, which currently are unsupported for GCN.
For GCN, this turns ~500 libstdc++ execution test FAILs into PASSes, and also
progresses:
PASS: g++.dg/tree-ssa/pr20458.C -std=gnu++17 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++17 execution test
PASS: g++.dg/tree-ssa/pr20458.C -std=gnu++26 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++26 execution test
UNSUPPORTED: g++.dg/tree-ssa/pr20458.C -std=gnu++98: exception handling not supported
(For nvptx, there is no effective change, due to other misconfiguration.)
PR target/119645
libstdc++-v3/
* acinclude.m4 (GLIBCXX_ENABLE_LOCK_POLICY) [GCN, nvptx]:
Hard-code results.
* configure: Regenerate.
* configure.host [GCN, nvptx] (atomicity_dir): Set to
'cpu/generic/atomicity_builtins'.
|
|
Follow-up to commit 1146410c0feb0e82c689b1333fdf530a2b34dc2b
"nvptx: Support '-mfake-ptx-alloca'". '-mfake-ptx-alloca' is applicable only
for configurations where PTX 'alloca' is not supported, where target libraries
are built with it enabled (that is, libstdc++, libgfortran).
This change progresses:
[-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++26 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++26 [-compilation failed to produce executable-]{+execution test+}
UNSUPPORTED: g++.dg/tree-ssa/pr20458.C -std=gnu++98: exception handling not supported
..., and "enables" a few test cases:
FAIL: g++.old-deja/g++.other/sibcall1.C -std=gnu++17 (test for excess errors)
[Etc.]
FAIL: g++.old-deja/g++.other/unchanging1.C -std=gnu++17 (test for excess errors)
[Etc.]
..., which now (unrelatedly to 'alloca', and in the same way as configurations
where PTX 'alloca' is supported) FAIL due to:
unresolved symbol _Unwind_DeleteException
collect2: error: ld returned 1 exit status
Most importantly, it progresses ~830 libstdc++ test cases:
[-FAIL:-]{+PASS:+} [...] (test for excess errors)
..., with (if applicable, for most of them):
[-UNRESOLVED:-]{+PASS:+} [...] [-compilation failed to produce executable-]{+execution test+}
..., or just a few 'FAIL: [...] execution test' where these test cases also
FAIL in configurations where PTX 'alloca' is supported, or ~120 instances of
'FAIL: [...] execution test' due to run-time
'GCC/nvptx: sorry, unimplemented: dynamic stack allocation not supported'.
This change also resolves the cases noted in
commit bac2d8a246892334e24dfa7d62be0cd0648c5606
"nvptx: Build libgfortran with '-mfake-ptx-alloca' [PR107635]":
| With '-mfake-ptx-alloca', libgfortran again succeeds to build, and compared
| to before, we've got only a small number of regressions due to nvptx 'ld'
| complaining about 'unresolved symbol __GCC_nvptx__PTX_alloca_not_supported':
|
| [-PASS:-]{+FAIL:+} gfortran.dg/coarray/codimension_2.f90 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
[-FAIL:-]{+PASS:+} gfortran.dg/coarray/codimension_2.f90 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
| [-PASS:-]{+FAIL:+} gfortran.dg/coarray/event_4.f08 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
| [-PASS:-]{+UNRESOLVED:+} gfortran.dg/coarray/event_4.f08 -fcoarray=lib -O2 -lcaf_single [-execution test-]{+compilation failed to produce executable+}
[-FAIL:-]{+PASS:+} gfortran.dg/coarray/event_4.f08 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} gfortran.dg/coarray/event_4.f08 -fcoarray=lib -O2 -lcaf_single [-compilation failed to produce executable-]{+execution test+}
| [-PASS:-]{+FAIL:+} gfortran.dg/coarray/fail_image_2.f08 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
| [-PASS:-]{+UNRESOLVED:+} gfortran.dg/coarray/fail_image_2.f08 -fcoarray=lib -O2 -lcaf_single [-execution test-]{+compilation failed to produce executable+}
[-FAIL:-]{+PASS:+} gfortran.dg/coarray/fail_image_2.f08 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} gfortran.dg/coarray/fail_image_2.f08 -fcoarray=lib -O2 -lcaf_single [-compilation failed to produce executable-]{+execution test+}
| [-PASS:-]{+FAIL:+} gfortran.dg/coarray/proc_pointer_assign_1.f90 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
| [-PASS:-]{+UNRESOLVED:+} gfortran.dg/coarray/proc_pointer_assign_1.f90 -fcoarray=lib -O2 -lcaf_single [-execution test-]{+compilation failed to produce executable+}
[-FAIL:-]{+PASS:+} gfortran.dg/coarray/proc_pointer_assign_1.f90 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} gfortran.dg/coarray/proc_pointer_assign_1.f90 -fcoarray=lib -O2 -lcaf_single [-compilation failed to produce executable-]{+execution test+}
| [-PASS:-]{+FAIL:+} gfortran.dg/coarray_43.f90 -O (test for excess errors)
[-FAIL:-]{+PASS:+} gfortran.dg/coarray_43.f90 -O (test for excess errors)
..., and further progresses:
[-FAIL:-]{+PASS:+} gfortran.dg/coarray_lib_comm_1.f90 -O0 (test for excess errors)
[-UNRESOLVED:-]{+FAIL:+} gfortran.dg/coarray_lib_comm_1.f90 -O0 [-compilation failed to produce executable-]{+execution test+}
[Etc.]
..., which now (unrelatedly to 'alloca', and in the same way as configurations
where PTX 'alloca' is supported) FAILs due to:
error : Prototype doesn't match for '_gfortran_caf_transfer_between_remotes' in 'input file 9 at offset 159897', first defined in 'input file 9 at offset 159897'
error : Prototype doesn't match for '_gfortran_caf_stop_numeric' in 'input file 9 at offset 159897', first defined in 'input file 9 at offset 159897'
nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)
gcc/
* config/nvptx/nvptx.opt (-mfake-ptx-alloca): Update.
gcc/testsuite/
* gcc.target/nvptx/alloca-2-O0_-mfake-ptx-alloca.c: Adjust.
libgcc/
* config/nvptx/alloca.c: New.
* config/nvptx/t-nvptx (LIB2ADD): Add it.
|
|
Apparently Darwin sed doesn't like 's/\(foo\|bar\|baz\)/qux/' syntax,
simplified by using a pattern which matches all libgcobol header names
except possible config.h.
2025-04-07 Jakub Jelinek <jakub@redhat.com>
* Make-lang.in (cobol/charmaps.cc, cobol/valconv.cc): Use a BRE
only sed regex.
|
|
As mentioned in the PR, the COBOL documentation is currently not present
in onlinedocs at all.
While the script generates gcobol{,-io}.{pdf,html}, it generates them in
the gcc/gcc/cobol/ subdirectory of the update_web_docs_git temporary
directory and nothing find it there afterwards, all the processing is on
for file in */*.html *.ps *.pdf *.tar; do
So, this patch puts gcobol{,-io}.html into gcobol/ subdirectory and
gcobol{,-io}.pdf into the current directory, so that it is picked up.
With this it makes into onlinedocs:
find . -name \*cobol\*
./onlinedocs/gcobol.pdf.gz
./onlinedocs/gcobol.pdf
./onlinedocs/gcobol_io.pdf.gz
./onlinedocs/gcobol_io.pdf
./onlinedocs/gcobol
./onlinedocs/gcobol/gcobol_io.html.gz
./onlinedocs/gcobol/gcobol_io.html
./onlinedocs/gcobol/gcobol.html.gz
./onlinedocs/gcobol/gcobol.html
./onlinedocs/gnat_rm/gnat_005frm_002finterfacing_005fto_005fother_005flanguages-interfacing-to-cobol.html.gz
./onlinedocs/gnat_rm/gnat_005frm_002finterfacing_005fto_005fother_005flanguages-interfacing-to-cobol.html
./onlinedocs/gnat_rm/gnat_005frm_002fimplementation_005fadvice-rm-f-7-cobol-support.html.gz
./onlinedocs/gnat_rm/gnat_005frm_002fimplementation_005fadvice-rm-f-7-cobol-support.html
./onlinedocs/gnat_rm/gnat_005frm_002fimplementation_005fadvice-rm-b-4-95-98-interfacing-with-cobol.html.gz
./onlinedocs/gnat_rm/gnat_005frm_002fimplementation_005fadvice-rm-b-4-95-98-interfacing-with-cobol.html
2025-04-07 Jakub Jelinek <jakub@redhat.com>
PR web/119227
* update_web_docs_git: Rename mdoc2pdf_html to cobol_mdoc2pdf_html,
perform mkdir -p $DOCSDIR/gcobol gcobol, remove $d/ from pdf and in
html replace it with gcobol/; update uses of the renamed function.
|
|
What make html does for COBOL is quite inconsistent with all
other FEs. Normally make html creates HTML/gcc-15.0.1/
subdirectory and puts there subdirectories like gcc, cpp, gccint, gfortran
etc. and only those contain *.html files. COBOL puts gcobol.html and
gcobol-io.html into the current directory instead.
The following patch puts them into $(build_htmldir)/gcobol/ directory.
2025-04-07 Jakub Jelinek <jakub@redhat.com>
PR web/119227
* Make-lang.in (GCOBOL_HTML_FILES): New variable.
(cobol.install-html, cobol.html, cobol.srchtml): Use
$(GCOBOL_HTML_FILES) instead of gcobol.html gcobol-io.html.
(gcobol.html): Rename goal to ...
($(build_htmldir)/gcobol/gcobol.html): ... this. Run mkinstalldirs.
(gcobol-io.html): Rename goal to ...
($(build_htmldir)/gcobol/gcobol-io.html): ... this. Run mkinstalldirs.
|
|
(PR118924)
During analysis of PR 118924 it was discussed that total scalarization
invents access paths (strings of COMPONENT_REFs and possibly even
ARRAY_REFs) which did not exist in the program before which can have
unintended effects on subsequent AA queries. Although not doing that
does not mean that SRA cannot create such situations (see the bug for
more info), it has been agreed that not doing this is generally better.
This patch therfore makes SRA fall back on creating simple MEM_REFs when
accessing components of an aggregate corresponding to what a SRA
variable now represents.
gcc/ChangeLog:
2025-03-26 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/118924
* tree-sra.cc (create_total_scalarization_access): Set
grp_same_access_path flag to zero.
|
|
The testcase in PR 118924, when compiled on Aarch64, contains an
gimple aggregate assignment statement in between different types which
are types_compatible_p but behave differently for the purposes of
alias analysis.
SRA replaces the statement with a series of scalar assignments which
however have LHSs access chains modeled on the RHS type and so do not
alias with a subsequent reads and so are DSEd.
SRA clearly gets its "same_access_path" logic subtly wrong. One issue
is that the same_access_path_p function probably should be implemented
more along the lines of (parts of ao_compare::compare_ao_refs) instead
of internally relying on operand_equal_p. That is however not the
problem in the PR and so I will deal with it only later.
The issue here is that even when the access path is the same, it must
not be bolted on an aggregate type that does not match. This patch
does that, taking just one simple function from the
ao_compare::compare_ao_refs machinery and using it to detect the
situation. The rest is just merging the information in between
accesses of the same access group.
I looked at how many times we come across such assignment during
"make stage2-bubble" of GCC (configured with only c and C++ and
without multilib and libsanitizers) and on an x86_64 there were 87924
such assignments (though now I realize not all of them had to be
aggregate), so they do happen. The patch leads to about 5% increase
of cases where we don't use an "access path" but resort to a
MEM_REF (from 90209 to 95204). On an Aarch64, there were 92268 such
assignments and the increase of falling back to MEM_REFs was by
4% (but from a bigger base 132983 to 107991).
gcc/ChangeLog:
2025-04-04 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/118924
* tree-ssa-alias-compare.h (types_equal_for_same_type_for_tbaa_p):
Declare.
* tree-ssa-alias.cc: Include ipa-utils.h.
(types_equal_for_same_type_for_tbaa_p): New public overloaded variant.
* tree-sra.cc: Include tree-ssa-alias-compare.h.
(create_access): Initialzie grp_same_access_path to true.
(build_accesses_from_assign): Detect tbaa hazards and clear
grp_same_access_path fields of involved accesses when they occur.
(sort_and_splice_var_accesses): Take previous values of
grp_same_access_path into account.
gcc/testsuite/ChangeLog:
2025-03-25 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/118924
* g++.dg/tree-ssa/pr118924.C: New test.
|
|
When the whole shift is invariant but the shift amount needs
to be converted and a vector shift used we can mess up placement
of vector stmts because we do not make SLP scheduling aware of
the need to insert code for it. The following mitigates this
by more conservative placement of such code in vectorizable_shift.
PR tree-optimization/119640
* tree-vect-stmts.cc (vectorizable_shift): Always insert code
for one of our SLP operands before the code for the vector
shift itself.
* gcc.dg/vect/pr119640.c: New testcase.
|
|
When MUL is not available, then the __umulhisi3 and __mulhisi3
functions can use __mulhisi3_helper. This improves code size,
stack footprint and runtime on AVRrc.
libgcc/
* config/avr/lib1funcs.S (__mulhisi3, __umulhisi3): Use
__mulhisi3_helper for better performance on AVRrc.
|
|
The previous version of this test required arch v6+ (for sxth), and
the number of vmov depended on the float-point ABI (where softfp
needed more of them to transfer floating-point values to and from
general registers).
With this patch we require arch v7-a, vfp FPU and -mfloat-abi=hard, we
also use -O2 to clean the generated code and convert
scan-assembler-times directives into check-function-bodies.
Tested on arm-none-linux-gnueabihf and several flavours of
arm-none-eabi.
gcc/testsuite/ChangeLog:
PR target/119556
* gcc.target/arm/short-vfp-1.c: Improve dg directives.
|
|
The IPA-VRP workaround in the tailc/musttail passes was just comparing
the singleton constant from a tail call candidate return with the ret_val.
This unfortunately doesn't work in the following testcase, where we have
<bb 5> [local count: 152205050]:
baz (); [must tail call]
goto <bb 4>; [100.00%]
<bb 6> [local count: 762356696]:
_8 = foo ();
<bb 7> [local count: 1073741824]:
# _3 = PHI <0B(4), _8(6)>
return _3;
and in the unreduced testcase even more PHIs before we reach the return
stmt.
Normally when the call has lhs, whenever we follow a (non-EH) successor
edge, it calls propagate_through_phis and that walks the PHIs in the
destination bb of the edge and when it sees a PHI whose argument matches
that of the currently tracked value (ass_var), it updates ass_var to
PHI result of that PHI. I think it is theoretically dangerous that it
picks the first one, perhaps there could be multiple PHIs, so perhaps safer
would be walk backwards from the return value up to the call.
Anyway, this PR is about the IPA-VRP workaround, there ass_var is NULL
because the potential tail call has no lhs, but ret_var is not TREE_CONSTANT
but SSA_NAME with PHI as SSA_NAME_DEF_STMT. The following patch handles
it by pushing the edges we've walked through when ass_var is NULL into a
vector and if ret_var is SSA_NAME set to PHI result, it attempts to walk
back from the ret_var through arguments of PHIs corresponding to the
edges we've walked back until we reach a constant and compare that constant
against the singleton value as well.
2025-04-07 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/119614
* tree-tailcall.cc (find_tail_calls): Remember edges which have been
walked through if !ass_var. Perform IPA-VRP workaround even when
ret_var is not TREE_CONSTANT, in that case check in a loop if it is
a PHI result and in that case look at the PHI argument from
corresponding edge in the edge vector.
* g++.dg/opt/pr119613.C: Change { c || c++11 } in obviously C++ only
test to just c++11.
* g++.dg/opt/pr119614.C: New test.
|
|
libgfortran and libquadmath [PR119408].
In GCC14, LoongArch added __float128 as an alias for _Float128.
In commit r15-8962, support for q/Q suffixes for 128-bit floating point
numbers. This will cause the compiler to automatically link libquadmath
when compiling Fortran programs. But on LoongArch `long double` is
IEEE quad, so there is no need to implement libquadmath.
This causes link failure.
PR target/119408
libgfortran/ChangeLog:
* acinclude.m4: When checking for __float128 support, determine
whether the current architecture is LoongArch. If so, return false.
* configure: Regenerate.
libquadmath/ChangeLog:
* configure.ac: When checking for __float128 support, determine
whether the current architecture is LoongArch. If so, return false.
* configure: Regenerate.
Sigend-off-by: Xi Ruoyao <xry111@xry111.site>
Sigend-off-by: Jakub Jelinek <jakub@redhat.com>
|
|
For technical reasons, the recently reimplemented finalization machinery
for controlled types requires arrays of controlled types to be allocated
with their bounds, including in the case where their nominal subtype is
constrained. However, in this case, the type of 'Access for the arrays
is pointer-to-constrained-array and, therefore, its value must designate
the array itself and not the bounds.
gcc/ada/
* gcc-interface/utils.cc (convert) <POINTER_TYPE>: Use fold_convert
to convert between thin pointers. If the source is a thin pointer
with zero offset from the base and the target is a pointer to its
array, displace the pointer after converting it.
* gcc-interface/utils2.cc (build_unary_op) <ATTR_ADDR_EXPR>: Use
fold_convert to convert the address before displacing it.
|
|
libgomp/ChangeLog:
* libgomp.texi (omp_target_memcpy_rect_async,
omp_target_memcpy_rect): Add @ref to 'Offload-Target Specifics'.
(AMD Radeon (GCN)): Document how memcpy_rect is implemented.
(nvptx): Move item about memcpy_rect item down; use present tense.
|
|
As noted in the previous patch, combine still takes >30% of
compile time in the original testcase for PR101523. The problem
is that try_combine uses linear insn searches for some dataflow
queries, so in the worst case, an unlimited number of 2->2
combinations for the same i2 can lead to quadratic behaviour.
This patch limits distribute_links to a certain number
of instructions when i2 is unchanged. As Segher said in the PR trail,
it would make more conceptual sense to apply the limit unconditionally,
but I thought it would be better to change as little as possible at
this development stage. Logically, in stage 1, the --param should
be applied directly by distribute_links with no input from callers.
As I mentioned in:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c28
I think it's safe to drop log links even if a use exists. All
processing of log links seems to handle the absence of a link
for a particular register in a conservative way.
The initial set-up errs on the side of dropping links, since for example
create_log_links has:
/* flow.c claimed:
We don't build a LOG_LINK for hard registers contained
in ASM_OPERANDs. If these registers get replaced,
we might wind up changing the semantics of the insn,
even if reload can make what appear to be valid
assignments later. */
if (regno < FIRST_PSEUDO_REGISTER
&& asm_noperands (PATTERN (use_insn)) >= 0)
continue;
which excludes combinations by dropping log links, rather than during
try_combine. And:
/* If this register is being initialized using itself, and the
register is uninitialized in this basic block, and there are
no LOG_LINKS which set the register, then part of the
register is uninitialized. In that case we can't assume
anything about the number of nonzero bits.
??? We could do better if we checked this in
reg_{nonzero_bits,num_sign_bit_copies}_for_combine. Then we
could avoid making assumptions about the insn which initially
sets the register, while still using the information in other
insns. We would have to be careful to check every insn
involved in the combination. */
if (insn
&& reg_referenced_p (x, PATTERN (insn))
&& !REGNO_REG_SET_P (DF_LR_IN (BLOCK_FOR_INSN (insn)),
REGNO (x)))
{
struct insn_link *link;
FOR_EACH_LOG_LINK (link, insn)
if (dead_or_set_p (link->insn, x))
break;
if (!link)
{
rsp->nonzero_bits = GET_MODE_MASK (mode);
rsp->sign_bit_copies = 1;
return;
}
}
treats the lack of a log link as a possible sign of uninitialised data,
but that would be a missed optimisation rather than a correctness issue.
One question is what the default --param value should be. I went with
Jakub's suggestion of 3000 from:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c25
Also, to answer Jakub's question in that comment, I tried bisecting:
int limit = atoi (getenv ("BISECT"));
(so applying the limit for all calls from try_combine) with an
abort in distribute_links if the limit caused a link to be skipped.
The minimum BISECT value that allowed an aarch64-linux-gnu bootstrap
to succeed with --enable-languages=all --enable-checking=yes,rtl,extra
was 142, so much lower than the parameter value. I realised too late
that --enable-checking=release would probably have been a more
interesting test.
The previous patch meant that distribute_links itself is now linear
for a given i2 definition, since each search starts at the previous
last use, rather than at i2 itself. This means that the limit has
to be applied cumulatively across all searches for the same link.
The patch does that by storing a counter in the insn_link structure.
There was a 32-bit hole there on LP64 hosts.
gcc/
PR testsuite/116398
* params.opt (-param=max-combine-search-insns=): New param.
* doc/invoke.texi: Document it.
* combine.cc (insn_link::insn_count): New field.
(alloc_insn_link): Initialize it.
(distribute_links): Add a limit parameter.
(try_combine): Use the new param to limit distribute_links
when only i3 has changed.
|
|
Another problem in PR101523 was that, after each successful 2->2
combination attempt, distribute_links would search further and further
for the next combinable use of the i2 destination. Each search would
start at i2 itself, making the search quadratic in the worst case.
In a 2->2 combination, if i2 is unchanged, the search can start at i3
instead of i2. The same thing applies to i2 when distributing i2's
links, since the only changes to earlier instructions are the deletion
of i0 and i1.
This change, combined with the previous split_i2i3 patch, gives a
34.6% speedup in combine for the testcase in PR101523. Combine
goes from being 41% to 34% of compile time.
gcc/
PR testsuite/116398
* combine.cc (distribute_links): Take an optional start point.
(try_combine): If only i3 has changed, only distribute i3's links,
not i2's. Start the search for the new use from i3 rather than
from the definition instruction. Likewise start the search for
the new use from i2 when distributing i2's links.
|
|
When combining a single-set i2 into a multi-set i3, combine
first tries to match the new multi-set in-place. If that fails,
combine considers splitting the multi-set so that one set goes in
i2 and the other set stays in i3. That moves a destination from i3
to i2 and so combine needs to update any associated log link for that
destination to point to i2 rather than i3.
However, that kind of split can also occur for 2->2 combinations.
For a 2-instruction combination in which i2 doesn't die in i3, combine
tries a 2->1 combination by turning i3 into a parallel of the original
i2 and the combined i3. If that fails, combine will split the parallel
as above, so that the first set goes in i2 and the second set goes in i3.
But that can often leave i2 unchanged, meaning that no destinations have
moved and so no search is necessary.
gcc/
PR testsuite/116398
* combine.cc (try_combine): Shortcut the split_i2i3 handling if
i2 is unchanged.
|
|
One of the problems in PR101523 was that, after each successful
2->2 combination attempt, try_combine would restart combination
attempts at i2 even if i2 hadn't changed. This led to quadratic
behaviour as the same failed combinations between i2 and i3 were
tried repeatedly.
The original patch for the PR dealt with that by disallowing 2->2
combinations. However, that led to various optimisation regressions,
so there was interest in allowing the combinations again, at least
until an alternative way of getting the same results is in place.
This patch is a variant of Richi's in:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523#c53
but limited to when we're combining 2 instructions.
This speeds up combine by 10x on the original PR101523 testcase
and reduces combine's memory footprint by 100x.
gcc/
PR testsuite/116398
* combine.cc (try_combine): Reallow 2->2 combinations. Detect when
only i3 has changed and restart from i3 in that case.
gcc/testsuite/
* gcc.target/aarch64/popcnt-le-1.c: Account for commutativity of TST.
* gcc.target/aarch64/popcnt-le-3.c: Likewise AND.
* gcc.target/aarch64/pr100056.c: Revert previous patch.
* gcc.target/aarch64/sve/pred-not-gen-1.c: Likewise.
* gcc.target/aarch64/sve/pred-not-gen-4.c: Likewise.
* gcc.target/aarch64/sve/var_stride_2.c: Likewise.
* gcc.target/aarch64/sve/var_stride_4.c: Likewise.
Co-authored-by: Richard Biener <rguenther@suse.de>
|
|
This patch forestalls a regression in gcc.dg/rtl/x86_64/vector_eq.c
with the patch for PR116398. The test wants:
(cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int 0) (const_int 0) (const_int 0) (const_int 0)])))
(cinsn 5 (set (reg:V4SI <2>)
(eq:V4SI (reg:V4SI <0>) (reg:V4SI <1>))))
to be folded to a vector of -1s. One unusual thing about the fold
is that the <1> in the second insn is uninitialised; it looks like
it should be replaced by <0>, or that there should be an insn 4 that
copies <0> to <1>.
As it stands, the test relies on init-regs to insert a zero
initialisation of <1>. This happens after all the cse/pre/fwprop
stuff, with only dce passes between init-regs and combine.
Combine therefore sees:
(insn 3 2 8 2 (set (reg:V4SI 98)
(const_vector:V4SI [
(const_int 0 [0]) repeated x4
])) 2403 {movv4si_internal}
(nil))
(insn 8 3 9 2 (clobber (reg:V4SI 99)) -1
(nil))
(insn 9 8 5 2 (set (reg:V4SI 99)
(const_vector:V4SI [
(const_int 0 [0]) repeated x4
])) -1
(nil))
(insn 5 9 7 2 (set (reg:V4SI 100)
(eq:V4SI (reg:V4SI 98)
(reg:V4SI 99))) 7874 {*sse2_eqv4si3}
(expr_list:REG_DEAD (reg:V4SI 99)
(expr_list:REG_DEAD (reg:V4SI 98)
(expr_list:REG_EQUAL (eq:V4SI (const_vector:V4SI [
(const_int 0 [0]) repeated x4
])
(reg:V4SI 99))
(nil)))))
It looks like the test should then pass through a 3, 9 -> 5 combination,
so that we get an (eq ...) between two zeros and fold it to a vector
of -1s. But although the combination is attempted, the fold doesn't
happen. Instead, combine is left to match the unsimplified (eq ...)
between two zeros, which rightly fails. The test only passes because
late_combine2 happens to try simplifying an (eq ...) between reg X and
reg X, which does fold to a vector of -1s.
The different handling of registers and constants is due to this
code in simplify_const_relational_operation:
if (INTEGRAL_MODE_P (mode) && trueop1 != const0_rtx
&& (code == EQ || code == NE)
&& ! ((REG_P (op0) || CONST_INT_P (trueop0))
&& (REG_P (op1) || CONST_INT_P (trueop1)))
&& (tem = simplify_binary_operation (MINUS, mode, op0, op1)) != 0
/* We cannot do this if tem is a nonzero address. */
&& ! nonzero_address_p (tem))
return simplify_const_relational_operation (signed_condition (code),
mode, tem, const0_rtx);
INTEGRAL_MODE_P matches vector integer modes, but everything else
about the condition is written for scalar integers only. Thus if
trueop0 and trueop1 are equal vector constants, we'll bypass all
the exclusions and try simplifying a subtraction. This will succeed,
giving a vector of zeros. The recursive call will then try to simplify
a comparison between the vector of zeros and const0_rtx, which isn't
well-formed. Luckily or unluckily, the ill-formedness doesn't trigger
an ICE, but it does prevent any simplification from happening.
The least-effort fix would be to replace INTEGRAL_MODE_P with
SCALAR_INT_MODE_P. But the fold does make conceptual sense for
vectors too, so it seemed better to keep the INTEGRAL_MODE_P and
generalise the rest of the condition to match.
gcc/
* simplify-rtx.cc (simplify_const_relational_operation): Generalize
the constant checks in the fold-via-minus path to match the
INTEGRAL_MODE_P condition.
|
|
|
|
gcc/ChangeLog
* doc/extend.texi (Boolean Type): Further clarify support for
_Bool in C23 and C++.
|
|
With some minor changes, 8-bit and 16-bit fixed-point operations
can be supported on the reduced core.
libgcc/
* config/avr/t-avr (LIB1ASMFUNCS): Add (and remove from
FUNCS_notiny): _mulhisi3, _umulhisi3, _mulqq3, _mulhq3, _muluhq3,
_mulha3, _muluha3 _muluha3_round, _usmuluha3, _ssmulha3,
_divqq3, _udivuqq3, _divqq_helper, _divhq3, _udivuhq3.
_divha3 _udivuha3, _ssneg_2, _ssabs_1, _ssabs_2,
_mask1, _ret, _roundqq3 _rounduqq3,
_round_s2, _round_u2, _round_2_const, _addmask_2.
* config/avr/lib1funcs.S (__umulhisi3, __mulhisi3): Make
work on AVRrc.
* config/avr/lib1funcs-fixed.S: Build 8-bit and 16-bit functions
on AVRrc, too.
|
|
The discovered paths already include the multilib and so there is
no need to add an extra library to COBOL_UNDER_TEST. Doing so makes
a duplicate, which causes test fails on Darwin, where the linker warns
when duplicate libraries are provided on the link line.
gcc/testsuite/ChangeLog:
* lib/cobol.exp: Simplify the setting of COBOL_UNDER_TEST.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
As discussed in the PR, the options had been added during development
to handle specific cases, they are no longer needed (and if they should
become necessary, we will need to guard them such that individual
platforms get the correct handling).
PR cobol/119414
gcc/cobol/ChangeLog:
* gcobolspec.cc (append_rdynamic,
append_allow_multiple_definition, append_fpic): Remove.
(lang_specific_driver): Remove platform-specific command
line option handling.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
gcc/ChangeLog
PR middle-end/78874
* doc/invoke.texi (Warning Options): Fix description of
-Wno-aggressive-loop-optimizations to reflect that this turns
off the warning, and the default is for it to be enabled.
|
|
Here during maybe_dependent_member_ref for accepted_type<_Up>, we
correctly don't strip the typedef because it's a complex one (its
defaulted template parameter isn't used in its definition) and so
we recurse to consider its corresponding TYPE_DECL.
We then incorrectly decide to not rewrite this use because of the
TYPENAME_TYPE shortcut. But I don't think this shortcut should apply to
a typedef TYPE_DECL.
PR c++/118626
gcc/cp/ChangeLog:
* pt.cc (maybe_dependent_member_ref): Restrict TYPENAME_TYPE
shortcut to non-typedef TYPE_DECL.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/class-deduction-alias25a.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
Here during maybe_dependent_member_ref (as part of CTAD rewriting
for the variant constructor) for __accepted_type<_Up> we strip this
alias all the way to type _Nth_type<__accepted_index<_Up>>, for which
we return NULL since _Nth_type is at namespace scope and so no
longer needs rewriting.
Note that however the template argument __accepted_index<_Up> of this
stripped type _does_ need rewriting (since it specializes a variable
template from the current instantiation). We end up not rewriting this
variable template reference at any point however because upon returning
NULL, the caller (tsubst) proceeds to substitute the original form of
the type __accepted_type<_Up>, which doesn't directly refer to
__accepted_index. This later leads to an ICE during subsequent alias
CTAD rewriting of this guide that contains a non-rewritten reference
to __accepted_index.
So when maybe_dependent_member_ref decides to not rewrite a class-scope
alias that's been stripped, the caller needs to commit to substituting
the stripped type rather than the original type. This patch essentially
implements that by making maybe_dependent_member_ref call tsubst itself
in that case.
PR c++/118626
gcc/cp/ChangeLog:
* pt.cc (maybe_dependent_member_ref): Substitute and return the
stripped type if we decided to not rewrite it directly.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/class-deduction-alias25.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
I keep forgetting to do this.... :-(
gcc/c-family/ChangeLog
* c.opt.urls: Regenerate.
gcc/d/ChangeLog
* lang.opt.urls: Regenerate.
|
|
Per the issue, there were a couple places in the manual where
-Wno-psabi was mentioned, but the option itself was not documented.
gcc/c-family/ChangeLog
PR c/81831
* c.opt (Wpsabi): Remove "Undocumented" modifier and add a
documentation string.
gcc/ChangeLog
PR c/81831
* doc/invoke.texi (Option Summary): Add -Wno-psabi.
(Warning Options): Document -Wpsabi separately from -Wabi.
Note it's enabled by default, not just implied by -Wabi.
Replace the detailed example for a GCC 4.4 change for x86
(which is unlikely to be very interesting nowadays) with
just a list of all targets that presently diagnose these
warnings.
(RS/6000 and PowerPC Options): Add cross-references for
-Wno-psabi.
|
|
|
|
Here the implicit use of 'this' in inner.size() template argument was
being rejected despite P2280R4 relaxations, due to the special *this
handling in the INDIRECT_REF case of potential_constant_expression_1.
This handling was originally added by r196737 as part of fixing PR56481,
and it seems obsolete especially in light of P2280R4. There doesn't
seem to be a good reason that we need to handle *this specially from
other dereferences.
This patch therefore removes this special handling. As a side benefit
we now correctly reject some *reinterpret_cast<...>(...) constructs
earlier, via p_c_e_1 rather than via constexpr evaluation (because the
removed STRIP_NOPS step meant we'd overlook such casts), which causes a
couple of diagnostic changes (for the better).
PR c++/118249
gcc/cp/ChangeLog:
* constexpr.cc (potential_constant_expression_1)
<case INDIRECT_REF>: Remove obsolete *this handling.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/constexpr-reinterpret2.C: Expect error at
call site of the non-constexpr functions.
* g++.dg/cpp23/constexpr-nonlit12.C: Likewise.
* g++.dg/cpp0x/constexpr-ref14.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
gcc/ChangeLog
PR middle-end/112589
* common.opt (-fcf-protection): Add documentation string.
* doc/invoke.texi (Option Summary): Add entry for -fcf-protection
without argument.
(Instrumentation Options): Tidy the -fcf-protection entry and
and add documention for the form without an argument.
|
|
This conditionally adds a path for libgcobol when that contains
libgcobol.spec.
gcc/testsuite/ChangeLog:
* lib/cobol.exp: Conditionally add a path for libgcobol.spec.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
Ads support for using a library spec file (e.g. to include the
target requirements for non-standard libraries - or even libm
which we can now configure at the target side).
gcc/cobol/ChangeLog:
* gcobolspec.cc (SPEC_FILE): New.
(lang_specific_driver): Make the 'need libgcobol' flag global
so that the prelink callback can use it. Libm use is now handled
via the library spec.
(lang_specific_pre_link): Include libgcobol.spec where needed.
libgcobol/ChangeLog:
* Makefile.am: Add libgcobol.spec and dependency.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Add libgcobol.spec handling.
* libgcobol.spec.in: New file.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
Currently, the library is configured as if it was written in C, however
all the sources are C++, so update to use C++ as the configure language
(and check the CXX instead of CC).
Reorder the configuration steps so that we setup the tools and environment
before carrying out tests.
Remove unused configuration machinery.
Also we configured extra ld flags but never used them. There is no need
to make these extra_ldflags darwin-specific, additions could be required
by other hosts.
libgcobol/ChangeLog:
* aclocal.m4: Regenerate.
* config.h.in: Regenerate.
* Makefile.am: Use the configured LIBS and extra_ldflags.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Shift configure to use c++. Order tests for tools
and environment before other tests.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
__umulhisi3 had an "rcall 1f" to save 6 bytes, which is an unreasonable
size gain vs. cycle cost. Just use the same code on all devices with MUL,
irrespective of program memory size.
libgcc/
* config/avr/lib1funcs.S (__umulhisi3) [Have MUL]: Reduce call
depth by 1.
|
|
In this testcase, the use of __FUNCTION__ is within a function parameter
scope, the lambda's. And P1787 changed __func__ to live in the parameter
scope. But [basic.scope.pdecl] says that the point of declaration of
__func__ is immediately before {, so in the trailing return type it isn't in
scope yet, so this __FUNCTION__ should refer to foo().
Looking first for a block scope, then a function parameter scope, gives us
the right result.
PR c++/118629
gcc/cp/ChangeLog:
* name-lookup.cc (pushdecl_outermost_localscope): Look for an
sk_block.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-__func__3.C: New test.
|
|
|
|
When adding TU_LOCAL_ENTITY in r15-6379 I neglected to add it to
cp_tree_node_structure, so garbage collection was crashing on it.
PR c++/119564
gcc/cp/ChangeLog:
* decl.cc (cp_tree_node_structure): Add TU_LOCAL_ENTITY; fix
formatting.
gcc/testsuite/ChangeLog:
* g++.dg/modules/gc-3_a.C: New test.
* g++.dg/modules/gc-3_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
Modules streaming walks decls multiple times, first as a non-streaming
walk to find dependencies, and then later to actually emit the decls.
The first walk needs to be done to note locations that will be emitted.
In the PR we are getting a checking ICE because we are streaming a decl
that we didn't initially walk when collecting dependencies, so the
location isn't in the noted locations map. This is because in decl_node
we have a branch where a PARM_DECL that hasn't previously been
referenced gets walked by value only if 'streaming_p ()' is true.
The true root cause here is that the decltype(v) in the testcase refers
to a different PARM_DECL from the one in the declaration that we're
streaming; it's the PARM_DECL from the initial forward-declaration, that
we're not streaming. A proper fix would be to ensure that it gets
remapped to the decl in the definition we're actually emitting, but for
now this workaround fixes the bug (and any other bugs that might
manifest similarly).
PR c++/119608
gcc/cp/ChangeLog:
* module.cc (trees_out::decl_node): Maybe require by-value
walking not just when streaming.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr119608_a.C: New test.
* g++.dg/modules/pr119608_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
In the linked PR, we're importing over a DECL_MAYBE_DELETED decl with a
decl that has already been instantiated. This patch ensures that the
needed bits are propagated across and that DECL_MAYBE_DELETED is cleared
from the existing decl, so that later synthesize_method doesn't crash
due to a definition unexpectedly already existing.
PR c++/119462
gcc/cp/ChangeLog:
* module.cc (trees_in::is_matching_decl): Propagate exception
spec and constexpr to DECL_MAYBE_DELETED; clear if appropriate.
gcc/testsuite/ChangeLog:
* g++.dg/modules/noexcept-3_a.C: New test.
* g++.dg/modules/noexcept-3_b.C: New test.
* g++.dg/modules/noexcept-3_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
This fix reverts the recent cobol_langhook_post_options change setting
flag_strict_aliasing = 0. It isn't necessary.
gcc/cobol
* cobol1.cc: Eliminate cobol_langhook_post_options.
* symbols.cc: Definition of RETURN-CODE special register sets
::attr member to signable_e.
|