Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc: Change it to no side effects.
* config/riscv/vector.md (@vsetvl<mode>_no_side_effects): New pattern.
|
|
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc: Remove side effects.
|
|
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (anticipatable_occurrence_p): Fix
incorrect annotations.
(available_occurrence_p): Ditto.
(backward_propagate_worthwhile_p): Ditto.
(can_backward_propagate_p): Ditto.
|
|
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (vlmax_avl_insn_p): Fix multi-line
conditional.
(vsetvl_insn_p): Ditto.
(same_bb_and_before_p): Ditto.
(same_bb_and_after_or_equal_p): Ditto.
|
|
PR fortran/106731
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_trans_auto_array_allocation): Remove gcc_assert (!TREE_STATIC()).
gcc/testsuite/ChangeLog:
* gfortran.dg/pr106731.f90: New test.
|
|
Make the output more readable. Don't output anything unless verbose
termination is enabled at configure-time.
The testsuite change was almost entirely mechanical. Save for two files
which had very short matches, these changes were produced by two seds and a
Perl script, for the more involved cases. The latter will be added in a
subsequent commit. The former are as follows:
sed -E -i "/dg-output/s/default std::handle_contract_violation called: \
(\S+) (\S+) (\S+(<[A-Za-z0-9, ]*)?>?)\
/contract violation in function \3 at \1:\2: /" *.C
sed -i '/dg-output/s/ */ /g'
Whichever files remained failing after the above changes were checked-out,
re-ran, with output extracted, and ran through dg-out-generator.pl.
Co-Authored-By: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/107792
PR libstdc++/107778
* src/experimental/contract.cc (handle_contract_violation): Make
output more readable.
gcc/testsuite/ChangeLog:
* g++.dg/contracts/contracts-access1.C: Convert to new default
violation handler.
* g++.dg/contracts/contracts-assume2.C: Ditto.
* g++.dg/contracts/contracts-config1.C: Ditto.
* g++.dg/contracts/contracts-constexpr1.C: Ditto.
* g++.dg/contracts/contracts-ctor-dtor1.C: Ditto.
* g++.dg/contracts/contracts-deduced2.C: Ditto.
* g++.dg/contracts/contracts-friend1.C: Ditto.
* g++.dg/contracts/contracts-multiline1.C: Ditto.
* g++.dg/contracts/contracts-post3.C: Ditto.
* g++.dg/contracts/contracts-pre10.C: Ditto.
* g++.dg/contracts/contracts-pre2.C: Ditto.
* g++.dg/contracts/contracts-pre2a2.C: Ditto.
* g++.dg/contracts/contracts-pre3.C: Ditto.
* g++.dg/contracts/contracts-pre4.C: Ditto.
* g++.dg/contracts/contracts-pre5.C: Ditto.
* g++.dg/contracts/contracts-pre7.C: Ditto.
* g++.dg/contracts/contracts-pre9.C: Ditto.
* g++.dg/contracts/contracts-redecl3.C: Ditto.
* g++.dg/contracts/contracts-redecl4.C: Ditto.
* g++.dg/contracts/contracts-redecl6.C: Ditto.
* g++.dg/contracts/contracts-redecl7.C: Ditto.
* g++.dg/contracts/contracts-tmpl-spec1.C: Ditto.
* g++.dg/contracts/contracts-tmpl-spec2.C: Ditto.
* g++.dg/contracts/contracts-tmpl-spec3.C: Ditto.
* g++.dg/contracts/contracts10.C: Ditto.
* g++.dg/contracts/contracts14.C: Ditto.
* g++.dg/contracts/contracts15.C: Ditto.
* g++.dg/contracts/contracts16.C: Ditto.
* g++.dg/contracts/contracts17.C: Ditto.
* g++.dg/contracts/contracts19.C: Ditto.
* g++.dg/contracts/contracts25.C: Ditto.
* g++.dg/contracts/contracts3.C: Ditto.
* g++.dg/contracts/contracts35.C: Ditto.
* g++.dg/contracts/contracts5.C: Ditto.
* g++.dg/contracts/contracts7.C: Ditto.
* g++.dg/contracts/contracts9.C: Ditto.
|
|
This script is a helper used to generate dg-output lines from an existing
program output conveniently. It takes care of escaping Tcl and ARE stuff.
contrib/ChangeLog:
* dg-out-generator.pl: New file.
|
|
|
|
mingw stdio.h plays horrible games with extern "C++", but it also seems
sloppy for coro.h to declare printf in testcases that will also include
standard headers.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/coro.h: #include <stdio.h> instead of
declaring puts/printf.
* g++.dg/coroutines/torture/mid-suspend-destruction-0.C:
#include <stdio.h>.
* g++.dg/coroutines/pr95599.C: Use PRINT instead of puts.
* g++.dg/coroutines/torture/call-00-co-aw-arg.C:
* g++.dg/coroutines/torture/call-01-multiple-co-aw.C:
* g++.dg/coroutines/torture/call-02-temp-co-aw.C:
* g++.dg/coroutines/torture/call-03-temp-ref-co-aw.C:
* g++.dg/coroutines/torture/co-await-00-trivial.C:
* g++.dg/coroutines/torture/co-await-01-with-value.C:
* g++.dg/coroutines/torture/co-await-02-xform.C:
* g++.dg/coroutines/torture/co-await-03-rhs-op.C:
* g++.dg/coroutines/torture/co-await-04-control-flow.C:
* g++.dg/coroutines/torture/co-await-05-loop.C:
* g++.dg/coroutines/torture/co-await-06-ovl.C:
* g++.dg/coroutines/torture/co-await-07-tmpl.C:
* g++.dg/coroutines/torture/co-await-08-cascade.C:
* g++.dg/coroutines/torture/co-await-09-pair.C:
* g++.dg/coroutines/torture/co-await-11-forwarding.C:
* g++.dg/coroutines/torture/co-await-12-operator-2.C:
* g++.dg/coroutines/torture/co-await-13-return-ref.C:
* g++.dg/coroutines/torture/co-await-14-return-ref-to-auto.C:
* g++.dg/coroutines/torture/pr95003.C: Likewise.
|
|
The commit r12-5877-g9e18a25331fa25 removed the incorrect
noexcept-specifier from std::condition_variable::wait and gave the new
symbol version @@GLIBCXX_3.4.30. It also redefined the original symbol
std::condition_variable::wait(unique_lock<mutex>&)@GLIBCXX_3.4.11 as an
alias for a new symbol, __gnu_cxx::__nothrow_wait_cv::wait, which still
has the incorrect noexcept guarantee. That __nothrow_wait_cv::wait is
just a wrapper around the real condition_variable::wait which adds
noexcept and so terminates on a __forced_unwind exception.
This doesn't work on uclibc, possibly due to a dynamic linker bug. When
__nothrow_wait_cv::wait calls the condition_variable::wait function it
binds to the alias symbol, which means it just calls itself recursively
until the stack overflows.
This change avoids the possibility of a recursive call by changing the
__nothrow_wait_cv::wait function so that instead of calling
condition_variable::wait it re-implements it. This requires accessing
the private _M_cond member of condition_variable, so we need to use the
trick of instantiating a template with the member-pointer of the private
member.
libstdc++-v3/ChangeLog:
PR libstdc++/105730
* src/c++11/compatibility-condvar.cc (__nothrow_wait_cv::wait):
Access private data member of base class and call its wait
member.
|
|
This adds the operator<< overloads and std::formatter specializations
required by C++20 so that <chrono> types can be written to ostreams and
printed with std::format.
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/std/chrono (operator<<): Move to new header.
(nonexistent_local_time::_M_make_what_str): Define correctly.
(ambiguous_local_time::_M_make_what_str): Likewise.
* include/bits/chrono_io.h: New file.
* src/c++20/tzdb.cc (operator<<(ostream&, const Rule&)): Use
new ostream output for month and weekday types.
* testsuite/20_util/duration/io.cc: Test std::format support.
* testsuite/std/time/exceptions.cc: Check what() strings.
* testsuite/std/time/syn_c++20.cc: Uncomment local_time_format.
* testsuite/std/time/time_zone/get_info_local.cc: Enable check
for formatted output of local_info objects.
* testsuite/std/time/clock/file/io.cc: New test.
* testsuite/std/time/clock/gps/io.cc: New test.
* testsuite/std/time/clock/system/io.cc: New test.
* testsuite/std/time/clock/tai/io.cc: New test.
* testsuite/std/time/clock/utc/io.cc: New test.
* testsuite/std/time/day/io.cc: New test.
* testsuite/std/time/format.cc: New test.
* testsuite/std/time/hh_mm_ss/io.cc: New test.
* testsuite/std/time/month/io.cc: New test.
* testsuite/std/time/weekday/io.cc: New test.
* testsuite/std/time/year/io.cc: New test.
* testsuite/std/time/year_month_day/io.cc: New test.
|
|
Add a new __format::__write_padded_as_spec helper to remove duplicated
code in formatter specializations.
libstdc++-v3/ChangeLog:
* include/std/format (__format::__write_padded_as_spec): New
function.
(__format::__formatter_str, __format::__formatter_int::format)
(formatter<const void*, charT>): Use it.
|
|
libstdc++-v3/ChangeLog:
* python/libstdcxx/v6/printers.py (StdChronoDurationPrinter)
(StdChronoTimePointPrinter, StdChronoZonedTimePrinter)
(StdChronoCalendarPrinter, StdChronoTimeZonePrinter)
(StdChronoLeapSecondPrinter, StdChronoTzdbPrinter)
(StdChronoTimeZoneRulePrinter): New printers.
|
|
This is the largest missing piece of C++20 support. Only the cxx11 ABI
is supported, due to the use of std::string in the API for time zones.
For the old gcc4 ABI, utc_clock and leap seconds are supported, but only
using a hardcoded list of leap seconds, no up-to-date tzdb::leap_seconds
information is available, and no time zones or zoned_time conversions.
The implementation currently depends on a tzdata.zi file being provided
by the OS or the user. The expected location is /usr/share/zoneinfo but
that can be changed using --with-libstdcxx-zoneinfo-dir=PATH. On targets
that support it there is also a weak symbol that users can override in
their own program (which also helps with testing):
extern "C++" const char* __gnu_cxx::zoneinfo_dir_override();
If no file is found, a fallback tzdb object will be created which only
contains the "Etc/UTC" and "Etc/GMT" time zones.
A leapseconds file is also expected in the same directory, but if that
isn't present then a hardcoded list of leapseconds is used, which is
correct at least as far as 2023-06-28 (and it currently looks like no
leap second will be inserted for a few years).
The tzdata.zi and leapseconds files from https://www.iana.org/time-zones
are in the public domain, so shipping copies of them with GCC would be
an option. However, the tzdata.zi file will rapidly become outdated, so
users should really provide it themselves (or convince their OS vendor
to do so). It would also be possible to implement an alternative parser
for the compiled tzdata files (one per time zone) under
/usr/share/zoneinfo. Those files are present on more operating systems,
but do not contain all the information present in tzdata.zi.
Specifically, the "links" are not present, so that e.g. "UTC" and
"Universal" are distinct time zones, rather than both being links to the
canonical "Etc/UTC" zone. For some platforms those files are hard links
to the same file, but there's no indication which zone is the canonical
name and which is a link. Other platforms just store them in different
inodes anyway. I do not plan to add such an alternative parser for the
compiled files. That would need to be contributed by maintainers or
users of targets that require it, if making tzdata.zi available is not
an option. The library ABI would not need to change for a new tzdb
implementation, because everything in tzdb_list, tzdb and time_zone is
implemented as a pimpl (except for the shared_ptr links between nodes,
described below). That means the new exported symbols added by this
commit should be stable even if the implementation is completely
rewritten.
The information from tzdata.zi is parsed and stored in data structures
that closely model the info in the file. This is a space-efficient
representation that uses less memory that storing every transition for
every time zone. It also avoids spending time expanding that
information into time zone transitions that might never be needed by the
program. When a conversion to/from a local time to UTC is requested the
information will be processed to determine the time zone transitions
close to the time being converted.
There is a bug in some time zone transitions. When generating a sys_info
object immediately after one that was previously generated, we need to
find the previous rule that was in effect and note its offset and
letters. This is so that the start time and abbreviation of the new
sys_info will be correct. This only affects time zones that use a format
like "C%sT" where the LETTERS replacing %s are non-empty for standard
time, e.g. "Asia/Shanghai" which uses "CST" for standard time and "CDT"
for daylight time.
The tzdb_list structure maintains a linked list of tzdb nodes using
shared_ptr links. This allows the iterators into the list to share
ownership with the list itself. This offers a non-portable solution to a
lifetime issue in the API. Because tzdb objects can be erased from the
list using tzdb_list::erase_after, separate modules/libraries in a large
program cannot guarantee that any const tzdb& or const time_zone*
remains valid indefinitely. Holding onto a tzdb_list::const_iterator
will extend the tzdb object's lifetime, even if it's erased from the
list. An alternative design would be for the list iterator to hold a
weak_ptr. This would allow users to test whether the tzdb still exists
when the iterator is dereferenced, which is better than just having a
dangling raw pointer. That doesn't actually extend the tzdb's lifetime
though, and every use of it would need to be preceded by checking the
weak_ptr. Using shared_ptr adds a little bit of overhead but allows
users to solve the lifetime issue if they rely on the libstdc++-specific
iterator property.
libstdc++-v3/ChangeLog:
* acinclude.m4 (GLIBCXX_ZONEINFO_DIR): New macro.
* config.h.in: Regenerate.
* config/abi/pre/gnu.ver: Export new symbols.
* configure: Regenerate.
* configure.ac (GLIBCXX_ZONEINFO_DIR): Use new macro.
* include/std/chrono (utc_clock::from_sys): Correct handling
of leap seconds.
(nonexistent_local_time::_M_make_what_str): Define.
(ambiguous_local_time::_M_make_what_str): Define.
(__throw_bad_local_time): Define new function.
(time_zone, tzdb_list, tzdb): Implement all members.
(remote_version, zoned_time, get_leap_second_info): Define.
* include/std/version: Add comment for __cpp_lib_chrono.
* src/c++20/Makefile.am: Add new file.
* src/c++20/Makefile.in: Regenerate.
* src/c++20/tzdb.cc: New file.
* testsuite/lib/libstdc++.exp: Define effective target tzdb.
* testsuite/std/time/clock/file/members.cc: Check file_time
alias and file_clock::now() member.
* testsuite/std/time/clock/gps/1.cc: Likewise for gps_clock.
* testsuite/std/time/clock/tai/1.cc: Likewise for tai_clock.
* testsuite/std/time/syn_c++20.cc: Uncomment everything except
parse.
* testsuite/std/time/clock/utc/leap_second_info.cc: New test.
* testsuite/std/time/exceptions.cc: New test.
* testsuite/std/time/time_zone/get_info_local.cc: New test.
* testsuite/std/time/time_zone/get_info_sys.cc: New test.
* testsuite/std/time/time_zone/requirements.cc: New test.
* testsuite/std/time/tzdb/1.cc: New test.
* testsuite/std/time/tzdb/leap_seconds.cc: New test.
* testsuite/std/time/tzdb_list/1.cc: New test.
* testsuite/std/time/tzdb_list/requirements.cc: New test.
* testsuite/std/time/zoned_time/1.cc: New test.
* testsuite/std/time/zoned_time/custom.cc: New test.
* testsuite/std/time/zoned_time/deduction.cc: New test.
* testsuite/std/time/zoned_time/req_neg.cc: New test.
* testsuite/std/time/zoned_time/requirements.cc: New test.
* testsuite/std/time/zoned_traits.cc: New test.
|
|
This avoids clang warnings:
gcc/go/gofrontend/escape.cc:1290:17: warning: private field 'fn_' is not used [-Wunused-private-field]
gcc/go/gofrontend/escape.cc:3478:19: warning: private field 'context_' is not used [-Wunused-private-field]
gcc/go/gofrontend/lex.h:564:15: warning: private field 'input_file_name_' is not used [-Wunused-private-field]
gcc/go/gofrontend/types.cc:5788:20: warning: private field 'call_' is not used [-Wunused-private-field]
gcc/go/gofrontend/wb.cc:206:9: warning: private field 'gogo_' is not used [-Wunused-private-field]
Path by Martin Liška.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/458975
|
|
gcc/fortran/ChangeLog:
PR fortran/69604
* match.cc (chk_stmt_fcn_body): New function. Check for invalid uses
of statement functions arguments.
(gfc_match_st_function): Use above.
gcc/testsuite/ChangeLog:
PR fortran/69604
* gfortran.dg/statement_function_4.f90: New test.
|
|
The documentation for the DONE and FAIL macros was incorrectly inserted
between example code, and a remark attached to that example.
gcc/ChangeLog:
* doc/md.texi: Move example code remark next to it's code block.
|
|
It is unclear why the example C function was renamed to
`commutative_integer_operator` as part of ec8e098d in 2004, while the
text and the example md were both left as `commutative_operator`. The
latter name appears to be more accurate, so revert the 2004 change.
gcc/ChangeLog:
* doc/md.texi: Fix inconsistent example name.
|
|
gcc/ChangeLog:
* doc/md.texi: Fix incorrect pxref.
|
|
There's no explicit mention of what GCC compiler supports C++11
and the cross compiler build requirement mentions GCC 4.8 but not
GCC 4.8.3 which is the earliest known version to not run into
C++11 implementation bugs. The following adds explicit wording.
PR bootstrap/106482
* doc/install.texi (ISO C++11 Compiler): Document GCC version
known to work.
|
|
This adds a missing effective target check for the permute
recurrence vectorization requires.
PR testsuite/107809
* gcc.dg/vect/vect-recurr-1.c: Require vect_perm.
* gcc.dg/vect/vect-recurr-2.c: Likewise.
* gcc.dg/vect/vect-recurr-3.c: Likewise.
* gcc.dg/vect/vect-recurr-4.c: Likewise.
* gcc.dg/vect/vect-recurr-5.c: Likewise.
* gcc.dg/vect/vect-recurr-6.c: Likewise.
|
|
The following place in value_replacement is after proving that
x == cst1 ? cst2 : x
phi result is only used in a comparison with constant which doesn't
care if it compares cst1 or cst2 and replaces it with x.
The testcase is miscompiled because we have after the replacement
incorrect range info for the phi result, we would need to
effectively union the phi result range with cst1 (oarg in the code)
because previously that constant might be missing in the range, but
newly it can appear (we've just verified that the single use stmt
of the phi result doesn't care about that value in particular).
The following patch just resets the info, bootstrapped/regtested
on x86_64-linux and i686-linux, ok for trunk?
Aldy/Andrew, how would one instead union the SSA_NAME_RANGE_INFO
with some INTEGER_CST and store it back into SSA_NAME_RANGE_INFO
(including adjusting non-zero bits and the like)?
2022-12-22 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/108166
* tree-ssa-phiopt.cc (value_replacement): For the maybe_equal_p
case turned into equal_p reset SSA_NAME_RANGE_INFO of phi result.
* g++.dg/torture/pr108166.C: New test.
|
|
The following testcase ICEs on aarch64, because insert_const_anchor
inserts invalid CONST_INT into the CSE tables - 0x80000000 for SImode.
The second hunk of the patch fixes that, the first one is to avoid
triggering undefined behavior at compile time during compute_const_anchors
computations - performing those additions and subtractions in
HOST_WIDE_INT means it can overflow for certain constants.
2022-12-22 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/108193
* cse.cc (compute_const_anchors): Change n type to
unsigned HOST_WIDE_INT, adjust comparison against it to avoid
warnings. Formatting fix.
(insert_const_anchor): Use gen_int_mode instead of GEN_INT.
* gfortran.dg/pr108193.f90: New test.
|
|
When vectorizing SLP loads with permutations we can access excess
elements when the load vector type is bigger than the group size
and the vectorization factor covers less groups than necessary
to fill it. Since we know the code will only access up to
group_size * VF elements in the unpermuted vector we can simply
fill the rest of the vector with whatever we want. For simplicity
this patch chooses to repeat the last group.
PR tree-optimization/107451
* tree-vect-stmts.cc (vectorizable_load): Avoid loading
SLP group members from group numbers in excess of the
vectorization factor.
* gcc.dg/torture/pr107451.c: New testcase.
|
|
The r13-2943-g11a113d501ff64 made aarch64.h include
aarch64-option-extensions.def, but that file isn't installed
for building plugins.
On Wed, Dec 21, 2022 at 09:56:33AM +0000, Richard Sandiford wrote:
> Should this (and aarch64-fusion-pairs.def and aarch64-tuning-flags.def)
> be in TM_H instead? The first two OPTIONS_H_EXTRA entries seem to be
> for aarch64-opt.h (included via aarch64.opt).
>
> I guess TM_H should also have aarch64-arches.def, since it's included
> for aarch64_feature.
gcc/Makefile.in has
TM_H = $(GTM_H) insn-flags.h $(OPTIONS_H)
and
OPTIONS_H = options.h flag-types.h $(OPTIONS_H_EXTRA)
which means that adding something into TM_H when it is already in
OPTIONS_H_EXTRA is a unnecessary.
It is true that aarch64-fusion-pairs.def (included by aarch64-protos.h)
and aarch64-tuning-flags.def (ditto) and aarch64-option-extensions.def
(included by aarch64.h) aren't needed for options.h, so I think the
right patch would be following.
2022-12-22 Jakub Jelinek <jakub@redhat.com>
* config/aarch64/t-aarch64 (TM_H): Don't add aarch64-cores.def,
add aarch64-fusion-pairs.def, aarch64-tuning-flags.def and
aarch64-option-extensions.def.
(OPTIONS_H_EXTRA): Don't add aarch64-fusion-pairs.def nor
aarch64-tuning-flags.def.
|
|
Thi defines a variable template for the internal __is_duration helper
trait, defines a new __is_time_point_v variable template (to be used in
a subsequent commit), and adds explicit specializations of the standard
chrono::treat_as_floating_point trait for common types.
A fast path is added to chrono::duration_cast for the no-op case where
no conversion is needed.
Finally, some SFINAE constraints are simplified by using the
__enable_if_t alias, or by using variable templates.
libstdc++-v3/ChangeLog:
* include/bits/chrono.h (__is_duration_v, __is_time_point_v):
New variable templates.
(duration_cast): Add simplified definition for noconv case.
(treat_as_floating_point_v): Add explicit specializations.
(duration::operator%=, floor, ceil, round): Simplify SFINAE
constraints.
|
|
libstdc++-v3/ChangeLog:
* include/std/chrono: Use nodiscard attribute.
|
|
Adds tunes needed for zen4 microarchitecture. I added two new knobs.
TARGET_AVX512_SPLIT_REGS which is used to specify that internally 512 vectors
are split to 256 vectors. This affects vectorization costs and reassociation
width. It probably should also affect RTX costs however I doubt it is very useful
since RTL optimizers are usually not judging between 256 and 512 vectors.
I also added X86_TUNE_AVOID_256FMA_CHAINS. Since fma has improved in zen4 this
flag may not be a win except for very specific benchmarks. I am still doing some
more detailed testing here.
Oherwise I disabled gathers on zen4 for 2 parts nad 4 parts. We can open code them
and since the latencies has only increased since zen3 opencoding is better than
actual instrucction. This shows at 4 tsvc benchmarks.
I ended up setting AVX256_OPTIMAL. This is a compromise. There are some tsvc
benchmarks that increase noticeably (up to 250%) however there are also few
regressions. Most of these can be solved by incrasing vec_perm cost in the
vectorizer. However this does not cure about 14% regression on x264 that is
quite important. Here we produce vectorized loops for avx512 that probably
would be faster if the loops in question had high enough iteration count.
We hit this problem with avx256 too: since the loop iterates few times, only
prologues/epilogues are used. Adding another round of prologue/epilogue
code does not make it better.
Finally I enabled avx stores for constnat sized memcpy and memset. I am not
sure why this is an opt-in feature. I think for most hardware this is a win.
gcc/ChangeLog:
2022-12-22 Jan Hubicka <hubicka@ucw.cz>
* config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): Add
TARGET_AVX512_SPLIT_REGS
* config/i386/i386-options.cc (ix86_option_override_internal):
Honor x86_TONE_AVOID_256FMA_CHAINS.
* config/i386/i386.cc (ix86_vec_cost): Honor TARGET_AVX512_SPLIT_REGS.
(ix86_reassociation_width): Likewise.
* config/i386/i386.h (TARGET_AVX512_SPLIT_REGS): New tune.
* config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable
for znver4.
(X86_TUNE_USE_GATHER_4PARTS): Likewise.
(X86_TUNE_AVOID_256FMA_CHAINS): Set for znver4.
(X86_TUNE_AVOID_512FMA_CHAINS): New utne; set for znver4.
(X86_TUNE_AVX256_OPTIMAL): Add znver4.
(X86_TUNE_AVX512_SPLIT_REGS): New tune.
(X86_TUNE_AVX256_MOVE_BY_PIECES): Add znver1-3.
(X86_TUNE_AVX256_STORE_BY_PIECES): Add znver1-3.
(X86_TUNE_AVX512_MOVE_BY_PIECES): Add znver4.
(X86_TUNE_AVX512_STORE_BY_PIECES): Add znver4.
|
|
This was missing.
gcc/lto/
* lto-common.cc (compare_tree_sccs_1): Compare DECL_NOT_FLEXARRAY.
|
|
Update cost of znver4 mostly based on data measued by Agner Fog.
Compared to previous generations x87 became bit slower which is probably not
big deal (and we have minimal benchmarking coverage for it). One interesting
improvement is reducation of FMA cost. I also updated costs of AVX256
loads/stores based on latencies (not throughput which is twice of avx256).
Overall AVX512 vectorization seems to improve noticeably some of TSVC
benchmarks but since internally 512 vectors are split to 256 vectors it is
somewhat risky and does not win in SPEC scores (mostly by regressing benchmarks
with loop that have small trip count like x264 and exchange), so for now I am
going to set AVX256_OPTIMAL tune but I am still playing with it. We improved
since ZNVER1 on choosing vectorization size and also have vectorized
prologues/epilogues so it may be possible to make avx512 small win overall.
2022-12-22 Jan Hubicka <hubicka@ucw.cz>
* config/i386/x86-tune-costs.h (znver4_cost): Upate costs of FP and SSE
moves, division multiplication, gathers, L2 cache size, and more
complex FP instrutions.
|
|
|
|
This fixes the following on LLP64 mingw-w64 target:
Excess errors:
gcc/testsuite/gcc.c-torture/compile/pr55569.c:13:12: warning: overflow in conversion from 'long long unsigned int' to 'long int' changes value from '4611686018427387903' to '-1' [-Woverflow]
gcc/testsuite/gcc.c-torture/compile/pr55569.c:13:34: warning: iteration 2147483647 invokes undefined behavior [-Waggressive-loop-optimizations]
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr55569.c: fix excess errors.
Signed-off-by: Jonathan Yong <10walls@gmail.com>
|
|
Even though this PR was reported with an ubsan issue, the problem is
tree_nonzero_bits is being called with an expression which is a vector type.
This fixes three patterns I noticed which does that.
And adds a testcase for one of the patterns.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions
gcc/ChangeLog:
PR tree-optimization/105532
* match.pd (~(X >> Y) -> ~X >> Y): Check if it is an integral
type before calling tree_nonzero_bits.
(popcount(X) + popcount(Y)): Likewise.
(popcount(X&C1)): Likewise.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/vector-shift-1.c: New test.
|
|
[Sync'ed from the binutils-gdb repo]
This patch uses the toplevel configure parts for GMP/MPFR for
gdb. The only thing is that gdb now requires MPFR for building.
Before it was a recommended but not required library.
Also this allows building of GMP and MPFR with the toplevel
directory just like how it is done for GCC.
We now error out in the toplevel configure of the version
of GMP and MPFR that is wrong.
OK after GDB 13 branches? Build gdb 3 ways:
with GMP and MPFR in the toplevel (static library used at that point for both)
With only MPFR in the toplevel (GMP distro library used and MPFR built from source)
With neither GMP and MPFR in the toplevel (distro libraries used)
Changes from v1:
* Updated gdb/README and gdb/doc/gdb.texinfo.
* Regenerated using unmodified autoconf-2.69
Thanks,
Andrew Pinski
ChangeLog:
* Makefile.def: Add configure-gdb dependencies
on all-gmp and all-mpfr.
* configure.ac: Split out MPC checking from MPFR.
Require GMP and MPFR if the gdb directory exist.
* Makefile.in: Regenerate.
* configure: Regenerate.
|
|
Instead of trying to have the GPU do CPU-with-OS-like things, this new barriers
implementation for NVPTX uses simplistic bar.* synchronization instructions.
Tasks are processed after threads have joined, and only if team->task_count != 0
It is noted that: there might be a little bit of performance forfeited for
cases where earlier arriving threads could've been used to process tasks ahead
of other threads, but that has the requirement of implementing complex
futex-wait/wake like behavior, which is what we're try to avoid with this patch.
It is deemed that task processing is not what GPU target offloading is usually
used for.
Implementation highlight notes:
1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in
the usual manner)
2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
3. gomp_barrier_wait_last() now is implemented using "bar.arrive"
4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
The main synchronization is done using a 'bar.red' instruction. This reduces
across all threads the condition (team->task_count != 0), to enable the task
processing down below if any thread created a task.
(this bar.red usage means that this patch is dependent on the prior NVPTX
bar.red GCC patch)
PR target/99555
libgomp/ChangeLog:
* config/nvptx/bar.c (generation_to_barrier): Remove.
(futex_wait,futex_wake,do_spin,do_wait): Remove.
(GOMP_WAIT_H): Remove.
(#include "../linux/bar.c"): Remove.
(gomp_barrier_wait_end): New function.
(gomp_barrier_wait): Likewise.
(gomp_barrier_wait_last): Likewise.
(gomp_team_barrier_wait_end): Likewise.
(gomp_team_barrier_wait): Likewise.
(gomp_team_barrier_wait_final): Likewise.
(gomp_team_barrier_wait_cancel_end): Likewise.
(gomp_team_barrier_wait_cancel): Likewise.
(gomp_team_barrier_cancel): Likewise.
* config/nvptx/bar.h (gomp_barrier_t): Remove waiters, lock fields.
(gomp_barrier_init): Remove init of waiters, lock fields.
(gomp_team_barrier_wake): Remove prototype, add new static inline
function.
|
|
This patch adds support for the PTX 'bar.red' (i.e. "barrier reduction")
instruction, in the form of nvptx-specific __builtin_nvptx_bar_red_[and/or/popc]
built-in functions.
gcc/ChangeLog:
* config/nvptx/nvptx.cc (nvptx_print_operand): Add 'p' case, adjust
comments.
(enum nvptx_builtins): Add NVPTX_BUILTIN_BAR_RED_AND,
NVPTX_BUILTIN_BAR_RED_OR, and NVPTX_BUILTIN_BAR_RED_POPC.
(nvptx_expand_bar_red): New function.
(nvptx_init_builtins):
Add DEFs of __builtin_nvptx_bar_red_[and/or/popc].
(nvptx_expand_builtin): Use nvptx_expand_bar_red to expand
NVPTX_BUILTIN_BAR_RED_[AND/OR/POPC] cases.
* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
UNSPECV_BARRED_AND, UNSPECV_BARRED_OR, and UNSPECV_BARRED_POPC.
(BARRED): New int iterator.
(barred_op,barred_mode,barred_ptxtype): New int attrs.
(nvptx_barred_<barred_op>): New define_insn.
|
|
Add the patch that fixes i686 Darwin build.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libffi/ChangeLog:
* LOCAL_PATCHES: Add patch to fix i686 darwin build.
|
|
This addresses a number of issues in the X86 Darwin 32b port for libffi.
1. The pic symbol stubs are weak definitions; the correct section placement
for these depends on the linker version in use. We do not have access
to that information, but we can use the target OS version (assumes that
the user has installed the latest version of xcode available).
When a coalesced section is in use (OS versions earlier than Darwin12 /
OSX 10.8), its name must differ from __TEXT,__text since otherwise that
would correspond to altering the attributes of the .text section (which
produces a diagnostic from the assembler).
Here we use __TEXT, __textcoal_nt for this which is what GCC emits for
these stubs.
For later versions than Darwin 12 (OS X 10.8) we can place the stubs in
the .text section (if we do not we get a diagnostic from clang -cc1as
saying that the use of coalesced sections for this is deprecated).
2. The EH frame is specified manually, since there is no support for .cfi_
directives in 'cctools' assemblers. The implementation needs to provide
offsets for CFA advance, code size and to the CIE as signed values
rather than relocations. However the cctools assembler will produce a
relocation for expressions like ' .long Lxx-Lyy' which then leads to a
link-time error. We correct this by forming the offset values using
' .set' directives and then assigning the results of them.
3. The register numbering used by m32 X86 Darwin EH frames is not the same
as the DWARF debug numbering (the Frame and Stack pointer numbers are
swapped).
4. The FDE address encoding used by the system tools is '0x10' (PCrel + abs)
where the value provided was PCrel + sdata4.
5. GCC does not use compact unwind at present, and it was not implemented
until Darwin10 / OSX 10.6. There were some issues with function location
in 10.6 so that the solution here suppresses emitting the compact unwind
section until Darwin11 / OSX 10.7.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libffi/ChangeLog:
* src/x86/sysv.S (COMDAT): Amend section use for Darwin, accounting
cases where coalesced is needed. (eh_frame): Rework to avoid relocs
that cause builf fails on earlier Darwin. Adjust register numbers
to account for X86 m32 Darwin differences between EH and debug.
|
|
The following avoids passing down error_mark_node to fold_convert.
PR middle-end/107994
* gimplify.cc (gimplify_expr): Catch errorneous comparison
operand.
|
|
gcc/ChangeLog:
2022-12-21 Jan Hubicka <hubicka@ucw.cz>
* lto-opts.cc (lto_write_options): Also skip -fwhole-program.
|
|
* lto-cgraph.cc (lto_output_node): When doing WPA in incremental link
pass down resolution info.
|
|
Update documentation of -fwhole-program which was wrongly
claiming that it is useless with LTO whole it is useful for LTO without plugin
and extends -fwhole-program to also work with incremental linking.
This is useful when building kernel where the incremental link is de-facto fina
binary and only some explicitly marked symbols needs to remain.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
2022-12-21 Jan Hubicka <hubicka@ucw.cz>
* doc/invoke.texi: Fix documentation of -fwhole-program with LTO
and document behaviour for incremental linking.
gcc/lto/ChangeLog:
2022-12-21 Jan Hubicka <hubicka@ucw.cz>
* lto-common.cc (lto_resolution_read): With incremental linking
and whole program ignore turn LDPR_PREVAILING_DEF_IRONLY to
LDPR_PREVAILING_DEF_IRONLY_EXP
* lto-lang.cc (lto_post_options): Do not clear flag_whole_program
for incremental link
|
|
[PR108153]
Lto profiledbootstrap was failing for me on {powerpc64le,s390x}-linux with
modula 2 enabled, with:
cc1gm2: internal compiler error: the location value is corrupt
0x11a3d2d m2assert_AssertLocation(unsigned int)
../../gcc/m2/gm2-gcc/m2assert.cc:40
0x11a3d2d m2statement_BuildAssignmentTree
../../gcc/m2/gm2-gcc/m2statement.cc:177
ICE. The problem was that caller (m2assert_AssertLocation used
location_t M2Options_OverrideLocation (location_t);
prototype with the libcpp/line-map.h
typedef unsigned int location_t;
typedef, but the callee defined in Modula 2 was using:
TYPE
location_t = INTEGER ;
and
PROCEDURE OverrideLocation (location: location_t) : location_t ;
Now, on powerpc64le-linux unsigned int is returned and passed zero extended
into 64-bits, while signed int is returned and passed sign-extended into 64-bits
and Modula 2 INTEGER is signed 32-bit type, so when the caller then compared
M2Options_OverrideLocation (location) != location
and powerpc64le-linux performed the comparison as 64-bit compare, there
was a mismatch for location_t of 0x8000007 or others with the MSB set.
Fixed by making Modula 2 location_t a CARDINAL, which is 32-bit unsigned type.
2022-12-21 Jakub Jelinek <jakub@redhat.com>
PR modula2/108153
* gm2-gcc/m2linemap.def (location_t): Use CARDINAL instead of INTEGER.
|
|
contrib/ChangeLog:
* filter-clang-warnings.py: Simplify.
|
|
DECL_OMP_PRIVATIZED_MEMBER vars are artificial vars with DECL_VALUE_EXPR
of this->field used just during gimplification and omp lowering/expansion
to privatize individual fields in methods when needed.
As the following testcase shows, when not in templates, they were handled
right, but in templates we actually called cp_finish_decl on them and
that can result in their destruction, which is obviously undesirable,
we should only destruct the privatized copies of them created in omp
lowering.
Fixed thusly.
2022-12-21 Jakub Jelinek <jakub@redhat.com>
PR c++/108180
* pt.cc (tsubst_expr): Don't call cp_finish_decl on
DECL_OMP_PRIVATIZED_MEMBER vars.
* testsuite/libgomp.c++/pr108180.C: New test.
|
|
contrib/ChangeLog:
* filter-clang-warnings.py: Skip Makefile and libffi warnings.
|
|
The make silent the following 2 warnings:
jit/jit-playback.h:785:16: warning: private field 'm_source_file' is not used [-Wunused-private-field]
jit/jit-playback.h:804:16: warning: private field 'm_line' is not used [-Wunused-private-field]
gcc/jit/ChangeLog:
* jit-playback.h: Use unused attribute.
|
|
In Fedora build libstdc++.so is built with assertions enabled and
FAIL: 20_util/to_chars/float128_c++23.cc execution test
was failing on all arches. The problem is that it called 5 argument version
of to_chars with chars_format{}, which C++ says is invalid:
http://eel.is/c++draft/charconv.to.chars#12
Preconditions: fmt has the value of one of the enumerators of chars_format.
The following patch fixes it by skipping the second part of the test
which needs the 5 argument to_chars for chars_format{}, but because
it is strictly speaking invalid also for 4 argument to_chars, it uses
3 argument to_chars instead of 4 argument to_chars with last argument
chars_format{}.
2022-12-21 Jakub Jelinek <jakub@redhat.com>
* testsuite/20_util/to_chars/float16_c++23.cc (test): Use 3 argument
std::to_chars if fmt is std::chars_format{}, rather than 4 argument.
* testsuite/20_util/to_chars/float128_c++23.cc (test): Likewise, and
skip second part of testing that requires 5 argument std::to_chars.
|
|
On non-Cygwin Windows, use '.' and expect the documented fail when opening
a directory (EACCESS). As gfortran does not set __WIN32__ this check is
done on the C side. (On __CYGWIN__, __WIN32__ is not set - but to make it
clear, !__CYGWIN__ is used in #if.)
On non-Windows, replace the 'call system' shell call by the POSIX functions
stat/mkdir/rmdir for better compatibility, especially on embedded systems;
additionally add some more checks. In particular, confirm that 'close' with
status='delete' indeed deleted the directory.
gcc/testsuite/ChangeLog:
* gfortran.dg/read_dir-aux.c: New; provides my_mkdir, my_rmdir,
my_verify_not_exists and expect_open_to_fail.
* gfortran.dg/read_dir.f90: Call those; expect that opening a
directory fails on Windows.
|
|
Patch from Sören Tempel.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/458396
|