Age | Commit message (Collapse) | Author | Files | Lines |
|
We don't want to call build_offset_ref with an enum.
PR c++/101869
gcc/cp/ChangeLog:
* semantics.c (finish_qualified_id_expr): Don't try to build a
pointer-to-member if the scope is an enumeration.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/enum43.C: New test.
|
|
|
|
|
|
|
|
This adds a check for REG_P on SET_DEST for the new idiom recognizer
for AARCH64_FUSE_ADDSUB_2REG_CONST1. The reported ICE is only
observable with checking=rtl.
Bootstrapped/regtested aarch64-linux, committed.
PR target/108589
gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Check
REG_P on SET_DEST.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr108589.c: New test.
(cherry picked from commit a39c6ec97906766ad65d15d4856fd41121ee7a45)
|
|
AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
Given the chance that this causes instructions to slip into the next
decoding cycle and the additional overheads when handling
cacheline-crossing LDP instructions, we disable the generation of LDP
isntructions through the tuning structure from instruction combining
(such as in peephole2).
Given the code-density benefits in builtins and prologue/epilogue
expansion, we allow LDPs there.
This commit:
* adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
* allows -moverride=tune=... to override this
These changes are benchmark-driven, yielding the following changes
(with a net-overall improvement):
503.bwaves_r. -0.88%
507.cactuBSSN_r 0.35%
508.namd_r 3.09%
510.parest_r -2.99%
511.povray_r 5.54%
519.lbm_r 15.83%
521.wrf_r 0.56%
526.blender_r 2.47%
527.cam4_r 0.70%
538.imagick_r 0.00%
544.nab_r -0.33%
549.fotonik3d_r. -0.42%
554.roms_r 0.00%
-------------------------
= total 1.79%
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Co-Authored-By: Di Zhao <di.zhao@amperecomputing.com>
gcc/ChangeLog:
* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
* config/aarch64/aarch64.c (aarch64_operands_ok_for_ldpstp):
Check for the above tuning option when processing loads.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ampere1-no_ldp_combine.c: New test.
(cherry picked from commit f200c56787f2c6f93ffb739d57d01a294ab72f68)
|
|
The original submission of AmpereOne (-mcpu=ampere1) costs occurred
prior to exhaustive testing of vectorizable workloads against
hardware.
Adjust the vector costs to achieve the best results and more closely
match the underlying hardware.
gcc/ChangeLog:
* config/aarch64/aarch64.c: Update vector costs for ampere1.
Co-Authored-By: Jiangning Liu <jiangning.liu@amperecomputing.com>
Co-Authored-By: Manolis Tsamis <manolis.tsamis@vrull.eu>
(cherry picked from commit ff1f2f2412bda118f7ddc10e69bd4284d9b24b9e)
|
|
This patch adds support for Ampere-1A CPU:
- recognize the name of the core and provide detection for -mcpu=native,
- updated extra_costs,
- adds a new fusion pair for (A+B+1 and A-B-1).
Ampere-1A and Ampere-1 have more timing difference than the extra
costs indicate, but these don't propagate through to the headline
items in our extra costs (e.g. the change in latency for scalar sqrt
doesn't have a corresponding table entry).
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere1a.
* config/aarch64/aarch64-cost-tables.h: Add ampere1a_extra_costs.
* config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR):
Define a new fusion pair for A+B+1/A-B-1 (i.e., add/subtract two
registers and then +1/-1).
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Implement
idiom-matcher for the new fusion pair.
* doc/invoke.texi: Add ampere1a.
(cherry picked from commit 590a06afbf0e96813b5879742f38f3665512c854)
|
|
This brings the extensions detected by -mcpu=native on Ampere-1 systems
in sync with the defaults generated for -mcpu=ampere1.
Note that some early kernel versions on Ampere1 may misreport the
presence of PAUTH and PREDRES (i.e., -mcpu=native will add 'nopauth'
and 'nopredres').
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Update
Ampere-1 core entry.
(cherry picked from commit db2f5d661239737157cf131de7d4df1c17d8d88d)
|
|
Fixes: 341573406b39
Don't subtract one from the result of strnlen() when trying to point
to the first character after the current string. This issue would
cause individual characters (where the 128 byte buffers are stitched
together) to be lost.
gcc/ChangeLog:
* config/aarch64/driver-aarch64.c (readline): Fix off-by-one.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpunative/info_18: New test.
* gcc.target/aarch64/cpunative/native_cpu_18.c: New test.
(cherry picked from commit b1cfbccc41de6aec950c0f662e7e85ab34bfff8a)
|
|
This adds support and a basic tuning model for the Ampere Computing
"Ampere-1" CPU.
The Ampere-1 implements the ARMv8.6 architecture in A64 mode and is
modelled as a 4-wide issue (as with all modern micro-architectures,
the chosen issue rate is a compromise between the maximum dispatch
rate and the maximum rate of uops issued to the scheduler).
This adds the -mcpu=ampere1 command-line option and the relevant cost
information/tuning tables for the Ampere-1.
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): New Ampere-1
core.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64-cost-tables.h: Add extra costs for
Ampere-1.
* config/aarch64/aarch64.c: Add tuning structures for Ampere-1.
* doc/invoke.texi: Add documentation for Ampere-1 core.
(cherry picked from commit 67b0d47e20e655c0dd53a76ea88aab60fafb2059)
|
|
|
|
The failures on the original failed case builtin-bitops-1.c
and the associated test case pr108699.c here show that the
current support of parity vector mode is wrong on Power.
The hardware insns vprtyb[wdq] which operate on the least
significant bit of each byte per element, they doesn't match
what RTL opcode parity needs, but the current implementation
expands it with them wrongly.
This patch is to fix the handling with one more insn vpopcntb.
PR target/108699
gcc/ChangeLog:
* config/rs6000/altivec.md (*p9v_parity<mode>2): Rename to ...
(rs6000_vprtyb<mode>2): ... this.
* config/rs6000/rs6000-builtin.def (VPRTYBD): Replace parityv2di2 with
rs6000_vprtybv2di2.
(VPRTYBW): Replace parityv4si2 with rs6000_vprtybv4si2.
(VPRTYBQ): Replace parityv1ti2 with rs6000_vprtybv1ti2.
* config/rs6000/vector.md (parity<mode>2 with VEC_IP): Expand with
popcountv16qi2 and the corresponding rs6000_vprtyb<mode>2.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/p9-vparity.c: Add scan-assembler-not for vpopcntb
to distinguish parity byte from parity.
* gcc.target/powerpc/pr108699.c: New test.
(cherry picked from commit cdd2d6643f7fef40e335a7027edfea7276cde608)
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/109511
* simplify.c (gfc_simplify_set_exponent): Fix implementation of
compile-time simplification of intrinsic SET_EXPONENT for argument
X < 1 and for I < 0.
gcc/testsuite/ChangeLog:
PR fortran/109511
* gfortran.dg/set_exponent_1.f90: New test.
(cherry picked from commit fa4cb42870df60deb8888dbd51e2ddc6d6ab9e6a)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/109186
* simplify.c (gfc_simplify_nearest): Fix off-by-one error in setting
up real kind-specific maximum exponent for mpfr.
gcc/testsuite/ChangeLog:
PR fortran/109186
* gfortran.dg/nearest_6.f90: New test.
(cherry picked from commit 4410a08b80cc40342eeaa5b6af824cd4352b218c)
|
|
gcc/fortran/ChangeLog:
PR fortran/85877
* resolve.c (resolve_fl_procedure): Check for an explicit interface
of procedures with the BIND(C) attribute (F2018:15.4.2.2).
gcc/testsuite/ChangeLog:
PR fortran/85877
* gfortran.dg/pr85877.f90: New test.
(cherry picked from commit 5426ab34643d9e6502f3ee572891a03471fa33ed)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In the bounds check for copying of class expressions, the number of elements
determined from a descriptor, returned as type gfc_array_index_type (i.e. a
signed type), should be converted to the type of the passed element count,
which is of type size_type_node (i.e. unsigned), for use in comparisons.
gcc/fortran/ChangeLog:
PR fortran/106945
* trans-expr.c (gfc_copy_class_to_class): Convert element counts in
bounds check to common type for comparison.
gcc/testsuite/ChangeLog:
PR fortran/106945
* gfortran.dg/pr106945.f90: New test.
(cherry picked from commit 2cf5f485e0351bb1faf46196a99e524688f3966e)
|
|
gcc/fortran/ChangeLog:
PR fortran/104332
* resolve.c (resolve_symbol): Avoid NULL pointer dereference while
checking a symbol with the BIND(C) attribute.
gcc/testsuite/ChangeLog:
PR fortran/104332
* gfortran.dg/bind_c_usage_34.f90: New test.
(cherry picked from commit e20e5d9dc11b64e8eabce6803c91cb5768207083)
|
|
|