Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
For function arguments/return, when it's BLK mode, it's put in a
parallel with an expr_list, and the expr_list contains the real mode
and registers.
Current ix86_check_avx_upper_register only checked for SSE_REG_P, and
failed to handle that. The patch extend the handle to each subrtx.
gcc/ChangeLog:
PR target/116512
* config/i386/i386.cc (ix86_check_avx_upper_register): Iterate
subrtx to scan for avx upper register.
(ix86_check_avx_upper_stores): Inline old
ix86_check_avx_upper_register.
(ix86_avx_u128_mode_needed): Ditto, and replace
FOR_EACH_SUBRTX with call to new
ix86_check_avx_upper_register.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr116512.c: New test.
(cherry picked from commit ab214ef734bfc3dcffcf79ff9e1dd651c2b40566)
|
|
|
|
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/116530
* trans-io.cc (transfer_namelist_element): Prevent NULL pointer
dereference.
gcc/testsuite/ChangeLog:
PR fortran/116530
* gfortran.dg/use_rename_12.f90: New test.
(cherry picked from commit 6bfeba12c86b4d0dae27d99b484f64774dd49398)
|
|
|
|
|
|
* ka.po: New file.
|
|
After r14-811 "call *nop@GOTPCREL(%rip)" is only generated with
-mno-direct-extern-access even if --enable-default-pie. So the r13-1614
change to this file is not valid anymore.
gcc/testsuite/ChangeLog:
PR testsuite/70150
* gcc.target/i386/fentryname3.c (dg-final): Revert r13-1614
change.
(cherry picked from commit 8035619b7313d9503852e1c7c8c06cfddca4d648)
|
|
For a --enable-default-pie build, using -fno-pic (for compiler) but
not -no-pie (for linker) triggers some linker warnings counted as
excess errors:
/usr/bin/ld: /tmp/cc8MgxiR.o: warning: relocation in read-only
section `.text.startup'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
gcc/testsuite/ChangeLog:
PR testsuite/70150
* gcc.target/i386/pr113689-1.c (dg-options): Add -no-pie.
(cherry picked from commit 331f7d8a393af99afccdb2729d4ab45797fd7a86)
|
|
|
|
* zh_CN.po: Update.
|
|
mips16.S was missing since
commit 29b74545531f6afbee9fc38c267524326dbfbedf
Date: Thu Jun 1 10:14:24 2023 +0800
MIPS: Add speculation_barrier support
Without mips16.S included, some symbols will miss for mips16, and
so some software will fail to build.
libgcc/ChangeLog:
* config/mips/lib1funcs.S: Includes mips16.S.
(cherry picked from commit 9522fc8bb7812f2ad50eb038e0938bfd958e730f)
|
|
|
|
|
|
|
|
|
|
When none of mprefer-vector-width, avx256_optimal/avx128_optimal,
avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will
set ix86_{move_max,store_max} as max available vector length except
for AVX part.
if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)
&& TARGET_EVEX512_P (opts->x_ix86_isa_flags2))
opts->x_ix86_move_max = PVW_AVX512;
else
opts->x_ix86_move_max = PVW_AVX128;
So for -mavx2, vectorizer will choose 256-bit for vectorization, but
128-bit is used for struct copy, there could be a potential STLF issue
due to this "misalign".
The patch fixes that.
gcc/ChangeLog:
* config/i386/i386-options.cc (ix86_option_override_internal):
set ix86_{move_max,store_max} to PVW_AVX256 when TARGET_AVX
instead of PVW_AVX128.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pieces-memcpy-10.c: Add -mprefer-vector-width=128.
* gcc.target/i386/pieces-memcpy-6.c: Ditto.
* gcc.target/i386/pieces-memset-38.c: Ditto.
* gcc.target/i386/pieces-memset-40.c: Ditto.
* gcc.target/i386/pieces-memset-41.c: Ditto.
* gcc.target/i386/pieces-memset-42.c: Ditto.
* gcc.target/i386/pieces-memset-43.c: Ditto.
* gcc.target/i386/pieces-strcpy-2.c: Ditto.
* gcc.target/i386/pieces-memcpy-22.c: New test.
* gcc.target/i386/pieces-memset-51.c: New test.
* gcc.target/i386/pieces-strcpy-3.c: New test.
(cherry picked from commit 6ea25c041964bf63014fcf7bb68fb1f5a0a4e123)
|
|
|
|
|
|
The test was too optimistic, alas. We used to vectorize shifts by
clamping the shift counts below the bit width of the types (e.g. at 15
for 16-bit vector elements), but (uint16_t)32768 >> (uint16_t)16 is
well defined (because of promotion to 32-bit int) and must yield 0,
not 1 (as before the fix).
Unfortunately, in the gimple model of vector units, such large shift
counts wouldn't be well-defined, so we won't vectorize such shifts any
more, unless we can tell they're in range or undefined.
So the test that expected the vectorization we no longer performed
needs to be adjusted. Instead of nobbling the test, Richard Earnshaw
suggested annotating the test with the expected ranges so as to enable
the optimization, and Christophe Lyon suggested a further
simplification.
Co-Authored-By: Richard Earnshaw <Richard.Earnshaw@arm.com>
for gcc/testsuite/ChangeLog
PR tree-optimization/113281
* gcc.target/arm/simd/mve-vshr.c: Add expected ranges.
(cherry picked from commit 54d2339c9f87f702e02e571a5460e11c19e1c02f)
|
|
|
|
Here we ICE since r14-8291 in C++11/C++14 modes. Fortunately
this is an easy one.
The important bit of r14-8291 is this:
@@ -20056,9 +20071,12 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
RETURN (retval);
}
if (IMPLICIT_CONV_EXPR_NONTYPE_ARG (t))
- /* We'll pass this to convert_nontype_argument again, we don't need
- to actually perform any conversion here. */
- RETURN (expr);
+ {
+ tree r = convert_nontype_argument (type, expr, complain);
+ if (r == NULL_TREE)
+ r = error_mark_node;
+ RETURN (r);
+ }
which obviously means that instead of returning right away we go
to convert_nontype_argument. When type is error_mark_node and we're
in C++17, in convert_nontype_argument we go down this path:
else if (INTEGRAL_OR_ENUMERATION_TYPE_P (type)
|| cxx_dialect >= cxx17)
{
expr = build_converted_constant_expr (type, expr, complain);
if (expr == error_mark_node)
return (complain & tf_error) ? NULL_TREE : error_mark_node;
// ...
}
but pre-C++17, we take a different route and end up crashing on
gcc_unreachable.
It would of course also work to check for error_mark_node early in
build_converted_constant_expr.
PR c++/116384
gcc/cp/ChangeLog:
* pt.cc (tsubst_expr) <case IMPLICIT_CONV_EXPR>: Bail if tsubst
returns error_mark_node.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/vt-116384.C: New test.
(cherry picked from commit 8191f15022b0ea44fcb549449b0458d07ae02e0a)
|
|
This fixes reported regression at
https://linaro.atlassian.net/browse/GNU-1315.
gcc/testsuite/ChangeLog:
* g++.dg/warn/pr33738-2.C: dg-prune arm linker messages about
size of enums.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit 6d8b9b772e0b3969e6b3fcf0363d6afcce2e65c9)
|
|
|
|
PR target/116407
gcc/
* config/avr/avr.md (*dec-and-branchhi!=-1.l.clobber):
Increase the additional jump offset to 2 words.
(cherry picked from commit 22acd3c4d18dbd4d4d39d7770145fe3ec36073f6)
|
|
Some text peepholes output extra instructions prior to a branch
instruction and that increase the jump offset of backward branches.
PR target/116407
gcc/
* config/avr/avr-protos.h (avr_jump_mode): Add an int argument.
* config/avr/avr.cc (avr_jump_mode): Add an int argument to increase
the computed jump offset of backwards branches.
* config/avr/avr.md (*dec-and-branchhi!=-1, *dec-and-branchsi!=-1):
Increase the jump offset used by avr_jump_mode() as needed.
gcc/testsuite/
* gcc.target/avr/torture/pr116407-2.c: New test.
* gcc.target/avr/torture/pr116407-4.c: New test.
(cherry picked from commit dfb2e8caa85d1059a0ab8ed4f19568c04c9f13a4)
|
|
|
|
PR target/116390
gcc/
* config/avr/avr.cc (avr_out_movsi_mr_r_reg_disp_tiny): Fix
output templates for the reg_base == reg_src and
reg_src == reg_base - 2 cases.
gcc/testsuite/
* gcc.target/avr/torture/pr116390.c: New test.
(cherry picked from commit 4065d163151b07b274241377e71dad028576db88)
|
|
|
|
gcc/
PR target/85624
* config/avr/avr.md (*clrmemqi*): Use HImode for alignment operand.
gcc/testsuite/
* gcc.target/avr/torture/pr85624.c: New test.
|
|
For some targets, like Cortex-M on arm-none-eabi, the -fshort-enums is
enabled by default. For these targets, the test case fails as
sizeof(Alpha) < sizeof(int).
To make the test case behave identical for targets that does enable
-fshort-enums and those that does not, add -fno-short-enums in the test
case and verify that the warning is not emitted. Then also create a copy
and run the test with -fshort-enums and verify that the warning is
emitted.
Regtested on x86_64-pc-linux-gnu and arm-none-eabi.
gcc/testsuite/ChangeLog:
* g++.dg/warn/pr33738.C: Added -fno-short-enums.
* g++.dg/warn/pr33738-2.C: Duplicate g++.dg/warn/pr33738.C with
-fshort-enums and removed xfail.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit 479dab62b828f93d6be48241178dbf654bdd33e7)
|
|
The test case assumes that sizeof(tree_code) >= 2. On some targets, like
Cortex-M on arm-none-eabi, -fshort-enums is enabled by default and in
that case, sizeof(tree_code) will be 1 and the following warning is
emitted:
.../pr97315-1.C:8:13: warning: width of 'tree_base::code' exceeds its type
Avoid the warning by forcing -fno-short-enums.
gcc/testsuite/ChangeLog:
* g++.dg/opt/pr97315-1.C: Add -fno-short-enums.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit 10bf0357750972e20dc702997f2930eab1c1be17)
|
|
On Cortex-M55 with MVE, the test case fails due to -INT_MAX being
undefined. Adding -fwrapv solves the issues.
Regtested on x86_64-pc-linux-gnu and arm-none-eabi for
Cortex-M0/M3/M4/M7/M33/M55/M85/A7.
gcc/testsuite/ChangeLog:
* gcc.dg/signbit-5.c: Add -fwrapv and remove x86 exception.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
Co-authored-by: Yvan ROUX <yvan.roux@foss.st.com>
(cherry picked from commit 5a3387938d4d95717cac29eecd0ba53e0ef9094d)
|
|
Although these vex insn have evex counterpart, but when it
uses the displayed vex prefix should not support APX EGPR.
Like TARGET_AVXVNNI, TARGET_IFMA and TARGET_AVXNECONVERT.
TARGET_AVXVNNIINT8 and TARGET_AVXVNNITINT16 also are vex
insn should not support egpr.
gcc/ChangeLog:
* config/i386/sse.md (vpmadd52<vpmadd52type><mode>):
Prohibit egpr for vex version.
(vpdpbusd_<mode>): Ditto.
(vpdpbusds_<mode>): Ditto.
(vpdpwssd_<mode>): Ditto.
(vpdpwssds_<mode>): Ditto.
(*vcvtneps2bf16_v4sf): Ditto.
(*vcvtneps2bf16_v8sf): Ditto.
(vpdp<vpdotprodtype>_<mode>): Ditto.
(vbcstnebf162ps_<mode>): Ditto.
(vbcstnesh2ps_<mode>): Ditto.
(vcvtnee<bf16_ph>2ps_<mode>): Ditto.
(vcvtneo<bf16_ph>2ps_<mode>): Ditto.
(vpdp<vpdpwprodtype>_<mode>): Ditto.
|
|
This patch includes the testcase from r15-1399 plus a miminal
fix for it, without the other proactive uses of force_subreg.
We can backport other force_subreg calls later if they're shown
to be needed.
gcc/
PR target/115464
* config/aarch64/aarch64-sve-builtins-base.cc
(svset_neonq_impl::expand): Use force_subreg instead of
lowpart_subreg.
gcc/testsuite/
PR target/115464
* gcc.target/aarch64/sve/acle/general/pr115464_2.c: New test.
|
|
The testcase extracts one arm_neon.h vector from a pair (one subreg)
and then reinterprets the result as an SVE vector (another subreg).
Each subreg makes sense individually, but we can't fold them together
into a single subreg: it's 32 bytes -> 16 bytes -> 16*N bytes,
but the interpretation of 32 bytes -> 16*N bytes depends on
whether N==1 or N>1.
Since the second subreg makes sense individually, simplify_subreg
should bail out rather than ICE on it. simplify_gen_subreg will
then do the same (because it already checks validate_subreg).
This leaves simplify_gen_subreg returning null, requiring the
caller to take appropriate action.
I think this is relatively likely to occur elsewhere, so the patch
adds a helper for forcing a subreg, allowing a temporary pseudo to
be created where necessary.
I'll follow up by using force_subreg in more places. This patch
is intended to be a minimal backportable fix for the PR.
gcc/
PR target/115464
* simplify-rtx.cc (simplify_context::simplify_subreg): Don't try
to fold two subregs together if their relationship isn't known
at compile time.
* explow.h (force_subreg): Declare.
* explow.cc (force_subreg): New function.
* config/aarch64/aarch64-sve-builtins-base.cc
(svset_neonq_impl::expand): Use it instead of simplify_gen_subreg.
gcc/testsuite/
PR target/115464
* gcc.target/aarch64/sve/acle/general/pr115464.c: New test.
(cherry picked from commit 0970ff46ba6330fc80e8736fc05b2eaeeae0b6a0)
|
|
pass_endbr_and_patchable_area.
gcc/ChangeLog:
PR target/116174
* config/i386/i386.cc (ix86_align_loops): Move this to ..
* config/i386/i386-features.cc (ix86_align_loops): .. here.
(class pass_align_tight_loops): New class.
(make_pass_align_tight_loops): New function.
* config/i386/i386-passes.def: Insert pass_align_tight_loops
after pass_insert_endbr_and_patchable_area.
* config/i386/i386-protos.h (make_pass_align_tight_loops): New
declare.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr116174.c: New test.
(cherry picked from commit c3c83d22d212a35cb1bfb8727477819463f0dcd8)
|
|
|
|
In r15-2210 we got rid of the unnecessary cast to lvalue reference when
passing *this to the promise type ctor, and as a drive-by change we also
simplified the code to use cp_build_fold_indirect_ref.
But it turns out cp_build_fold_indirect_ref does too much here, namely
it has a shortcut for returning current_class_ref if the operand is
current_class_ptr. The problem with that shortcut is current_class_ref
might have gotten clobbered earlier if it appeared in the function body,
since rewrite_param_uses walks and rewrites in-place all local variable
uses to their corresponding frame copy.
So later cp_build_fold_indirect_ref for *this will instead return the
clobbered current_class_ref i.e. *frame_ptr->this, which doesn't make
sense here since we're in the ramp function and not the actor function
where frame_ptr is in scope.
This patch fixes this by using the build_fold_indirect_ref instead of
cp_build_fold_indirect_ref.
PR c++/116327
PR c++/104981
PR c++/115550
gcc/cp/ChangeLog:
* coroutines.cc (morph_fn_to_coro): Use build_fold_indirect_ref
instead of cp_build_fold_indirect_ref.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr104981-preview-this.C: Improve coverage by
adding a non-static data member use within the coroutine member
function.
* g++.dg/coroutines/pr116327-preview-this.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 303bed670af962c01b77a4f0c51de97f70e8167e)
|
|
These tests check the sched2 dump, so skip them for optimization levels
that do not enable sched2.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/mcpu-6.c: Skip for -O0, -O1, -Og.
* gcc.target/riscv/mcpu-7.c: Likewise.
(cherry picked from commit 77f3b3419d476e90a2b82dff2204466aba3b9c2c)
|
|
During investigate the support of early break autovec, we notice
the test full-vec-move1.c will be optimized to 'return 0;' in main
function body. Because somehow the value of V type is compiler
time constant, and then the second loop will be considered as
assert (true).
Thus, the ccp4 pass will eliminate these stmt and just return 0.
typedef int16_t V __attribute__((vector_size (128)));
int main ()
{
V v;
for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++)
(v)[i] = i;
V res = v;
for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++)
assert (res[i] == i); // will be optimized to assert (true)
}
This patch would like to introduce a extern function to use the res[i]
that get rid of the ccp4 optimization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c:
Introduce extern func use to get rid of ccp4 optimization.
Signed-off-by: Pan Li <pan2.li@intel.com>
(cherry picked from commit b1520d2260c5e0cfcd7a4354fab70f66e2912ff2)
|
|
|
|
Starting with r14-9449-g9f2b16ce1efef0 builtins were streamlined with
those in LLVM. In particular s390_vgfm{,a}g have been changed from
UV16QI to UINT128 in order to match those in LLVM. However, these
low-level builtins are directly used by the high-level builtins
vec_gfmsum{,_accum}_128 which expect UV16QI instead. Therefore,
introduce new low-level builtins s390_vgfm{,a}g_128 and make use of
them, respectively.
gcc/ChangeLog:
* config/s390/s390-builtin-types.def (BT_FN_UV16QI_UV2DI_UV2DI):
New.
(BT_FN_UV16QI_UV2DI_UV2DI_UV16QI): New.
* config/s390/s390-builtins.def (s390_vgfmg_128): New.
(s390_vgfmag_128): New.
* config/s390/vecintrin.h (vec_gfmsum_128): Use s390_vgfmg_128.
(vec_gfmsum_accum_128): Use s390_vgfmag_128.
(cherry picked from commit e8a7142a697c5d2673adea33ba23af82a89c9559)
|
|
|
|
This patch adds the power11 option to the -mcpu= and -mtune= switches.
This patch treats the power11 like a power10 in terms of costs and reassociation
width.
This patch issues a ".machine power11" to the assembly file if you use
-mcpu=power11.
This patch defines _ARCH_PWR11 if the user uses -mcpu=power11.
This patch allows GCC to be configured with the --with-cpu=power11 and
--with-tune=power11 options.
This patch passes -mpwr11 to the assembler if the user uses -mcpu=power11.
This patch adds support for using "power11" in the __builtin_cpu_is built-in
function.
Backported from master: 2024-07-22
2024-08-13 Michael Meissner <meissner@linux.ibm.com>
gcc/
* config.gcc (powerpc*-*-*): Add support for power11.
* config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=power11.
* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/driver-rs6000.cc (asm_names): Likewise.
* config/rs6000/ppc-auxv.h (PPC_PLATFORM_POWER11): New define.
* config/rs6000/rs6000-builtin.cc (cpu_is_info): Add power11.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
_ARCH_PWR11 if -mcpu=power11.
* config/rs6000/rs6000-cpus.def (POWER11_MASKS_SERVER): New define.
(POWERPC_MASKS): Add power11.
(power11 cpu): Add power11 definition.
* config/rs6000/rs6000-opts.h (PROCESSOR_POWER11): Add power11 processor.
* config/rs6000/rs6000-string.cc (expand_compare_loop): Likewise.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Add power11
support.
(rs6000_machine_from_flags): Likewise.
(rs6000_reassociation_width): Likewise.
(rs6000_adjust_cost): Likewise.
(rs6000_issue_rate): Likewise.
(rs6000_sched_reorder): Likewise.
(rs6000_sched_reorder2): Likewise.
(rs6000_register_move_cost): Likewise.
(rs6000_opt_masks): Likewise.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/rs6000.md (cpu attribute): Add power11.
* config/rs6000/rs6000.opt (-mpower11): Add internal power11 flag.
* doc/invoke.texi (RS/6000 and PowerPC Options): Document -mcpu=power11.
* config/rs6000/power10.md (all reservations): Add power11 support.
gcc/testsuite/
* gcc.target/powerpc/power11-1.c: New test.
* gcc.target/powerpc/power11-2.c: Likewise.
* gcc.target/powerpc/power11-3.c: Likewise.
|
|
|
|
|
|
|