Age | Commit message (Collapse) | Author | Files | Lines |
|
The previous patch enables 16-byte by pieces move. Originally 16-byte
move is implemented via pattern. expand_block_move does an optimization
on P8 LE to leverage V2DI reversed load/store for memory to memory move.
Now 16-byte move is implemented via by pieces move and finally split to
two DI load/store. This patch creates an insn_and_split pattern to
retake the optimization.
gcc/
PR target/111449
* config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New.
gcc/testsuite/
PR target/111449
* gcc.target/powerpc/pr111449-2.c: New.
|
|
This patch adds a new expand pattern - cbranchv16qi4 to enable vector
mode by pieces equality compare on rs6000. The macro MOVE_MAX_PIECES
(COMPARE_MAX_PIECES) is set to 16 bytes when EFFICIENT_UNALIGNED_VSX
is enabled, otherwise keeps unchanged. The macro STORE_MAX_PIECES is
set to the same value as MOVE_MAX_PIECES by default, so now it's
explicitly defined and keeps unchanged.
gcc/
PR target/111449
* config/rs6000/altivec.md (cbranchv16qi4): New expand pattern.
* config/rs6000/rs6000.cc (rs6000_generate_compare): Generate
insn sequence for V16QImode equality compare.
* config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
(STORE_MAX_PIECES): Define.
gcc/testsuite/
PR target/111449
* gcc.target/powerpc/pr111449-1.c: New.
* gcc.dg/tree-ssa/sra-17.c: Add additional options for 32-bit powerpc.
* gcc.dg/tree-ssa/sra-18.c: Likewise.
|
|
The LoongArch has defined ctz and clz on the backend, but if we want GCC
do CTZ transformation optimization in forwprop2 pass, GCC need to know
the value of c[lt]z at zero, which may be beneficial for some test cases
(like spec2017 deepsjeng_r).
After implementing the macro, we test dynamic instruction count on
deepsjeng_r:
- before 1688423249186
- after 1660311215745 (1.66% reduction)
gcc/ChangeLog:
* config/loongarch/loongarch.h (CLZ_DEFINED_VALUE_AT_ZERO):
Implement.
(CTZ_DEFINED_VALUE_AT_ZERO): Same.
gcc/testsuite/ChangeLog:
* gcc.dg/pr90838.c: add clz/ctz test support on LoongArch.
|
|
We have a support case that shows GCC 7 sometimes creates
DW_TAG_label refering to itself via a DW_AT_abstract_origin
when using LTO. This for example triggers the sanity check
added below during LTO bootstrap.
Making this check cover more than just DW_AT_abstract_origin
breaks bootstrap on trunk for
/* GNU extension: Record what type our vtable lives in. */
if (TYPE_VFIELD (type))
{
tree vtype = DECL_FCONTEXT (TYPE_VFIELD (type));
gen_type_die (vtype, context_die);
add_AT_die_ref (type_die, DW_AT_containing_type,
lookup_type_die (vtype));
so the check is for now restricted to DW_AT_abstract_origin
and DW_AT_specification both of which we follow within get_AT.
* dwarf2out.cc (add_AT_die_ref): Assert we do not add
a self-ref DW_AT_abstract_origin or DW_AT_specification.
|
|
Based on SPEC2017 performance evaluation results, it's better to make them equal
to the cost of unaligned store/load so as to avoid odd alignment peeling.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Adjust.
|
|
These tests fail when they are first added,this patch adjusts the scan-assembler-times
to fix them.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: Adjust assembler times.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx-vcond-1.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: Ditto.
|
|
|
|
constant folding [PR112483]
On targets with native copysign instructions, (copysign x, -1) is
usually more efficient than (fneg (fabs x)). Since r14-5284, in the
middle end we always optimize (fneg (fabs x)) to (copysign x, -1), not
vice versa. If the target does not support native fcopysign,
expand_COPYSIGN will expand it as (fneg (fabs x)) anyway.
gcc/ChangeLog:
PR rtl-optimization/112483
* simplify-rtx.cc (simplify_binary_operation_1) <case COPYSIGN>:
Call simplify_unary_operation for NEG instead of
simplify_gen_unary.
|
|
gcc/testsuite/
* gnat.dg/varsize4.adb (Func): Initialize Byte_Read parameter.
|
|
Fix __riscv_unaligned_fast/slow/avoid macro name to
__riscv_misaligned_fast/slow/avoid to be consistent with the RISC-V API Spec
PR target/111557
gcc/ChangeLog:
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): update macro name
gcc/testsuite/ChangeLog:
* gcc.target/riscv/attribute-1.c: update macro name
* gcc.target/riscv/attribute-4.c: ditto
* gcc.target/riscv/attribute-5.c: ditto
* gcc.target/riscv/predef-align-1.c: ditto
* gcc.target/riscv/predef-align-2.c: ditto
* gcc.target/riscv/predef-align-3.c: ditto
* gcc.target/riscv/predef-align-4.c: ditto
* gcc.target/riscv/predef-align-5.c: ditto
* gcc.target/riscv/predef-align-6.c: ditto
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
|
|
Sometimes the compiler emits the following code with <insn>qi_ext<mode>_0:
shrl $8, %eax
addb %bh, %al
Patch introduces new low part QImode insn patterns with both of
their input arguments extracted from high register. This invalid
insn is split after reload to a move from the high register
and <insn>qi_ext<mode>_0 instruction. The combine pass is able to
convert shift to zero/sign-extract sub-RTX, which we split to the
optimal:
movzbl %bh, %edx
addb %ah, %dl
PR target/78904
gcc/ChangeLog:
* config/i386/i386.md (*addqi_ext2<mode>_0):
New define_insn_and_split pattern.
(*subqi_ext2<mode>_0): Ditto.
(*<code>qi_ext2<mode>_0): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr78904-10.c: New test.
* gcc.target/i386/pr78904-10a.c: New test.
* gcc.target/i386/pr78904-10b.c: New test.
|
|
In analyzing PR rtl-optimization/112415, I realized that restricting
REG+D offsets to 5-bits before reload results in very poor code and
complexities in optimizing these instructions after reload. The
general problem is long displacements are not allowed for floating
point accesses when generating PA 1.1 code. Even with PA 2.0, there
is a ELF linker bug that prevents using long displacements for
floating point loads and stores.
In the past, enabling long displacements before reload caused issues
in reload. However, there have been fixes in the handling of reloads
for floating-point accesses. This change allows long displacements
before reload and corrects a couple of issues in the constraint
handling for integer and floating-point accesses.
2023-11-16 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
PR rtl-optimization/112415
* config/pa/pa.cc (pa_legitimate_address_p): Allow 14-bit
displacements before reload. Simplify logic flow. Revise
comments.
* config/pa/pa.h (TARGET_ELF64): New define.
(INT14_OK_STRICT): Update define and comment.
* config/pa/pa64-linux.h (TARGET_ELF64): Define.
* config/pa/predicates.md (base14_operand): Don't check
alignment of short displacements.
(integer_store_memory_operand): Don't return true when
reload_in_progress is true. Remove INT_5_BITS check.
(floating_point_store_memory_operand): Don't return true when
reload_in_progress is true. Use INT14_OK_STRICT to check
whether long displacements are always okay.
|
|
This is a tree sharing issue for the internal return type synthesized for
a function returning a dynamically-sized type and taking an Out or In/Out
parameter passed by copy.
gcc/ada/
* gcc-interface/decl.cc (gnat_to_gnu_subprog_type): Also create a
TYPE_DECL for the return type built for the CI/CO mechanism.
gcc/testsuite/
* gnat.dg/varsize4.ads, gnat.dg/varsize4.adb: New test.
* gnat.dg/varsize4_pkg.ads: New helper.
|
|
check_field_decls for DECL_C_BIT_FIELD FIELD_DECLs with error_mark_node
TREE_TYPE continues early and doesn't call check_bitfield_decl which would
either set DECL_BIT_FIELD, or clear DECL_C_BIT_FIELD. So, the following
testcase ICEs after emitting tons of errors, because
SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD asserts DECL_BIT_FIELD.
The patch skips that for FIELD_DECLs with error_mark_node, another
option would be to check DECL_BIT_FIELD in addition to DECL_C_BIT_FIELD.
2023-11-16 Jakub Jelinek <jakub@redhat.com>
PR c++/112365
* class.cc (layout_class_type): Don't
SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD on FIELD_DECLs with
error_mark_node type.
* g++.dg/cpp0x/pr112365.C: New test.
|
|
Also fix some indentitation inconsistencies.
PR target/112567
gcc/ChangeLog:
* config/i386/i386.md (*<any_logic:code>qi_ext<mode>_1_slp):
Fix generation of invalid RTX in split pattern.
|
|
Both of these PRs are fixed by r12-1403-gc4e50e500da7692a.
PR c++/98614
PR c++/104802
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/nontype-auto22.C: New test.
* g++.dg/cpp2a/concepts-partial-spec14.C: New test.
|
|
potential_constant_expression for CALL_EXPR tests FUNCTION_POINTER_TYPE_P
on the callee rather than on the type of the callee, which means we
always pass want_rval=any when recursing and so may fail to identify a
non-constant function pointer callee as such. Fixing this turns out to
further work around PR111703.
PR c++/111703
PR c++/107939
gcc/cp/ChangeLog:
* constexpr.cc (potential_constant_expression_1) <case CALL_EXPR>:
Fix FUNCTION_POINTER_TYPE_P test.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-fn8.C: Extend test.
* g++.dg/diagnostic/constexpr4.C: New test.
|
|
No functional change intended.
gcc/ChangeLog:
* diagnostic.cc (diagnostic_context::set_option_hooks): Add
"lang_mask" param.
* diagnostic.h (diagnostic_context::option_enabled_p): Update for
move of m_lang_mask.
(diagnostic_context::set_option_hooks): Add "lang_mask" param.
(diagnostic_context::get_lang_mask): New.
(diagnostic_context::m_lang_mask): Move into m_option_callbacks,
thus making private.
* lto-wrapper.cc (main): Update for new lang_mask param of
set_option_hooks.
* toplev.cc (init_asm_output): Use get_lang_mask.
(general_init): Move initialization of global_dc's lang_mask to
new lang_mask param of set_option_hooks.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
Before my refactoring if the loop->latch was incorrect then find_loop_location
skipped checking the edges and would eventually return a dummy location.
It turns out that a loop can have
loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS) but also not have a latch
in which case get_loop_exit_edges traps.
This restores the old behavior.
gcc/ChangeLog:
PR tree-optimization/111878
* tree-vect-loop-manip.cc (find_loop_location): Skip edges check if
latch incorrect.
gcc/testsuite/ChangeLog:
PR tree-optimization/111878
* gcc.dg/graphite/pr111878.c: New test.
|
|
gcc/testsuite/
* gcc.c-torture/execute/931004-13.c (main): Fix mistakenly swapped
int/void types.
|
|
The target attribute which proposed in [1], target attribute allow user
to specify a local setting per-function basis.
The syntax of target attribute is `__attribute__((target("<ATTR-STRING>")))`.
and the syntax of `<ATTR-STRING>` describes below:
```
ATTR-STRING := ATTR-STRING ';' ATTR
| ATTR
ATTR := ARCH-ATTR
| CPU-ATTR
| TUNE-ATTR
ARCH-ATTR := 'arch=' EXTENSIONS-OR-FULLARCH
EXTENSIONS-OR-FULLARCH := <EXTENSIONS>
| <FULLARCHSTR>
EXTENSIONS := <EXTENSION> ',' <EXTENSIONS>
| <EXTENSION>
FULLARCHSTR := <full-arch-string>
EXTENSION := <OP> <EXTENSION-NAME> <VERSION>
OP := '+'
VERSION := [0-9]+ 'p' [0-9]+
| [1-9][0-9]*
|
EXTENSION-NAME := Naming rule is defined in RISC-V ISA manual
CPU-ATTR := 'cpu=' <valid-cpu-name>
TUNE-ATTR := 'tune=' <valid-tune-name>
```
Changes since v1:
- Use std::unique_ptr rather than alloca to prevent memory issue.
- Error rather than warning when attribute duplicated.
[1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/35
gcc/ChangeLog:
* config.gcc (riscv): Add riscv-target-attr.o.
* config/riscv/riscv-protos.h (riscv_declare_function_size) New.
(riscv_option_valid_attribute_p): New.
(riscv_override_options_internal): New.
(struct riscv_tune_info): New.
(riscv_parse_tune): New.
* config/riscv/riscv-target-attr.cc
(class riscv_target_attr_parser): New.
(struct riscv_attribute_info): New.
(riscv_attributes): New.
(riscv_target_attr_parser::parse_arch): New.
(riscv_target_attr_parser::handle_arch): New.
(riscv_target_attr_parser::handle_cpu): New.
(riscv_target_attr_parser::handle_tune): New.
(riscv_target_attr_parser::update_settings): New.
(riscv_process_one_target_attr): New.
(num_occurences_in_str): New.
(riscv_process_target_attr): New.
(riscv_option_valid_attribute_p): New.
* config/riscv/riscv.cc: Include target-globals.h and
riscv-subset.h.
(struct riscv_tune_info): Move to riscv-protos.h.
(get_tune_str): New.
(riscv_parse_tune): New parameter null_p.
(riscv_declare_function_size): New.
(riscv_option_override): Build target_option_default_node and
target_option_current_node.
(riscv_save_restore_target_globals): New.
(riscv_option_restore): New.
(riscv_previous_fndecl): New.
(riscv_set_current_function): Apply the target attribute.
(TARGET_OPTION_RESTORE): Define.
(TARGET_OPTION_VALID_ATTRIBUTE_P): Ditto.
* config/riscv/riscv.h (SWITCHABLE_TARGET): Define to 1.
(ASM_DECLARE_FUNCTION_SIZE) Define.
* config/riscv/riscv.opt (mtune=): Add Save attribute.
(mcpu=): Ditto.
(mcmodel=): Ditto.
* config/riscv/t-riscv: Add build rule for riscv-target-attr.o
* doc/extend.texi: Add doc for target attribute.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/target-attr-01.c: New.
* gcc.target/riscv/target-attr-02.c: Ditto.
* gcc.target/riscv/target-attr-03.c: Ditto.
* gcc.target/riscv/target-attr-04.c: Ditto.
* gcc.target/riscv/target-attr-05.c: Ditto.
* gcc.target/riscv/target-attr-06.c: Ditto.
* gcc.target/riscv/target-attr-07.c: Ditto.
* gcc.target/riscv/target-attr-bad-01.c: Ditto.
* gcc.target/riscv/target-attr-bad-02.c: Ditto.
* gcc.target/riscv/target-attr-bad-03.c: Ditto.
* gcc.target/riscv/target-attr-bad-04.c: Ditto.
* gcc.target/riscv/target-attr-bad-05.c: Ditto.
* gcc.target/riscv/target-attr-bad-06.c: Ditto.
* gcc.target/riscv/target-attr-bad-07.c: Ditto.
* gcc.target/riscv/target-attr-bad-08.c: Ditto.
* gcc.target/riscv/target-attr-bad-09.c: Ditto.
* gcc.target/riscv/target-attr-bad-10.c: Ditto.
Reviewed-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
We set ra to fixed register now, but we still need to save/restore that at
prologue/epilogue if that has used.
gcc/ChangeLog:
PR target/112478
* config/riscv/riscv.cc (riscv_save_return_addr_reg_p): Check ra
is ever lived.
gcc/testsuite/ChangeLog:
PR target/112478
* gcc.target/riscv/pr112478.c: New.
Reviewed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Tested-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
The new added splitter will generate
(insn 58 56 59 2 (set (reg:V4HI 20 xmm0 [129])
(vec_duplicate:V4HI (reg:HI 22 xmm2 [123]))) "testcase.c":16:21 -1
But we only have
(define_insn "*vec_dupv4hi"
[(set (match_operand:V4HI 0 "register_operand" "=y,Yw")
(vec_duplicate:V4HI
(truncate:HI
(match_operand:SI 1 "register_operand" "0,Yw"))))]
The patch add patterns for V4HI and V2HI.
gcc/ChangeLog:
PR target/112532
* config/i386/mmx.md (*vec_dup<mode>): Extend for V4HI and
V2HI.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr112532.c: New test.
|
|
peephole2 [PR112526]
The following testcase is miscompiled on x86_64 since PR110551 r14-4968
commit. That commit added 2 peephole2s, one for
mov imm,%rXX; mov %rYY,%rax; mulq %rXX -> mov imm,%rax; mulq %rYY
which I believe is ok, and another one for
mov imm,%rXX; mov %rYY,%rdx; mulx %rXX, %rZZ, %rWW -> mov imm,%rdx; mulx %rYY, %rZZ, %rWW
which is wrong. Both peephole2s verify that %rXX above is dead at
the end of the pattern, by checking if %rXX is either one of the
registers overwritten in the multiplication (%rdx:%rax in the first
case, the 2 destination registers of mulx in the latter case), because
we no longer set %rXX to that immediate (we set %rax resp. %rdx to it
instead) when the peephole2 replaces it. But, we also need to ensure
that the other register previously set to the value of %rYY and newly
to imm isn't used after the multiplication, and neither of the peephole2s
does that. Now, for the first one (at least assuming in the % pattern
the matching operand (i.e. hardcoded %rax resp. %rdx) after RA will always go
first) I think it is always the case, because operands[2] if it must be %rax
register will be overwritten by mulq writing to %rdx:%rax. But in the
second case, there is no reason why %rdx couldn't be used after the pattern,
and if it is (like in the testcase), we can't make those changes.
So, the patch checks similarly to operands[0] that operands[2] (which ought
to be %rdx if RA puts the % match_dup operand first and nothing swaps it
afterwards) is either the same register as one of the destination registers
of mulx or dies at the end of the multiplication.
2023-11-16 Jakub Jelinek <jakub@redhat.com>
PR target/112526
* config/i386/i386.md
(mov imm,%rax; mov %rdi,%rdx; mulx %rax -> mov imm,%rdx; mulx %rdi):
Verify in define_peephole2 that operands[2] dies or is overwritten
at the end of multiplication.
* gcc.target/i386/bmi2-pr112526.c: New test.
|
|
We ICE on the following testcase now that IFN_C[LT]Z calls can have one or
two arguments (where 2 mean it is well defined at zero).
The following patch makes us create child node only for the first argument
and compatible_calls_p ensures the other argument is the same, which
at least according to the testcase seems sufficient because of vect
patterns.
2023-11-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112536
* tree-vect-slp.cc (arg0_map): New variable.
(vect_get_operand_map): For IFN_CLZ or IFN_CTZ, return arg0_map.
* gcc.dg/pr112536.c: New test.
|
|
Avoid requiring a glibc specific symbol.
PR tree-optimization/112282
* gcc.dg/torture/pr112282.c: Do not use __assert_fail.
|
|
This patch fixes ICE:
https://godbolt.org/z/z8T6o6qov
<source>: In function 'b':
<source>:2:6: error: missing definition
2 | void b() {
| ^
for SSA_NAME: loop_len_8 in statement:
_1 = -loop_len_8;
during GIMPLE pass: vect
<source>:2:6: internal compiler error: verify_ssa failed
0x7f1b56331082 __libc_start_main
???:0
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
Compiler returned: 1
The root cause is we generate such IR in vectorization:
_1 = -loop_len_8;
vect_cst__11 = {_1, _1};
_18 = vect_vec_iv_.6_14 + vect_cst__11;
loop_len_8 is uninitialized value.
The IR _18 = vect_vec_iv_.6_14 + vect_cst__11; is generated because of we are adding induction variable with
the result of SELECT_VL instead of VF.
The code is:
else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
{
/* When we're using loop_len produced by SELEC_VL, the non-final
iterations are not always processing VF elements. So vectorize
induction variable instead of
_21 = vect_vec_iv_.6_22 + { VF, ... };
We should generate:
_35 = .SELECT_VL (ivtmp_33, VF);
vect_cst__22 = [vec_duplicate_expr] _35;
_21 = vect_vec_iv_.6_22 + vect_cst__22; */
gcc_assert (!slp_node);
gimple_seq seq = NULL;
vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
tree len = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0);
expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr),
unshare_expr (len)),
&seq, true, NULL_TREE);
new_name = gimple_build (&seq, MULT_EXPR, TREE_TYPE (step_expr), expr,
step_expr);
gsi_insert_seq_before (&si, seq, GSI_SAME_STMT);
step_iv_si = &si;
}
LOOP_VINFO_USING_SELECT_VL_P is set before loop vectorization analysis so we don't know whether it is partial
vectorization or not but the induction variable depends on SELECT_VL_P is true.
So update SELECT_VL_P as false when it is not partial vectorization.
PR middle-end/112554
gcc/ChangeLog:
* tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
Clear SELECT_VL_P for non-partial vectorization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112554.c: New test.
|
|
Here we are wrongly parsing
int y(auto(42));
which uses the C++23 cast-to-prvalue feature, and initializes y to 42.
However, we were treating the auto as an implicit template parameter.
Fixing the auto{42} case is easy, but when auto is followed by a (,
I found the fix to be much more involved. For instance, we cannot
use cp_parser_expression, because that can give hard errors. It's
also necessary to disambiguate 'auto(i)' as 'auto i', not a cast.
auto(), auto(int), auto(f)(int), auto(*), auto(i[]), auto(...), etc.
are all function declarations.
This patch rectifies that by undoing the implicit function template
modification. In the test above, we should notice that the parameter
list is ill-formed, and since we've synthesized an implicit template
parameter, we undo it by calling abort_fully_implicit_template. Then,
we'll parse the "(auto(42))" as an initializer.
PR c++/112410
gcc/cp/ChangeLog:
* parser.cc (cp_parser_direct_declarator): Maybe call
abort_fully_implicit_template if it turned out the parameter list was
ill-formed.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/auto-fncast13.C: New test.
* g++.dg/cpp23/auto-fncast14.C: New test.
|
|
For vextract/insert{if}128 they cannot adopt EGPR in their memory operand, all
related pattern should be adjusted to disable EGPR usage on them.
Also fix a wrong gpr16 attr for insertps.
gcc/ChangeLog:
* config/i386/sse.md (vec_extract_hi_<mode>): Add noavx512vl
alternative with attr addr gpr16 and "jm" constraint.
(vec_extract_hi_<mode>): Likewise for SF vector modes.
(@vec_extract_hi_<mode>): Likewise.
(*vec_extractv2ti): Likewise.
(vec_set_hi_<mode><mask_name>): Likewise.
* config/i386/mmx.md (@sse4_1_insertps_<mode>): Correct gpr16 attr for
each alternative.
|
|
|
|
Following testcase:
struct S1
{
unsigned char val;
unsigned char pad1;
unsigned short pad2;
};
struct S2
{
unsigned char pad1;
unsigned char val;
unsigned short pad2;
};
struct S1 test_add (struct S1 a, struct S2 b, struct S2 c)
{
a.val = b.val + c.val;
return a;
}
compiles with -O2 to:
movl %edi, %eax
movzbl %dh, %edx
movl %esi, %ecx
movb %dl, %al
addb %ch, %al
The insert to %al can go directly from %dh:
movl %edi, %eax
movl %esi, %ecx
movb %dh, %al
addb %ch, %al
Patch introduces strict_low_part QImode insn patterns with both of
their input arguments extracted from high register. This invalid
insn is split after reload to a lowpart insert from the high register
and <insn>qi_ext<mode>_1_slp instruction.
PR target/78904
gcc/ChangeLog:
* config/i386/i386.md (*movstrictqi_ext<mode>_1): New insn pattern.
(*addqi_ext<mode>_2_slp): New define_insn_and_split pattern.
(*subqi_ext<mode>_2_slp): Ditto.
(*<any_logic:code>qi_ext<mode>_2_slp): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr78904-8.c: New test.
* gcc.target/i386/pr78904-8a.c: New test.
* gcc.target/i386/pr78904-8b.c: New test.
* gcc.target/i386/pr78904-9.c: New test.
* gcc.target/i386/pr78904-9a.c: New test.
* gcc.target/i386/pr78904-9b.c: New test.
|
|
Passing in a base extension in non-canonical order (i, e, g) causes GCC
to ICE:
xgcc: error: '-march=rv64ge': ISA string is not in canonical order. 'e'
xgcc: internal compiler error: in add, at common/config/riscv/riscv-common.cc:671
...
This is fixed by skipping to the next extension when a non-canonical
order is detected.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse_std_ext): Emit an error and skip to
the next extension when a non-canonical ordering is detected.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-27.c: New test.
* gcc.target/riscv/arch-28.c: New test.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
r14-985-gca2007a9bb3074 used the collapsed macro definition
CAN_HAVE_LOCATION_P in gcc-rich-location.cc and r14-977-g8861c80733da5c
in c++'s build_cplus_array_type ().
However, although otherwise correct, the usage of CAN_HAVE_LOCATION_P
in these two spots is misleading, so this patch reverts aforementioned
two hunks.
gcc/cp/ChangeLog:
* tree.cc (build_cplus_array_type): Revert using the macro
CAN_HAVE_LOCATION_P.
gcc/ChangeLog:
* gcc-rich-location.cc (maybe_range_label_for_tree_type_mismatch::get_text):
Revert using the macro CAN_HAVE_LOCATION_P.
|
|
Fixes: f0e28d8c1371 ("RISC-V: Fix failed hoist in LICM of vmv.v.x instruction")
Since above commit, we have following failure:
FAIL: gcc.c-torture/execute/memset-3.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test
FAIL: gcc.c-torture/execute/memset-3.c -O3 -g execution test
The issue was not the commit but rather it unravelled an issue in the
vsetvli pass.
Here's Juzhe's analysis:
We have 2 types of global vsetvls insertion.
One is earliest fusion of each end of the block.
The other is LCM suggested edge vsetvls.
So before this patch, insertion as follows:
| (insn 2817 2820 2818 361 (set (reg:SI 67 vtype)
| (unspec:SI [
| (const_int 8 [0x8])
| (const_int 7 [0x7])
| (const_int 1 [0x1]) repeated x2
| ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))
| (insn 2818 2817 999 361 (set (reg:SI 67 vtype)
| (unspec:SI [
| (const_int 32 [0x20])
| (const_int 1 [0x1]) repeated x3
| ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))
After this patch:
| (insn 2817 2820 2819 361 (set (reg:SI 67 vtype)
| (unspec:SI [
| (const_int 32 [0x20])
| (const_int 1 [0x1]) repeated x3
| ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))
| (insn 2819 2817 999 361 (set (reg:SI 67 vtype)
| (unspec:SI [
| (const_int 8 [0x8])
| (const_int 7 [0x7])
| (const_int 1 [0x1]) repeated x2
| ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))
The original insertion order is incorrect.
We should first insert earliest fusion since it is the vsetvls information
already there which was seen by later LCM. We just delay the insertion.
So it should be come before the LCM suggested insertion.
PR target/112447
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): Insert
local vsetvl info before LCM suggested one.
Tested-by: Patrick O'Neill <patrick@rivosinc.com> # pre-commit-CI #679
Co-developed-by: Vineet Gupta <vineetg@rivosinc.com>
|
|
RV64 compare and branch instructions only support 64-bit operands.
At Expand time, the backend conservatively zero/sign extends
its operands even if not needed, such as incoming function args
which ABI/ISA guarantee to be sign-extended already (this is true for
SI, HI, QI operands)
And subsequently REE fails to eliminate them as
"missing defintion(s)" or "multiple definition(s)
since function args don't have explicit definition.
So during expand riscv_extend_comparands (), if an operand is a
subreg-promoted SI with inner DI, which is representative of a function
arg, just peel away the subreg to expose the DI, eliding the sign
extension. As Jeff noted this routine is also used in if-conversion so
potentially can also help there.
Note there's currently patches floating around to improve REE and also a
new pass to eliminate unneccesary extensions, but it is still beneficial
to not generate those extra extensions in first place. It is obviously
less work for post-reload passes such as REE, but even for earlier
passes, such as combine, having to deal with one less thing and ensuing
fewer combinations is a win too.
Way too many existing tests used to observe this issue.
e.g. gcc.c-torture/compile/20190827-1.c -O2 -march=rv64gc
It elimiates the SEXT.W
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sign_extend_if_not_subreg_prom): New.
* (riscv_extend_comparands): Call New function on operands.
Tested-by: Patrick O'Neill <patrick@rivosinc.com> # pre-commit-CI #676
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
|
|
The NON_DEPENDENT_EXPR removal exposed that is_direct_enum_init can be
called in a template context on a CONSTRUCTOR that isn't type-dependent
but whose element is.
PR c++/112515
gcc/cp/ChangeLog:
* decl.cc (is_direct_enum_init): Check type-dependence of the
single element.
gcc/testsuite/ChangeLog:
* g++.dg/template/non-dependent30.C: New test.
|
|
Here we're ICEing from strip_typedefs for the partially instantiated
requires-expression when walking its REQUIRES_EXPR_EXTRA_ARGS which
in this case is a TREE_LIST with non-empty TREE_PURPOSE (to hold the
captured local specialization 't' as per build_extra_args) which
strip_typedefs doesn't expect.
We can probably skip walking REQUIRES_EXPR_EXTRA_ARGS at all since it
shouldn't contain any typedefs in the first place, but it seems safer
and more generally useful to just teach strip_typedefs to handle non-empty
TREE_PURPOSE the obvious way. (The code asserts TREE_PURPOSE was empty
even since since its inception i.e. r189298.)
PR c++/101043
gcc/cp/ChangeLog:
* tree.cc (strip_typedefs_expr) <case TREE_LIST>: Handle
non-empty TREE_PURPOSE.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-requires37.C: New test.
|
|
Here when building up the non-dependent .* expression, we crash from
fold_convert on 'b.a' due to this (templated) COMPONENT_REF having an
IDENTIFIER_NODE instead of FIELD_DECL operand that middle-end routines
expect. Like in r14-4899-gd80a26cca02587, this patch fixes this by
replacing the problematic piecemeal folding with a single call to
cp_fully_fold. Also, don't bother building the POINTER_PLUS_EXPR in a
template context. This means the returned non-dependent tree might not
have TREE_SIDE_EFFECTS set when it used to, so we need to compensate
by making build_min_non_dep propagate TREE_SIDE_EFFECTS from the original
arguments like buildN and build_min do.
PR c++/112427
gcc/cp/ChangeLog:
* tree.cc (build_min_non_dep): Propagate TREE_SIDE_EFFECTS from
the original arguments.
(build_min_non_dep_call_vec): Likewise.
* typeck2.cc (build_m_component_ref): Use cp_convert, build2 and
cp_fully_fold instead of fold_build_pointer_plus and fold_convert.
Don't build the POINTER_PLUS_EXPR in a template context.
gcc/testsuite/ChangeLog:
* g++.dg/template/non-dependent29.C: New test.
|
|
potential_constant_expression was incorrectly treating most local
variables from a constexpr function as constant because it wasn't
considering the 'now' parameter. This patch fixes this by relaxing
its var_in_maybe_constexpr_fn checks accordingly, which turns out to
partially fix two recently reported regressions:
PR111703 is a regression caused by r11-550-gf65a3299a521a4 for restricting
constexpr evaluation during warning-dependent folding. The mechanism is
intended to restrict only constant evaluation of the instantiated
non-dependent expression, but it also ends up restricting constant
evaluation occurring during instantiation of the expression, in particular
when instantiating the converted argument 'x' (a VIEW_CONVERT_EXPR) into
a copy constructor call. This seems like a flaw in the mechanism, though
I don't know if we want to fix the mechanism or get rid of it completely
since the original testcases which motivated the mechanism are fixed more
simply by r13-1225-gb00b95198e6720. In any case, this patch partially
fixes this by making us correctly treat 'x' as non-constant which prevents
the problematic warning-dependent folding from occurring at all.
PR112269 is caused by r14-4796-g3e3d73ed5e85e7 for merging tsubst_copy
into tsubst_copy_and_build. tsubst_copy used to exit early when 'args'
was empty, behavior which that commit deliberately didn't preserve.
This early exit masked the fact that COMPLEX_EXPR wasn't handled by
tsubst at all, and is a tree code that apparently we could see during
warning-dependent folding on some targets. A complete fix is to add
handling for this tree code in tsubst_expr, but this patch should fix
the reported testsuite failures since the COMPLEX_EXPRs that crop up
in <complex> are considered non-constant expressions after this patch.
PR c++/111703
PR c++/112269
gcc/cp/ChangeLog:
* constexpr.cc (potential_constant_expression_1) <case VAR_DECL>:
Only consider var_in_maybe_constexpr_fn if 'now' is false.
<case INDIRECT_REF>: Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-fn8.C: New test.
|
|
gcc/ChangeLog:
* config/i386/i386.md (*addqi_ext<mode>_1_slp):
Add "&& " before "reload_completed" in split condition.
(*subqi_ext<mode>_1_slp): Ditto.
(*<any_logic:code>qi_ext<mode>_1_slp): Ditto.
|
|
[PR112540]
PR target/112540
gcc/ChangeLog:
* config/i386/i386.md (*addqi_ext<mode>_1_slp):
Correct operand numbers in split pattern. Replace !Q constraint
of operand 1 with !qm. Add insn constrain.
(*subqi_ext<mode>_1_slp): Ditto.
(*<any_logic:code>qi_ext<mode>_1_slp): Ditto.
|
|
Minor fix-up for commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".
gcc/
* doc/extend.texi (Nvidia PTX Built-in Functions): Fix
copy'n'paste-o in '__builtin_nvptx_brev' description.
|
|
This minor tweak to the nvptx backend switches the representation of
of the brev instruction from an UNSPEC to instead use the new BITREVERSE
rtx. This allows various RTL optimizations including evaluation (constant
folding) of integer constant arguments at compile-time.
gcc/
* config/nvptx/nvptx.md (UNSPEC_BITREV): Delete.
(bitrev<mode>2): Represent using bitreverse.
gcc/testsuite/
* gcc.target/nvptx/brev-2-O2.c: Adjust.
* gcc.target/nvptx/brevll-2-O2.c: Likewise.
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
|
|
In order to observe effects of a later patch, extend the 'brev' test cases
added in commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".
gcc/testsuite/
* gcc.target/nvptx/brev-1.c: Extend.
* gcc.target/nvptx/brev-2.c: Rename to...
* gcc.target/nvptx/brev-2-O2.c: ... this, and extend. Copy to...
* gcc.target/nvptx/brev-2-O0.c: ... this, and adapt for '-O0'.
* gcc.target/nvptx/brevll-1.c: Extend.
* gcc.target/nvptx/brevll-2.c: Rename to...
* gcc.target/nvptx/brevll-2-O2.c: ... this, and extend. Copy to...
* gcc.target/nvptx/brevll-2-O0.c: ... this, and adapt for '-O0'.
|
|
Add the new CDNA register file. We don't support any of the specialized
instructions that use these registers, but they're useful to relieve
register pressure without spilling to stack.
Co-authored-by: Andrew Jenner <andrew@codesourcery.com>
gcc/ChangeLog:
* config/gcn/constraints.md: Add "a" AVGPR constraint.
* config/gcn/gcn-valu.md (*mov<mode>): Add AVGPR alternatives.
(*mov<mode>_4reg): Likewise.
(@mov<mode>_sgprbase): Likewise.
(gather<mode>_insn_1offset<exec>): Likewise.
(gather<mode>_insn_1offset_ds<exec>): Likewise.
(gather<mode>_insn_2offsets<exec>): Likewise.
(scatter<mode>_expr<exec_scatter>): Likewise.
(scatter<mode>_insn_1offset_ds<exec_scatter>): Likewise.
(scatter<mode>_insn_2offsets<exec_scatter>): Likewise.
* config/gcn/gcn.cc (MAX_NORMAL_AVGPR_COUNT): Define.
(gcn_class_max_nregs): Handle AVGPR_REGS and ALL_VGPR_REGS.
(gcn_hard_regno_mode_ok): Likewise.
(gcn_regno_reg_class): Likewise.
(gcn_spill_class): Allow spilling to AVGPRs on TARGET_CDNA1_PLUS.
(gcn_sgpr_move_p): Handle AVGPRs.
(gcn_secondary_reload): Reload AVGPRs via VGPRs.
(gcn_conditional_register_usage): Handle AVGPRs.
(gcn_vgpr_equivalent_register_operand): New function.
(gcn_valid_move_p): Check for validity of AVGPR moves.
(gcn_compute_frame_offsets): Handle AVGPRs.
(gcn_memory_move_cost): Likewise.
(gcn_register_move_cost): Likewise.
(gcn_vmem_insn_p): Handle TYPE_VOP3P_MAI.
(gcn_md_reorg): Handle AVGPRs.
(gcn_hsa_declare_function_name): Likewise.
(print_reg): Likewise.
(gcn_dwarf_register_number): Likewise.
* config/gcn/gcn.h (FIRST_AVGPR_REG): Define.
(AVGPR_REGNO): Define.
(LAST_AVGPR_REG): Define.
(SOFT_ARG_REG): Update.
(FRAME_POINTER_REGNUM): Update.
(DWARF_LINK_REGISTER): Update.
(FIRST_PSEUDO_REGISTER): Update.
(AVGPR_REGNO_P): Define.
(enum reg_class): Add AVGPR_REGS and ALL_VGPR_REGS.
(REG_CLASS_CONTENTS): Add new register classes and add entries for
AVGPRs to all classes.
(REGISTER_NAMES): Add AVGPRs.
* config/gcn/gcn.md (FIRST_AVGPR_REG, LAST_AVGPR_REG): Define.
(AP_REGNUM, FP_REGNUM): Update.
(define_attr "type"): Add vop3p_mai.
(define_attr "unit"): Handle vop3p_mai.
(define_attr "gcn_version"): Add "cdna2".
(define_attr "enabled"): Handle cdna2.
(*mov<mode>_insn): Add AVGPR alternatives.
(*movti_insn): Likewise.
* config/gcn/mkoffload.cc (isa_has_combined_avgprs): New.
(process_asm): Process avgpr_count.
* config/gcn/predicates.md (gcn_avgpr_register_operand): New.
(gcn_avgpr_hard_register_operand): New.
* doc/md.texi: Document the "a" constraint.
gcc/testsuite/ChangeLog:
* gcc.target/gcn/avgpr-mem-double.c: New test.
* gcc.target/gcn/avgpr-mem-int.c: New test.
* gcc.target/gcn/avgpr-mem-long.c: New test.
* gcc.target/gcn/avgpr-mem-short.c: New test.
* gcc.target/gcn/avgpr-spill-double.c: New test.
* gcc.target/gcn/avgpr-spill-int.c: New test.
* gcc.target/gcn/avgpr-spill-long.c: New test.
* gcc.target/gcn/avgpr-spill-short.c: New test.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (max_isa_vgprs): New.
(run_kernel): CDNA2 devices have more VGPRs.
|
|
Remove some unnecessary complexity; no functional change is intended,
although LRA appears to use the constraints from the reload_in/out
patterns, so it's probably an improvement for it to see the real sgprbase
constraints.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (mov<mode>_sgprbase): Add @ modifier.
(reload_in<mode>): Delete.
(reload_out<mode>): Delete.
* config/gcn/gcn.cc (CODE_FOR): Delete.
(get_code_for_##PREFIX##vN##SUFFIX): Delete.
(CODE_FOR_OP): Delete.
(get_code_for_##PREFIX): Delete.
(gcn_secondary_reload): Replace "get_code_for" with "code_for".
|
|
By default the preprocessed output includes linemarkers. This leads to
an error if -pedantic is used as e.g. during bootstrap:
s390-gen-builtins.h:1:3: error: style of line directive is a GCC extension [-Werror]
Fixed by omitting linemarkers while generating s390-gen-builtins.h.
gcc/ChangeLog:
* config/s390/t-s390: Generate s390-gen-builtins.h without
linemarkers.
|
|
The following avoids hoisting of invariants from conditionally
executed parts of an if-converted loop. That now makes a difference
since we perform bitfield lowering even when we do not actually
if-convert the loop. if-conversion deals with resetting flow-sensitive
info when necessary already.
PR tree-optimization/112282
* tree-if-conv.cc (ifcvt_hoist_invariants): Only hoist from
the loop header.
* gcc.dg/torture/pr112282.c: New testcase.
|
|
We have to clear the visited flag on stmts.
* tree-vect-slp.cc (vect_slp_region): Also clear visited flag when
we skipped an instance due to -fdbg-cnt.
|
|
The updated libasan doesn't print __interceptor_free (or __interceptor_malloc)
but free (or malloc), the following patch adjusts the testcase so that it
accepts it.
2023-11-15 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/asan/sanity-check-pure-c-1.c: Adjust for interceptor_
or wrap_ substrings possibly not being emitted in newer libasan.
|