Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/ada/
* sem_ch12.adb (Save_And_Reset): Fix value of low bound used to
reset table.
|
|
The compiler incorrectly accepts Some_Object'Unchecked_Access'Image.
gcc/ada/
* sem_attr.adb
(Analyze_Image_Attribute.Check_Image_Type): Check for
E_Access_Attribute_Type prefix type.
|
|
In some cases, a use clause (or a use type clause) occurring within a
protected operation is incorrectly ignored.
gcc/ada/
* exp_ch9.adb
(Expand_N_Protected_Body): Declare new procedure
Unanalyze_Use_Clauses and call it before analyzing the newly
constructed subprogram body.
|
|
If type T1 is is a tagged null record with a Put_Image aspect specification
and type T2 is a null extension of T1 (with no aspect specifications), then
evaluation of a T2'Image call should include a call to the specified procedure
(as opposed to yielding "(NULL RECORD)").
gcc/ada/
* exp_put_image.adb
(Build_Record_Put_Image_Procedure): Declare new Boolean-valued
function Null_Record_Default_Implementation_OK; call it as part of
deciding whether to generate "(NULL RECORD)" text.
|
|
This patch modifies the experimental 'Size'Class feature such that objects of
mutably tagged types can be assigned qualified expressions featuring a
definite type (e.g. Mutable_Obj := Root_Child_T'(Root_T with others => <>)).
gcc/ada/
* sem_ch5.adb:
(Analyze_Assignment): Add special expansion for qualified expressions
in certain cases dealing with mutably tagged types.
|
|
GNAT crashes on an iterator with a filter inside an expression function
that is the completion of an earlier spec.
gcc/ada/
* freeze.adb (Freeze_Type_Refs): If Node is in N_Has_Etype,
check that it has had its Etype set, because this can be
called early for expression functions that are completions.
|
|
... implement support for ordering comparisons of discrete array types.
This extends the Support_Composite_Compare_On_Target feature to ordering
comparisons of discrete array types as specified by RM 4.5.2(26/3), when
the component type is a byte (unsigned).
Implement support for ordering comparisons of discrete array types
with a two-pronged approach: for types with a size known at compile time,
this lets the gimplifier generate the call to memcmp (or else an optimize
version of it); otherwise, this directly generates the call to memcmp.
gcc/ada/
* exp_ch4.adb (Expand_Array_Comparison): Remove the obsolete byte
addressibility test. If Support_Composite_Compare_On_Target is true,
immediately return for a component size of 8, an unsigned component
type and aligned operands. Disable when Unnest_Subprogram_Mode is
true (for LLVM).
(Expand_N_Op_Eq): Adjust comment.
* targparm.ads (Support_Composite_Compare_On_Target): Replace bit by
byte in description and document support for ordering comparisons.
* gcc-interface/utils2.cc (compare_arrays): Rename into...
(compare_arrays_for_equality): ...this. Remove redundant lines.
(compare_arrays_for_ordering): New function.
(build_binary_op) <comparisons>: Call compare_arrays_for_ordering
to implement ordering comparisons for arrays.
|
|
Pragma/aspect Extensions_Visible should be analyzed before any
pre/post contracts on a subprogram, as the legality of conversions
of formal parameters to classwide type depends on the value of
Extensions_Visible. Now fixed.
gcc/ada/
* contracts.adb (Analyze_Pragmas_In_Declarations): Analyze
pragmas in two iterations over the list of declarations in
order to analyze some pragmas before others.
* einfo-utils.ads (Get_Pragma): Fix comment.
* sem_prag.ads (Pragma_Significant_To_Subprograms): Fix.
(Pragma_Significant_To_Subprograms_Analyzed_First): Add new
global array to identify these pragmas which should be analyzed
first, which concerns only Extensions_Visible for now.
|
|
The problem is that the call to Convert_View made from Make_Init_Call does
nothing because the Etype is not set on the second argument.
gcc/ada/
* exp_ch7.adb (Convert_View): Add third parameter Typ and use it if
the second parameter does not have an Etype.
(Make_Adjust_Call): Remove obsolete setting of Etype and pass Typ in
call to Convert_View.
(Make_Final_Call): Likewise.
(Make_Init_Call): Pass Typ in call to Convert_View.
|
|
When an array has several dimensions, and inner dimmensions are
initialized using Ada 2022 null array aggregates, the compiler
crashes or reports spurious errors computing the bounds of the
null array aggregates. This patch fixes the problem and adds
new warnings reported when the index of null array aggregates is
an enumeration type or a modular type and it is known at compile
time that the program will raise Constraint_Error computing the
bounds of the aggregate.
gcc/ada/
* sem_aggr.adb (Cannot_Compute_High_Bound): New subprogram.
(Report_Null_Array_Constraint_Error): New subprogram.
(Collect_Aggr_Bounds): For null aggregates, build the bounds
of the inner dimensions.
(Has_Null_Aggregate_Raising_Constraint_Error): New subprogram.
(Subtract): New subprogram.
(Resolve_Array_Aggregate): Report a warning when the index of
null array aggregates is an enumeration type or a modular type
at we can statically determine that the program will raise CE
at runtime computing its high bound.
(Resolve_Null_Array_Aggregate): ditto.
|
|
The problem is that the implementation of the No_Default_Initialization
restriction assumes that no type initialization routines are needed and,
therefore, builds a dummy version of them, which goes against their use
for box-initialized components in aggregates.
Therefore this use needs to be flagged as violating the restriction too.
gcc/ada/
* doc/gnat_rm/standard_and_implementation_defined_restrictions.rst
(No_Default_Initialization): Mention components alongside variables.
* exp_aggr.adb (Build_Array_Aggr_Code.Gen_Assign): Check that the
restriction No_Default_Initialization is not in effect for default
initialized component.
(Build_Record_Aggr_Code): Likewise.
* gnat_rm.texi: Regenerate.
|
|
gcc/ada/
* debug.adb (dJ): Add back as unused.
|
|
This is a first step towards having a device table so we can add new devices
more easily. It'll also make it easier to remove the deprecated GCN3 bits.
The patch should not change the behaviour of anything.
gcc/ChangeLog:
* config/gcn/gcn-opts.h (TARGET_GLOBAL_ADDRSPACE): New.
(TARGET_AVGPRS): New.
(TARGET_AVGPR_MEMOPS): New.
(TARGET_AVGPR_COMBINED): New.
(TARGET_FLAT_OFFSETS): New.
(TARGET_11BIT_GLOBAL_OFFSET): New.
(TARGET_CDNA2_MEM_COSTS): New.
(TARGET_WAVE64_COMPAT): New.
(TARGET_DPP_FULL): New.
(TARGET_DPP16): New.
(TARGET_DPP8): New.
(TARGET_AVGPR_CDNA1_NOPS): New.
(TARGET_VGPR_GRANULARITY): New.
(TARGET_ARCHITECTED_FLAT_SCRATCH): New.
(TARGET_EXPLICIT_CARRY): New.
(TARGET_MULTIPLY_IMMEDIATE): New.
(TARGET_SDWA): New.
(TARGET_WBINVL1_CACHE): New.
(TARGET_GLn_CACHE): New.
* config/gcn/gcn-valu.md (throughout): Change TARGET_GCN*,
TARGET_CDNA* and TARGET_RDNA* to use TARGET_<feature> instead.
* config/gcn/gcn.cc (throughout): Likewise.
* config/gcn/gcn.md (throughout): Likewise.
|
|
The case in the ticket is an ICE on invalid due to an assert in stabilize_expr,
but the underlying issue can actually trigger on this *valid* code:
=== cut here ===
struct TheClass {
TheClass() {}
TheClass(volatile TheClass& t) {}
TheClass operator=(volatile TheClass& t) volatile { return t; }
};
void the_func() {
volatile TheClass x, y, z;
(false ? x : y) = z;
}
=== cut here ===
The problem is that stabilize_expr asserts that it returns an expression
without TREE_SIDE_EFFECTS, which can't be if the involved type is volatile.
This patch relaxes the assert to accept having TREE_THIS_VOLATILE on the
returned expression.
Successfully tested on x86_64-pc-linux-gnu.
PR c++/111160
gcc/cp/ChangeLog:
* tree.cc (stabilize_expr): Stabilized expressions can have
TREE_SIDE_EFFECTS if they're volatile.
gcc/testsuite/ChangeLog:
* g++.dg/overload/error8.C: New test.
* g++.dg/overload/volatile2.C: New test.
|
|
gcc/ChangeLog:
* config/i386/i386.md (*imulhi<mode>zu): Added APX
NF support.
(*imulhi<mode>zu<nf_name>): New define_insn.
(*mulsi3_1_zext<nf_name>): Ditto.
(*mul<mode><dwi>3_1<nf_name>): Ditto.
(*<u>mulqihi3_1<nf_name>): Ditto.
(*mul<mode>3_1<nf_name>): Added APX NDD support.
(*mulv<mode>4): Ditto.
(*mulvhi4): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-ndd.c: Add test for imul ndd.
|
|
Commit r15-1594 removed define of LONG_DOUBLE_TYPE_SIZE in
sparc.cc, it's based on the assumption that each OS has its
own define (see the comments in sparc.h), but it exposes an
issue on vxworks which lacks of the define.
We can bring back the default SPARC_LONG_DOUBLE_TYPE_SIZE to
sparc.cc, but according to the comments in sparc.h, I think
it's better to define this in vxworks.h. btw, I also went
through all the sparc supported triples, vxworks is the only
one that misses this define.
PR target/115739
gcc/ChangeLog:
* config/sparc/vxworks.h (SPARC_LONG_DOUBLE_TYPE_SIZE): New define.
|
|
The following two FAIL items have been fixed:
FAIL: gcc.target/loongarch/movcf2gr-via-fr.c scan-assembler movcf2fr\\t\\\\\$f[0-9]+,\\\\\$fcc
FAIL: gcc.target/loongarch/movcf2gr-via-fr.c scan-assembler movfr2gr\\\\.s\\t\\\\\$r4
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_insn_cost):
New function.
(TARGET_INSN_COST): New macro.
|
|
After r15-1579, ADD and LD/ST pairs will be merged into LDX/STX.
Cause these two tests to fail. To guarantee that these two tests pass,
add the compilation option '-fno-late-combine-instructions'.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/explicit-relocs-extreme-tls-desc.c:
Add compilation options '-fno-late-combine-instructions'.
* gcc.target/loongarch/explicit-relocs-tls-desc.c: Likewise.
|
|
As PR115659 shows, assuming c = x CMP y, there are some
folding chances for patterns r = c ? -1/z : z/0.
For r = c ? -1 : z, it can be folded into:
- r = c | z (with ior_optab supported)
- or r = c ? c : z
while for r = c ? z : 0, it can be foled into:
- r = c & z (with and_optab supported)
- or r = c ? z : c
This patch is to teach ISEL to take care of them and also
remove the redundant gsi_replace as the caller of function
gimple_expand_vec_cond_expr will handle it.
PR tree-optimization/115659
gcc/ChangeLog:
* gimple-isel.cc (gimple_expand_vec_cond_expr): Add more foldings for
patterns x CMP y ? -1 : z and x CMP y ? z : 0.
|
|
|
|
This is a low-prio crash on invalid code where we ICE on a VAR_DECL
with erroneous type. I thought I'd try to avoid putting such decls
into ->names and ->names_in_scope but that sounds riskier than the
following cleanup.
PR c++/115469
gcc/cp/ChangeLog:
* decl.cc (automatic_var_with_nontrivial_dtor_p): New.
(poplevel_named_label_1): Use it.
(check_goto_1): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/ext/label17.C: New test.
|
|
I made sure that Wnarrowing22.C works fine on ILP32, but apparently
I didn't verify that spaceship-narrowing1.C works there as well. :(
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/spaceship-narrowing1.C: Use __INT64_TYPE__.
|
|
This works:
template<typename T>
int Func(T);
typedef int (*funcptrtype)(int);
funcptrtype fp0 = &Func<int>;
but this doesn't:
funcptrtype fp2 = (0, &Func<int>);
because we only call resolve_nondeduced_context on the LHS (via
convert_to_void) but not on the RHS, so cp_build_compound_expr's
type_unknown_p check issues an error.
PR c++/115430
gcc/cp/ChangeLog:
* typeck.cc (cp_build_compound_expr): Call resolve_nondeduced_context
on RHS.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/noexcept41.C: Remove dg-error.
* g++.dg/overload/addr3.C: New test.
|
|
This DR (https://cplusplus.github.io/CWG/issues/2627.html) says that
even if we are converting from an integer type or unscoped enumeration type
to an integer type that cannot represent all the values of the original
type, it's not narrowing if "the source is a bit-field whose width w is
less than that of its type (or, for an enumeration type, its underlying
type) and the target type can represent all the values of a hypothetical
extended integer type with width w and with the same signedness as the
original type".
DR 2627
PR c++/94058
PR c++/104392
gcc/cp/ChangeLog:
* typeck2.cc (check_narrowing): Don't warn if the conversion isn't
narrowing as per DR 2627.
gcc/testsuite/ChangeLog:
* g++.dg/DRs/dr2627.C: New test.
* g++.dg/cpp0x/Wnarrowing22.C: New test.
* g++.dg/cpp2a/spaceship-narrowing1.C: New test.
* g++.dg/cpp2a/spaceship-narrowing2.C: New test.
|
|
Besides VN and copy-prop also CCP and VRP as well as forwprop
propagate out copies and thus it's worthwhile to try to preserve
range and points-to info there when possible.
Note that this also fixes the testcase from PR115701 but that's
because we do not actually intersect info but only copy info when
there was no info present.
* tree-ssa-forwprop.cc (fwprop_set_lattice_val): Preserve
SSA info.
* tree-ssa-propagate.cc
(substitute_and_fold_dom_walker::before_dom_children): Likewise.
|
|
This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 4. Aka:
Form 4:
#define DEF_SAT_U_ADD_IMM_FMT_4(T) \
T __attribute__((noinline)) \
sat_u_add_imm_##T##_fmt_4 (T x) \
{ \
T ret; \
return __builtin_add_overflow (x, 9, &ret) == 0 ? ret : -1; \
}
DEF_SAT_U_ADD_IMM_FMT_4(uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-13.c: New test.
* gcc.target/riscv/sat_u_add_imm-14.c: New test.
* gcc.target/riscv/sat_u_add_imm-15.c: New test.
* gcc.target/riscv/sat_u_add_imm-16.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-13.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-14.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-15.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-16.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 3. Aka:
Form 3:
#define DEF_SAT_U_ADD_IMM_FMT_3(T) \
T __attribute__((noinline)) \
sat_u_add_imm_##T##_fmt_3 (T x) \
{ \
T ret; \
return __builtin_add_overflow (x, 8, &ret) ? -1 : ret; \
}
DEF_SAT_U_ADD_IMM_FMT_3(uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-10.c: New test.
* gcc.target/riscv/sat_u_add_imm-11.c: New test.
* gcc.target/riscv/sat_u_add_imm-12.c: New test.
* gcc.target/riscv/sat_u_add_imm-9.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-10.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-11.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-12.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-9.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 2. Aka:
Form 2:
#define DEF_SAT_U_ADD_IMM_FMT_2(T) \
T __attribute__((noinline)) \
sat_u_add_imm_##T##_fmt_1 (T x) \
{ \
return (T)(x + 9) < x ? -1 : (x + 9); \
}
DEF_SAT_U_ADD_IMM_FMT_2(uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-5.c: New test.
* gcc.target/riscv/sat_u_add_imm-6.c: New test.
* gcc.target/riscv/sat_u_add_imm-7.c: New test.
* gcc.target/riscv/sat_u_add_imm-8.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-5.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-6.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-7.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 1. Aka:
Form 1:
#define DEF_SAT_U_ADD_IMM_FMT_1(T) \
T __attribute__((noinline)) \
sat_u_add_imm_##T##_fmt_1 (T x) \
{ \
return (T)(x + 9) >= x ? (x + 9) : -1; \
}
DEF_SAT_U_ADD_IMM_FMT_1(uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-1.c: New test.
* gcc.target/riscv/sat_u_add_imm-2.c: New test.
* gcc.target/riscv/sat_u_add_imm-3.c: New test.
* gcc.target/riscv/sat_u_add_imm-4.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-1.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-2.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-3.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-4.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch fixes the 4 FAILs of gcc.target/i386/pr192464-vrndscaleph.c
with --target_board='unix{-m32}' on RedHat 7.x. The issue is that this
AVX512 test includes the system math.h, and on older systems this provides
inline versions of floor, ceil and rint (for the 387). The work around
is to define __NO_MATH_INLINES before #include <math.h> (or alternatively
use __builtin_floor, __builtin_ceil, etc.).
2024-07-01 Roger Sayle <roger@nextmovesoftware.com>
gcc/testsuite/ChangeLog
PR middle-end/102464
* gcc.target/i386/pr102464-vrndscaleph.c: Define __NO_MATH_INLINES
to resovle FAILs with -m32 on older RedHat systems.
|
|
A common idiom for implementing an integer division that rounds upwards is
to write (x + y - 1) / y. Conveniently on x86, the two additions to form
the numerator can be performed by a single lea instruction, and indeed gcc
currently generates a lea when both x and y are both registers.
int foo(int x, int y) {
return (x+y-1)/y;
}
generates with -O2:
foo: leal -1(%rsi,%rdi), %eax // 4 bytes
cltd
idivl %esi
ret
Oddly, however, if x is a memory, gcc currently uses two instructions:
int m;
int bar(int y) {
return (m+y-1)/y;
}
generates:
foo: movl m(%rip), %eax
addl %edi, %eax // 2 bytes
subl $1, %eax // 3 bytes
cltd
idivl %edi
ret
This discrepancy is caused by the late decision (in peephole2) to split
an addition with a memory operand, into a load followed by a reg-reg
addition. This patch improves this situation by adding a peephole2
to recognize consecutive additions and transform them into lea if
profitable.
My first attempt at fixing this was to use a define_insn_and_split:
(define_insn_and_split "*lea<mode>3_reg_mem_imm"
[(set (match_operand:SWI48 0 "register_operand")
(plus:SWI48 (plus:SWI48 (match_operand:SWI48 1 "register_operand")
(match_operand:SWI48 2 "memory_operand"))
(match_operand:SWI48 3 "x86_64_immediate_operand")))]
"ix86_pre_reload_split ()"
"#"
"&& 1"
[(set (match_dup 4) (match_dup 2))
(set (match_dup 0) (plus:SWI48 (plus:SWI48 (match_dup 1) (match_dup 4))
(match_dup 3)))]
"operands[4] = gen_reg_rtx (<MODE>mode);")
using combine to combine instructions. Unfortunately, this approach
interferes with (reload's) subtle balance of deciding when to use/avoid lea,
which can be observed as a code size regression in CSiBE. The peephole2
approach (proposed here) uniformly improves CSiBE results.
2024-07-01 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (peephole2): Transform two consecutive
additions into a 3-component lea if !TARGET_AVOID_LEA_FOR_ADDR.
gcc/testsuite/ChangeLog
* gcc.target/i386/lea-3.c: New test case.
|
|
PR target/88236
PR target/115726
gcc/
* config/avr/avr.md (mov<mode>) [avr_mem_memx_p]: Expand in such a
way that the destination does not overlap with any hard register
clobbered / used by xload8qi_A resp. xload<mode>_A.
* config/avr/avr.cc (avr_out_xload): Avoid early-clobber
situation for Z by executing just one load when the output register
overlaps with Z.
gcc/testsuite/
* gcc.target/avr/torture/pr88236-pr115726.c: New test.
|
|
PR testsuite/52641
gcc/testsuite/
* gcc.dg/analyzer/pr109577.c: Use __SIZE_TYPE__ instead of "unsigned long".
* gcc.dg/analyzer/pr93032-mztools-signed-char.c: Requires int32plus.
* gcc.dg/analyzer/pr93032-mztools-unsigned-char.c: Requires int32plus.
* gcc.dg/analyzer/putenv-1.c: Skip on avr.
* gcc.dg/torture/type-generic-1.c: Skip on avr.
|
|
This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. This is not in the OpenMP standard so it uses the "ompx"
namespace and an independent enum baseline of 200 (selected to not clash with
other known implementations).
The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait. One motivation for having this feature is
for use by the (planned) -foffload-memory=pinned feature.
gcc/fortran/ChangeLog:
* openmp.cc (is_predefined_allocator): Update valid ranges to
incorporate ompx_gnu_pinned_mem_alloc.
libgomp/ChangeLog:
* allocator.c (ompx_gnu_min_predefined_alloc): New.
(ompx_gnu_max_predefined_alloc): New.
(predefined_alloc_mapping): Rename to ...
(predefined_omp_alloc_mapping): ... this.
(predefined_ompx_gnu_alloc_mapping): New.
(_Static_assert): Adjust for the new name, and add a new assert for the
new table.
(predefined_allocator_p): New.
(predefined_alloc_mapping): New.
(omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc.
Use predefined_allocator_p and predefined_alloc_mapping.
(omp_free): Likewise.
(omp_alligned_calloc): Likewise.
(omp_realloc): Likewise.
* env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc.
* libgomp.texi: Document ompx_gnu_pinned_mem_alloc.
* omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc.
* omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/allocate-pinned-1.f90: New test.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
|
|
The feature doesn't work on non-Linux hosts, at present, so skip the tests
entirely.
On Linux systems that have insufficient lockable memory configured we still
need to fail or else the feature won't be getting tested when we think it is,
but now there's a message to explain why.
libgomp/ChangeLog:
* testsuite/libgomp.c/alloc-pinned-1.c: Change dg-xfail-run-if to
dg-skip-if.
Correct spelling mistake.
Abort on insufficient lockable memory.
Use #error on non-linux hosts.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
|
|
The libffi.closures/single_entry_structs2.c test FAILs on 32-bit SPARC:
FAIL: libffi.closures/single_entry_structs2.c -W -Wall -Wno-psabi -O0
execution test
The issue has been reported, analyzed and fixed upstream:
Several tests FAIL on 32-bit Solaris/SPARC
https://github.com/libffi/libffi/issues/841
Therefore this patch imports the fix into the GCC tree.
Tested on sparc-sun-solaris2.11.
2024-07-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
libffi:
PR libffi/115681
* src/sparc/ffi.c (ffi_call_int): Copy structure arguments to
maintain call-by-value semantics.
|
|
The following fixes an ICE with a .COND_ADD discovered as reduction
even though its else value isn't the reduction chain link but a
constant. This would be wrong-code with --disable-checking I think.
PR tree-optimization/115723
* tree-vect-loop.cc (check_reduction_path): For a .COND_ADD
verify the else value also refers to the reduction chain op.
* gcc.dg/vect/pr115723.c: New testcase.
|
|
Update my email address.
ChangeLog:
* MAINTAINERS: Update claziss email address.
Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>
|
|
The following adds a missed check when forwprop attempts to rewrite
a complex store.
PR tree-optimization/115694
* tree-ssa-forwprop.cc (pass_forwprop::execute): Check the
store is complex before rewriting it.
* g++.dg/torture/pr115694.C: New testcase.
|
|
gcc/ChangeLog:
PR target/115517
* config/i386/mmx.md (vcond<mode>v2sf): Removed.
(vcond<MMXMODE124:mode><MMXMODEI:mode>): Ditto.
(vcond<mode><mode>): Ditto.
(vcondu<MMXMODE124:mode><MMXMODEI:mode>): Ditto.
(vcondu<mode><mode>): Ditto.
* config/i386/sse.md (vcond<V_512:mode><VF_512:mode>): Ditto.
(vcond<V_256:mode><VF_256:mode>): Ditto.
(vcond<V_128:mode><VF_128:mode>): Ditto.
(vcond<VI2HFBF_AVX512VL:mode><VHF_AVX512VL:mode>): Ditto.
(vcond<V_512:mode><VI_AVX512BW:mode>): Ditto.
(vcond<V_256:mode><VI_256:mode>): Ditto.
(vcond<V_128:mode><VI124_128:mode>): Ditto.
(vcond<VI8F_128:mode>v2di): Ditto.
(vcondu<V_512:mode><VI_AVX512BW:mode>): Ditto.
(vcondu<V_256:mode><VI_256:mode>): Ditto.
(vcondu<V_128:mode><VI124_128:mode>): Ditto.
(vcondu<VI8F_128:mode>v2di): Ditto.
(vcondeq<VI8F_128:mode>v2di): Ditto.
|
|
Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
and x < 0 ? 1 : 0 into (unsigned) x >> 31.
Add define_insn_and_split for the optimization did in
ix86_expand_int_vcond.
gcc/ChangeLog:
PR target/115517
* config/i386/sse.md ("*ashr<mode>3_1"): New
define_insn_and_split.
(*avx512_ashr<mode>3_1): Ditto.
(*avx2_lshr<mode>3_1): Ditto.
(*avx2_lshr<mode>3_2): Ditto and add 2 combine splitter after
it.
* config/i386/mmx.md (mmxscalarsize): New mode attribute.
(*mmw_ashr<mode>3_1): New define_insn_and_split.
("mmx_<insn><mode>3): Add a combine spiltter after it.
(*mmx_ashrv2hi3_1): New define_insn_and_plit, also add a
combine splitter after it.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr111023-2.c: Adjust testcase.
* gcc.target/i386/vect-div-1.c: Ditto.
|
|
> Richard suggests that we implement the "obvious" transforms like
> inversion in the middle-end but if for example unsigned compares
> are not supported the us_minus + eq + negative trick isn't on
> that list.
>
> The main reason to restrict vec_cmp would be to avoid
> a <= b ? c : d going with an unsupported vec_cmp but instead
> do a > b ? d : c - the alternative is trying to fix this
> on the RTL side via combine. I understand the non-native
Yes, I have a patch which can fix most regressions via pattern match
in combine.
Still there is a situation that is difficult to deal with, mainly the
optimization w/o sse4.1 . Because pblendvb/blendvps/blendvpd only
exists under sse4.1, w/o sse4.1, it takes 3
instructions (pand,pandn,por) to simulate the vcond_mask, and the
combine matches up to 4 instructions, which makes it currently
impossible to use the combine to recover those optimizations in the
vcond{,u,eq}.i.e min/max.
In the case of sse 4.1 and above, there is basically no regression anymore.
the regression testcases w/o sse4.1
FAIL: g++.target/i386/pr100637-1b.C -std=gnu++14 scan-assembler-times pcmpeqb 2
FAIL: g++.target/i386/pr100637-1b.C -std=gnu++17 scan-assembler-times pcmpeqb 2
FAIL: g++.target/i386/pr100637-1b.C -std=gnu++20 scan-assembler-times pcmpeqb 2
FAIL: g++.target/i386/pr100637-1b.C -std=gnu++98 scan-assembler-times pcmpeqb 2
FAIL: g++.target/i386/pr100637-1w.C -std=gnu++14 scan-assembler-times pcmpeqw 2
FAIL: g++.target/i386/pr100637-1w.C -std=gnu++17 scan-assembler-times pcmpeqw 2
FAIL: g++.target/i386/pr100637-1w.C -std=gnu++20 scan-assembler-times pcmpeqw 2
FAIL: g++.target/i386/pr100637-1w.C -std=gnu++98 scan-assembler-times pcmpeqw 2
FAIL: g++.target/i386/pr103861-1.C -std=gnu++14 scan-assembler-times pcmpeqb 2
FAIL: g++.target/i386/pr103861-1.C -std=gnu++17 scan-assembler-times pcmpeqb 2
FAIL: g++.target/i386/pr103861-1.C -std=gnu++20 scan-assembler-times pcmpeqb 2
FAIL: g++.target/i386/pr103861-1.C -std=gnu++98 scan-assembler-times pcmpeqb 2
FAIL: gcc.target/i386/pr88540.c scan-assembler minpd
gcc/testsuite/ChangeLog:
PR target/115517
* g++.target/i386/pr100637-1b.C: Add xfail and -mno-sse4.1.
* g++.target/i386/pr100637-1w.C: Ditto.
* g++.target/i386/pr103861-1.C: Ditto.
* gcc.target/i386/pr88540.c: Ditto.
* gcc.target/i386/pr103941-2.c: Add -mno-avx512f.
* g++.target/i386/sse4_1-pr100637-1b.C: New test.
* g++.target/i386/sse4_1-pr100637-1w.C: New test.
* g++.target/i386/sse4_1-pr103861-1.C: New test.
* gcc.target/i386/sse4_1-pr88540.c: New test.
|
|
gcc/ChangeLog:
PR target/115517
* config/i386/sse.md
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_lt_avx512): New
define_insn_and_split.
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_lt_avx512):
Ditto.
(*<sse2_avx2>_pmovmskb_lt_avx512): Ditto.
(*<sse2_avx2>_pmovmskb_zext_lt_avx512): Ditto.
(*sse2_pmovmskb_ext_lt_avx512): Ditto.
(*pmovsk_kmask_v16qi_avx512): Ditto.
(*pmovsk_mask_v32qi_avx512): Ditto.
(*pmovsk_mask_cmp_<mode>_avx512): Ditto.
(*pmovsk_ptest_<mode>_avx512): Ditto.
|
|
These versions of the min/max patterns implement exactly the operations
min = (op1 < op2 ? op1 : op2)
max = (!(op1 < op2) ? op1 : op2)
gcc/ChangeLog:
PR target/115517
* config/i386/sse.md (*minmax<mode>3_1): New pre_reload
define_insn_and_split.
(*minmax<mode>3_2): Ditto.
|
|
is vector -1/0.
gcc/ChangeLog
PR target/115517
* config/i386/sse.md
(*<avx512>_cvtmask2<ssemodesuffix><mode>_not): New pre_reload
splitter.
(*<avx512>_cvtmask2<ssemodesuffix><mode>_not): Ditto.
(*avx2_pcmp<mode>3_6): Ditto.
(*avx2_pcmp<mode>3_7): Ditto.
|
|
UNSPEC_BLENDV)
These define_insn_and_split are needed after vcond{,u,eq} is obsolete.
gcc/ChangeLog:
PR target/115517
* config/i386/sse.md
(*<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>_gt): New
define_insn_and_split.
(*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_gtint):
Ditto.
(*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_not_gtint):
Ditto.
(*<sse4_1_avx2>_pblendvb_gt): Ditto.
(*<sse4_1_avx2>_pblendvb_gt_subreg_not): Ditto.
|
|
Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also
define target_insn_cost to prevent post_reload pass_late_combine to
revert the optimziation did in pass_rpad.
Adjust testcases since pass_late_combine generates better code but
break scan assembly.
.i.e
Under 32-bit target, gcc used to generate broadcast from stack and
then do the real operation.
After flate_combine, they're combined into embeded broadcast
operations.
gcc/ChangeLog:
* config/i386/i386-features.cc (ix86_rpad_gate): New function.
* config/i386/i386-options.cc (ix86_override_options_after_change):
Don't disable flate_combine.
* config/i386/i386-passes.def: Move pass_stv2 and pass_rpad
after pre_reload pas_late_combine.
* config/i386/i386-protos.h (ix86_rpad_gate): New declare.
* config/i386/i386.cc (ix86_insn_cost): New function.
(TARGET_INSN_COST): Define.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Adjus
testcase.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Ditto.
* gcc.target/i386/avx512f-fmadd-sf-zmm-7.c: Ditto.
* gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Ditto.
* gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c: Ditto.
* gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c: Ditto.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Ditto.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Ditto.
* gcc.target/i386/pr91333.c: Ditto.
* gcc.target/i386/vect-strided-4.c: Ditto.
|
|
late_combine will combine lshift + zero into *lshifrtsi3_1_zext which
cause extra mov between gpr and kmask, add ?k to the pattern.
gcc/ChangeLog:
PR target/115610
* config/i386/i386.md (<*insnsi3_zext): Add alternative ?k,
enable it only for lshiftrt and under avx512bw.
* config/i386/sse.md (*klshrsi3_1_zext): New define_insn, and
add corresponding define_split after it.
|
|
The testcases are supposed to scan for vpopcnt{b,w,d,q} operations
with k mask, but mask is defined as uninitialized local variable which
will be set as 0 at rtl expand phase.
And it's further simplified off by late_combine which caused scan assembly failure.
Move the definition of mask outside to make the testcases more stable.
gcc/testsuite/ChangeLog:
PR target/115610
* gcc.target/i386/avx512bitalg-vpopcntb.c: Define mask as
extern instead of uninitialized local variables.
* gcc.target/i386/avx512bitalg-vpopcntbvl.c: Ditto.
* gcc.target/i386/avx512bitalg-vpopcntw.c: Ditto.
* gcc.target/i386/avx512bitalg-vpopcntwvl.c: Ditto.
* gcc.target/i386/avx512vpopcntdq-vpopcntd.c: Ditto.
* gcc.target/i386/avx512vpopcntdq-vpopcntq.c: Ditto.
|
|
|