Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/ada/
* Makefile.rtl (X86_TARGET_PAIRS): Use __builtin variant of
System.Atomic_Counters.
* libgnat/s-atocou__x86.adb: Remove.
|
|
My g:01f3e6a40e7202310abbeb41c345d325bd69554f broke the s390
build because the rclass variable was still needed by the
IRA_HARD_REGNO_ADD_COST_MULTIPLIER code.
gcc/
* ira-costs.c (ira_tune_allocno_costs): Fix missing rclass
definition in IRA_HARD_REGNO_ADD_COST_MULTIPLIER code.
|
|
This is the third iteration of a patch to perceive MULT_HIGHPART_EXPR
in the middle-end. As they say "the third time's a charm". The first
version implemented this in match.pd, which was considered too early.
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551316.html
The second version attempted to do this during RTL expansion, and was
considered to be too late in the middle-end.
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576922.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576923.html
This latest version incorporates Richard Biener's feedback/suggestion
to perceive MULT_HIGHPART_EXPR in one of the "instruction selection
passes", specifically tree-ssa-math-opts, where the recognition of
highpart multiplications takes place in the same pass as widening
multiplications.
With each rewrite, the patch is also getting more aggressive in the
set of widening multiplications that it recognizes as highpart multiplies.
Currently any widening multiplication followed by a right shift (either
signed or unsigned) by a bit count sufficient to eliminate the lowpart
is recognized. The result of this shift doesn't need to be truncated.
As previously, this patch confirms the target provides a suitable
optab before introducing the MULT_HIGHPART_EXPR. This is the reason
the testcase is restricted to x86_64, as this pass doesn't do anything
on some platforms, but x86_64 should be sufficient to confirm that the
pass is working/continues to work.
2022-01-11 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
* tree-ssa-math-opts.c (struct widen_mul_stats): Add a
highpart_mults_inserted field.
(convert_mult_to_highpart): New function to convert right shift
of a widening multiply into a MULT_HIGHPART_EXPR.
(math_opts_dom_walker::after_dom_children) [RSHIFT_EXPR]:
Call new convert_mult_to_highpart function.
(pass_optimize_widening_mul::execute): Add a statistics counter
for tracking "highpart multiplications inserted" events.
gcc/testsuite/ChangeLog
* gcc.target/i386/mult-highpart.c: New test case.
|
|
Add specialized version to combine two instructions from
9: {r123:CC=cmp(r124:DI&0x600000000,0);clobber scratch;}
REG_DEAD r124:DI
10: pc={(r123:CC==0)?L15:pc}
REG_DEAD r123:CC
to:
10: {pc={(r123:DI&0x600000000==0)?L15:pc};clobber scratch;clobber %0:CC;}
then split2 will split it to one rotate dot instruction (to save one
rotate back instruction) as shifted result doesn't matter when comparing
to 0 in CCEQmode.
Bootstrapped and regression tested pass on Power 8/9/10.
gcc/ChangeLog:
PR target/102239
* config/rs6000/rs6000-protos.h (rs6000_is_valid_rotate_dot_mask): New
declare.
* config/rs6000/rs6000.c (rs6000_is_valid_rotate_dot_mask): New
function.
* config/rs6000/rs6000.md (*branch_anddi3_dot): New.
gcc/testsuite/ChangeLog:
PR target/102239
* gcc.target/powerpc/pr102239.c: New test.
|
|
Since we now save the option in the "switches" table
to let specs use it more generally, we need to explicitly
state that the option was validated else the driver
will consider it "unrecognized".
2022-01-05 Olivier Hainque <hainque@adacore.com>
* gcc.c (driver_handle_option): State --sysroot as
validated.
|
|
Checking for one_only/weak support is better done
before deciding to turn references to __cxa_pure_virtual weak.
This helps at least on VxWorks where one_only / weak support
varies between kernel and rtp modes as well as across VxWorks
versions.
2021-12-30 Olivier Hainque <hainque@adacore.com>
gcc/cp/
* decl.c (cxx_init_decl_processing): Move code possibly
altering flag_weak before code testing it.
|
|
r12-6087 will avoid move cold bb out of hot loop, while the original
intent of this testcase is to hoist divides out of loop and CSE them to
only one divide. So increase the loop count to turn the cold bb to hot
bb again. Then the 3 divides could be rewritten with same reciptmp.
Tested pass on Power-Linux {32,64}, x86 {64,32} and i686-linux.
gcc/testsuite/ChangeLog:
PR testsuite/103820
* gcc.dg/tree-ssa/recip-3.c: Adjust.
|
|
Option -mpower10 was made as "WarnRemoved" since commit r11-2318,
so -mno-power10 doesn't take effect any more. This patch is to
remove one line code which respects it.
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_disable_incompatible_switches): Remove
useless related to option -mno-power10.
|
|
andnot insn.
This can do optimization like
- pcmpeqd %xmm0, %xmm0
- pxor g(%rip), %xmm0
- pand %xmm1, %xmm0
+ movdqa g(%rip), %xmm0
+ pandn %xmm1, %xmm0
gcc/ChangeLog:
PR target/53652
* config/i386/sse.md (*andnot<mode>3): Extend predicate of
operands[1] from register_operand to vector_operand.
gcc/testsuite/ChangeLog:
PR target/53652
* gcc.target/i386/pr53652-1.c: New test.
|
|
|
|
Add V2QImode vector compares with SSE registers.
2022-01-10 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/103861
* config/i386/i386-expand.c (ix86_expand_int_sse_cmp):
Handle V2QImode.
* config/i386/mmx.md (<sat_plusminus:insn><mode>3):
Use VI1_16_32 mode iterator.
(*eq<mode>3): Ditto.
(*gt<mode>3): Ditto.
(*xop_maskcmp<mode>3): Ditto.
(*xop_maskcmp_uns<mode>3): Ditto.
(vec_cmp<mode><mode>): Ditto.
(vec_cmpu<mode><mode>): Ditto.
gcc/testsuite/ChangeLog:
PR target/103861
* gcc.target/i386/pr103861-2.c: New test.
|
|
r12-136 made us canonicalize an object/offset pair with negative offset
into one with a nonnegative offset, by iteratively absorbing the
innermost component into the offset and stopping as soon as the offset
becomes nonnegative.
This patch strengthens this transformation by making it keep on absorbing
even if the offset is already 0 as long as the innermost component is at
position 0 (and thus absorbing doesn't change the offset). This lets us
accept the two constexpr testcases below, which we'd previously reject
essentially because cxx_fold_indirect_ref would be unable to resolve
*(B*)&b.D123 (where D123 is the base A subobject at position 0) to just b.
PR c++/103879
gcc/cp/ChangeLog:
* constexpr.c (cxx_fold_indirect_ref): Split out object/offset
canonicalization step into a local lambda. Strengthen it to
absorb more components at position 0. Use it before both calls
to cxx_fold_indirect_ref_1.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/constexpr-base2.C: New test.
* g++.dg/cpp1y/constexpr-base2a.C: New test.
|
|
Here we're rejecting the calls to g1 and g2 as ambiguous even though one
overload is more constrained than the other (and they're otherwise tied),
because the implicit 'this' parameter of the non-static overload causes
cand_parms_match to think the function parameter lists aren't equivalent.
This patch fixes this by making cand_parms_match skip over 'this'
appropriately. Note that this bug only affects partial ordering of
non-template member functions because for member function templates
more_specialized_fn seems to already skip over 'this' appropriately.
PR c++/103783
gcc/cp/ChangeLog:
* call.c (cand_parms_match): Skip over 'this' when given one
static and one non-static member function.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-memfun2.C: New test.
|
|
Immediate functions should never be emitted into assembly, the FE doesn't
genericize them and does various things to ensure they aren't gimplified.
But the following testcase ICEs anyway due to that, because the consteval
function returns a lambda, and operator() of the lambda has
decl_function_context of the consteval function. cgraphunit.c then
does:
/* Preserve a functions function context node. It will
later be needed to output debug info. */
if (tree fn = decl_function_context (decl))
{
cgraph_node *origin_node = cgraph_node::get_create (fn);
enqueue_node (origin_node);
}
which enqueues the immediate function and then tries to gimplify it,
which results in ICE because it hasn't been genericized.
When I try similar testcase with constexpr instead of consteval and
static constinit auto instead of auto in main, what happens is that
the functions are gimplified, later ipa.c discovers they aren't reachable
and sets body_removed to true for them (and clears other flags) and we end
up with a debug info which has the foo and bar functions without
DW_AT_low_pc and other code specific attributes, just stuff from its BLOCK
structure and in there the lambda with DW_AT_low_pc etc.
The following patch attempts to emulate that behavior early, so that cgraph
doesn't try to gimplify those and pretends they were already gimplified
and found unused and optimized away.
2022-01-10 Jakub Jelinek <jakub@redhat.com>
PR c++/103912
* semantics.c (expand_or_defer_fn): For immediate functions, set
node->body_removed to true and clear analyzed, definition and
force_output.
* decl2.c (c_parse_final_cleanups): Ignore immediate functions for
expand_or_defer_fn.
* g++.dg/cpp2a/consteval26.C: New test.
|
|
Currently, expand_vector_condition detects only vcondMN and vconduMN
named RTX patterns. Teach it to also consider vec_cmpMN and vec_cmpuMN
RTX patterns when all ones vector is returned for true and all zeros vector
is returned for false.
2022-01-10 Richard Biener <rguenther@suse.de>
gcc/ChangeLog:
PR tree-optimization/103948
* tree-vect-generic.c (expand_vector_condition): Return true if
all ones vector is returned for true, all zeros vector for false
and the target defines corresponding vec_cmp{,u}MN named RTX pattern.
|
|
Power10 ISA added `xxblendv*` instructions which are realized in the
`vec_blendv` instrinsic.
Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and
`_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`.
Update original implementation of _mm_blendv_epi8 to use signed types,
to better match the function parameters. Realization is unchanged.
Also, copy a test from i386 for testing `_mm_blendv_ps`.
This should have come with commit ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8,
but was inadvertently omitted.
2022-01-10 Paul A. Clarke <pc@us.ibm.com>
gcc
* config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv
when _ARCH_PWR10. Use signed types.
(_mm_blendv_ps): Use vec_blendv when _ARCH_PWR10.
(_mm_blendv_pd): Likewise.
gcc/testsuite
* gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386,
adjust dg directives to suit.
|
|
gcc/ChangeLog:
* tree-vectorizer.c (better_epilogue_loop_than_p): Round factors up for
epilogue costing.
* tree-vect-loop.c (vect_analyze_loop): Re-analyze all modes for
epilogues, unless we are guaranteed that we can't have partial vectors.
* genopinit.c: (partial_vectors_supported): Generate new function.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/masked_epilogue.c: New test.
|
|
2022-01-10 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/103366
* trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): Allow unlimited
polymorphic actual argument passed to assumed type formal.
gcc/testsuite/
PR fortran/103366
* gfortran.dg/pr103366.f90: New test.
|
|
zero width bitfield ABI changes [PR102024]
For zero-width bitfields current GCC classify_argument does:
if (DECL_BIT_FIELD (field))
{
for (i = (int_bit_position (field)
+ (bit_offset % 64)) / 8 / 8;
i < ((int_bit_position (field) + (bit_offset % 64))
+ tree_to_shwi (DECL_SIZE (field))
+ 63) / 8 / 8; i++)
classes[i]
= merge_classes (X86_64_INTEGER_CLASS, classes[i]);
}
which I think means that if the zero-width bitfields are at bit-positions
(in the toplevel aggregate) which are multiples of 64 bits doesn't do
anything, (int_bit_position (field) + (bit_offset % 64)) / 64 and
(int_bit_position (field) + (bit_offset % 64) + 63) / 64 should be equal.
But for zero-width bitfields at other bit positions it will call
merge_classes once. Now, the typical case is that the zero width bitfield
is surrounded by some bitfields and in that case, it doesn't change
anything, but it can be sandwitched in between floats too as the testcases
show.
In C we had this behavior, in C++ previously the FE was removing the
zero-width bitfields and therefore they were ignored.
LLVM and ICC seems to ignore those bitfields both in C and C++ (== passing
struct S in SSE register rather than in GPR).
The x86-64 psABI has been recently clarified by
https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/1aa4398d26c250b252a0c4a0f777216c9a6789ec
that zero width bitfield should be always ignored.
This patch implements that and emits a warning for C for cases where the ABI
changed from GCC 11.
2022-01-10 Jakub Jelinek <jakub@redhat.com>
PR target/102024
* config/i386/i386.c (classify_argument): Add zero_width_bitfields
argument, when seeing DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD bitfields,
always ignore them, when seeing other zero sized bitfields, either
set zero_width_bitfields to 1 and ignore it or if equal to 2 process
it. Pass it to recursive calls. Add wrapper
with old arguments and diagnose ABI differences for C structures
with zero width bitfields. Formatting fixes.
* gcc.target/i386/pr102024.c: New test.
* g++.target/i386/pr102024.C: New test.
|
|
This patch looks for allocno conflicts of the following form:
- One allocno (X) is a cap allocno for some non-cap allocno X2.
- X2 belongs to some loop L2.
- The other allocno (Y) is a non-cap allocno.
- Y is an ancestor of some allocno Y2 in L2.
- Y2 is not referenced in L2 (that is, ALLOCNO_NREFS (Y2) == 0).
- Y can use a different allocation from Y2.
In this case, Y's register is live across L2 but is not used within it,
whereas X's register is used only within L2. The conflict is therefore
only "soft", in that it can easily be avoided by spilling Y2 inside L2
without affecting any insn references.
In principle we could do this for ALLOCNO_NREFS (Y2) != 0 too, with the
callers then taking Y2's ALLOCNO_MEMORY_COST into account. There would
then be no "cliff edge" between a Y2 that has no references and a Y2 that
has (say) a single cold reference.
However, doing that isn't necessary for the PR and seems to give
variable results in practice. (fotonik3d_r improves slightly but
namd_r regresses slightly.) It therefore seemed better to start
with the higher-value zero-reference case and see how things go.
On top of the previous patches in the series, this fixes the exchange2
regression seen in GCC 11.
gcc/
PR rtl-optimization/98782
* ira-int.h (ira_soft_conflict): Declare.
* ira-color.c (max_soft_conflict_loop_depth): New constant.
(ira_soft_conflict): New function.
(spill_soft_conflicts): Likewise.
(assign_hard_reg): Use them to handle the case described by
the comment above ira_soft_conflict.
(improve_allocation): Likewise.
* ira.c (check_allocation): Allow allocnos with "soft" conflicts
to share the same register.
gcc/testsuite/
* gcc.target/aarch64/reg-alloc-4.c: New test.
|
|
If an allocno A in an inner loop L spans a call, a parent allocno AP
can choose to handle a call-clobbered/caller-saved hard register R
in one of two ways:
(1) save R before each call in L and restore R after each call
(2) spill R to memory throughout L
(2) can be cheaper than (1) in some cases, particularly if L does
not reference A.
Before the patch we always did (1). The patch adds support for
picking (2) instead, when it seems cheaper. It builds on the
earlier support for not propagating conflicts to parent allocnos.
gcc/
PR rtl-optimization/98782
* ira-int.h (ira_caller_save_cost): New function.
(ira_caller_save_loop_spill_p): Likewise.
* ira-build.c (ira_propagate_hard_reg_costs): Test whether it is
cheaper to spill a call-clobbered register throughout a loop rather
than spill it around each individual call. If so, treat all
call-clobbered registers as conflicts and...
(propagate_allocno_info): ...do not propagate call information
from the child to the parent.
* ira-color.c (move_spill_restore): Update accordingly.
* ira-costs.c (ira_tune_allocno_costs): Use ira_caller_save_cost.
gcc/testsuite/
* gcc.target/aarch64/reg-alloc-3.c: New test.
|
|
Suppose that:
- an inner loop L contains an allocno A
- L clobbers hard register R while A is live
- A's parent allocno is AP
Previously, propagate_allocno_info would propagate conflict sets up the
loop tree, so that the conflict between A and R would become a conflict
between AP and R (and so on for ancestors of AP).
However, when IRA treats loops as separate allocation regions, it can
decide on a loop-by-loop basis whether to allocate a register or spill
to memory. Conflicts in inner loops therefore don't need to become
hard conflicts in parent loops. Instead we can record that using the
“conflicting” registers for the parent allocnos has a higher cost.
In the example above, this higher cost is the sum of:
- the cost of saving R on entry to L
- the cost of keeping the pseudo register in memory throughout L
- the cost of reloading R on exit from L
This value is also a cap on the hard register cost that A can contribute
to AP in general (not just for conflicts). Whatever allocation we pick
for AP, there is always the option of spilling that register to memory
throughout L, so the cost to A of allocating a register to AP can't be
more than the cost of spilling A.
To take an extreme example: if allocating a register R2 to A is more
expensive than spilling A to memory, ALLOCNO_HARD_REG_COSTS (A)[R2]
could be (say) 2 times greater than ALLOCNO_MEMORY_COST (A) or 100
times greater than ALLOCNO_MEMORY_COST (A). But this scale factor
doesn't matter to AP. All that matters is that R2 is more expensive
than memory for A, so that allocating R2 to AP should be costed as
spilling A to memory (again assuming that A and AP are in different
allocation regions). Propagating a factor of 100 would distort the
register costs for AP.
move_spill_restore tries to undo the propagation done by
propagate_allocno_info, so we need some extra processing there.
gcc/
PR rtl-optimization/98782
* ira-int.h (ira_allocno::might_conflict_with_parent_p): New field.
(ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P): New macro.
(ira_single_region_allocno_p): New function.
(ira_total_conflict_hard_regs): Likewise.
* ira-build.c (ira_create_allocno): Initialize
ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P.
(ira_propagate_hard_reg_costs): New function.
(propagate_allocno_info): Use it. Try to avoid propagating
hard register conflicts to parent allocnos if we can handle
the conflicts by spilling instead. Limit the propagated
register costs to the cost of spilling throughout the child loop.
* ira-color.c (color_pass): Use ira_single_region_allocno_p to
test whether a child and parent allocno can share the same
register.
(move_spill_restore): Adjust for the new behavior of
propagate_allocno_info.
gcc/testsuite/
* gcc.target/aarch64/reg-alloc-2.c: New test.
|
|
color_pass has two instances of the same code for propagating non-cap
assignments from parent loops to subloops. This patch adds a helper
function for testing when such propagations are required for correctness
and uses it to remove the duplicated code.
A later patch will use this in ira-build.c too, which is why the
function is exported to ira-int.h.
No functional change intended.
gcc/
PR rtl-optimization/98782
* ira-int.h (ira_subloop_allocnos_can_differ_p): New function,
extracted from...
* ira-color.c (color_pass): ...here.
|
|
This patch adds comments to describe each use of ira_loop_border_costs.
I think this highlights that move_spill_restore was using the wrong cost
in one case, which came from tranposing [0] and [1] in the original
(pre-ira_loop_border_costs) ira_memory_move_cost expressions. The
difference would only be noticeable on targets that distinguish between
load and store costs.
gcc/
PR rtl-optimization/98782
* ira-color.c (color_pass): Add comments to describe the spill costs.
(move_spill_restore): Likewise. Fix reversed calculation.
|
|
The final index into (ira_)memory_move_cost is 1 for loads and
0 for stores. Thus the combination:
entry_freq * memory_cost[1] + exit_freq * memory_cost[0]
is the cost of loading a register on entry to a loop and
storing it back on exit from the loop. This is the cost to
use if the register is successfully allocated within the
loop but is spilled in the parent loop. Similarly:
entry_freq * memory_cost[0] + exit_freq * memory_cost[1]
is the cost of storing a register on entry to the loop and
restoring it on exit from the loop. This is the cost to
use if the register is spilled within the loop but is
successfully allocated in the parent loop.
The patch adds a helper class for calculating these values and
mechanically replaces the existing instances. There is no attempt to
editorialise the choice between using “spill inside” and “spill outside”
costs. (I think one of them is the wrong way round, but a later patch
deals with that.)
No functional change intended.
gcc/
PR rtl-optimization/98782
* ira-int.h (ira_loop_border_costs): New class.
* ira-color.c (ira_loop_border_costs::ira_loop_border_costs):
New constructor.
(calculate_allocno_spill_cost): Use ira_loop_border_costs.
(color_pass): Likewise.
(move_spill_restore): Likewise.
|
|
The PR uncovered that -freorder-blocks-and-partition was working by accident
on 64-bit Windows, i.e. the middle-end was supposed to disable it with SEH.
After the change installed on mainline, the middle-end properly disables it,
which is too bad since a significant amount of work went into it for SEH.
gcc/
PR target/103465
* coretypes.h (unwind_info_type): Swap UI_SEH and UI_TARGET.
|
|
We use the issignaling macro, present in some libc's (notably glibc),
when it is available. Compile all IEEE-related files in the library
(both C and Fortran sources) with -fsignaling-nans to ensure maximum
compatibility.
libgfortran/ChangeLog:
PR fortran/82207
* Makefile.am: Pass -fsignaling-nans for IEEE files.
* Makefile.in: Regenerate.
* ieee/ieee_helper.c: Use issignaling macro to recognized
signaling NaNs.
gcc/testsuite/ChangeLog:
PR fortran/82207
* gfortran.dg/ieee/signaling_1.f90: New test.
* gfortran.dg/ieee/signaling_1_c.c: New file.
|
|
This makes __builtin_shufflevector lowering force the result
of the BIT_FIELD_REF lowpart operation to a temporary as to
fulfil the IL verifier constraint that BIT_FIELD_REFs should
be always in outermost handled component position. Trying to
enforce this during gimplification isn't as straight-forward
as here where we know we're dealing with an rvalue.
FAIL: c-c++-common/torture/builtin-shufflevector-1.c -O0 execution test
2022-01-05 Richard Biener <rguenther@suse.de>
PR middle-end/101530
gcc/c-family/
* c-common.c (c_build_shufflevector): Wrap the BIT_FIELD_REF
in a TARGET_EXPR to force a temporary.
gcc/testsuite/
* c-c++-common/builtin-shufflevector-3.c: New testcase.
|
|
This fixes a mistake done with r8-5008 when introducing
allow_peel to the unroll code. The intent was to allow
peeling that doesn't grow code but the result was that
with -O3 and UL_ALL this wasn't done. The following
instantiates the desired effect by adjusting ul to UL_NO_GROWTH
if peeling is not allowed.
2022-01-05 Richard Biener <rguenther@suse.de>
PR tree-optimization/100359
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely):
Allow non-growing peeling with !allow_peel and UL_ALL.
* gcc.dg/tree-ssa/pr100359.c: New testcase.
|
|
gcc/ada/
* gcc-interface/trans.c (Identifier_to_gnu): Use correct subtype.
(elaborate_profile): New function.
(Call_to_gnu): Call it on the formals and the result type before
retrieving the translated result type from the subprogram type.
|
|
gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity) <E_Record_Type>: Fix
computation of boolean result in the unchecked union case.
(components_to_record): Rename MAYBE_UNUSED parameter to IN_VARIANT
and remove local variable of the same name. Pass NULL recursively
as P_GNU_REP_LIST for nested variants in the unchecked union case.
|
|
gcc/ada/
* gcc-interface/trans.c (lvalue_required_p) <N_Pragma>: New case.
<N_Pragma_Argument_Association>: Likewise.
(Pragma_to_gnu) <Pragma_Inspection_Point>: Fetch the corresponding
variable of a constant before marking it as addressable.
|
|
gcc/ada/
* gcc-interface/Make-lang.in (ADA_GENERATED_FILES): Remove
s-casuti.ad?, s-crtl.ad?, s-os_lib.ad?. Update list of object
files accordingly.
|
|
gcc/ada/
* libgnat/s-atopri.ads (Atomic_Compare_Exchange): Replaces
deprecated Sync_Compare_And_Swap.
* libgnat/s-atopri.adb (Lock_Free_Try_Write): Switch from __sync
to __atomic builtins.
|
|
gcc/ada/
* libgnat/s-exponn.adb, libgnat/s-expont.adb,
libgnat/s-exponu.adb, libgnat/s-widthi.adb,
libgnat/s-widthu.adb: Remove CodePeer annotations for pragma
Loop_Variant.
|
|
gcc/ada/
* exp_prag.adb (Expand_Pragma_Loop_Variant): Disable expansion
in CodePeer mode.
|
|
gcc/ada/
* sem_util.adb (Is_Child_Or_Sibling): Fix typo in comment.
|
|
gcc/ada/
* exp_pakd.adb (Install_PAT): If the PAT is a scalar type, apply
the canonical adjustment to its alignment.
|
|
gcc/ada/
* libgnat/s-atocou__builtin.adb (Decrement, Increment): Switch
from __sync to __atomic builtins; use 'Address to be consistent
with System.Atomic_Primitives.
|
|
gcc/ada/
* exp_pakd.adb (Install_PAT): Do not reset the alignment here.
* layout.adb (Layout_Type): Call Adjust_Esize_Alignment after having
copied the RM_Size onto the Esize when the latter is too small.
|
|
gcc/ada/
* sem_warn.adb (Check_References): Handle arrays of tasks
similar to task objects.
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/103777
* simplify.c (gfc_simplify_maskr): Check validity of argument 'I'
before simplifying.
(gfc_simplify_maskl): Likewise.
gcc/testsuite/ChangeLog:
PR fortran/103777
* gfortran.dg/masklr_3.f90: New test.
|
|
gcc/fortran/ChangeLog:
PR fortran/101762
* expr.c (gfc_check_pointer_assign): For pointer initialization
targets, check that subscripts and substring indices in
specifications are constant expressions.
gcc/testsuite/ChangeLog:
PR fortran/101762
* gfortran.dg/pr101762.f90: New test.
|
|
After PR97896 for which some code was added to ignore the KIND argument
of the INDEX intrinsics, and PR87711 for which that was extended to LEN_TRIM
as well, this propagates it further to MASKL, MASKR, SCAN and VERIFY.
PR fortran/103789
gcc/fortran/ChangeLog:
* trans-array.c (arg_evaluated_for_scalarization): Add MASKL, MASKR,
SCAN and VERIFY to the list of intrinsics whose KIND argument is to be
ignored.
gcc/testsuite/ChangeLog:
* gfortran.dg/maskl_1.f90: New test.
* gfortran.dg/maskr_1.f90: New test.
* gfortran.dg/scan_3.f90: New test.
* gfortran.dg/verify_3.f90: New test.
|
|
nios2-elf target defaults to -fno-delete-null-pointer-checks, breaking
tests that implicitly depend on that optimization. Add the option
explicitly on these tests.
2022-01-08 Sandra Loosemore <sandra@codesourcery.com>
gcc/testsuite/
* g++.dg/cpp0x/constexpr-compare1.C: Add explicit
-fdelete-null-pointer-checks option.
* g++.dg/cpp0x/constexpr-compare2.C: Likewise.
* g++.dg/cpp0x/constexpr-typeid2.C: Likewise.
* g++.dg/cpp1y/constexpr-94716.C: Likewise.
* g++.dg/cpp1z/constexpr-compare1.C: Likewise.
* g++.dg/cpp1z/constexpr-if36.C: Likewise.
* gcc.dg/init-compare-1.c: Likewise.
libstdc++-v3/
* testsuite/18_support/type_info/constexpr.cc: Add explicit
-fdelete-null-pointer-checks option.
|
|
|
|
This patch improves the code generated when moving a 128-bit value
in TImode, represented by two 64-bit registers, to V1TImode, which
is a single SSE register.
Currently, the simple move:
typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
uv1ti foo(__int128 x) { return (uv1ti)x; }
is always transferred via memory, as:
foo: movq %rdi, -24(%rsp)
movq %rsi, -16(%rsp)
movdqa -24(%rsp), %xmm0
ret
with this patch, we now generate (with -msse2):
foo: movq %rdi, %xmm1
movq %rsi, %xmm2
punpcklqdq %xmm2, %xmm1
movdqa %xmm1, %xmm0
ret
and with -mavx2:
foo: vmovq %rdi, %xmm1
vpinsrq $1, %rsi, %xmm1, %xmm0
ret
Even more dramatic is the improvement of zero extended transfers.
uv1ti bar(unsigned char c) { return (uv1ti)(__int128)c; }
Previously generated:
bar: movq $0, -16(%rsp)
movzbl %dil, %eax
movq %rax, -24(%rsp)
vmovdqa -24(%rsp), %xmm0
ret
Now generates:
bar: movzbl %dil, %edi
movq %rdi, %xmm0
ret
My first attempt at this functionality attempted to use a simple
define_split, but unfortunately, this triggers very late during the
compilation preventing some of the simplifications we'd like (in
combine). For example the foo case above becomes:
foo: movq %rsi, -16(%rsp)
movq %rdi, %xmm0
movhps -16(%rsp), %xmm0
transferring half directly, and the other half via memory.
And for the bar case above, GCC fails to appreciate that
movq/vmovq clears the high bits, resulting in:
bar: movzbl %dil, %eax
xorl %edx, %edx
vmovq %rax, %xmm1
vpinsrq $1, %rdx, %xmm1, %xmm0
ret
Hence the solution (i.e. this patch) is to add a special case
to ix86_expand_vector_move for TImode to V1TImode transfers.
2022-01-08 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.c (ix86_expand_vector_move): Add
special case for TImode to V1TImode moves, going via V2DImode.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-v1ti-mov-1.c: New test case.
* gcc.target/i386/sse2-v1ti-zext.c: New test case.
|
|
== &var2 + 24 [PR89074]
The match.pd address_comparison simplification can only handle
ADDR_EXPR comparisons possibly converted to some other type (I wonder
if we shouldn't restrict it in address_compare to casts to pointer
types or pointer-sized integer types, I think we shouldn't optimize
(short) (&var) == (short) (&var2) because we really don't know whether
it will be true or false). On GIMPLE, most of pointer to pointer
casts are useless and optimized away and further we have in
gimple_fold_stmt_to_constant_1 an optimization that folds
&something p+ const_int
into
&MEM_REF[..., off]
On GENERIC, we don't do that and e.g. for constant evaluation it
could be pretty harmful if e.g. such pointers are dereferenced, because
it can lose what exact field it was starting with etc., all it knows
is the base and offset, type and alias set.
Instead of teaching the match.pd address_compare about 3 extra variants
where one or both compared operands are pointer_plus, this patch attempts
to fold operands of comparisons similarly to gimple_fold_stmt_to_constant_1
before calling fold_binary on it.
There is another thing though, while we do have (x p+ y) p+ z to
x p+ (y + z) simplification which works on GIMPLE well because of the
useless pointer conversions, on GENERIC we can have pointer casts in between
and at that point we can end up with large expressions like
((type3) (((type2) ((type1) (&var + 2) + 2) + 2) + 2))
etc. Pointer-plus doesn't really care what exact pointer type it has as
long as it is a pointer, so the following match.pd simplification for
GENERIC only (it is useless for GIMPLE) also moves the cast so that nested
p+ can be simplified.
Note, I've noticed we don't really diagnose going out of bounds with
pointer_plus (unlike e.g. with ARRAY_REF) during constant evaluation, I
think another patch for cxx_eval_binary_expression with POINTER_PLUS will be
needed. But it isn't clear to me what exactly it should do in case of
subobjects. If we start with address of a whole var, (&var), I guess we
should diagnose if the pointer_plus gets before start of the var (i.e.
"negative") or 1 byte past the end of the var, but what if we start with
&var.field or &var.field[3] ? For &var.field, shall we diagnose out of
bounds of field (except perhaps flexible members?) or the whole var?
For ARRAY_REFs, I assume we must at least strip all the outer ARRAY_REFs
and so start with &var.field too, right?
2022-01-08 Jakub Jelinek <jakub@redhat.com>
PR c++/89074
gcc/
* match.pd ((ptr) (x p+ y) p+ z -> (ptr) (x p+ (y + z))): New GENERIC
simplification.
gcc/cp/
* constexpr.c (cxx_maybe_fold_addr_pointer_plus): New function.
(cxx_eval_binary_expression): Use it.
gcc/testsuite/
* g++.dg/cpp1y/constexpr-89074-2.C: New test.
* g++.dg/cpp1z/constexpr-89074-1.C: New test.
|
|
In the patch for PR92385 I added asserts to see if we tried to make a
vec_init of a vec_init, but didn't see any in regression testing. This
testcase is one case, which seems reasonable: we create a VEC_INIT_EXPR for
the aggregate initializer, and then again to express the actual
initialization of the member. We already do similar collapsing of
TARGET_EXPR. So let's just remove the asserts.
PR c++/103946
gcc/cp/ChangeLog:
* init.c (build_vec_init): Remove assert.
* tree.c (build_vec_init_expr): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/nsdmi-array1.C: New test.
|