Age | Commit message (Collapse) | Author | Files | Lines |
|
The following re-orders checks to make sure we check TYPE_PRECISION
on an integral type.
PR tree-optimization/110506
* tree-vect-patterns.cc (vect_recog_rotate_pattern): Re-order
TYPE_PRECISION access with INTEGRAL_TYPE_P check.
* gcc.dg/pr110506-2.c: New testcase.
|
|
get_value_for_expr was blindlessly using TYPE_PRECISION to produce
a mask for vector typed entities which the new tree checking now
catches.
PR tree-optimization/110506
* tree-ssa-ccp.cc (get_value_for_expr): Check for integral
type before relying on TYPE_PRECISION to produce a nonzero mask.
* gcc.dg/pr110506.c: New testcase.
|
|
As discussed in the PR, the testcase needs
/* { dg-require-effective-target vect_float_strict } */
2023-02-03 Andrew Pinski <apinski@marvell.com>
PR tree-optimization/110381
gcc/testsuite/
* gcc.dg/vect/pr110381.c: Add vect_float_strict.
|
|
This patch allows mips16e2 acts the same with -O1~3
when generating ZEB/ZEH instead of ANDI under
the -O0 option, which shrinks the code size.
gcc/ChangeLog:
* config/mips/mips.md(*and<mode>3_mips16): Generates
ZEB/ZEH instructions.
|
|
This patch adds CACHE instruction from mips16e2
with corresponding tests.
gcc/ChangeLog:
* config/mips/mips.cc(mips_9bit_offset_address_p): Restrict the
address register to M16_REGS for MIPS16.
(BUILTIN_AVAIL_MIPS16E2): Defined a new macro.
(AVAIL_MIPS16E2_OR_NON_MIPS16): Same as above.
(AVAIL_NON_MIPS16 (cache..)): Update to
AVAIL_MIPS16E2_OR_NON_MIPS16.
* config/mips/mips.h (ISA_HAS_CACHE): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md (mips_cache): Mark as extended MIPS16.
gcc/testsuite/ChangeLog:
* gcc.target/mips/mips16e2-cache.c: New tests for mips16e2.
|
|
The MIPS16e2 ASE has PREF, LL and SC instructions,
they use 9 bits immediate, like mips32r6.
The MIPS32 PRE-R6 uses 16 bits immediate.
gcc/ChangeLog:
* config/mips/mips.h(ISA_HAS_9BIT_DISPLACEMENT): Add clause
for ISA_HAS_MIPS16E2.
(ISA_HAS_SYNC): Same as above.
(ISA_HAS_LL_SC): Same as above.
|
|
This patch adds LWL/LWR, SWL/SWR instructions with their
corresponding tests.
gcc/ChangeLog:
* config/mips/mips.cc(mips_expand_ins_as_unaligned_store):
Add logics for generating instruction.
* config/mips/mips.h(ISA_HAS_LWL_LWR): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md(mov_<load>l): Generates instructions.
(mov_<load>r): Same as above.
(mov_<store>l): Adjusted for the conditions above.
(mov_<store>r): Same as above.
(mov_<store>l_mips16e2): Add machine description for `define_insn mov_<store>l_mips16e2`.
(mov_<store>r_mips16e2): Add machine description for `define_insn mov_<store>r_mips16e2`.
gcc/testsuite/ChangeLog:
* gcc.target/mips/mips16e2.c: New tests for mips16e2.
|
|
This patch adds LUI instruction from mips16e2
with corresponding test.
gcc/ChangeLog:
* config/mips/mips.cc(mips_symbol_insns_1): Generates LUI instruction.
(mips_const_insns): Same as above.
(mips_output_move): Same as above.
(mips_output_function_prologue): Same as above.
* config/mips/mips.md: Same as above
gcc/testsuite/ChangeLog:
* gcc.target/mips/mips16e2.c: Add new tests for mips16e2.
|
|
There are shortened bitwise instructions in the mips16e2 ASE,
for instance, ANDI, ORI/XORI, EXT, INS etc. .
This patch adds these instrutions with corresponding tests.
gcc/ChangeLog:
* config/mips/constraints.md(Yz): New constraints for mips16e2.
* config/mips/mips-protos.h(mips_bit_clear_p): Declared new function.
(mips_bit_clear_info): Same as above.
* config/mips/mips.cc(mips_bit_clear_info): New function for
generating instructions.
(mips_bit_clear_p): Same as above.
* config/mips/mips.h(ISA_HAS_EXT_INS): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md(extended_mips16): Generates EXT and INS instructions.
(*and<mode>3): Generates INS instruction.
(*and<mode>3_mips16): Generates EXT, INS and ANDI instructions.
(ior<mode>3): Add logics for ORI instruction.
(*ior<mode>3_mips16_asmacro): Generates ORI instrucion.
(*ior<mode>3_mips16): Add logics for XORI instruction.
(*xor<mode>3_mips16): Generates XORI instrucion.
(*extzv<mode>): Add logics for EXT instruction.
(*insv<mode>): Add logics for INS instruction.
* config/mips/predicates.md(bit_clear_operand): New predicate for
generating bitwise instructions.
(and_reg_operand): Add logics for generating bitwise instructions.
gcc/testsuite/ChangeLog:
* gcc.target/mips/mips16e2.c: New tests for mips16e2.
|
|
The mips16e2 ASE uses eight general-purpose registers
from mips32, with some special-purpose registers,
these registers are GPRs: s0-1, v0-1, a0-3, and
special registers: t8, gp, sp, ra.
As mentioned above, the special register gp is
used in mips16e2, which is the global pointer register,
it is used by some of the instructions in the ASE,
for instance, ADDIU, LB/LBU, etc. .
This patch adds these instructions with corresponding tests.
gcc/ChangeLog:
* config/mips/mips.cc(mips_regno_mode_ok_for_base_p): Generate instructions
that uses global pointer register.
(mips16_unextended_reference_p): Same as above.
(mips_pic_base_register): Same as above.
(mips_init_relocs): Same as above.
* config/mips/mips.h(MIPS16_GP_LOADS): Defined a new macro.
(GLOBAL_POINTER_REGNUM): Moved to machine description `mips.md`.
* config/mips/mips.md(GLOBAL_POINTER_REGNUM): Moved to here from above.
(*lowsi_mips16_gp):New `define_insn *low<mode>_mips16`.
gcc/testsuite/ChangeLog:
* gcc.target/mips/mips16e2-gp.c: New tests for mips16e2.
|
|
This patch adds MOVx instructions from mips16e2
(movn,movz,movtn,movtz) with corresponding tests.
gcc/ChangeLog:
* config/mips/mips.h(ISA_HAS_CONDMOVE): Add condition for ISA_HAS_MIPS16E2.
* config/mips/mips.md(*mov<GPR:mode>_on_<MOVECC:mode>): Add logics for MOVx insts.
(*mov<GPR:mode>_on_<MOVECC:mode>_mips16e2): Generate MOVx instruction.
(*mov<GPR:mode>_on_<GPR2:mode>_ne): Add logics for MOVx insts.
(*mov<GPR:mode>_on_<GPR2:mode>_ne_mips16e2): Generate MOVx instruction.
* config/mips/predicates.md(reg_or_0_operand_mips16e2): New predicate for MOVx insts.
gcc/testsuite/ChangeLog:
* gcc.target/mips/mips16e2-cmov.c: Added tests for MOVx instructions.
|
|
The MIPS16e2 ASE is an enhancement to the MIPS16e ASE,
which includes all MIPS16e instructions, with some addition.
It defines new special instructions for increasing
code density (e.g. Extend, PC-relative instructions, etc.).
This patch adds basic support for mips16e2 used by the
following series of patches.
gcc/ChangeLog:
* config/mips/mips.cc(mips_file_start): Add mips16e2 info
for output file.
* config/mips/mips.h(__mips_mips16e2): Defined a new
predefine macro.
(ISA_HAS_MIPS16E2): Defined a new macro.
(ASM_SPEC): Pass mmips16e2 to the assembler.
* config/mips/mips.opt: Add -m(no-)mips16e2 option.
* config/mips/predicates.md: Add clause for TARGET_MIPS16E2.
* doc/invoke.texi: Add -m(no-)mips16e2 option..
gcc/testsuite/ChangeLog:
* gcc.target/mips/mips.exp(mips_option_groups): Add -mmips16e2
option.
(mips-dg-init): Handle the recognization of mips16e2 targets.
(mips-dg-options): Add dependencies for mips16e2.
|
|
|
|
Seen at least on aarch64-*-darwin, the parameters used to instantiate
the shufflevector intrinsic meant the return type was __vector(int[1]),
which resulted in the error:
vector type '__vector(int[1])' is not supported on this platform.
All instantiations have now been fixed so the expected warning/error is
now given by the compiler.
gcc/testsuite/ChangeLog:
* gdc.dg/Wbuiltin_declaration_mismatch2.d: Fix failed tests.
|
|
The match_uaddc_usubc matching doesn't require that the second
.{ADD,SUB}_OVERFLOW has REALPART_EXPR of its lhs used, only that there is
at most one. So, in the weird case where the REALPART_EXPR of it isn't
present, we shouldn't ICE trying to replace that REALPART_EXPR with
REALPART_EXPR of .U{ADD,SUB}C result.
2023-07-02 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/110508
* tree-ssa-math-opts.cc (match_uaddc_usubc): Only replace re2 with
REALPART_EXPR opf nlhs if re2 is non-NULL.
* gcc.dg/pr110508.c: New test.
|
|
as TARGET_CLAMPS
Because both smin and smax requiring TARGET_MINMAX are essential to the
RTL representation.
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_match_CLAMPS_imms_p):
Simplify.
* config/xtensa/xtensa.md (*xtensa_clamps):
Add TARGET_MINMAX to the condition.
|
|
gcc/ChangeLog:
* config/xtensa/xtensa.md (*eqne_INT_MIN):
Add missing ":SI" to the match_operator.
|
|
This support the -fconstant-cfstrings option as used by clang (and
expect by some build scripts) as an alias to the target-specific
-mconstant-cfstrings.
The documentation is also updated to reflect that the 'f' option is
only available on Darwin, and to add the 'm' option to the Darwin
section of the invocation text.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR target/108743
gcc/ChangeLog:
* config/darwin.opt: Add fconstant-cfstrings alias to
mconstant-cfstrings.
* doc/invoke.texi: Amend invocation descriptions to reflect
that the fconstant-cfstrings is a target-option alias and to
add the missing mconstant-cfstrings option description to the
Darwin section.
|
|
The issue was fixed in r14-2232.
PR d/108962
gcc/testsuite/ChangeLog:
* gdc.dg/pr108962.d: New test.
|
|
The first pass of code generation in the D front-end splits up all
compound expressions and discards expressions that have no side effects.
This included calls to the `volatileLoad' intrinsic if its result was
not used, causing such calls to be eliminated from the program.
We already set TREE_THIS_VOLATILE on the expression, however the
tree documentation says if this bit is set in an expression, so is
TREE_SIDE_EFFECTS. So set TREE_SIDE_EFFECTS on the expression too.
This prevents any early discarding from occuring.
PR d/110516
gcc/d/ChangeLog:
* intrinsics.cc (expand_volatile_load): Set TREE_SIDE_EFFECTS on the
expanded expression.
(expand_volatile_store): Likewise.
gcc/testsuite/ChangeLog:
* gdc.dg/torture/pr110516a.d: New test.
* gdc.dg/torture/pr110516b.d: New test.
|
|
|
|
Starts setting TREE_READONLY against specific kinds of VAR_DECLs, so
that the middle-end/optimization passes can more aggressively constant
fold D code that makes use of `immutable' or `const'.
PR d/110514
gcc/d/ChangeLog:
* decl.cc (get_symbol_decl): Set TREE_READONLY on certain kinds of
const and immutable variables.
* expr.cc (ExprVisitor::visit (ArrayLiteralExp *)): Set TREE_READONLY
on immutable dynamic array literals.
gcc/testsuite/ChangeLog:
* gdc.dg/pr110514a.d: New test.
* gdc.dg/pr110514b.d: New test.
* gdc.dg/pr110514c.d: New test.
* gdc.dg/pr110514d.d: New test.
|
|
`-fno-exceptions'
The version flags for RTMI, RTTI, and exceptions was unconditionally
predefined. These are now only predefined if the feature flag is
enabled. It was noticed that there was no `-fexceptions' definition
inside d/lang.opt, so the detection of the exceptions option flag was
only partially working. Once that was fixed, a few places in the
front-end implementation were found to fall fowl of `nothrow' rules,
these have been fixed upstream and backported here as well.
Reviewed-on: https://github.com/dlang/dmd/pull/15357
https://github.com/dlang/dmd/pull/15360
PR d/110471
gcc/d/ChangeLog:
* d-builtins.cc (d_init_versions): Predefine D_ModuleInfo,
D_Exceptions, and D_TypeInfo only if feature is enabled.
* lang.opt: Add -fexceptions.
gcc/testsuite/ChangeLog:
* gdc.dg/pr110471a.d: New test.
* gdc.dg/pr110471b.d: New test.
* gdc.dg/pr110471c.d: New test.
|
|
gcc/testsuite/ChangeLog:
PR tree-optimization/25623
* gfortran.dg/pr25623.f90: New test.
|
|
Most common source of profile mismatches is now copyheader pass. The reason is that
in comon case the duplicated header condition will become constant true and that needs
changes in the loop exit condition probability.
While this can be done by jump threading it is not, since it gives up on loops.
Copy header pass now has logic to prove that first exit will become true, so this
patch adds necessary pumbing to the profile updating.
This is done in gimple_duplicate_sese_region in a way that is specific for this
particular case. I think general case is kind-of unsolvable and loop-ch is the
only user of the infrastructure. If we later invent some new users, maybe we
can export the region and region_copy arrays and let user to do the update.
With the patch we now get:
Pass dump id and name |static mismat|dynamic mismatch
|in count |in count
107t cunrolli | 3 +3| 19237 +19237
127t ch | 13 +10| 19237
131t dom | 39 +26| 19237
133t isolate-paths | 47 +8| 19237
134t reassoc | 49 +2| 19237
136t forwprop | 53 +4| 226943 +207706
159t cddce | 61 +8| 242222 +15279
161t ldist | 62 +1| 242222
172t ifcvt | 66 +4| 415472 +173250
173t vect | 143 +77| 10859784 +10444312
176t cunroll | 294 +151| 150357763 +139497979
183t loopdone | 291 -3| 150289533 -68230
194t tracer | 322 +31| 153230990 +2941457
195t fre | 317 -5| 153230990
197t dom | 286 -31| 154448079 +1217089
199t threadfull | 293 +7| 154724763 +276684
200t vrp | 297 +4| 155042448 +317685
204t dce | 294 -3| 155017073 -25375
206t sink | 292 -2| 155017073
211t cddce | 298 +6| 155018657 +1584
255t optimized | 296 -2| 155018657
256r expand | 273 -23| 154592622 -426035
258r into_cfglayout | 268 -5| 154592661 +39
275r loop2_unroll | 272 +4| 159701866 +5109205
291r ce2 | 270 -2| 159723509
312r pro_and_epilogue | 290 +20| 159792505 +68996
315r jump2 | 296 +6| 164234016 +4441511
323r bbro | 294 -2| 159385430 -4848586
So ch introduces 10 new mismatches while originally it did 308. At bbro the
number of mismatches dropped from 432 to 294.
Most offender is now cunroll pass. I think it is the case where loop has multiple
exits and one of exits becomes to be false in all but last peeled iteration.
This is another case where non-trivial loop update is needed.
Honza
gcc/ChangeLog:
* tree-cfg.cc (gimple_duplicate_sese_region): Add elliminated_edge
parmaeter; update profile.
* tree-cfg.h (gimple_duplicate_sese_region): Update prototype.
* tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Rename to ...
(static_loop_exit): ... this; return the edge to be elliminated.
(ch_base::copy_headers): Handle profile updating for eliminated exits.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ifc-20040816-1.c: Reduce number of mismatches
from 2 to 1.
* gcc.dg/tree-ssa/loop-ch-profile-1.c: New test.
* gcc.dg/tree-ssa/loop-ch-profile-2.c: New test.
|
|
This patch implements scalar-to-vector (STV) support for DImode and SImode
rotations by constant bit counts. Scalar rotations are almost always
optimal on x86, requiring only one or two instructions, but it is also
possible to implement these efficiently with SSE2, requiring only one
or two instructions for SImode rotations and at most 3 instructions for
DImode rotations. This allows GCC to STV rotations with a small or no
penalty if there are other (net) benefits to converting a chain. An
example of the benefits is shown below, which is based upon the BLAKE2
cryptographic hash function:
unsigned long long a,b,c,d;
unsigned long rot(unsigned long long x, int y)
{
return (x<<y) | (x>>(64-y));
}
void foo()
{
d = rot(d ^ a,32);
c = c + d;
b = rot(b ^ c,24);
a = a + b;
d = rot(d ^ a,16);
c = c + d;
b = rot(b ^ c,63);
}
where with -m32 -O2 -msse2
Before (59 insns, 247 bytes):
foo: pushl %edi
xorl %edx, %edx
pushl %esi
pushl %ebx
subl $16, %esp
movq a, %xmm1
movq d, %xmm0
movq b, %xmm2
pxor %xmm1, %xmm0
psrlq $32, %xmm0
movd %xmm0, %eax
movd %edx, %xmm0
movd %eax, %xmm3
punpckldq %xmm0, %xmm3
movq c, %xmm0
paddq %xmm3, %xmm0
pxor %xmm0, %xmm2
movd %xmm2, %ecx
psrlq $32, %xmm2
movd %xmm2, %ebx
movl %ecx, %eax
shldl $24, %ebx, %ecx
shldl $24, %eax, %ebx
movd %ebx, %xmm4
movd %ecx, %xmm2
punpckldq %xmm4, %xmm2
movdqa .LC0, %xmm4
pand %xmm4, %xmm2
paddq %xmm2, %xmm1
movq %xmm1, a
pxor %xmm3, %xmm1
movd %xmm1, %esi
psrlq $32, %xmm1
movd %xmm1, %edi
movl %esi, %eax
shldl $16, %edi, %esi
shldl $16, %eax, %edi
movd %esi, %xmm1
movd %edi, %xmm3
punpckldq %xmm3, %xmm1
pand %xmm4, %xmm1
movq %xmm1, d
paddq %xmm1, %xmm0
movq %xmm0, c
pxor %xmm2, %xmm0
movd %xmm0, 8(%esp)
psrlq $32, %xmm0
movl 8(%esp), %eax
movd %xmm0, 12(%esp)
movl 12(%esp), %edx
shrdl $1, %edx, %eax
xorl %edx, %edx
movl %eax, b
movl %edx, b+4
addl $16, %esp
popl %ebx
popl %esi
popl %edi
ret
After (32 insns, 165 bytes):
movq a, %xmm1
xorl %edx, %edx
movq d, %xmm0
movq b, %xmm2
movdqa .LC0, %xmm4
pxor %xmm1, %xmm0
psrlq $32, %xmm0
movd %xmm0, %eax
movd %edx, %xmm0
movd %eax, %xmm3
punpckldq %xmm0, %xmm3
movq c, %xmm0
paddq %xmm3, %xmm0
pxor %xmm0, %xmm2
pshufd $68, %xmm2, %xmm2
psrldq $5, %xmm2
pand %xmm4, %xmm2
paddq %xmm2, %xmm1
movq %xmm1, a
pxor %xmm3, %xmm1
pshuflw $147, %xmm1, %xmm1
pand %xmm4, %xmm1
movq %xmm1, d
paddq %xmm1, %xmm0
movq %xmm0, c
pxor %xmm2, %xmm0
pshufd $20, %xmm0, %xmm0
psrlq $1, %xmm0
pshufd $136, %xmm0, %xmm0
pand %xmm4, %xmm0
movq %xmm0, b
ret
2023-07-01 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Provide
gains/costs for ROTATE and ROTATERT (by an integer constant).
(general_scalar_chain::convert_rotate): New helper function to
convert a DImode or SImode rotation by an integer constant into
SSE vector form.
(general_scalar_chain::convert_insn): Call the new convert_rotate
for ROTATE and ROTATERT.
(general_scalar_to_vector_candidate_p): Consider ROTATE and
ROTATERT to be candidates if the second operand is an integer
constant, valid for a rotation (or shift) in the given mode.
* config/i386/i386-features.h (general_scalar_chain): Add new
helper method convert_rotate.
gcc/testsuite/ChangeLog
* gcc.target/i386/rotate-6.c: New test case.
* gcc.target/i386/sse2-stv-1.c: Likewise.
|
|
Fix profile some of profile mismatched caused by profile updating.
It seems that I misupdated update_bb_profile_for_threading in 2017 which
results in invalid updates from rtl threading and threadbackwards.
update_bb_profile_for_threading knows that some paths to BB are being
redirected elsehwere and those paths will exit from BB with E. So it needs to
determine probability of the duplicated path and redistribute probablities.
For some reaosn however the conditonal probability of redirected path is
computed after its counts is subtracted which is wrong and often results in
probability greater than 100%.
I also fixed error mesage. Compilling tramp3d I now get following passes
producing mismpatches:
Pass dump id and name |static mismatcdynamic mismatch
|in count |in count
113t fre | 2 +2| 0
114t mergephi | 2 | 0
115t threadfull | 2 | 0
116t vrp | 2 | 0
127t ch | 307 +305| 347194302 +347194302
130t thread | 313 +6| 347221478 +27176
131t dom | 321 +8| 346841121 -380357
134t reassoc | 323 +2| 346841121
136t forwprop | 327 +4| 347026371 +185250
144t pre | 326 -1| 347040926 +14555
172t ifcvt | 338 +2| 347218249 +156280
173t vect | 409 +71| 356357418 +9139169
176t cunroll | 377 -32| 126071925 -230285493
183t loopdone | 376 -1| 126015489 -56436
194t tracer | 379 +3| 127258199 +1242710
197t dom | 375 -4| 128352165 +1093966
199t threadfull | 379 +4| 128526112 +173947
200t vrp | 381 +2| 128724673 +198561
204t dce | 374 -7| 128632495 -92178
206t sink | 370 -4| 128618043 -14452
211t cddce | 372 +2| 128632495 +14452
248t ehcleanup | 370 -2| 128618755 -13740
255t optimized | 362 -8| 128576810 -41945
256r expand | 356 -6| 128899768 +322958
258r into_cfglayout | 353 -3| 129051765 +151997
259r jump | 354 +1| 129051765
262r cse1 | 353 -1| 129051765
275r loop2_unroll | 355 +2| 132182110 +3130345
277r loop2_done | 354 -1| 132182109 -1
312r pro_and_epilogue | 371 +17| 132222324 +40215
323r bbro | 375 +4| 132095926 -126398
Without the patch at jump2 time we get over 432 mismatches, so 15%
improvement. Some of the mismathces are unavoidable.
I think ch mismatches are mostly due to loop header copying where the header
condition constant propagates. Most common case should be threadable in early
optimizations and we also could do better on profile updating here.
Bootstrapped/regtested x6_64-linux, comitted.
gcc/ChangeLog:
PR tree-optimization/103680
* cfg.cc (update_bb_profile_for_threading): Fix profile update;
make message clearer.
gcc/testsuite/ChangeLog:
PR tree-optimization/103680
* gcc.dg/tree-ssa/pr103680.c: New test.
* gcc.dg/tree-prof/cmpsf-1.c: Un-xfail.
|
|
|
|
Due to level/depth mismatches between the template parameters of a level
lowered ttp and the original ttp, the ttp comparison check added by
r14-418-g0bc2a1dc327af9 never actually holds outside of erroneous cases.
Moreover, it'd be good to also cache the overall TEMPLATE_TEMPLATE_PARM
instead of only the TEMPLATE_PARM_INDEX.
It's tricky to cache all level lowered ttps since the result of level
lowering may depend on more than just the depth of the arguments, e.g.
for TT in
template<class T>
struct A {
template<template<T> class TT> void f();
}
the substitution T=int yields a different level-lowered ttp than T=char.
But these kinds of ttps seem to be rare in practice, and "simple" ttps
that don't depend on outer template parameters are easy enough to cache
like so. Unfortunately, this means we're back to expecting a duplicate
error in nontype12.C again since the ttp in question isn't "simple" so
caching of the (erroneous) lowered ttp doesn't happen.
gcc/cp/ChangeLog:
* cp-tree.h (TEMPLATE_PARM_DESCENDANTS): Harden.
(TEMPLATE_TYPE_DESCENDANTS): Define.
(TEMPLATE_TEMPLATE_PARM_SIMPLE_P): Define.
* pt.cc (reduce_template_parm_level): Revert
r14-418-g0bc2a1dc327af9 change.
(process_template_parm): Set TEMPLATE_TEMPLATE_PARM_SIMPLE_P
appropriately.
(uses_outer_template_parms): Determine the outer depth of
a template template parm without relying on DECL_CONTEXT.
(tsubst) <case TEMPLATE_TEMPLATE_PARM>: Cache lowering a
simple template template parm. Consistently use 'code'.
gcc/testsuite/ChangeLog:
* g++.dg/template/nontype12.C: Refine and XFAIL the dg-bogus
duplicate diagnostic check.
|
|
tree-optimization/101832]
__builtin_object_size should treat struct with TYPE_INCLUDES_FLEXARRAY as
flexible size.
gcc/ChangeLog:
PR tree-optimization/101832
* tree-object-size.cc (addr_object_size): Handle structure/union type
when it has flexible size.
gcc/testsuite/ChangeLog:
PR tree-optimization/101832
* gcc.dg/builtin-object-size-pr101832.c: New test.
|
|
fold_ctor_reference attempts to use a recursive local processing in order
to call native_encode_expr on the leaf nodes of the constructor, before
falling back to calling native_encode_initializer if this fails.
There are a couple of issues related to endianness present in it:
1) it does not specifically handle integral bit-fields; now these are left
justified on big-endian platforms so cannot be treated like ordinary fields.
2) it does not check that the constructor uses the native storage order.
gcc/
* gimple-fold.cc (fold_array_ctor_reference): Fix head comment.
(fold_nonarray_ctor_reference): Likewise. Specifically deal
with integral bit-fields.
(fold_ctor_reference): Make sure that the constructor uses the
native storage order.
gcc/testsuite/
* gcc.c-torture/execute/20230630-1.c: New test.
* gcc.c-torture/execute/20230630-2.c: Likewise.
* gcc.c-torture/execute/20230630-3.c: Likewise
* gcc.c-torture/execute/20230630-4.c: Likewise
|
|
gcc/testsuite/ChangeLog:
PR jit/110466
* jit.dg/jit.exp (jit-check-debug-info): Gracefully handle too
early versions of gdb that don't support our dwarf version, via
"unsupported".
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
r13-4531-gd2e782cb99c311 added test coverage to libgccjit's vector
support, but used __vector, which doesn't work on Power. Additionally
the size param to gcc_jit_type_get_vector was wrong.
Fixed thusly.
gcc/testsuite/ChangeLog:
PR jit/110466
* jit.dg/test-expressions.c (run_test_of_comparison): Fix size
param to gcc_jit_type_get_vector.
(verify_comparisons): Use a typedef rather than __vector.
Co-authored-by: Marek Polacek <polacek@redhat.com>
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
heuristics
While looking into the std::vector _M_realloc_insert codegen I noticed that
call of __throw_bad_alloc is predicted with 10% probability. This is because
the conditional guarding it has __builtin_expect (cond, 0) on it. This
incorrectly takes precedence over more reliable heuristics predicting that call
to cold noreturn is likely not going to happen.
So I reordered the predictors so __builtin_expect_with_probability comes first
after predictors that never makes a mistake (so user can use it to always
specify the outcome by hand). I also downgraded malloc predictor since I do
think user-defined malloc functions & new operators may behave funny ways and
moved usual __builtin_expect after the noreturn cold predictor.
This triggered latent bug in expr_expected_value_1 where
if (*predictor < predictor2)
*predictor = predictor2;
should be:
if (predictor2 < *predictor)
*predictor = predictor2;
which eventually triggered an ICE on combining heuristics. This made me notice
that we can do slightly better while combining expected values in case only
one of the parameters (such as in a*b when we expect a==0) can determine
overall result.
Note that the new code may pick weaker heuristics in case that both values are
predicted. Not sure if this scenario is worth the extra CPU time: there is
not correct way to combine the probabilities anyway since we do not know if
the predictions are independent, so I think users should not rely on it.
Fixing this issue uncovered another problem. In 2018 Martin Liska added
code predicting that MALLOC returns non-NULL but instead of that he predicts
that it returns true (boolean 1). This sort of works for testcase testing
malloc (10) != NULL
but, for example, we will predict
malloc (10) == malloc (10)
as true, which is not right and such comparsion may happen in real code
I think proper way is to update expr_expected_value_1 to work with value
ranges, but that needs greater surgery so I decided to postpone this and
only add FIXME and fill PR110499.
gcc/ChangeLog:
PR middle-end/109849
* predict.cc (estimate_bb_frequencies): Turn to static function.
(expr_expected_value_1): Fix handling of binary expressions with
predicted values.
* predict.def (PRED_MALLOC_NONNULL): Move later in the priority queue.
(PRED_BUILTIN_EXPECT_WITH_PROBABILITY): Move to almost top of the priority
queue.
* predict.h (estimate_bb_frequencies): No longer declare it.
gcc/testsuite/ChangeLog:
PR middle-end/109849
* gcc.dg/predict-18.c: Improve testcase.
|
|
When we make a select() that fails, there is an attempt to (a) diagnose
why and (b) make a fallback. These actions are causing some tests to
hang on some Darwin versions, this is because the first action that is
tried to assist in diagnosis/fallback handling is to replace the set
timeout with NIL (which causes select to wait forever, modulo other
reasons it might complete).
To fix this, call select with a zero timeout when checking for error
conditions. Also, as we check the possible failure conditions, if we
find a change that succeeds, then stop looking for errors.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR testsuite/108835
gcc/m2/ChangeLog:
* gm2-libs/RTint.mod: Do not use NIL timeout setting on select,
test failures sequentially, finishing on the first success.
|
|
Also change some internal variables and function argument from int to bool.
gcc/ChangeLog:
* fold-const.h (multiple_of_p): Change return type from int to bool.
* fold-const.cc (split_tree): Change negl_p, neg_litp_p,
neg_conp_p and neg_var_p variables to bool.
(const_binop): Change sat_p variable to bool.
(merge_ranges): Change no_overlap variable to bool.
(extract_muldiv_1): Change same_p variable to bool.
(tree_swap_operands_p): Update function body for bool return type.
(fold_truth_andor): Change commutative variable to bool.
(multiple_of_p): Change return type
from int to void and adjust function body accordingly.
* optabs.h (expand_twoval_unop): Change return type from int to bool.
(expand_twoval_binop): Ditto.
(can_compare_p): Ditto.
(have_add2_insn): Ditto.
(have_addptr3_insn): Ditto.
(have_sub2_insn): Ditto.
(have_insn_for): Ditto.
* optabs.cc (add_equal_note): Ditto.
(widen_operand): Change no_extend argument from int to bool.
(expand_binop): Ditto.
(expand_twoval_unop): Change return type
from int to void and adjust function body accordingly.
(expand_twoval_binop): Ditto.
(can_compare_p): Ditto.
(have_add2_insn): Ditto.
(have_addptr3_insn): Ditto.
(have_sub2_insn): Ditto.
(have_insn_for): Ditto.
|
|
This patch adds new RTL for ABDL (sabdl, sabdl2, uabdl, uabdl2).
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md
(vec_widen_<su>abdl_lo_<mode>, vec_widen_<su>abdl_hi_<mode>):
Expansions for abd vec widen optabs.
(aarch64_<su>abdl<mode>_insn): VQW based abdl RTL.
* config/aarch64/iterators.md (USMAX_EXT): Code attributes
that give the appropriate extend RTL for the max RTL.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/abd_2.c: Added ABDL testcases.
* gcc.target/aarch64/abd_3.c: Added ABDL testcases.
* gcc.target/aarch64/abd_4.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_2.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_3.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_4.c: Added ABDL testcases.
* gcc.target/aarch64/abd_run_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_2.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_none_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_none_2.c: Added ABDL testcases.
|
|
This updates vect_recog_abd_pattern to recognize the widening
variant of absolute difference (ABDL, ABDL2).
gcc/ChangeLog:
* internal-fn.def (VEC_WIDEN_ABD): New internal hilo optab.
* optabs.def (vec_widen_sabd_optab,
vec_widen_sabd_hi_optab, vec_widen_sabd_lo_optab,
vec_widen_sabd_odd_even, vec_widen_sabd_even_optab,
vec_widen_uabd_optab,
vec_widen_uabd_hi_optab, vec_widen_uabd_lo_optab,
vec_widen_uabd_odd_even, vec_widen_uabd_even_optab):
New optabs.
* doc/md.texi: Document them.
* tree-vect-patterns.cc (vect_recog_abd_pattern): Update to
to build a VEC_WIDEN_ABD call if the input precision is smaller
than the precision of the output.
(vect_recog_widen_abd_pattern): Should an ABD expression be
found preceeding an extension, replace the two with a
VEC_WIDEN_ABD.
|
|
This patch would like to refactor the vxrm_mode attr for duplicated
eq_attr condition. The common condition of attr is extraced to one
place instead of many places.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/vector.md: Refactor the common condition.
|
|
When store-merging looks for bswap opportunities we also handle
BIT_FIELD_REFs where we verify the refed object is of scalar
type but we don't check for the result type we eventually use.
That's done later but after we eventually query TYPE_PRECISION.
The following re-orders this.
PR tree-optimization/110496
* gimple-ssa-store-merging.cc (find_bswap_or_nop_1): Re-order
verifying and TYPE_PRECISION query for the BIT_FIELD_REF case.
* gcc.dg/pr110496.c: New testcase.
|
|
When we call statistics_fini_pass we unconditionally allocate
the statistics hash and traverse it. When a TU has many small
functions this can take considerable time. The following avoids
this by never allocating the hash from this function.
PR middle-end/110489
* statistics.cc (curr_statistics_hash): Add argument
indicating whether we should allocate the hash.
(statistics_fini_pass): If the hash isn't allocated
only print the summary header.
|
|
... understanding that "turn on LRA" is an exaggeration here, given that nvptx
isn't actually doing register allocation ('TARGET_NO_REGISTER_ALLOCATION').
gcc/
* config/nvptx/nvptx.cc (TARGET_LRA_P): Remove.
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
|
|
This adds a missing check_vect () to the execute testcase.
PR tree-optimization/110381
* gcc.dg/vect/pr110381.c: Add check_vect ().
|
|
This patch changes alignment for typedef types when passed as
arguments, making the alignment equal to the alignment of
original (aliased) types.
This change makes it impossible for a typedef type to have
alignment that is less than its size.
2023-06-27 Jovan Dmitrović <jovan.dmitrovic@syrmia.com>
gcc/ChangeLog:
PR target/109435
* config/mips/mips.cc (mips_function_arg_alignment): Returns
the alignment of function argument. In case of typedef type,
it returns the aligment of the aliased type.
(mips_function_arg_boundary): Relocated calculation of the
aligment of function arguments.
gcc/testsuite/ChangeLog:
* gcc.target/mips/align-1-n64.c: New test.
* gcc.target/mips/align-1-o32.c: New test.
|
|
|
|
g++.dg/analyzer/PR100244.C was failing after a patch of PR109439.
The reason was a spurious preemptive return of get_store_value upon
out-of-bounds read that was preventing further checks. Now instead,
a boolean value check_poisoned goes to false when a OOB is detected,
and is later on given to get_or_create_initial_value.
gcc/analyzer/ChangeLog:
PR analyzer/110198
* region-model-manager.cc
(region_model_manager::get_or_create_initial_value): Take an
optional boolean value to bypass poisoning checks
* region-model-manager.h: Update declaration of the above function.
* region-model.cc (region_model::get_store_value): No longer returns
on OOB, but rather gives a boolean to get_or_create_initial_value.
(region_model::check_region_access): Update docstring.
(region_model::check_region_for_write): Update docstring.
Signed-off-by: benjamin priour <priour.be@gmail.com>
|
|
std::vector allocator looks as follows:
__attribute__((nodiscard))
struct pair * std::__new_allocator<std::pair<unsigned int, unsigned int> >::allocate (struct __new_allocator * const this, size_type __n, const void * D.27753)
{
bool _1;
long int _2;
long int _3;
long unsigned int _5;
struct pair * _9;
<bb 2> [local count: 1073741824]:
_1 = __n_7(D) > 1152921504606846975;
_2 = (long int) _1;
_3 = __builtin_expect (_2, 0);
if (_3 != 0)
goto <bb 3>; [10.00%]
else
goto <bb 6>; [90.00%]
<bb 3> [local count: 107374184]:
if (__n_7(D) > 2305843009213693951)
goto <bb 4>; [50.00%]
else
goto <bb 5>; [50.00%]
<bb 4> [local count: 53687092]:
std::__throw_bad_array_new_length ();
<bb 5> [local count: 53687092]:
std::__throw_bad_alloc ();
<bb 6> [local count: 966367641]:
_5 = __n_7(D) * 8;
_9 = operator new (_5);
return _9;
}
So there is check for allocated block size being greater than max_size which is
wrapper in __builtin_expect. This makes ipa-fnsummary to give up analyzing
predicates and it will miss the fact that the two different calls to __throw
will be optimized out if __n is larady smaller than 1152921504606846975 which
it is after _M_check_len.
This patch extends ipa-fnsummary to understand functions that return their
parameter.
gcc/ChangeLog:
PR tree-optimization/109849
* ipa-fnsummary.cc (decompose_param_expr): Skip
functions returning its parameter.
(set_cond_stmt_execution_predicate): Return early
if predicate was constructed.
gcc/testsuite/ChangeLog:
PR tree-optimization/109849
* gcc.dg/ipa/pr109849.c: New test.
|
|
Certain downstream compilers (for example, in Fedora) default to
-freport-bug. The extra output breaks the following tests. We can use
-fno-report-bug to fix that. Patch verified with:
$ make check RUNTESTFLAGS='--target_board=unix\{,-freport-bug\} plugin.exp'
gcc/testsuite/ChangeLog:
* gcc.dg/plugin/crash-test-ice-sarif.c: Use -fno-report-bug. Adjust
scan-sarif-file.
* gcc.dg/plugin/crash-test-ice-stderr.c: Use -fno-report-bug.
* gcc.dg/plugin/crash-test-write-though-null-sarif.c: Use
-fno-report-bug. Adjust scan-sarif-file.
* gcc.dg/plugin/crash-test-write-though-null-stderr.c: Use
-fno-report-bug.
|
|
These tests fail when the testsuite is executed with -fstack-protector-strong.
To avoid this, this patch adds -fno-stack-protector to dg-options.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr104610.c: Use -fno-stack-protector.
* gcc.target/i386/pr69482-1.c: Likewise.
|
|
Here we find ourselves instantiating the NSDMI for A<1>::m when
computing argument conversions during overload resolution, and
thus tf_conv is set. The flag causes mark_used for the constructor
used in the NSDMI to exit early and not instantiate its noexcept-spec,
which eventually leads to an ICE from nothrow_spec_p.
This patch fixes this by clearing any special tsubst flags during
instantiation of an NSDMI, since the result should be independent of
the context that requires the instantiation.
PR c++/110468
gcc/cp/ChangeLog:
* init.cc (maybe_instantiate_nsdmi_init): Mask out all
tsubst flags except for tf_warning_or_error.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/noexcept79.C: New test.
|