Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/ChangeLog:
PR target/113729
* config/i386/i386.md (*ashlqi3_1_zext<mode><nf_name>):
New define_insn.
(*ashlhi3_1_zext<mode><nf_name>): Ditto.
(*<insn>qi3_1_zext<mode><nf_name>): Ditto.
(*<insn>hi3_1_zext<mode><nf_name>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr113729.c: Add testcase for shift and rotate.
|
|
gcc/ChangeLog:
PR target/113729
* config/i386/i386.md (*andqi_1_zext<mode><nf_name>): New
define_insn.
(*andhi_1_zext<mode><nf_name>): Ditto.
(*<code>qi_1_zext<mode><nf_name>): Ditto.
(*<code>hi_1_zext<mode><nf_name>): Ditto.
(*negqi_1_zext<mode><nf_name>): Ditto.
(*neghi_1_zext<mode><nf_name>): Ditto.
(*one_cmplqi2_1_zext<mode>): Ditto.
(*one_cmplhi2_1_zext<mode>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr113729.c: Add more tests.
|
|
gcc/ChangeLog:
PR target/113729
* config/i386/i386.md (*subqi_1_zext<mode><nf_name>): New
define_insn.
(*subhi_1_zext<mode><nf_name>): Ditto.
(*addqi3_carry_zext<mode>): Ditto.
(*addhi3_carry_zext<mode>): Ditto.
(*addqi3_carry_zext<mode>_0): Ditto.
(*addhi3_carry_zext<mode>_0): Ditto.
(*addqi3_carry_zext<mode>_0r): Ditto.
(*addhi3_carry_zext<mode>_0r): Ditto.
(*subqi3_carry_zext<mode>): Ditto.
(*subhi3_carry_zext<mode>): Ditto.
(*subqi3_carry_zext<mode>_0): Ditto.
(*subhi3_carry_zext<mode>_0): Ditto.
(*subqi3_carry_zext<mode>_0r): Ditto.
(*subhi3_carry_zext<mode>_0r): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr113729.c: Add more test.
* gcc.target/i386/pr113729-adc-sbb.c: New test.
|
|
gcc/ChangeLog:
PR target/113729
* config/i386/i386.md (*addqi_1_zext<mode><nf_name>): New
define.
(*addhi_1_zext<mode><nf_name>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr113729.c: New test.
|
|
The testcase uses -march=rv64gcv and dg-do run, so should be
restricted to a riscv_v target.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr116202-run-1.c (dg-do run):
Add target riscv_v.
|
|
A global variable is set when proc_ptr parsing in an associate is
expected. In the case of an error, that flag was not reset, which is
fixed now.
gcc/fortran/ChangeLog:
PR fortran/102973
* match.cc (gfc_match_associate): Reset proc_ptr parsing flag on
error.
|
|
Fix ICE by getting the vtype only when a derived or class type is
prevent. Also take care about the _len component for unlimited
polymorphics.
gcc/fortran/ChangeLog:
PR fortran/116292
* trans-intrinsic.cc (conv_intrinsic_move_alloc): Get the vtab
only for derived types and classes and adjust _len for class
types.
gcc/testsuite/ChangeLog:
* gfortran.dg/move_alloc_19.f90: New test.
|
|
With the increase in the number of modes and patterns for some
backend architectures, the place_operands function becomes a
bottleneck int the speed of genoutput, and may even become a
bottleneck int the overall speed of building the GCC project.
This patch aims to accelerate the place_operands function,
the optimizations it includes are:
1. Use a hash table to store operand information,
improving the lookup time for the first operand.
2. Move mode comparison to the beginning to avoid the scenarios of most strcmp.
I tested the speed improvements for the following backends,
Improvement Ratio
x86_64 197.9%
aarch64 954.5%
riscv 2578.6%
If the build machine is slow, then this improvement can save a lot of time.
I tested the genoutput output for x86_64/aarch64/riscv backends,
and there was no difference compared to before the optimization,
so this shouldn't introduce any functional issues.
gcc/
* genoutput.cc (struct operand_data): Add member 'eq_next' to
point to the next member with the same hash value in the
hash table.
(compare_operands): Move the comparison of the mode to the very
beginning to accelerate the comparison of the two operands.
(struct operand_data_hasher): New, a class that takes into account
the necessary elements for comparing the equality of two operands
in its hash value.
(operand_data_hasher::hash): New.
(operand_data_hasher::equal): New.
(operand_datas): New, hash table of konwn pattern operands.
(place_operands): Use a hash table instead of traversing the array
to find the same operand.
(main): Add initialization of the hash table 'operand_datas'.
|
|
This reverts commit e9738e77674e23f600315ca1efed7d1c7944d0cc.
|
|
As PR116148#c7 shows, fam-in-union-alone-in-struct-2.c still
fails on hppa which is a BE environment, but by checking more
(also confirmed by John in PR116148#c12), it's due to that
signedness of plain char on hppa is signed therefore the value
of with_fam_3_v.a[7] "8f" get sign extended as "ffffff8f" then
the verification will fail. This patch is to change plain char
with unsigned char to avoid that.
PR testsuite/116148
gcc/testsuite/ChangeLog:
* c-c++-common/fam-in-union-alone-in-struct-2.c: Change the type of
member a[] of union with_fam_3 with unsigned char.
|
|
pass_endbr_and_patchable_area.
gcc/ChangeLog:
PR target/116174
* config/i386/i386.cc (ix86_align_loops): Move this to ..
* config/i386/i386-features.cc (ix86_align_loops): .. here.
(class pass_align_tight_loops): New class.
(make_pass_align_tight_loops): New function.
* config/i386/i386-passes.def: Insert pass_align_tight_loops
after pass_insert_endbr_and_patchable_area.
* config/i386/i386-protos.h (make_pass_align_tight_loops): New
declare.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr116174.c: New test.
|
|
|
|
The size of "struct only_fam_2" is dependent on the alignment of the
flexible array member "b", and not on the type of the preceding
bit-fields. For most targets the two are equal. But on default_packed
targets like pru-unknown-elf, the alignment of int is not equal to the
size of int, so the test failed.
Patch was suggested by Qing Zhao. Tested on pru-unknown-elf and
x86_64-pc-linux-gnu.
PR testsuite/116155
gcc/testsuite/ChangeLog:
* c-c++-common/fam-in-union-alone-in-struct-1.c: Adjust
check to account for default_packed targets.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
Now that more operations are allowed for noce_convert_multiple_sets,
we need to check noce_can_force_operand on the sequence before calling
try_emit_cmove_seq. Otherwise an inappropriate argument may be given
to copy_to_mode_reg and result in an ICE.
PR tree-optimization/116353
gcc/ChangeLog:
* ifcvt.cc (bb_ok_for_noce_convert_multiple_sets): Check
noce_can_force_operand.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr116353.c: New test.
|
|
gcc/fortran/ChangeLog:
PR fortran/114308
* array.cc (resolve_array_list): Reject array constructor value if
its declared type is abstract (F2018:C7114).
gcc/testsuite/ChangeLog:
PR fortran/114308
* gfortran.dg/abstract_type_10.f90: New test.
Co-Authored-By: Steven G. Kargl <kargl@gcc.gnu.org>
|
|
This fixes the remainder of the typos I found when reading various parts of the
RISC-V backend.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (legitimize_move): extrac -> extract.
(expand_vec_cmp_float): Remove duplicate vmnor.mm.
* config/riscv/riscv-vector-builtins.cc: ins -> insns.
* config/riscv/riscv.cc (riscv_init_machine_status): mwrvv -> mrvv.
* config/riscv/vector-iterators.md: RVVM8QImde -> RVVM8QImode
* config/riscv/vector.md: Replaced non-existant vsetivl with vsetivli.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
For some target like target=amdgcn-amdhsa, we need to take care of
vector bool types prior to general vector mode types. Or we may have
the asm check failure as below.
gcc.target/gcn/cond_smax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 80
gcc.target/gcn/cond_smin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 80
gcc.target/gcn/cond_umax_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 56
gcc.target/gcn/cond_umin_1.c scan-assembler-times \\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 56
gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump-not ivopts "zero if "
The below test suites are passed for this patch.
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.
4. The amdgcn test case as above.
PR target/116103
gcc/ChangeLog:
* internal-fn.cc (type_strictly_matches_mode_p): Add handling
for vector bool type.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Commit r15-2084 exposes one ICE in LRA. Firstly, before
r15-2084 KFmode has 126 bit precision while V1TImode has 128
bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is
paradoxical_subreg_p, which stops some passes from doing
some optimization. After r15-2084, KFmode has the same mode
precision as V1TImode, passes are able to optimize more, but
it causes this ICE in LRA as described below:
For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)),
which matches pattern
(define_insn "*vsx_le_perm_store_<mode>"
[(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q")
(match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))]
"!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR
&& !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
"@
#
#"
[(set_attr "type" "vecstore,store")
(set_attr "length" "12,8")
(set_attr "isa" "<VSisa>,*")])
LRA makes equivalence substitution on r133 with const double
(const_double:KF 0.0), selects alternative 0 and fixes up
operand 1 for constraint "wa", because operand 1 is OP_INOUT,
so it considers assigning back to it as well, that is:
lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);
But because old has been changed to const_double in equivalence
substitution, the move is actually assigning to const_double,
which is invalid and cause ICE.
Considering reg:KF 133 is equivalent with (const_double:KF 0.0)
even though this operand is OP_INOUT, IMHO there should not be
any following uses of reg:KF 133, otherwise it doesn't have the
chance to be equivalent to (const_double:KF 0.0). So this patch
is to guard the lra_emit_move with !CONSTANT_P to exclude such
case.
PR rtl-optimization/116170
gcc/ChangeLog:
* lra-constraints.cc (curr_insn_transform): Don't emit move back to
old operand if it's CONSTANT_P.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr116170.c: New test.
|
|
avr added an -mlra option, but the avr.opt.url file wasn't
regenerated.
Note that commit 149a23ee2568 ("AVR: -mlra is not documeted in TEXI.")
did add the Undocumented flag, but that still needs the avr.op.urls
file to be updated.
Fixes: 09a87ea666b2 ("AVR: ad target/113934 - Add option -mlra to enable LRA.")
gcc/ChangeLog:
* config/avr/avr.opt.urls: Regenerate.
|
|
|
|
Only disable shrink-wrapping when using -mrop-protect when we know we
will be emitting the ROP-protect hash instructions (ie, non-leaf functions).
2024-06-17 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/114759
* config/rs6000/rs6000.cc (rs6000_override_options_after_change): Move
the disabling of shrink-wrapping from here....
* config/rs6000/rs6000-logue.cc (rs6000_emit_prologue): ...to here.
gcc/testsuite/
PR target/114759
* gcc.target/powerpc/pr114759-1.c: New test.
|
|
The following test was failing when building on 32 bit targets
due to not overwriting the mabi arg. This resulted in dejagnu
attempting to run the test with -mabi=ilp32d -march=rv64gcv_zvl256b
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr116202-run-1.c: Add mabi arg
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
|
|
So this is another nasty latent bug exposed by ext-dce.
Similar to the prior m68k failure it's another problem with how we handle
paradoxical subregs on big endian targets.
In this instance when we remove the hard subregs we take something like:
(subreg:DI (reg:SI 0) 0)
And turn it into
(reg:SI -1)
Which is clearly wrong. (reg:SI 0) is correct.
The transformation happens in alter_subreg, but I really wanted to fix this in
subreg_regno since we could have similar problems in some of the other callers
of subreg_regno.
Unfortunately reload depends on the current behavior of subreg_regno; in the
cases where the return value is an invalid register, the wrong half of a
register pair, etc the resulting bogus value is detected by reload and triggers
reloading of the inner object. So that's the new comment in subreg_regno.
The second best place to fix is alter_subreg which is what this patch does. If
presented with a paradoxical subreg, then the base register number should
always be REGNO (SUBREG_REG (object)). It's just how paradoxicals are designed
to work.
I haven't tried to fix the other places that call subreg_regno. After being
burned by reload, I'm more than a bit worried about unintended fallout.
I must admit I'm surprised we haven't stumbled over this before and that it
didn't fix any failures on the big endian embedded targets.
Boostrapped & regression tested on x86_64, also went through all the embedded
targets in my tester and bootstrapped on m68k & s390x to get some additional
big endian testing.
Pushing to the trunk.
rtl-optimization/116244
gcc/
* rtlanal.cc (subreg_regno): Update comment.
* final.cc (alter_subrg): Always use REGNO (SUBREG_REG ()) to get
the base regsiter for paradoxical subregs.
gcc/testsuite/
* g++.target/m68k/m68k.exp: New test driver.
* g++.target/m68k/pr116244.C: New test.
|
|
gcc/rust/ChangeLog:
* checks/errors/borrowck/rust-bir-builder.h: Cast size_t values to unsigned
long before printing.
* checks/errors/borrowck/rust-bir-fact-collector.h: Likewise.
|
|
On architectures where `size_t` is `unsigned int`, such as 32bit x86,
we encounter an issue with `PlaceId` and `FreeRegion` being aliases to
the same types. This poses an issue for overloading functions for these
two types, such as `push_subset` in that case. This commit renames one
of these `push_subset` functions to avoid the issue, but this should be
fixed with a newtype pattern for these two types.
gcc/rust/ChangeLog:
* checks/errors/borrowck/rust-bir-fact-collector.h (points): Rename
`push_subset(PlaceId, PlaceId)` to `push_subset_place(PlaceId, PlaceId)`
|
|
The existing implementation of need_cmov_or_rewire and
noce_convert_multiple_sets_1 assumes that sets are either REG or SUBREG.
This commit enchances them so they can handle/rewire arbitrary set statements.
To do that a new helper struct noce_multiple_sets_info is introduced which is
used by noce_convert_multiple_sets and its helper functions. This results in
cleaner function signatures, improved efficientcy (a number of vecs and hash
set/map are replaced with a single vec of struct) and simplicity.
gcc/ChangeLog:
* ifcvt.cc (need_cmov_or_rewire): Renamed init_noce_multiple_sets_info.
(init_noce_multiple_sets_info): Initialize noce_multiple_sets_info.
(noce_convert_multiple_sets_1): Use noce_multiple_sets_info and handle
rewiring of multiple registers.
(noce_convert_multiple_sets): Updated to use noce_multiple_sets_info.
* ifcvt.h (struct noce_multiple_sets_info): Introduce new struct
noce_multiple_sets_info to store info for noce_convert_multiple_sets.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ifcvt_multiple_sets_rewire.c: New test.
|
|
Currently the operations allowed for if conversion of a basic block
with multiple sets are few, namely REG, SUBREG and CONST_INT (as
controlled by bb_ok_for_noce_convert_multiple_sets).
This commit allows more operations (arithmetic, compare, etc) to
participate in if conversion. The target's profitability hook and
ifcvt's costing is expected to reject sequences that are unprofitable.
This is especially useful for targets which provide a rich selection
of conditional instructions (like aarch64 which has cinc, csneg,
csinv, ccmp, ...) which are currently not used in basic blocks with
more than a single set.
For targets that have a rich selection of conditional instructions,
like aarch64, we have seen an ~5x increase of profitable if
conversions for multiple set blocks in SPEC CPU 2017 benchmarks.
gcc/ChangeLog:
* ifcvt.cc (try_emit_cmove_seq): Modify comments.
(noce_convert_multiple_sets_1): Modify comments.
(bb_ok_for_noce_convert_multiple_sets): Allow more operations.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ifcvt_multiple_sets_arithm.c: New test.
|
|
This is an extension of what was done in PR106590.
Currently if a sequence generated in noce_convert_multiple_sets clobbers the
condition rtx (cc_cmp or rev_cc_cmp) then only seq1 is used afterwards
(sequences that emit the comparison itself). Since this applies only from the
next iteration it assumes that the sequences generated (in particular seq2)
doesn't clobber the condition rtx itself before using it in the if_then_else,
which is only true in specific cases (currently only register/subregister moves
are allowed).
This patch changes this so it also tests if seq2 clobbers cc_cmp/rev_cc_cmp in
the current iteration. It also checks whether the resulting sequence clobbers
the condition attached to the jump. This makes it possible to include arithmetic
operations in noce_convert_multiple_sets.
It also makes the code that checks whether the condition is used outside of the
if_then_else emitted more robust.
gcc/ChangeLog:
* ifcvt.cc (check_for_cc_cmp_clobbers): Use modified_in_p instead.
(noce_convert_multiple_sets_1): Don't use seq2 if it clobbers cc_cmp.
Punt if seq clobbers cond. Refactor the code that sets read_comparison.
|
|
The clrmem* patterns don't use the provided alignment information,
hence the setmemhi expander can just pass down 0 as alignment to
the clrmem* insns.
PR target/85624
gcc/
* config/avr/avr.md (setmemhi): Set alignment to 0.
gcc/testsuite/
* gcc.target/avr/torture/pr85624.c: New test.
|
|
gcc/testsuite/
* gcc.c-torture/execute/20021120-1.c: Skip if not size20plus or -Os.
* gcc.dg/fixed-point/convert-float-4.c: Require size20plus.
* gcc.dg/torture/pr112282.c: Skip if -O0 unless size20plus.
* g++.dg/lookup/pr21802.C: Require size20plus.
|
|
frame size on 16 bit targets.
Note: GCC has a limitation that a stack frame cannot exceed half the address space.
For two tests the decision to modify or skip them seems not so clear-cut;
I choose to modify gcc.dg/pr47893.c to use types that fit the numbers, as
that seemed to have little impact on the test, and skip gcc.dg/pr115646.c
for 16 bit, as layout of structs with bitfields members can have quite
subtle rules.
gcc/testsuite/
* gcc.dg/pr107523.c: Make sure variables can fit numbers.
* gcc.dg/pr47893.c: Add dg-require-effective-target size20plus clause.
* c-c++-common/torture/builtin-clear-padding-2.c:
dg-require-effective-target size20plus.
* gcc.dg/pr115646.c: dg-require-effective-target int32plus.
* c-c++-common/analyzer/coreutils-sum-pr108666.c:
For c++, expect a warning about exceeding maximum object size
if not size20plus.
* gcc.dg/torture/inline-mem-cpy-1.c:
Like the included file, dg-require-effective-target ptr32plus.
* gcc.dg/torture/inline-mem-cmp-1.c: Likewise.
|
|
the assign_params emitted code.
2024-08-06 Joern Rennecke <joern.rennecke@riscy-ip.com>
gcc/
* except.cc (sjlj_emit_function_enter):
Set fn_begin_outside_block again if encountering a jump instruction.
|
|
This patch is an attempt to gauge opinion on one way of fixing PR30920.
The PR points out that the libiberty splay tree implementation does
not implement the algorithm described by Sleator and Tarjan and has
unclear complexity bounds. (It's also somewhat dangerous in that
splay_tree_min and splay_tree_max walk the tree without splaying,
meaning that they are fully linear in the worst case, rather than
amortised logarithmic.) These properties have been carried over
to typed-splay-tree.h.
We could fix those problems directly in the existing implementations,
and probably should for libiberty. But when I added rtl-ssa, I also
added a third(!) splay tree implementation: splay-tree-utils.h.
In response to Jeff's understandable unease about having three
implementations, I was supposed to go back during the next stage 1
and reduce it to no more than two. I never did that. :-(
splay-tree-utils.h is so called because rtl-ssa uses splay trees
in structures that are relatively small and very size-sensitive.
I therefore wanted to be able to embed the splay tree links directly
in the structures, rather than pay the penalty of using separate
nodes with one-way or two-way links between them. There were also
operations for which it was convenient to treat the splay tree root
as an explicitly managed cursor, rather than treating the tree as
a pure ADT. The interface is therefore a bit more low-level than
for the other implementations.
I wondered whether the same trade-offs might apply to users of
the libiberty splay trees. The first one I looked at in detail
was SCC value numbering, which seemed like it would benefit from
using splay-tree-utils.h directly.
The patch does that. It also adds a couple of new helper routines
to splay-tree-utils.h.
I don't expect this approach to be the right one for every use
of splay trees. E.g. splay tree used for omp gimplification would
certainly need separate nodes.
gcc/
PR other/30920
* splay-tree-utils.h (rooted_splay_tree::insert_relative)
(rooted_splay_tree::lookup_le): New functions.
(rooted_splay_tree::remove_root_and_splay_next): Likewise.
* splay-tree-utils.tcc (rooted_splay_tree::insert_relative): New
function, extracted from...
(rooted_splay_tree::insert): ...here.
(rooted_splay_tree::lookup_le): New function.
(rooted_splay_tree::remove_root_and_splay_next): Likewise.
* tree-ssa-sccvn.cc (pd_range::m_children): New member variable.
(vn_walk_cb_data::vn_walk_cb_data): Initialize first_range.
(vn_walk_cb_data::known_ranges): Use a default_splay_tree.
(vn_walk_cb_data::~vn_walk_cb_data): Remove freeing of known_ranges.
(pd_range_compare, pd_range_alloc, pd_range_dealloc): Delete.
(vn_walk_cb_data::push_partial_def): Rewrite splay tree operations
to use splay-tree-utils.h.
* rtl-ssa/accesses.cc (function_info::add_use): Use insert_relative.
|
|
On many cores, including Neoverse V2 the throughput of vector ADD
instructions is higher than vector shifts like SHL. We can lean on that
to emit code like:
add v0.4s, v0.4s, v0.4s
instead of:
shl v0.4s, v0.4s, 1
LLVM already does this trick.
In RTL the code gets canonincalised from (plus x x) to (ashift x 1) so I
opted to instead do this at the final assembly printing stage, similar
to how we emit CMLT instead of SSHR elsewhere in the backend.
I'd like to also do this for SVE shifts, but those will have to be
separate patches.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md
(aarch64_simd_imm_shl<mode><vczle><vczbe>): Rewrite to new
syntax. Add =w,w,vs1 alternative.
* config/aarch64/constraints.md (vs1): New constraint.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/advsimd_shl_add.c: New test.
|
|
PR fortran/85510
gcc/fortran/ChangeLog:
* resolve.cc (resolve_variable): Mark the variable as host
associated only, when it is not in an associate block.
* trans-decl.cc (generate_coarray_init): Remove incorrect unused
flag on parameter.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/pr85510.f90: New test.
|
|
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_available_features): Handle
avx10.2.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX10_2_256_SET): New.
(OPTION_MASK_ISA2_AVX10_2_512_SET): Ditto.
(OPTION_MASK_ISA2_AVX10_1_256_UNSET):
Add OPTION_MASK_ISA2_AVX10_2_256_UNSET.
(OPTION_MASK_ISA2_AVX10_1_512_UNSET):
Add OPTION_MASK_ISA2_AVX10_2_512_UNSET.
(OPTION_MASK_ISA2_AVX10_2_256_UNSET): New.
(OPTION_MASK_ISA2_AVX10_2_512_UNSET): Ditto.
(ix86_handle_option): Handle avx10.2-256 and avx10.2-512.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AVX10_2_256 and FEATURE_AVX10_2_512.
* common/config/i386/i386-isas.h: Add ISA_NAMES_TABLE_ENTRY for
avx10.2-256 and avx10.2-512.
* config/i386/i386-c.cc (ix86_target_macros_internal): Define
__AVX10_2_256__ and __AVX10_2_512__.
* config/i386/i386-isa.def (AVX10_2): Add DEF_PTA(AVX10_2_256)
and DEF_PTA(AVX10_2_512).
* config/i386/i386-options.cc (isa2_opts): Add -mavx10.2-256 and
-mavx10.2-512.
(ix86_valid_target_attribute_inner_p): Handle avx10.2-256 and
avx10.2-512.
* config/i386/i386.opt: Add option -mavx10.2, -mavx10.2-256 and
-mavx10.2-512.
* config/i386/i386.opt.urls: Regenerated.
* doc/extend.texi: Document avx10.2, avx10.2-256 and avx10.2-512.
* doc/invoke.texi: Document -mavx10.2, -mavx10.2-256 and
-mavx10.2-512.
* doc/sourcebuild.texi: Document target avx10.2, avx10.2-256,
avx10.2-512.
gcc/testsuite/ChangeLog:
* g++.dg/other/i386-2.C: Ditto.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/sse-12.c: Ditto.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
|
|
This patch resolves PR target/116275, a recent ICE-on-valid regression on
-m32 caused by my recent change to enable STV of DImode arithmeric right
shift on non-AVX512VL targets. The oversight is that the i386 backend
contains an *extenddi2_doubleword_highpart instruction (whose pattern
is an arithmetic right shift of a left shift) that optimizes the case where
sign-extension need only update the highpart word of a DImode value when
generating 32-bit code (!TARGET_64BIT). STV accepts this pattern as a
candidate, as there are patterns to handle this form of extension on SSE
using AVX512VL instructions (and previously ASHIFTRT was only allowed on
AVX512VL). Now that ASHIFTRT is a candidate on non-AVX512vL targets, we
either need to check that the first operand is a register, or as done
below provide the define_insn_and_split that provides a non-AVX512VL
implementation of *extendv2di_highpart_stv.
The new testcase only ICEed with -m32, so this test could be limited to
target ia32, but there's no harm also running this test on -m64 to
provide a little extra test coverage.
2024-08-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/116275
* config/i386/i386.md (*extendv2di2_highpart_stv_noavx512vl): New
define_insn_and_split to handle the STV conversion of the DImode
pattern *extendsi2_doubleword_highpart.
gcc/testsuite/ChangeLog
PR target/116275
* g++.target/i386/pr116275.C: New test case.
|
|
We support vashr vlshr and vashl. However, in r15-1638 support optimize
x < 0 ? -1 : 0 into (signed) x >> 31 and x < 0 ? 1 : 0 into (unsigned) x >> 31.
To support this optimization, vector ashr lshr and ashl need to be implemented.
gcc/ChangeLog:
* config/loongarch/loongarch.md (insn): Added rotatert rotr pairs.
* config/loongarch/simd.md (rotr<mode>3): Remove to ...
(<optab><mode>3): This.
gcc/testsuite/ChangeLog:
* g++.target/loongarch/vect-ashr-lshr.C: New test.
|
|
Optabs vcond{,u} will be removed for GCC 15. Since regtest shows no
fallout, dropping the expanders, now.
gcc/ChangeLog:
PR target/114189
* config/loongarch/lasx.md (vcondu<LASX:mode><ILASX:mode>): Delete.
(vcond<LASX:mode><LASX_2:mode>): Likewise.
* config/loongarch/lsx.md (vcondu<LSX:mode><ILSX:mode>): Likewise.
(vcond<LSX:mode><LSX_2:mode>): Likewise.
|
|
R15-1890 introduced new optabs iorc and andc, and its corresponding
internal functions BIT_{ANDC,IORC}, and if targets defines such optabs
for vector modes. And in r15-2258 the iorc and andc were renamed to
iorn and andn.
So we changed the andn and iorn implementation templates to the standard
template names.
gcc/ChangeLog:
* config/loongarch/lasx.md (xvandn<mode>3): Rename to ...
(andn<mode>3): This.
(xvorn<mode>3): Rename to ...
(iorn<mode>3): This.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vandn_v): Defined as the modified name.
(CODE_FOR_lsx_vorn_v): Likewise.
(CODE_FOR_lasx_xvandn_v): Likewise.
(CODE_FOR_lasx_xvorn_v): Likewise.
(loongarch_expand_builtin_insn): When the builtin function to be
called is __builtin_lasx_xvandn or __builtin_lsx_vandn, swap the
two operands.
* config/loongarch/loongarch.md (<optab>n<mode>): Rename to ...
(<optab>n<mode>3): This.
* config/loongarch/lsx.md (vandn<mode>3): Rename to ...
(andn<mode>3): This.
(vorn<mode>3): Rename to ...
(iorn<mode>3): This.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/lasx-andn-iorn.c: New test.
* gcc.target/loongarch/lsx-andn-iorn.c: New test.
|
|
This patch fixes many ODR warnings which appear when compiling the
interface files found in gcc/m2/*-ch/ and gcc/m2/{pge,mc}-boot
directories.
gcc/m2/ChangeLog:
PR modula2/116181
* gm2-compiler/ppg.mod (FindStr): Initialize j.
* gm2-libs-ch/UnixArgs.cc (_M2_UnixArgs_ctor): Replace
M2RTS_RegisterModule with M2RTS_RegisterModule_Cstr.
* gm2-libs-ch/dtoa.cc (_M2_dtoa_ctor): Ditto.
* gm2-libs-ch/ldtoa.cc (ldtoa_strtold): Cast parameter s
for strtod.
(_M2_ldtoa_ctor): Replace M2RTS_RegisterModule with
M2RTS_RegisterModule_Cstr.
* gm2-libs-ch/m2rts.h (M2RTS_RegisterModule_Cstr): New
define.
(M2RTS_RegisterModule): Remove const.
* mc-boot-ch/GSelective.c (Selective_FdIsSet): Return bool
rather than int.
* mc-boot-ch/Gldtoa.cc (ldtoa_strtold): Change const char to
void.
Cast s before passing as a parameter to strtod.
* mc-boot-ch/Glibc.c (tracedb_open): Replace const char with const
void.
(libc_perror): Replace char with const char.
(libc_printf): Replace char with void.
(libc_snprintf): Replace char with void.
Add const_cast for parameter to index.
Add reinterpret_cast for parameter to vsnprintf.
(libc_open): Replace first paramter type char with void.
Add vararg for the third parameter.
* mc-boot-ch/Gm2rtsdummy.cc (M2RTS_RequestDependant): Remove #if 0 code.
(m2pim_M2RTS_RegisterModule): Change const char parameters to void
(M2RTS_RegisterModule): Ditto.
(_M2_M2RTS_init): Remove #if 0 code.
(M2RTS_ConstructModules): Ditto.
(M2RTS_Terminate): Ditto.
(M2RTS_DeconstructModules): Ditto.
(M2RTS_Halt): Ditto.
* mc-boot-ch/Gtermios.cc (SetFlag): Return bool.
* mc-boot-ch/m2rts.h (M2RTS_RegisterModule_Cstr): New define.
(M2RTS_RegisterModule): Change const char parameters to void.
* mc-boot/Gdecl.cc: Regenerate.
* mc/decl.mod (getNextConstExp): Reimplement.
* pge-boot/GDynamicStrings.cc: Regenerate.
* pge-boot/GDynamicStrings.h: Ditto.
* pge-boot/GM2RTS.h (M2RTS_RegisterModule_Cstr): New function.
(M2RTS_RegisterModule): Reformat.
* pge-boot/GSymbolKey.cc: Regenerate.
* pge-boot/GSysExceptions.cc (_M2_SysExceptions_init): Add correct parameters.
(_M2_SysExceptions_fini): Ditto.
* pge-boot/GUnixArgs.cc (_M2_UnixArgs_ctor::_M2_UnixArgs_ctor):
Replace call to M2RTS_RegisterModule with M2RTS_RegisterModuleCstr.
* pge-boot/Gerrno.cc (_M2_errno_init): Add correct parameters.
(_M2_errno_fini): Ditto.
* pge-boot/Gldtoa.cc (ldtoa_strtold): Replace const char with
void.
Use reinterpret_cast when passing s to strtod.
Replace true with TRUE.
* pge-boot/Gldtoa.h (ldtoa_strtold): Tidy up.
* pge-boot/Glibc.cc (libc_read): Use size_t as the return type.
(libc_write): Ditto.
(libc_strlen): Ditto.
(libc_perror): Replace char with const char.
(libc_printf): Replace char to const char.
Cast parameter to index using const_cast.
(libc_snprintf): Replace char with void.
Cast parameter to index using const_cast.
(libc_malloc): Replace parameter type with size_t.
(libc_memcpy): Replace third parameter type with size_t.
(libc_open): Use varargs.
* pge-boot/Glibc.h (libc_perror): Add _string_high parameter.
* pge-boot/Gpge.cc: Regenerate.
* pge-boot/Gtermios.cc (SetFlag): Replace return type with bool.
(_M2_termios_init): Add correct parameters.
(_M2_termios_fini): Ditto.
* pge-boot/m2rts.h (M2RTS_RegisterModule_Cstr): New define.
(M2RTS_RegisterModule): Replace const char with void.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
|
|
PR fortran/116221
gcc/fortran/ChangeLog:
* intrinsic.cc (gfc_get_intrinsic_sub_symbol): Initialize variable.
* symbol.cc (gfc_get_ha_symbol): Likewise.
|
|
gcc/
* config/avr/avr.opt (mlra): Set Undocumented flag.
|
|
It returns lra_in_progress resp. reload_in_progress depending on avr_lra_p.
Currently, direct use of ra_in_progress() is only made with -mlog=.
gcc/
* config/avr/avr.cc (ra_in_progress): New static function.
(avr_legitimate_address_p, avr_addr_space_legitimate_address_p)
(extra_constraint_Q): Use it with -mlog=.
|
|
|
|
After r14-811 "call *nop@GOTPCREL(%rip)" is only generated with
-mno-direct-extern-access even if --enable-default-pie. So the r13-1614
change to this file is not valid anymore.
gcc/testsuite/ChangeLog:
PR testsuite/70150
* gcc.target/i386/fentryname3.c (dg-final): Revert r13-1614
change.
|
|
For a --enable-default-pie build, using -fno-pic (for compiler) but
not -no-pie (for linker) triggers some linker warnings counted as
excess errors:
/usr/bin/ld: /tmp/cc8MgxiR.o: warning: relocation in read-only
section `.text.startup'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
gcc/testsuite/ChangeLog:
PR testsuite/70150
* gcc.target/i386/pr113689-1.c (dg-options): Add -no-pie.
|
|
It is using a class now with a different name.
gcc/ChangeLog:
* doc/cfg.texi: Fix references to dom_walker.
|
|
The Close() procedure in MemStream is missing a guard to prevent it from
printing in non-debug mode.
gcc/gm2:
* gm2-libs-iso/MemStream.mod: Guard debug output.
Signed-off-by: Wilken Gottwalt <wilken.gottwalt@posteo.net>
|