Age | Commit message (Collapse) | Author | Files | Lines |
|
[PR107424]
Before this commit, gfortran produced with OpenMP for 'do i = 1,10,2'
the code
for (count.0 = 0; count.0 < 5; count.0 = count.0 + 1)
i = count.0 * 2 + 1;
While such an inner loop can be collapsed, a non-rectangular could not.
With this commit and for all constant loop steps, a simple loop such
as 'for (i = 1; i <= 10; i = i + 2)' is created. (Before only for the
constant steps of 1 and -1.)
The constant step permits to know the direction (increasing/decreasing)
that is required for the loop condition.
The new code is only valid if one assumes no overflow of the loop variable.
However, the Fortran standard can be read that this must be ensured by
the user. Namely, the Fortran standard requires (F2023, 10.1.5.2.4):
"The execution of any numeric operation whose result is not defined by
the arithmetic used by the processor is prohibited."
And, for DO loops, F2023's "11.1.7.4.3 The execution cycle" has the
following: The number of loop iterations handled by an iteration count,
which would permit code like 'do i = huge(i)-5, huge(i),4'. However,
in step (3), this count is not only decremented by one but also:
"... The DO variable, if any, is incremented by the value of the
incrementation parameter m3."
And for the example above, 'i' would be 'huge(i)+3' in the last
execution cycle, which exceeds the largest model number and should
render the example as invalid.
PR fortran/107424
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_nonrect_loop_expr): Accept all
constant loop steps.
(gfc_trans_omp_do): Likewise; use sign to determine
loop direction.
libgomp/ChangeLog:
* libgomp.texi (Impl. Status 5.0): Add link to new PR110735.
* testsuite/libgomp.fortran/non-rectangular-loop-1.f90: Enable
commented tests.
* testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: Remove
test file; tests are in non-rectangular-loop-1.f90.
* testsuite/libgomp.fortran/non-rectangular-loop-5.f90: Change
testcase to use a non-constant step to retain the 'sorry' test.
* testsuite/libgomp.fortran/non-rectangular-loop-6.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/linear-2.f90: Update dump to remove
the additional count variable.
(cherry picked from commit 85da0b40538fb0d17d89de1e7905984668e3dfef)
|
|
Later versions of the static linker support a more flexible flag to
describe the OS, OS version and SDK used to build the code. This
replaces the functionality of '-mmacosx_version_min' (which is now
deprecated, leading to the diagnostic described in the PR).
We now use the platform_version flag when available which avoids the
diagnostic.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR target/110624
gcc/ChangeLog:
* config/darwin.h (DARWIN_PLATFORM_ID): New.
(LINK_COMMAND_A): Use DARWIN_PLATFORM_ID to pass OS, OS version
and SDK data to the static linker.
(cherry picked from commit 032b5da1fc781bd3c23d9caa72fb09439e7f6f3a)
|
|
The addition of the multiply_defined suppress flag has been handled for some
considerable time now in the Darwin specs; remove it from the testsuite libs.
Avoid duplicates in the specs.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config/darwin.h: Avoid duplicate multiply_defined specs on
earlier Darwin versions with shared libgcc.
libstdc++-v3/ChangeLog:
* testsuite/lib/libstdc++.exp: Remove additional flag handled
by Darwin specs.
gcc/testsuite/ChangeLog:
* lib/g++.exp: Remove additional flag handled by Darwin specs.
* lib/obj-c++.exp: Likewise.
(cherry picked from commit 3c776fdf1a825818ad7248d442e846f532574ff7)
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/95947
PR fortran/110658
* trans-expr.cc (gfc_conv_procedure_call): For intrinsic procedures
whose result characteristics depends on the first argument and which
can be of type character, the character length will not be deferred.
gcc/testsuite/ChangeLog:
PR fortran/95947
PR fortran/110658
* gfortran.dg/deferred_character_37.f90: New test.
(cherry picked from commit 95ddd2659849a904509067ec3a2770135149a722)
|
|
This patch comes from part of below change, which locate one bug of rvv
vsetvel pass when auto-vectorization.
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624523.html
Unforunately, It is not easy to reproduce this bug by intrinsic APIs
but it is worth to backport to GCC 13.
Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Add vl parameter.
(change_vsetvl_insn): Ditto.
(change_insn): Add validate change as well as assert.
(pass_vsetvl::backward_demand_fusion): Allow forward.
|
|
|
|
This fixes a crash when mangling an ADL-enabled call to a template-id
naming an unknown template (as per P0846R0).
PR c++/110524
gcc/cp/ChangeLog:
* mangle.cc (write_expression): Handle TEMPLATE_ID_EXPR
whose template is already an IDENTIFIER_NODE.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/fn-template26.C: New test.
(cherry picked from commit 97ceaa110e1607ec8f4f1223200868e1642f3cc7)
|
|
The matching code lacked a check that we end up with a PHI node
in the loop header. This caused us to match a random PHI argument
now catched by the extra PHI_ARG_DEF_FROM_EDGE checking.
PR tree-optimization/110669
* tree-scalar-evolution.cc (analyze_and_compute_bitop_with_inv_effect):
Check we matched a header PHI.
* gcc.dg/torture/pr110669.c: New testcase.
|
|
|
|
|
|
|
|
cprop1 pass does not consider paradoxical subreg and for (insn 22) claims
that it equals 8 elements of HImodeby setting REG_EQUAL note:
(insn 21 19 22 4 (set (reg:V4QI 98)
(mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0 S4 A32])) "pr110206.c":12:42 1530 {*movv4qi_internal}
(expr_list:REG_EQUAL (const_vector:V4QI [
(const_int -52 [0xffffffffffffffcc]) repeated x4
])
(nil)))
(insn 22 21 23 4 (set (reg:V8HI 100)
(zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
(const_int 4 [0x4])
(const_int 5 [0x5])
(const_int 6 [0x6])
(const_int 7 [0x7])
])))) "pr110206.c":12:42 7471 {sse4_1_zero_extendv8qiv8hi2}
(expr_list:REG_EQUAL (const_vector:V8HI [
(const_int 204 [0xcc]) repeated x8
])
(expr_list:REG_DEAD (reg:V4QI 98)
(nil))))
We rely on the "undefined" vals to have a specific value (from the earlier
REG_EQUAL note) but actual code generation doesn't ensure this (it doesn't
need to). That said, the issue isn't the constant folding per-se but that
we do not actually constant fold but register an equality that doesn't hold.
PR target/110206
gcc/ChangeLog:
* fwprop.cc (contains_paradoxical_subreg_p): Move to ...
* rtlanal.cc (contains_paradoxical_subreg_p): ... here.
* rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
* cprop.cc (try_replace_reg): Do not set REG_EQUAL note
when the original source contains a paradoxical subreg.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110206.c: New test.
(cherry picked from commit 1815e313a8fb519a77c94a908eb6dafc4ce51ffe)
|
|
gcc/fortran/ChangeLog:
PR fortran/110288
* symbol.cc (gfc_copy_formal_args_intr): When deriving the formal
argument attributes from the actual ones for intrinsic procedure
calls, take special care of CHARACTER arguments that we do not
wrongly treat them formally as deferred-length.
gcc/testsuite/ChangeLog:
PR fortran/110288
* gfortran.dg/findloc_10.f90: New test.
(cherry picked from commit 3b2c523ae31b68fc3b8363b458a55eec53a44365)
|
|
This patch enables the c-c++-common/gomp/declare-mapper-3.c test for C.
This was seemingly overlooked in commit 393fd99c90e.
2023-07-14 Julian Brown <julian@codesourcery.com>
gcc/testsuite/
* c-c++-common/gomp/declare-mapper-3.c: Enable for C.
|
|
This patch fixes a bug in non-contiguous 'target update' operations using
the new array-shaping operator for C and C++, processing the dimensions
of the array the wrong way round during the OpenMP lowering pass.
Fortran was also incorrectly using the wrong ordering but the second
reversal in omp-low.cc made it produce the correct result.
The C and C++ bug only affected array shapes where the dimension sizes
are different ([X][Y]) - several existing tests used the same value
for both/all dimensions ([X][X]), which masked the problem. Only the
array dimensions (extents) are affected, not e.g. the indices, lengths
or strides for array sections.
This patch reverses the order used in both omp-low.cc and the Fortran
front-end, so the order should now be correct for all supported base
languages.
2023-07-14 Julian Brown <julian@codesourcery.com>
gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_arrayshape_type): Reverse dimension
ordering for created array type.
gcc/
* omp-low.cc (lower_omp_target): Reverse iteration over array
dimensions.
libgomp/
* testsuite/libgomp.c-c++-common/array-shaping-14.c: New test.
|
|
gcc/ChangeLog:
PR target/101469
* config/sh/sh.md (peephole2): Handle case where eliminated reg
is also used by the address of the following memory operand.
|
|
|
|
PR target/106966
gcc/ChangeLog:
* config/alpha/alpha.cc (alpha_emit_set_long_const):
Always use DImode when constructing long const.
gcc/testsuite/ChangeLog:
* gcc.target/alpha/pr106966.c: New test.
(cherry picked from commit 337649c1660211db733c1ba34ae260b8c66a3578)
|
|
|
|
This patch fixes a bug with the calculation of array bounds in the
metadata for noncontiguous 'target update' directives. We record the
array base address, a bias and the array length to pass to libgomp --
but at present, we use the 'whole array size' for the last, which means
that at runtime we might look up an array with lower bound "base+bias"
and upper bound "base+bias+length", which for non-zero bias will overflow
the actual bounds of the array on the host and will (sometimes) return
an unrelated block instead of the correct one.
The fix is to instead calculate a size for the array that encloses the
elements to be transferred, and is guaranteed to be entirely within the
array (user errors excepted).
2023-07-11 Julian Brown <julian@codesourcery.com>
gcc/
* omp-low.cc (lower_omp_target): Calculate volume enclosing
transferred elements instead of using whole array size for
noncontiguous 'target update' operations.
|
|
gcc/ChangeLog:
* common/config/i386/cpuinfo.h
(get_intel_cpu): Handle Granite Rapids D.
* common/config/i386/i386-common.cc:
(processor_alias_table): Add graniterapids-d.
* common/config/i386/i386-cpuinfo.h
(enum processor_subtypes): Add INTEL_COREI7_GRANITERAPIDS_D.
* config.gcc: Add -march=graniterapids-d.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Handle graniterapids-d.
* config/i386/i386.h: (PTA_GRANITERAPIDS_D): New.
* doc/extend.texi: Add graniterapids-d.
* doc/invoke.texi: Ditto.
gcc/testsuite/ChangeLog:
* g++.target/i386/mv16.C: Add graniterapids-d.
* gcc.target/i386/funcspec-56.inc: Handle new march.
|
|
|
|
It turns out that adaint.c includes other Windows header files than just
windows.h, so defining WIN32_LEAN_AND_MEAN is not sufficient for it.
gcc/ada/
* adaint.c [_WIN32]: Undefine 'abort' macro.
|
|
On ports with 32-bit long, the test produced excess errors:
gcc/testsuite/g++.dg/vect/pr110557.cc:12:8: warning: width of
'Item::y' exceeds its type
Reported-by: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
gcc/testsuite/ChangeLog:
* g++.dg/vect/pr110557.cc: Use long long instead of long for
64-bit type.
(test): Remove an unnecessary cast.
(cherry picked from commit 312839653b8295599c63cae90278a87af528edad)
|
|
|
|
Merge up to r13-7553-g1e6a948cd22f2f142cdc828296f78c7af9e283c8 (10th July 2023)
|
|
If a bit-field is signed and it's wider than the output type, we must
ensure the extracted result sign-extended. But this was not handled
correctly.
For example:
int x : 8;
long y : 55;
bool z : 1;
The vectorized extraction of y was:
vect__ifc__49.29_110 =
MEM <vector(2) long unsigned int> [(struct Item *)vectp_a.27_108];
vect_patt_38.30_112 =
vect__ifc__49.29_110 & { 9223372036854775552, 9223372036854775552 };
vect_patt_39.31_113 = vect_patt_38.30_112 >> 8;
vect_patt_40.32_114 =
VIEW_CONVERT_EXPR<vector(2) long int>(vect_patt_39.31_113);
This is obviously incorrect. This pach has implemented it as:
vect__ifc__25.16_62 =
MEM <vector(2) long unsigned int> [(struct Item *)vectp_a.14_60];
vect_patt_31.17_63 =
VIEW_CONVERT_EXPR<vector(2) long int>(vect__ifc__25.16_62);
vect_patt_32.18_64 = vect_patt_31.17_63 << 1;
vect_patt_33.19_65 = vect_patt_32.18_64 >> 9;
gcc/ChangeLog:
PR tree-optimization/110557
* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern):
Ensure the output sign-extended if necessary.
gcc/testsuite/ChangeLog:
PR tree-optimization/110557
* g++.dg/vect/pr110557.cc: New test.
(cherry picked from commit 63ae6bc60c0f67fb2791991bf4b6e7e0a907d420)
|
|
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/110585
* arith.cc (gfc_compare_expr): Handle equality comparison of constant
complex gfc_expr arguments.
gcc/testsuite/ChangeLog:
PR fortran/110585
* gfortran.dg/findloc_9.f90: New test.
(cherry picked from commit 7ac1581d066a6f3a0d4acf1042a74634258b4966)
|
|
gcc/ChangeLog:
PR c++/110595
* doc/invoke.texi (Warning Options): Fix typo.
|
|
|
|
Restrict the generating of CONST_DECLs for D manifest constants to just
scalars without pointers. It shouldn't happen that a reference to a
manifest constant has not been expanded within a function body during
codegen, but it has been found to occur in older versions of the D
front-end (PR98277), so if the decl of a non-scalar constant is
requested, just return its initializer as an expression.
PR d/108842
gcc/d/ChangeLog:
* decl.cc (DeclVisitor::visit (VarDeclaration *)): Only emit scalar
manifest constants.
(get_symbol_decl): Don't generate CONST_DECL for non-scalar manifest
constants.
* imports.cc (ImportVisitor::visit (VarDeclaration *)): New method.
gcc/testsuite/ChangeLog:
* gdc.dg/pr98277.d: Add more tests.
* gdc.dg/pr108842.d: New test.
(cherry picked from commit f934c5753849f7c48c6a3abfcd73b8f6008e8371)
|
|
The stmt comparison function for GIMPLE_ASSIGNs for tail merging
still looks like it deals with pre-tuples IL. The following
attempts to fix this, not only comparing the first operand (sic!)
of stmts but all of them plus also compare the operation code.
PR tree-optimization/110556
* tree-ssa-tail-merge.cc (gimple_equal_p): Check
assign code and all operands of non-stores.
* gcc.dg/torture/pr110556.c: New testcase.
(cherry picked from commit 7b16686ef882ab141276f0e36a9d4ce1d755f64a)
|
|
In this PR we face the issue that LIM speculates a load when
hoisting it out of the loop (since it knows it cannot trap).
Unfortunately this exposes undefined behavior when the load
accesses memory with the wrong dynamic type. This later
makes PRE use that representation instead of the original
which accesses the same memory location but using a different
dynamic type leading to a wrong disambiguation of that
original access against another and thus a wrong-code transform.
Fortunately there already is code in PRE dealing with a similar
situation for code hoisting but that left a small gap which
when fixed also fixes the wrong-code transform in this bug even
if it doesn't address the underlying issue of LIM speculating
that load.
The upside is this fix is trivially safe to backport and chances
of code generation regressions are very low.
PR tree-optimization/110515
* tree-ssa-pre.cc (compute_avail): Make code dealing
with hoisting loads with different alias-sets more
robust.
* g++.dg/opt/pr110515.C: New testcase.
(cherry picked from commit 9f4f833455bb35c11d03e93f802604ac7cd8b740)
|
|
Feeding not optimized IL can result in predicate normalization
to simplify things so a predicate can get true or false. The
following re-orders the early exit in that case to come after
simplification and normalization to take care of that.
PR tree-optimization/110392
* gimple-predicate-analysis.cc (uninit_analysis::is_use_guarded):
Do early exits on true/false predicate only after normalization.
(cherry picked from commit ab6eac20f00761695c69b555f6b0a026bc25770d)
|
|
The following fixes a bug that manifests itself during fold-left
reduction transform in picking not the last scalar def to replace
and thus double-counting some elements. But the underlying issue
is that we merge a load permutation into the in-order reduction
which is of course wrong.
Now, reduction analysis has not yet been performend when optimizing
permutations so we have to resort to check that ourselves.
PR tree-optimization/110381
* tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts):
Materialize permutes before fold-left reductions.
* gcc.dg/vect/pr110381.c: New testcase.
(cherry picked from commit 53d6f57c1b20c6da52aefce737fb7d5263686ba3)
|
|
This patch fixes this issue happens on GCC-13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110560
This patch should be backported to GCC-13.
GCC-14 has rewritten this function, so there is no issue.
gcc/ChangeLog:
PR target/110560
* config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): Fix bug.
|
|
Ensure that that container aggregate expressions are expanded as
such and not as records even if the type of the expression is a
record.
gcc/ada/
* exp_aggr.adb (Expand_N_Aggregate): Ensure that container
aggregate expressions do not get expanded as records but instead
as container aggregates.
|
|
This just applies the same fix to Expand_Array_Aggregate as the one that was
recently applied to Convert_To_Assignments.
gcc/ada/
* exp_aggr.adb (Convert_To_Assignments): Tweak comment.
(Expand_Array_Aggregate): Do not delay the expansion if the parent
node is a container aggregate.
|
|
Initializing a vector using
Vec : V.Vector := [Some_Type'(Some_Abstract_Type with F => 0)];
may crash the compiler. The expander marks the N_Extension_Aggregate for
delayed expansion which never happens and incorrectly ends up in gigi.
The delayed expansion is needed for nested aggregates, which the
original code is testing for, but container aggregates are handled
differently.
Such assignments to container aggregates are later transformed into
procedure calls to the procedures named in the Aggregate aspect
definition, for which the delayed expansion is not required/expected.
gcc/ada/
* exp_aggr.adb (Convert_To_Assignments): Do not mark node for
delayed expansion if parent type has the Aggregate aspect.
* sem_util.adb (Is_Container_Aggregate): Move...
* sem_util.ads (Is_Container_Aggregate): ... here and make it
public.
|
|
|
|
This patch allows 'declare mapper' mappers to be used on 'omp target
data', 'omp target enter data' and 'omp target exit data' directives.
For each of these, only explicit mappings are supported, unlike for
'omp target' directives where implicit uses of variables inside an
offload region might trigger mappers also.
Each of C, C++ and Fortran are supported.
The patch also adjusts 'map kind decay' to match OpenMP 5.2 semantics,
which is particularly important with regard to 'exit data' operations.
2023-07-06 Julian Brown <julian@codesourcery.com>
gcc/c-family/
* c-common.h (c_omp_region_type): Add C_ORT_EXIT_DATA,
C_ORT_OMP_EXIT_DATA.
(c_omp_instantiate_mappers): Add region type parameter.
* c-omp.cc (omp_split_map_kind, omp_join_map_kind,
omp_map_decayed_kind): New functions.
(omp_instantiate_mapper): Add ORT parameter. Implement map kind decay
for instantiated mapper clauses.
(c_omp_instantiate_mappers): Add ORT parameter, pass to
omp_instantiate_mapper.
gcc/c/
* c-parser.cc (c_parser_omp_target_data): Instantiate mappers for
'omp target data'.
(c_parser_omp_target_enter_data): Instantiate mappers for 'omp target
enter data'.
(c_parser_omp_target_exit_data): Instantiate mappers for 'omp target
exit data'.
(c_parser_omp_target): Add c_omp_region_type argument to
c_omp_instantiate_mappers call.
* c-tree.h (c_omp_instantiate_mappers): Remove spurious prototype.
gcc/cp/
* parser.cc (cp_parser_omp_target_data): Instantiate mappers for 'omp
target data'.
(cp_parser_omp_target_enter_data): Instantiate mappers for 'omp target
enter data'.
(cp_parser_omp_target_exit_data): Instantiate mappers for 'omp target
exit data'.
(cp_parser_omp_target): Add c_omp_region_type argument to
c_omp_instantiate_mappers call.
* pt.cc (tsubst_omp_clauses): Instantiate mappers for OMP regions other
than just C_ORT_OMP_TARGET.
(tsubst_expr): Update call to tsubst_omp_clauses for OMP_TARGET_UPDATE,
OMP_TARGET_ENTER_DATA, OMP_TARGET_EXIT_DATA stanza.
* semantics.cc (cxx_omp_map_array_section): Avoid calling
build_array_ref for non-array/non-pointer bases (error reported
already).
gcc/fortran/
* trans-openmp.cc (omp_split_map_op, omp_join_map_op,
omp_map_decayed_kind): New functions.
(gfc_trans_omp_instantiate_mapper): Add CD parameter. Implement map
kind decay.
(gfc_trans_omp_instantiate_mappers): Add CD parameter. Pass to above
function.
(gfc_trans_omp_target_data): Instantiate mappers for 'omp target data'.
(gfc_trans_omp_target_enter_data): Instantiate mappers for 'omp target
enter data'.
(gfc_trans_omp_target_exit_data): Instantiate mappers for 'omp target
exit data'.
gcc/testsuite/
* c-c++-common/gomp/declare-mapper-15.c: New test.
* c-c++-common/gomp/declare-mapper-16.c: New test.
* g++.dg/gomp/declare-mapper-1.C: Adjust expected scan output.
* gfortran.dg/gomp/declare-mapper-22.f90: New test.
* gfortran.dg/gomp/declare-mapper-23.f90: New test.
|
|
|
|
This changes fixes PR target/105325. PR target/105325 is a bug where an
invalid lwa instruction is generated due to power10 fusion of a load
instruction to a GPR and an compare immediate instruction with the immediate
being -1, 0, or 1.
In some cases, when the load instruction is done, the GCC compiler would
generate a load instruction with an offset that was too large to fit into the
normal load instruction.
In particular, loads from the stack might originally have a small offset, so
that the load is not a prefixed load. However, after the stack is set up, and
register allocation has been done, the offset now is large enough that we would
have to use a prefixed load instruction.
The support for prefixed loads did not consider that patterns with a fused load
and compare might have a prefixed address. Without this support, the proper
prefixed load won't be generated.
In the original code, when the split2 pass is run after reload has finished the
ds_form_mem_operand predicate that was used for lwa and ld no longer returns
true. When the pattern was created, ds_form_mem_operand recognized the insn as
being valid since the offset was small. But after register allocation,
ds_form_mem_operand did not return true. Because it didn't return true, the
insn could not be split. Since the insn was not split and the prefix support
did not indicate a prefixed instruction was used, the wrong load is generated.
The solution involves:
1) Don't use ds_form_mem_operand for ld and lwa, always use
non_update_memory_operand.
2) Delete ds_form_mem_operand since it is no longer used.
3) Use the "YZ" constraints for ld/lwa instead of "m".
4) If we don't need to sign extend the lwa, convert it to lwz, and use
cmpwi instead of cmpdi. Adjust the insn name to reflect the code
generate.
5) Insure that the insn using lwa will be recognized as having a prefixed
operand (and hence the insn length will be 16 bytes instead of 8
bytes).
5a) Set the prefixed and maybe_prefix attributes to know that
fused_load_cmpi are also load insns;
5b) In the case where we are just setting CC and not using the memory
afterward, set the clobber to use a DI register, and put an
explicit sign_extend operation in the split;
5c) Set the sign_extend attribute to "yes" for lwa.
5d) 5a-5c are the things that prefixed_load_p in rs6000.cc checks to
ensure that lwa is treated as a ds-form instruction and not as
a d-form instruction (i.e. lwz).
6) Add a new test case for this case.
7) Adjust the insn counts in fusion-p10-ldcmpi.c. Because we are no
longer using ds_form_mem_operand, the ld and lwa instructions will fuse
x-form (reg+reg) addresses in addition ds-form (reg+offset or reg).
2023-06-23 Michael Meissner <meissner@linux.ibm.com>
gcc/
PR target/105325
* config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
allowed prefixed lwa to be generated.
* config/rs6000/fusion.md: Regenerate.
* config/rs6000/predicates.md (ds_form_mem_operand): Delete.
* config/rs6000/rs6000.md (prefixed attribute): Add support for load
plus compare immediate fused insns.
(maybe_prefixed): Likewise.
gcc/testsuite/
PR target/105325
* g++.target/powerpc/pr105325.C: New test.
* gcc.target/powerpc/fusion-p10-ldcmpi.c: Update insn counts.
(cherry picked from commit 370de1488a9a49956c47e5ec8c8f1489b4314a34)
Co-Authored-By: Aaron Sawdey <acsawdey@linux.ibm.com>
|
|
This makes the code more readable, more digestible, more maintainable,
more extensible. That kind of thing. It does that by pulling things
apart a bit, but also making what stays together more cohesive lumps.
The original function was a bunch of loops and early-outs, and then
quite a bit of stuff done per iteration, with the iterations essentially
independent of each other. This patch moves the stuff done for one
iteration to a new _one function.
The second big thing is the stuff printed to the .md file is done in
"here documents" now, which is a lot more readable than having to quote
and escape and double-escape pieces of text. Whitespace inside the
here-document is significant (will be printed as-is), which is a bit
awkward sometimes, or might take some getting used to, but it is also
one of the benefits of using them.
Local variables are declared at first use (or close to first use).
There also shouldn't be many at all, often you can write easier to
read and manage code by omitting to name something that is hard to name
in the first place.
Finally some things are done in more typical, more modern, and tighter
Perl style, for example REs in "if"s or "qw" for lists of constants.
2023-06-06 Segher Boessenkool <segher@kernel.crashing.org>
* config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): New, rewritten and
split out from...
(gen_ld_cmpi_p10): ... this.
(cherry picked from commit 19e5bf1d5fac00da0b8cd4144d5651b2979d8308)
|
|
The following replaces the simplistic gimple_uses_undefined_value_p
with the conservative mark_ssa_maybe_undefs approach as already
used by LIM and IVOPTs. This is to avoid exposing an unconditional
uninitialized read on a path from entry by if-combine.
PR tree-optimization/110228
* tree-ssa-ifcombine.cc (pass_tree_ifcombine::execute):
Mark SSA may-undefs.
(bb_no_side_effects_p): Check stmt uses for undefs.
* gcc.dg/torture/pr110228.c: New testcase.
* gcc.dg/uninit-pr101912.c: Un-XFAIL.
(cherry picked from commit b083203f053f1666e9cc1ded2abdf4e1688d1ec0)
|
|
|
|
Enable ENQCMD and UINTR for march=sierraforest according to Intel ISE
https://cdrdv2.intel.com/v1/dl/getContent/671368
gcc/ChangeLog
* config/i386/i386.h: Add PTA_ENQCMD and PTA_UINTR to PTA_SIERRAFOREST.
* doc/invoke.texi: Update new isa to march=sierraforest and grandridge.
|