Age | Commit message (Collapse) | Author | Files | Lines |
|
When iterating over actual and formal parameters, we should use
First_Actual/Next_Actual and not simply First/Next, because the
order of actual parameters might be different than the order of
formal parameters obtained with First_Formal/Next_Formal.
This patch fixes a glitch in validity checks for actual parameters
and applies the same fix to other misuses of First/Next as well.
gcc/ada/
* checks.adb (Ensure_Valid): Use First_Actual/Next_Actual.
* exp_ch6.adb (Is_Direct_Deep_Call): Likewise.
* exp_util.adb (Type_Of_Formal): Likewise.
* sem_util.adb (Is_Container_Element): Likewise; cleanup
membership test by using a subtype.
|
|
gcc/ada/
* sem_ch13.adb (Analyze_One_Aspect): Temporarily remove reporting
an error when the new aspect is set to True and the extensions are
not enabled.
|
|
The compiler does not report an error when 'access is applied to
a non-aliased class-wide interface type object.
gcc/ada/
* exp_util.ads (Is_Expanded_Class_Wide_Interface_Object_Decl): New
subprogram.
* exp_util.adb (Is_Expanded_Class_Wide_Interface_Object_Decl):
ditto.
* sem_util.adb (Is_Aliased_View): Handle expanded class-wide type
object declaration.
* checks.adb (Is_Aliased_Unconstrained_Component): Protect the
frontend against calling Is_Aliased_View with Empty. Found working
on this issue.
|
|
This patch adds support for a new GNAT aspect/pragma that modifies
the semantics of dispatching primitives. When a tagged type has
this aspect/pragma, only subprograms that have the first parameter
of this type will be considered dispatching primitives; this new
pragma/aspect is inherited by all descendant types.
gcc/ada/
* aspects.ads (Aspect_First_Controlling_Parameter): New aspect.
Defined as implementation defined aspect that has a static boolean
value and it is converted to pragma when the value is True.
* einfo.ads (Has_First_Controlling_Parameter): New attribute.
* exp_ch9.adb (Build_Corresponding_Record): Propagate the aspect
to the corresponding record type.
(Expand_N_Protected_Type_Declaration): Analyze the inherited
aspect to add the pragma.
(Expand_N_Task_Type_Declaration): ditto.
* freeze.adb (Warn_If_Implicitly_Inherited_Aspects): New
subprogram.
(Has_First_Ctrl_Param_Aspect): New subprogram.
(Freeze_Record_Type): Call Warn_If_Implicitly_Inherited_Aspects.
(Freeze_Subprogram): Check illegal subprograms of tagged types and
interface types that have this new aspect.
* gen_il-fields.ads (Has_First_Controlling_Parameter): New entity
field.
* gen_il-gen-gen_entities.adb (Has_First_Controlling_Parameter):
The new field is a semantic flag.
* gen_il-internals.adb (Image): Add
Has_First_Controlling_Parameter.
* par-prag.adb (Prag): No action for
Pragma_First_Controlling_Parameter since processing is handled
entirely in Sem_Prag.
* sem_ch12.adb (Validate_Private_Type_Instance): When the generic
formal has this new aspect, check that the actual type also has
this aspect.
* sem_ch13.adb (Analyze_One_Aspect): Check that the aspect is
applied to a tagged type or a concurrent type.
* sem_ch3.adb (Analyze_Full_Type_Declaration): Derived tagged
types inherit this new aspect, and also from their implemented
interface types.
(Process_Full_View): Propagate the aspect to the full view.
* sem_ch6.adb (Is_A_Primitive): New subprogram; used to factor
code and also clarify detection of primitives.
* sem_ch9.adb (Check_Interfaces): Propagate this new aspect to the
type implementing interface types.
* sem_disp.adb (Check_Controlling_Formals): Handle tagged type
that has the aspect and has subprograms overriding primitives of
tagged types that lack this aspect.
(Check_Dispatching_Operation): Warn on dispatching primitives
disallowed by this new aspect.
(Has_Predefined_Dispatching_Operation_Name): New subprogram.
(Find_Dispatching_Type): Handle dispatching functions of tagged
types that have the new aspect.
(Find_Primitive_Covering_Interface): For primitives of tagged
types that have the aspect and override a primitive of a parent
type that does not have the aspect, we must temporarily unset
attribute First_Controlling_ Parameter to properly check
conformance.
* sem_prag.ads (Aspect_Specifying_Pragma): Add new pragma.
* sem_prag.adb (Pragma_First_Controlling_Parameter): Handle new
pragma.
* snames.ads-tmpl (Name_First_Controlling_Parameter): New name.
* warnsw.ads (Warn_On_Non_Dispatching_Primitives): New warning.
* warnsw.adb (Warn_On_Non_Dispatching_Primitives): New warning;
not set by default when GNAT_Mode warnings are enabled, nor when
all warnings are enabled (-gnatwa).
|
|
gcc/fortran:
* invoke.texi (Code Gen Options): Add a missing word.
|
|
The generic GPL link redirects to GPL v3.0 right now, but may redirect
to a different version at one point. Specifically link to the version we
are using
gcc:
* doc/gm2.texi (License): Specifically link to GPL v3.0
|
|
This patch removes an unnecessary view_convert in trans_associate to
prevent hard to find runtime errors in the future. The view_convert was
erroneously introduced not understanding why ranks of the arrays to
assign are different. The ranks are fixed by PR86468 now and the
view_convert is obsolete.
gcc/fortran/ChangeLog:
PR fortran/86468
* trans-stmt.cc (trans_associate_var): Remove superfluous
view_convert.
|
|
The testcase cc.dg/vect/vect-mod-var.c has an division by 0
which is undefined. On some targets (aarch64), the scalar and
the vectorized version, the result of division by 0 is the same.
While on other targets (x86), we get a SIGFAULT. On other targets (powerpc),
the results are different.
The fix is to make sure the testcase does not test division by 0 (or really mod by 0).
Pushed as obvious after testing on x86_64-linux-gnu to make sure the testcase passes
now.
PR testsuite/116461
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-mod-var.c: Change the initialization loop so that
`b[i]` is never 0. Use 1 in those places.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
|
|
[PR116464]
This is an obvious fix to the gcc.dg/torture/pr116420.c testcase which simplier
changes from plain `char` to `signed char` so it works on targets where plain char defaults
to unsigned.
Pushed as obvious after a quick test for aarch64-linux-gnu to make sure the testcase
passes now.
PR testsuite/116464
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr116420.c:
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The DF framework provides us a way to run dataflow problems on sub-graphs.
Naturally a bitmap of interesting blocks is passed into those routines. At a
confluence point, the DF framework will not mark a block for re-processing if
it's not in that set of interesting blocks.
When ext-dce sets up that set of interesting blocks it's using the wrong
counter. ie, it's using n_basic_blocks rather than last_basic_block. If there
are holes in the block indices, some number of blocks won't get marked as
interesting.
In this case the block needing reprocessing has an index higher than
n_basic_blocks. It never gets reprocessed and the newly found live chunks
don't propagate further up the CFG -- ultimately resulting in a pseudo
appearing to have only the low 8 bits live, when in fact the low 32 bits are
actually live.
Fixed in the obvious way, by using last_basic_block instead.
Bootstrapped and regression tested on x86_64. Pushing to the trunk.
PR rtl-optimization/116420
gcc/
* ext-dce.cc (ext_dce_init): Fix loop iteration when setting up the
interesting block for DF to analyze.
gcc/testsuite
* gcc.dg/torture/pr116420.c: New test.
|
|
libstdc++-v3/ChangeLog:
* testsuite/25_algorithms/contains/1.cc: Verify value of
__cpp_lib_ranges_contains.
* testsuite/25_algorithms/find_last/1.cc: Verify value of
__cpp_lib_ranges_find_last.
* testsuite/26_numerics/iota/2.cc: Verify value of
__cpp_lib_ranges_iota.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
|
|
The patch streams out VOIDmode for aggregate types with offloading enabled,
and recomputes appropriate TYPE_MODE and DECL_MODE while streaming-in on accel
side. The rationale for this change is to avoid streaming out host-specific
modes that may be used for aggregate types, which may not be representable on
the accelerator. For eg, AArch64 uses OImode for ARRAY_TYPE whose size is 256-bits,
and nvptx doesn't have OImode, and thus ends up emitting an error from
lto_input_mode_table.
gcc/ChangeLog:
* lto-streamer-in.cc: (lto_read_tree_1): Set DECL_MODE (expr) to
TREE_TYPE (TYPE_MODE (expr)) if TREE_TYPE (expr) is aggregate type and
offloading is enabled.
* stor-layout.cc (layout_type): Move computation of mode for
ARRAY_TYPE from ...
(compute_array_mode): ... to here.
* stor-layout.h (compute_array_mode): Declare.
* tree-streamer-in.cc: Include stor-layout.h.
(unpack_ts_common_value_fields): Call compute_array_mode if offloading
is enabled.
* tree-streamer-out.cc (pack_ts_fixed_cst_value_fields): Stream out
VOIDmode if decl has aggregate type and offloading is enabled.
(pack_ts_type_common_value_fields): Stream out VOIDmode for aggregate
type if offloading is enabled.
Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
|
|
The stack-clash code is generating wrong cfi directives in
riscv_v_adjust_scalable_frame because REG_CFA_DEF_CFA has a different
encoding than REG_FRAME_RELATED_EXPR, this patch fixes the offset sign
in prologue and starts using REG_CFA_DEF_CFA in the epilogue.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_v_adjust_scalable_frame): Add
epilogue code for stack-clash and fix prologue cfi note.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/stack-check-cfa-3.c: Fix the expected output.
|
|
Algorithms that are generalized to take projections typically default the
projection to std::identity, which is equivalent to no projection at all.
In that case, I believe we could shortcut the projection logic to return
the iterator unchanged rather than wrapping it. This should reduce compile
times especially after P2609R3 which made the indirect invocability
concepts more expensive to check when actual projections are involved.
libstdc++-v3/ChangeLog:
* include/bits/iterator_concepts.h (__detail::__projected): Define
an optimized partial specialization for when the projection is
std::identity.
* testsuite/24_iterators/indirect_callable/projected.cc: Verify the
optimization.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
|
|
This implements the changes of this C++26 paper as a DR against C++20.
In passing this patch removes the std/ranges/version_c++23.cc test which
is now mostly obsolete after the version.def FTM refactoring, and instead
expands the __cpp_lib_ranges checks in another test so that it verifies
the exact value of the FTM on a per language version basis.
libstdc++-v3/ChangeLog:
* include/bits/iterator_concepts.h (indirectly_unary_invocable):
Relax as per P2997R1.
(indirectly_regular_unary_invocable): Likewise.
(indirect_unary_predicate): Likewise.
(indirect_binary_predicate): Likewise.
(indirect_equivalence_relation): Likewise.
(indirect_strict_weak_order): Likewise.
* include/bits/version.def (ranges): Update value for C++26.
* include/bits/version.h: Regenerate.
* testsuite/24_iterators/indirect_callable/p2997r1.cc: New test.
* testsuite/std/ranges/version_c++23.cc: Remove.
* testsuite/std/ranges/headers/ranges/synopsis.cc: Refine the
__cpp_lib_ranges checks.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
|
|
This implements the changes of this C++23 paper as a DR against C++20.
Note that after the later P2538R1 "ADL-proof std::projected" (which we
already implement), we can't use a simple partial specialization to match
specializations of the 'projected' alias template. So instead we identify
such specializations using a pair of distinguishing member aliases.
libstdc++-v3/ChangeLog:
* include/bits/iterator_concepts.h (__detail::__indirect_value):
Define.
(__indirect_value_t): Define as per P2609R3.
(iter_common_reference_t): Adjust as per P2609R3.
(indirectly_unary_invocable): Likewise.
(indirectly_regular_unary_invocable): Likewise.
(indirect_unary_predicate): Likewise.
(indirect_binary_predicate): Likewise.
(indirect_equivalence_relation): Likewise.
(indirect_strict_weak_order): Likewise.
(__detail::__projected::__type): Define member aliases
__projected_Iter and __projected_Proj providing the
template arguments of the current specialization.
* include/bits/version.def (ranges): Update value.
* include/bits/version.h: Regenerate.
* testsuite/24_iterators/indirect_callable/p2609r3.cc: New test.
* testsuite/std/ranges/version_c++23.cc: Update expected value
of __cpp_lib_ranges macro.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
|
|
This hook allows the BFD linker plugin to distinguish calls to
claim_file_handler that know the object is being used by the linker
(from ldmain.c:add_archive_element), from calls that don't know it's
being used by the linker (from elf_link_is_defined_archive_symbol); in
the latter case, the plugin should avoid including the unused LTO archive
members in link output. To get the proper support for archives with LTO
common symbols, the linker fix
commit a6f8fe0a9e9cbe871652e46ba7c22d5e9fb86208
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Wed Aug 14 20:50:02 2024 -0700
lto: Don't include unused LTO archive members in output
is required.
PR lto/116361
* lto-plugin.c (claim_file_handler_v2): Rename claimed to
can_be_claimed. Include the LTO object only if it is known to
be included in link output.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
The problem here was a missing save_expr around arg0 since
it is used twice, once in REALPART_EXPR and once in IMAGPART_EXPR.
Thia adds the save_expr and reformats the code slightly so it is a
little easier to understand. It excludes the case when arg0 is
a COMPLEX_EXPR since in that case we'll end up with the distinct
real and imaginary parts. This is important to retain early
optimization in some testcases.
Bootstapped and tested on x86_64-linux-gnu with no regressions.
PR middle-end/116454
gcc/ChangeLog:
* fold-const.cc (fold_binary_loc): Fix `a * +-1i`
by wrapping arg0 with save_expr when it is not COMPLEX_EXPR.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr116454-1.c: New test.
* gcc.dg/torture/pr116454-2.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Co-Authored-By: Richard Biener <rguenther@suse.de>
|
|
Single argument static_assert is C++17 only.
libcpp/ChangeLog:
* lex.cc(search_line_ssse3): fix static_assert to use 2 arguments.
|
|
aarch64-autovec-preference=N
The param aarch64-autovec-preference=N is a useful tool for testing
auto-vectorisation in GCC as it allows the user to force a particular
strategy. So far, N could be a numerical value between 0 and 4.
This patch replaces the numerical values by more user-friendly
names to distinguish the options.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
Ok for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
PR target/116365
* config/aarch64/aarch64-opts.h
(enum aarch64_autovec_preference_enum): New enum.
* config/aarch64/aarch64.cc (aarch64_cmp_autovec_modes):
Change numerical to enum values.
(aarch64_autovectorize_vector_modes): Change numerical to enum
values.
(aarch64_vector_costs::record_potential_advsimd_unrolling):
Change numerical to enum values.
* config/aarch64/aarch64.opt: Change param type to enum.
* doc/invoke.texi: Update documentation.
gcc/testsuite/
PR target/116365
* gcc.target/aarch64/autovec_param_asimd-only.c: New test.
* gcc.target/aarch64/autovec_param_default.c: Likewise.
* gcc.target/aarch64/autovec_param_prefer-asimd.c: Likewise.
* gcc.target/aarch64/autovec_param_prefer-sve.c: Likewise.
* gcc.target/aarch64/autovec_param_sve-only.c: Likewise.
* gcc.target/aarch64/neoverse_v1_2.c: Update parameter value.
* gcc.target/aarch64/neoverse_v1_3.c: Likewise.
* gcc.target/aarch64/sve/cond_asrd_1.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_4.c: Likewise.
* gcc.target/aarch64/sve/cond_unary_5.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_5.c: Likewise.
* gcc.target/aarch64/sve/cond_xorsign_2.c: Likewise.
* gcc.target/aarch64/sve/pr98268-1.c: Likewise.
* gcc.target/aarch64/sve/pr98268-2.c: Likewise.
|
|
This affects only the RISC-V targets, where the compiler options
-gvariable-location-views and consequently also -ginline-points
are disabled by default, which is unexpected and disables some
useful features of the generated debug info.
Due to a bug in the gas assembler the .loc statement
is not usable to generate location view debug info.
That is detected by configure:
configure:31500: checking assembler for dwarf2 debug_view support
configure:31509: .../riscv-unknown-elf/bin/as -o conftest.o conftest.s >&5
conftest.s: Assembler messages:
conftest.s:5: Error: .uleb128 only supports constant or subtract expressions
conftest.s:6: Error: .uleb128 only supports constant or subtract expressions
configure:31512: $? = 1
configure: failed program was
.file 1 "conftest.s"
.loc 1 3 0 view .LVU1
nop
.data
.uleb128 .LVU1
.uleb128 .LVU1
configure:31523: result: no
This results in dwarf2out_as_locview_support being set to false,
and that creates a sequence of events, with the end result that
most inlined functions either have no DW_AT_entry_pc, or one
with a wrong entry pc value.
But the location views can also be generated without using any
.loc statements, therefore we should enable the option
-gvariable-location-views by default, regardless of the status
of -gas-locview-support.
Note however, that the combination of the following compiler options
-g -O2 -gvariable-location-views -gno-as-loc-support
turned out to create invalid assembler intermediate files,
with lots of assembler errors like:
Error: leb128 operand is an undefined symbol: .LVU3
This affects all targets, except RISC-V of course ;-)
and is fixed by the changes in dwarf2out.cc
Finally the .debug_loclists created without assembler locview support
did contain location view pairs like v0000000ffffffff v000000000000000
which is the value from FORCE_RESET_NEXT_VIEW, but that is most likely
not as expected either, so change that as well.
gcc/ChangeLog:
* dwarf2out.cc (dwarf2out_maybe_output_loclist_view_pair,
output_loc_list): Correct handling of -gno-as-loc-support,
use ZERO_VIEW_P to output view number as zero value.
* toplev.cc (process_options): Do not automatically disable
-gvariable-location-views when -gno-as-loc-support or
-gno-as-locview-support is used, instead do automatically
disable -gas-locview-support if -gno-as-loc-support is used.
gcc/testsuite/ChangeLog:
* gcc.dg/debug/dwarf2/inline2.c: Add checks for inline entry_pc.
* gcc.dg/debug/dwarf2/inline6.c: Add -gno-as-loc-support and check
the resulting location views.
|
|
While this already works correctly for the case when an inlined
subroutine contains only one subrange, a redundant DW_TAG_lexical_block
is still emitted when the subroutine has multiple blocks.
Fixes: ac02e5b75451 ("re PR debug/37801 (DWARF output for inlined functions
doesn't always use DW_TAG_inlined_subroutine)")
gcc/ChangeLog:
PR debug/87440
* dwarf2out.cc (gen_inlined_subroutine_die): Handle the case
of multiple subranges correctly.
gcc/testsuite/ChangeLog:
* gcc.dg/debug/dwarf2/inline7.c: New test.
|
|
This patch adds a new vectorization pattern that detects the modulo
operation where the second operand is a variable.
It replaces the statement by division, multiplication, and subtraction.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
Ok for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
PR tree-optimization/101390
* tree-vect-patterns.cc (vect_recog_mod_var_pattern): Add new pattern.
gcc/testsuite/
PR tree-optimization/101390
* gcc.dg/vect/vect-mod-var.c: New test.
* gcc.target/aarch64/sve/mod_1.c: Likewise.
* lib/target-supports.exp: New selector expression.
|
|
Dump ICF-unified decls, thunks, aliases and whatnot along with their
ultimate targets, with edges from the alias to the target.
Add support for dropping the source file's suffix when forming from
dump-base, so that auxiliary files can be scanned, such as the .ci
files generated by -fcallgraph-info, as in the testcase.
for gcc/ChangeLog
* toplev.cc (dump_final_alias_vcg): New.
(dump_final_node_vcg): Dump aliases along with node.
for gcc/testsuite/ChangeLog
* lib/scandump.exp (dump-base): Support {} in dump base suffix
to drop it.
* gcc.dg/callgraph-info-1.c: New.
|
|
* Makefile.in: Regenerate.
* Makefile.tpl: Fix whitespace.
Signed-off-by: Sam James <sam@gentoo.org>
|
|
intermodule supported was dropped in r0-103106-gde6ba7aee152a0 with some
remaining bits for Fortran removed in r14-1696-gecc96eb5d2a0e5.
Remove some small leftovers.
* Makefile.in: Regenerate.
* Makefile.tpl (STAGE1_CONFIGURE_FLAGS): Remove --disable-intermodule.
|
|
When none of mprefer-vector-width, avx256_optimal/avx128_optimal,
avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will
set ix86_{move_max,store_max} as max available vector length except
for AVX part.
if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)
&& TARGET_EVEX512_P (opts->x_ix86_isa_flags2))
opts->x_ix86_move_max = PVW_AVX512;
else
opts->x_ix86_move_max = PVW_AVX128;
So for -mavx2, vectorizer will choose 256-bit for vectorization, but
128-bit is used for struct copy, there could be a potential STLF issue
due to this "misalign".
The patch fixes that.
gcc/ChangeLog:
* config/i386/i386-options.cc (ix86_option_override_internal):
set ix86_{move_max,store_max} to PVW_AVX256 when TARGET_AVX
instead of PVW_AVX128.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pieces-memcpy-10.c: Add -mprefer-vector-width=128.
* gcc.target/i386/pieces-memcpy-6.c: Ditto.
* gcc.target/i386/pieces-memset-38.c: Ditto.
* gcc.target/i386/pieces-memset-40.c: Ditto.
* gcc.target/i386/pieces-memset-41.c: Ditto.
* gcc.target/i386/pieces-memset-42.c: Ditto.
* gcc.target/i386/pieces-memset-43.c: Ditto.
* gcc.target/i386/pieces-strcpy-2.c: Ditto.
* gcc.target/i386/pieces-memcpy-22.c: New test.
* gcc.target/i386/pieces-memset-51.c: New test.
* gcc.target/i386/pieces-strcpy-3.c: New test.
|
|
|
|
This patch would like to add test cases for the unsigned vector
.SAT_TRUNC form 3. Aka:
Form 3:
#define DEF_VEC_SAT_U_TRUNC_FMT_3(NT, WT) \
void __attribute__((noinline)) \
vec_sat_u_trunc_##NT##_##WT##_fmt_3 (NT *out, WT *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
WT max = (WT)(NT)-1; \
out[i] = in[i] <= max ? (NT)in[i] : (NT)max; \
} \
}
DEF_VEC_SAT_U_TRUNC_FMT_3 (uint32_t, uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-13.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-14.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-15.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-17.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-18.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-13.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-14.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-15.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-16.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-17.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-18.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to add test cases for the unsigned vector
.SAT_TRUNC form 2. Aka:
Form 2:
#define DEF_VEC_SAT_U_TRUNC_FMT_2(NT, WT) \
void __attribute__((noinline)) \
vec_sat_u_trunc_##NT##_##WT##_fmt_2 (NT *out, WT *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
WT max = (WT)(NT)-1; \
out[i] = in[i] > max ? (NT)max : (NT)in[i]; \
} \
}
DEF_VEC_SAT_U_TRUNC_FMT_2 (uint32_t, uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-10.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-11.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-12.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-7.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-9.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-10.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-11.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-12.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-7.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-8.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-9.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Another RTL checking failure in ext-dce. An easy one to fix this time.
When we optimize an extension we have to go back and cleanup with
SUBREG_PROMOTED state. So we record the destination register into a bitmap as
we make changes, then later do a single pass over the IL fixing any associated
subreg expressions.
The optimization is changing the SET_SRC and largely ignores the destination.
The LHS could be a REG, SUBREG, or ZERO_EXTRACT. If the LHS is a SUBREG or
ZERO_EXTRACT we can just strip them.
Bootstrapped and ran the testsuite with an RTL checking compiler and verified
no ext-dce RTL checking failures tripped. Also bootstrapped and regression
tested x86_64 in the usual way.
Pushing to the trunk.
PR rtl-optimization/116437
gcc/
* ext-dce.cc (ext_dce_try_optimize_insn): Handle SUBREG and
ZERO_EXTRACT destinations.
|
|
The testcase contains a VNx2QImode pseudo that is live across a call
and that cannot be allocated a call-preserved register. LRA quite
reasonably tried to save it before the call and restore it afterwards.
Unfortunately, the target told it to do that in SImode, even though
punning between SImode and VNx2QImode is disallowed by both
TARGET_CAN_CHANGE_MODE_CLASS and TARGET_MODES_TIEABLE_P.
The natural class to use for SImode is GENERAL_REGS, so this led
to an unsalvageable situation in which we had:
(set (subreg:VNx2QI (reg:SI A) 0) (reg:VNx2QI B))
where A needed GENERAL_REGS and B needed FP_REGS. We therefore ended
up in a reload loop.
The hooks above should ensure that this situation can never occur
for incoming subregs. It only happened here because the target
explicitly forced it.
The decision to use SImode for modes smaller than 4 bytes dates
back to the beginning of the port, before 16-bit floating-point
modes existed. I'm not sure whether promoting to SImode really
makes sense for any FPR, but that's a separate performance/QoI
discussion. For now, this patch just disallows using SImode
when it is wrong for correctness reasons, since that should be
safer to backport.
gcc/
PR testsuite/116238
* config/aarch64/aarch64.cc (aarch64_hard_regno_caller_save_mode):
Only return SImode if we can convert to and from it.
gcc/testsuite/
PR testsuite/116238
* gcc.target/aarch64/sve/pr116238.c: New test.
|
|
When CSSC is not enabled, 128bit popcount can be implemented
just via the vector (v16qi) cnt instruction followed by a reduction,
like how the 64bit one is currently implemented instead of
splitting into 2 64bit popcount.
Changes since v1:
* v2: Make operand 0 be DImode instead of TImode and simplify.
Build and tested for aarch64-linux-gnu.
PR target/113042
gcc/ChangeLog:
* config/aarch64/aarch64.md (popcountti2): New define_expand.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/popcnt10.c: New test.
* gcc.target/aarch64/popcnt9.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The following does away with the idea to use non-symmetrical
testing of mode_can_transfer_bits in hash-table equality testing.
It isn't feasible to always control query order to maintain
consistency.
PR tree-optimization/116406
* tree-ssa-sccvn.cc (vn_reference_eq): Never equate
float and int when the float mode cannot transfer bits.
Do not try to anticipate which is the mode we actually load
from.
* gcc.dg/tree-ssa/pr116406.c: New testcase.
* gcc.dg/tree-ssa/ssa-pre-30.c: On x86 dd -msse -mfpmath=sse.
|
|
PR 58416 shows that storing non-floating point data to floating point
scalar registers can lead to miscompilations when the data is
normalized or otherwise processed upon loading to a register. To
avoid that risk, this patch detects situations where we have multiple
types and a we decide to represent the data in a type with a mode that
is known to not be able to transfer actual bits reliably using the new
TARGET_MODE_CAN_TRANSFER_BITS hook.
gcc/ChangeLog:
2024-08-19 Martin Jambor <mjambor@suse.cz>
PR target/58416
* tree-sra.cc (types_risk_mangled_binary_repr_p): New function.
(sort_and_splice_var_accesses): Use it.
(propagate_subaccesses_from_rhs): Likewise.
gcc/testsuite/ChangeLog:
2024-08-19 Martin Jambor <mjambor@suse.cz>
PR target/58416
* gcc.dg/torture/pr58416.c: New test.
|
|
When updating LC PHIs after copying loops we have to handle defs
defined outside of the loop appropriately (by not setting them to
NULL ...). This mimics how we handle this in the SSA updating
code of the vectorizer.
PR tree-optimization/116380
* tree-loop-distribution.cc (copy_loop_before): Handle
out-of-loop defs appropriately.
* gcc.dg/torture/pr116380.c: New testcase.
|
|
libstdc++-v3/ChangeLog:
PR tree-optimization/102958
* include/bits/char_traits.h (char_traits<char8_t>::length): Use
strlen.
|
|
This is LWG 4084 which I filed recently. LWG seems to support making the
change, so that std::num_put can use the %F format for floating-point
numbers.
libstdc++-v3/ChangeLog:
PR libstdc++/114862
* src/c++98/locale_facets.cc (__num_base::_S_format_float):
Check uppercase flag for fixed format.
* testsuite/22_locale/num_put/put/char/lwg4084.cc: New test.
|
|
The corank was propagated to array components in derived types. Fix
this by setting a zero corank when the array component is not a pointer.
For pointer typed array components propagate the corank of the derived
type to allow associating the component to a coarray.
gcc/fortran/ChangeLog:
PR fortran/86468
* trans-intrinsic.cc (conv_intrinsic_move_alloc): Correct
comment.
* trans-types.cc (gfc_sym_type): Pass coarray rank, not false.
(gfc_get_derived_type): Only propagate codimension for coarrays
and pointers to array components in derived typed coarrays.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray_lib_this_image_2.f90: Fix array rank in
tree dump scan.
* gfortran.dg/coarray_lib_token_4.f90: Same.
* gfortran.dg/coarray/move_alloc_2.f90: New test.
|
|
libstdc++-v3/ChangeLog:
PR libstdc++/116381
* include/std/variant (variant): Fix conditions for
static_assert to match the spec.
* testsuite/20_util/variant/types_neg.cc: New test.
|
|
This performs the same basic check that is done by finish_function
to catch cases where the function is so badly malformed that we
do not have a consistent binding level.
gcc/cp/ChangeLog:
* coroutines.cc (split_coroutine_body_from_ramp): Check
that the binding level is as expected before attempting
to outline the function body.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
The new g++.target/i386/pr116275-2.C test FAILs on 32-bit Solaris/x86:
FAIL: g++.target/i386/pr116275-2.C scan-assembler vpslld
This happens because Solaris defaults to -mstackrealign, disabling -mstv.
Fixed by disabling the former and enabling the latter.
Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
2024-08-20 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
* g++.target/i386/pr116275-2.C (dg-options): Add -mstv
-mno-stackrealign.
|
|
Use se's class_container where present in sizeof().
PR fortran/77518
gcc/fortran/ChangeLog:
* trans-intrinsic.cc (gfc_conv_intrinsic_sizeof): Use
class_container of se when set.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/sizeof_1.f90: New test.
|
|
Since *vsx_le_perm_store_* can be split into vector
permute and vector store, after reload_completed, we reuse
the operand 1 as the destination of vector permute, so we
set operand 1 with constraint modifier "+". But since
it's taken as pure input in DF and most passes as Richard
pointed out in [1], to ensure it's correct when operand 1
is still live, we actually restore the operand 1's value
after the store with vector permute, that is:
op1 = vector permute op1 (doubleword swapping)
op0 = op2
op1 = vector permute op1 (doubleword swapping)
, it means op1's value isn't changed by this insn.
So according to the comments from Richard and Segher in
that thread, this patch is to remove the "+" constraint
modifier of operand 1 from *vsx_le_perm_store_* insns.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660145.html
gcc/ChangeLog:
* config/rs6000/vsx.md (define_insn *vsx_le_perm_store_{<VSX_D:mode>,
<VSX_W:mode>,v8hi,v16qi,<VSX_LE_128:mode>}): Remove constraint modifier
"+" from operand 1.
|
|
For vsx_le_perm_store_* we have two splitters, one is for
!reload_completed and the other is for reload_completed.
As Richard pointed out in [1], operand 1 here is a pure
input for DF and most passes, but it could be used as the
vector rotation (64 bit) destination of itself, so we
re-compute the source (back to the original value) for
the case reload_completed, while for !reload_completed we
generate one new pseudo, so both cases are fine if operand
1 is still live after this insn. But according to the
source code, for !reload_completed case, it can logically
reuse the operand 1 as the new pseudo generation is
conditional on can_create_pseudo_p, then it can cause
wrong result once operand 1 is live. So considering this
and there is no splitting for this when reload_in_progress,
this patch is to fix the code to assert can_create_pseudo_p
there, so that both !reload_completed and reload_completed
cases would ensure operand 1 is unchanged (pure input), it
is also prepared for the following up patch which would
strip the unnecessary INOUT constraint modifier "+".
This also fixes an oversight in the splitter for VSX_LE_128
(!reload_completed), it should use operand 1 rather than
operand 0.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660145.html
gcc/ChangeLog:
* config/rs6000/vsx.md (*vsx_le_perm_store_{<VSX_D:mode>,<VSX_W:mode>,
v8hi,v16qi,<VSX_LE_128:mode>} !reload_completed splitters): Assert
can_create_pseudo_p and always generate one new pseudo for operand 1.
|
|
Similar to r15-710-g458b23bc8b3e2b which removed all uses of
powerpc-*-linux*paired*, this patch is to remove the remaining
powerpc-*paired* uses which I missed to catch with "*linux*"
in search keyword.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_vect_support_and_set_flags): Remove
the if arm checking powerpc-*paired*.
(check_750cl_hw_available): Remove.
(check_effective_target_vect_unpack): Remove the check on
powerpc-*paired*.
|
|
> It's not obvious to me why movv16qi requires a nonimmediate_operand
> > source, especially since ix86_expand_vector_mode does have code to
> > cope with constant operand[1]s. emit_move_insn_1 doesn't check the
> > predicates anyway, so the predicate will have little effect.
> >
> > A workaround would be to check legitimate_constant_p instead of the
> > predicate, but I'm not sure that that should be necessary.
> >
> > Has this already been discussed? If not, we should loop in the x86
> > maintainers (but I didn't do that here in case it would be a repeat).
>
> I also noticed it. Not sure why movv16qi requires a
> nonimmediate_operand, while ix86_expand_vector_mode could deal with
> constant op. Looking forward to Hongtao's comments.
The code has been there since 2005 before I'm involved.
It looks to me at the beginning both mov<mode> and
*mov<mode>_internal only support nonimmediate_operand for the
operands[1].
And r0-75606-g5656a184e83983 adjusted the nonimmediate_operand to
nonimmediate_or_sse_const_operand for *mov<mode>_internal, but not for
mov<mode>. I think we can align the predicate between mov<mode>
and *mov<mode>_internal.
gcc/ChangeLog:
* config/i386/sse.md (mov<mode>): Align predicates for
operands[1] between mov<mode> and *mov<mode>_internal.
* config/i386/mmx.md (mov<mode>): Ditto.
|
|
|
|
supports an optab for it
On aarch64 (without !CSSC instructions), since popcount is implemented using the SIMD instruction cnt,
instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit cnt (V16QI mode). And only one
reduction addition instead of 2. Currently fold_builtin_bit_query will expand always without checking
if there was an optab for the type, so this changes that to check the optab to see if we should expand
or have the backend handle it.
Bootstrapped and tested on x86_64-linux-gnu and built and tested for aarch64-linux-gnu.
gcc/ChangeLog:
* builtins.cc (fold_builtin_bit_query): Don't expand double
`unsigned long long` typess if there is an optab entry for that
type.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|