Age | Commit message (Collapse) | Author | Files | Lines |
|
Check_Scil failed due to not handling a type that came from a package that was
mentioned in a limited-with clause. Also, an aggregate with an uninitialized
component was not being pretty-printed properly.
gcc/ada/
* pprint.adb (List_Name): Check for "Box_Present" when displaying
a list, and emit "<>" if returns True.
* sem_scil.adb (Check_SCIL_Node): Handle case when the type of a
parameter is from a package that was mentioned in a limited with
clause, and make no further checks, since this check routine does
not have all the logic to check such a usage.
|
|
The problem is that the freeze node for the class-wide subtype built for the
expression of the allocator escapes from the dependent expression instead of
being stored in its list of actions.
gcc/ada/
* freeze.adb (Freeze_Expression.Has_Decl_In_List): Deal specifically
with itypes that are class-wide subtypes.
|
|
This is modeled on the existing binding for __atomic_load_n.
gcc/ada/
* libgnat/s-atopri.ads (Atomic_Store): New generic procedure.
(Atomic_Store_8): New instantiated procedure.
(Atomic_Store_16): Likewise.
(Atomic_Store_32): Likewise.
(Atomic_Store_64): Likewise.
* libgnat/s-atopri__32.ads (Atomic_Store): New generic procedure.
(Atomic_Store_8): New instantiated procedure.
(Atomic_Store_16): Likewise.
(Atomic_Store_32): Likewise.
* gcc-interface/decl.cc (gnat_to_gnu_subprog_type): Implement the
support for __atomic_store_n and __sync_bool_compare_and_swap_n.
* gcc-interface/gigi.h (list_second): New inline function.
|
|
Add missing support for RM 9.5.4(5.6/4): the target of a requeue
statement may be a procedure when its name denotes a renaming of
an entry.
gcc/ada/
* sem_ch6.adb (Analyze_Subprogram_Specification): Do not replace
the type of the formals with its corresponding record in
init-procs.
* sem_ch9.adb (Analyze_Requeue): Add missing support to requeue to
a procedure that denotes a renaming of an entry.
|
|
In GNATprove mode the removal of side effects is only needed in certain
syntactic contexts, which include subtype declarations. Now this removal
is limited to genuine subtype declarations and not to itypes coming from
expressions where side effects are not expected.
gcc/ada/
* exp_util.adb (Possible_Side_Effect_In_SPARK): Refine handling of
itype declarations.
|
|
Previously if a subprogram call could not be inlined in GNATprove mode,
then all subsequent calls to the same subprogram were not inlined
either (because a failed attempt to inline clears flag Is_Inlined_Always
and we tested this flag when attempting to inline subsequent calls).
Now a failure in inlining of a particular call does not prevent inlining
of subsequent calls to the same subprogram, except when inlining failed
because the subprogram was detected to be recursive (which clears the
Is_Inlined flag that we now examine).
This change allows more checks to be proved and reduces interactions
between inlining and SPARK legality checks.
gcc/ada/
* sem_ch6.adb (Analyze_Subprogram_Specification): Set Is_Inlined
flag by default in GNATprove mode.
* sem_res.adb (Resolve_Call): Only look at flag which is cleared
when inlined subprogram is detected to be recursive.
|
|
Inlining of subprogram calls happens in routine Expand_Inlined_Call
which calls Establish_Actual_Mapping_For_Inlined_Call. Both routines
had detection of recursive calls. The detection in the second routine
was dead code.
gcc/ada/
* inline.adb (Establish_Actual_Mapping_For_Inlined_Call):
Remove detection of recursive calls.
|
|
Removed code was dead because it could only be executed when
Back_End_Inlining is True and that flag is always false in
GNATprove_Mode.
gcc/ada/
* inline.adb (Cannot_Inline): Cleanup use of 'Length; remove
dead code.
|
|
Fix style violation reported by GNATcheck.
gcc/ada/
* sem_aggr.adb (Resolve_Container_Aggregate): Use "No".
* sem_ch8.adb (Find_Direct_Name): Likewise.
|
|
Fix Sem_Util.Enclosing_Declaration to not return an N_Subprogram_Specification
node. Remove code in various places that was formerly needed to cope with this
misbehavior.
gcc/ada/
* sem_util.adb (Enclosing_Declaration): Instead of returning a
subprogram specification node, return its parent (which is
presumably a subprogram declaration).
* contracts.adb (Insert_Stable_Property_Check): Remove code
formerly needed to compensate for incorrect behavior of
Sem_Util.Enclosing_Declaration.
* exp_attr.adb (In_Available_Context): Remove code formerly needed
to compensate for incorrect behavior of
Sem_Util.Enclosing_Declaration.
* sem_ch8.adb (Is_Actual_Subp_Of_Inst): Remove code formerly
needed to compensate for incorrect behavior of
Sem_Util.Enclosing_Declaration.
|
|
In some cases the compiler would crash or generate spurious errors
compiling a legal object renaming declaration that lacks a subtype mark.
In addition to fixing the immediate problem, change Atree.Copy_Slots
so that attempts to modify either the Empty or the Error nodes
(e.g., by passing one of them as the target in a call to Rewrite)
are ineffective. Cope with the consequences of this.
gcc/ada/
* sem_ch8.adb (Check_Constrained_Object): Before updating the
subtype mark of an object renaming declaration by calling Rewrite,
first check whether the destination of the Rewrite call exists.
* atree.adb (Copy_Slots): Return without performing any updates if
Destination equals Empty or Error, or if Source equals Empty. Any
of those conditions indicates an error case.
* sem_ch12.adb (Analyze_Formal_Derived_Type): Avoid cascading
errors.
* sem_ch3.adb (Analyze_Number_Declaration): In an error case, do
not pass Error as destination in a call to Rewrite.
(Find_Type_Of_Subtype_Indic): In an error case, do not pass Error
or Empty as destination in a call to Rewrite.
|
|
The precondition of both Update procedures in Interfaces.C.Strings were
incorrect. This patch fixes this.
gcc/ada/
* libgnat/i-cstrin.ads (Update): Fix precondition.
|
|
The only functions using the BIP protocol are now those returning a limited
type: Is_Build_In_Place_Result_Type => Is_Inherently_Limited_Type.
gcc/ada/
* sem_aggr.adb (Resolve_Extension_Aggregate): Remove the unreachable
call to Transform_BIP_Assignment as well as the procedure.
|
|
For an actual passed as an 'in out' parameter of a type support
subprogram such as deep finalize, do not count it as a read
reference of the actual. Clearly these should not count.
Furthermore, counting them causes different warnings in -gnatc
mode compared to normal mode, because the calls only exist in
normal mode, which would disable the warnings. Such warnings now
occur in both modes, instead of just with -gnatc.
gcc/ada/
* lib-xref.adb (Generate_Reference): Do not count it as a read
reference if we're calling a TSS.
|
|
Add description of a recently added SPARK contract.
gcc/ada/
* doc/gnat_rm/implementation_defined_aspects.rst,
doc/gnat_rm/implementation_defined_pragmas.rst: Add sections for
Always_Terminates.
* gnat-style.texi: Regenerate.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.
|
|
The r14-6524 changes created aarch64-builtins.h header and moved
struct aarch64_simd_type_info definition in there.
Unfortunately, the new header wasn't added to target_gtfiles, so the
trees and const char * pointer elements in the aarch64_simd_types
array aren't marked as GC roots anymore. That breaks e.g. PCH, when
the array elements then can refer to ggc_freed memory instead of the expected
types, but also any other GC collection could free them and further uses would
not work correctly.
Unfortunately, just adding the new header to target_gtfiles doesn't fix this,
because non-static variable definitions marked with GTY(()) aren't considered
by gengtype, it looks in those cases for an extern GTY(()) declaration, and
there was none - the aarch64-builtins.h header contains an extern declaration
without GTY(()). Adding GTY(()) to that extern declaration doesn't work, because
then gengtype attempts to emit the aarch64_simd_types GC roots in gtype-desc.cc
but the corresponding header isn't included there.
So, the patch instead adds another extern declaration in aarch64-builtins.cc
right before the actual definition, which makes sure the GC roots are registered
correctly in gt-aarch64-builtins.h (where we want them).
2024-01-09 Jakub Jelinek <jakub@redhat.com>
PR target/113270
* config.gcc (aarch64*-*-*): Add aarch64-builtins.h to target_gtfiles.
* config/aarch64/aarch64-builtins.cc (aarch64_simd_types): Add extern
GTY(()) declaration before the definition, drop GTY(()) drom the
definition.
|
|
The late amendment with a limit based on VF was redundant and wrong
for peeled early exits. The following moves the adjustment done
when we don't have a skip edge down to the place where the already
existing VF based max iter check is done and removes the amendment.
PR tree-optimization/113026
* tree-vect-loop-manip.cc (vect_do_peeling): Remove
redundant and wrong niter bound setting. Move niter
bound adjustment down.
|
|
gcc/ada/
PR ada/78207
* libgnat/g-regexp.ads: Fix outdated comment.
|
|
In C you can have loops without a condition, the original version of the patch
was rejecting the use of #pragma GCC novector, however during review it was
changed to not due this with the reason that we didn't want to give a compile
error with such cases.
However because annotations seem to be only be allowed on conditions (unless
I'm mistaken?) the attached example ICEs because there's no condition.
This will have it ignore the pragma instead of ICEing. I don't know if this is
the best solution, but as far as I can tell we can't attach the annotation to
anything else.
gcc/c/ChangeLog:
PR c/113267
* c-parser.cc (c_parser_for_statement): Skip the pragma is no cond.
gcc/testsuite/ChangeLog:
PR c/113267
* gcc.dg/pr113267.c: New test.
|
|
We can't support nonlinear inductions other than neg when vectorizing
early breaks and iteration count is known.
For early break we currently require a peeled epilog but in these cases
we can't compute the remaining values.
gcc/ChangeLog:
PR middle-end/113163
* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p):
Reject non-linear inductions that aren't supported.
gcc/testsuite/ChangeLog:
PR middle-end/113163
* gcc.target/gcn/pr113163.c: New test.
|
|
One of the cool features of the H8 backend is its use of tables to select
optimal shift implementations for different CPU variants. This patch
borrows (plagiarizes) that idiom for SImode left shifts in the ARC backend
(for CPUs without a barrel-shifter). This provides a convenient mechanism
for both selecting the best implementation strategy (for speed vs. size),
and providing accurate rtx_costs [without duplicating a lot of logic].
Left shift RTX costs are especially important for use in synth_mult.
An example improvement is:
int foo(int x) { return 32768*x; }
which is now generated with -O2 -mcpu=em -mswap as:
foo: bmsk_s r0,r0,16
swap r0,r0
j_s.d [blink]
ror r0,r0
where previously the ARC backend would generate a loop:
foo: mov lp_count,15
lp 2f
add r0,r0,r0
nop
2: # end single insn loop
j_s [blink]
2024-01-09 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/arc/arc.cc (arc_shift_alg): New enumerated type for
left shift implementation strategies.
(arc_shift_info): Type for each entry of the shift strategy table.
(arc_shift_context_idx): Return a integer value for each code
generation context, used as an index
(arc_ashl_alg): Table indexed by context and shifted bit count.
(arc_split_ashl): Use the arc_ashl_alg table to select SImode
left shift implementation.
(arc_rtx_costs) <case ASHIFT>: Use the arc_ashl_alg table to
provide accurate costs, when optimizing for speed or size.
|
|
As Robin suggested, remove gimple_uid check which is sufficient for our need.
Tested on both RV32/RV64 no regression, ok for trunk ?
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (loop_invariant_op_p): Fix loop invariant check.
|
|
This patch supports "lvalue" parsing (or "locator list item type" parsing)
for several OpenMP clause types for C++, as required for OpenMP 5.0
and above.
This version has been rebased -- some things have changed around
template handling recently, e.g. removal of build_non_dependent_expr and
tsubst_copy. A new potential corner-case issue has shown up regarding
implicit mapping of references to pointer to pointers -- an interaction
with the post-review fixes/rework for the patch here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638602.html
Which fixed the (new) tests baseptrs-[6789].C. I've noted that for now in
the patch, and adjusted the baseptrs-[46].C tests slightly to accommodate.
2024-01-08 Julian Brown <julian@codesourcery.com>
gcc/c-family/
* c-common.h (c_omp_address_inspector): Remove static from get_origin
and maybe_unconvert_ref methods.
* c-omp.cc (c_omp_split_clauses): Support OMP_ARRAY_SECTION.
(c_omp_address_inspector::map_supported_p): Handle OMP_ARRAY_SECTION.
(c_omp_address_inspector::get_origin): Avoid dereferencing possibly
NULL type when processing template decls.
(c_omp_address_inspector::maybe_unconvert_ref): Likewise.
gcc/cp/
* constexpr.cc (potential_consant_expression_1): Handle
OMP_ARRAY_SECTION.
* cp-tree.h (grok_omp_array_section, build_omp_array_section): Add
prototypes.
* decl2.cc (grok_omp_array_section): New function.
* error.cc (dump_expr): Handle OMP_ARRAY_SECTION.
* parser.cc (cp_parser_new): Initialize parser->omp_array_section_p.
(cp_parser_statement_expr): Disallow array sections.
(cp_parser_postfix_open_square_expression): Support OMP_ARRAY_SECTION
parsing.
(cp_parser_parenthesized_expression_list, cp_parser_lambda_expression,
cp_parser_braced_list): Disallow array sections.
(cp_parser_omp_var_list_no_open): Remove ALLOW_DEREF parameter, add
MAP_LVALUE in its place. Support generalised lvalue parsing for
OpenMP map, to and from clauses. Use OMP_ARRAY_SECTION
code instead of TREE_LIST to represent OpenMP array sections.
(cp_parser_omp_var_list): Remove ALLOW_DEREF parameter, add MAP_LVALUE.
Pass to cp_parser_omp_var_list_no_open.
(cp_parser_oacc_data_clause): Update call to cp_parser_omp_var_list.
(cp_parser_omp_clause_map): Add sk_omp scope around
cp_parser_omp_var_list_no_open call.
* parser.h (cp_parser): Add omp_array_section_p field.
* pt.cc (tsubst, tsubst_copy, tsubst_omp_clause_decl,
tsubst_copy_and_build): Add OMP_ARRAY_SECTION support.
* semantics.cc (handle_omp_array_sections_1, handle_omp_array_sections,
cp_oacc_check_attachments, finish_omp_clauses): Use OMP_ARRAY_SECTION
instead of TREE_LIST where appropriate. Handle more types of map
expression.
* typeck.cc (build_omp_array_section): New function.
gcc/
* gimplify.cc (gimplify_expr): Ensure OMP_ARRAY_SECTION has been
processed out before gimplification.
* tree-pretty-print.cc (dump_generic_node): Support OMP_ARRAY_SECTION.
* tree.def (OMP_ARRAY_SECTION): New tree code.
gcc/testsuite/
* c-c++-common/gomp/map-6.c: Update expected output.
* c-c++-common/gomp/target-enter-data-1.c: Update scan test.
* g++.dg/gomp/array-section-1.C: New test.
* g++.dg/gomp/array-section-2.C: New test.
* g++.dg/gomp/bad-array-section-1.C: New test.
* g++.dg/gomp/bad-array-section-2.C: New test.
* g++.dg/gomp/bad-array-section-3.C: New test.
* g++.dg/gomp/bad-array-section-4.C: New test.
* g++.dg/gomp/bad-array-section-5.C: New test.
* g++.dg/gomp/bad-array-section-6.C: New test.
* g++.dg/gomp/bad-array-section-7.C: New test.
* g++.dg/gomp/bad-array-section-8.C: New test.
* g++.dg/gomp/bad-array-section-9.C: New test.
* g++.dg/gomp/bad-array-section-10.C: New test.
* g++.dg/gomp/bad-array-section-11.C: New test.
* g++.dg/gomp/has_device_addr-non-lvalue-1.C: New test.
* g++.dg/gomp/pr67522.C: Update expected output.
* g++.dg/gomp/ind-base-3.C: New test.
* g++.dg/gomp/map-assignment-1.C: New test.
* g++.dg/gomp/map-inc-1.C: New test.
* g++.dg/gomp/map-lvalue-ref-1.C: New test.
* g++.dg/gomp/map-ptrmem-1.C: New test.
* g++.dg/gomp/map-ptrmem-2.C: New test.
* g++.dg/gomp/map-static-cast-lvalue-1.C: New test.
* g++.dg/gomp/map-ternary-1.C: New test.
* g++.dg/gomp/member-array-2.C: New test.
libgomp/
* testsuite/libgomp.c++/baseptrs-4.C: Remove commented-out cases that
now work.
* testsuite/libgomp.c++/baseptrs-6.C: New test.
* testsuite/libgomp.c++/ind-base-1.C: New test.
* testsuite/libgomp.c++/ind-base-2.C: New test.
* testsuite/libgomp.c++/lvalue-tofrom-1.C: New test.
* testsuite/libgomp.c++/lvalue-tofrom-2.C: New test.
* testsuite/libgomp.c++/map-comma-1.C: New test.
* testsuite/libgomp.c++/map-rvalue-ref-1.C: New test.
* testsuite/libgomp.c++/struct-ref-1.C: New test.
* testsuite/libgomp.c-c++-common/array-field-1.c: New test.
* testsuite/libgomp.c-c++-common/array-of-struct-1.c: New test.
* testsuite/libgomp.c-c++-common/array-of-struct-2.c: New test.
|
|
The problem occurs when this function call is the expression of a return in
a function returning the limited interface; in this peculiar case, there is
a mismatch between the callee, which has BIP formals but is not a BIP call,
and the caller, which is a BIP function, that is spotted by an assertion.
This is fixed by restoring the semantics of Is_Build_In_Place_Function_Call,
which returns again true only for calls to BIP functions, introducing the
Is_Function_Call_With_BIP_Formals predicate, which also returns true for
calls to functions with BIP formals that are not BIP functions, and moving
down the assertion in Expand_Simple_Function_Return.
gcc/ada/
PR ada/112781
* exp_ch6.ads (Is_Build_In_Place_Function): Adjust description.
* exp_ch6.adb (Is_True_Build_In_Place_Function_Call): Delete.
(Is_Function_Call_With_BIP_Formals): New predicate.
(Is_Build_In_Place_Function_Call): Restore original semantics.
(Expand_Call_Helper): Adjust conditions guarding the calls to
Add_Dummy_Build_In_Place_Actuals to above renaming.
(Expand_N_Extended_Return_Statement): Adjust to above renaming.
(Expand_Simple_Function_Return): Likewise. Move the assertion
to after the transformation into an extended return statement.
(Make_Build_In_Place_Call_In_Allocator): Remove unreachable code.
(Make_Build_In_Place_Call_In_Assignment): Likewise.
gcc/testsuite/
* gnat.dg/bip_prim_func2.adb: New test.
* gnat.dg/bip_prim_func2_pkg.ads, gnat.dg/bip_prim_func2_pkg.adb:
New helper package.
|
|
This is a regression present on the mainline and 13 branch, in the form of a
series of internal errors (3) on a function call returning the extension of
a limited interface.
This is only a partial fix for the first two assertion failures; the third
one is the most problematic and will be dealt with separately.
The first issue is in Instantiate_Type, where we use Base_Type in a specific
case to compute the ancestor of a derived type, which will later trigger the
assertion on line 16960 of sem_ch3.adb since Parent_Base and Generic_Actual
are the same node. This is changed to use Etype like in other cases around.
The second issue is an unprotected use of Designated_Type on type T in
Analyze_Explicit_Dereference, while another use in an equivalent context
is guarded by Is_Access_Type a few lines above.
gcc/ada
PR ada/112781
* sem_ch12.adb (Instantiate_Type): Use Etype instead of Base_Type
consistently to retrieve the ancestor for a derived type.
* sem_ch4.adb (Analyze_Explicit_Dereference): Test Is_Access_Type
consistently before accessing Designated_Type.
|
|
[PR113210]
On the following testcase e.g. on riscv64 or aarch64 (latter with
-O3 -march=armv8-a+sve ) we ICE, because while NITERS is INTEGER_CST,
NITERSM1 is a complex expression like
(short unsigned int) (a.0_1 + 255) + 1 > 256 ? ~(short unsigned int) (a.0_1 + 255) : 0
where a.0_1 is unsigned char. The condition is never true, so the above
is equivalent to just 0, but only when trying to fold the above with
PLUS_EXPR 1 we manage to simplify it (first
~(short unsigned int) (a.0_1 + 255)
to
-(short unsigned int) (a.0_1 + 255)
and then
(short unsigned int) (a.0_1 + 255) + 1 > 256 ? -(short unsigned int) (a.0_1 + 255) : 1
to
(short unsigned int) (a.0_1 + 255) >= 256 ? -(short unsigned int) (a.0_1 + 255) : 1
and only at this point we fold the condition to be false.
But the vectorizer seems to assume that if NITERS is known (i.e. suitable
INTEGER_CST) then NITERSM1 also is, so the following hack ensures that if
NITERS folds into INTEGER_CST NITERSM1 will be one as well.
2024-01-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113210
* tree-vect-loop.cc (vect_get_loop_niters): If non-INTEGER_CST
value in *number_of_iterationsm1 PLUS_EXPR 1 is folded into
INTEGER_CST, recompute *number_of_iterationsm1 as the INTEGER_CST
minus 1.
* gcc.c-torture/compile/pr113210.c: New test.
|
|
This is a small regression present on the mainline and 13 branch, in the
form of an internal error in gigi on anonymous access type equality. We
now need to also accept them for anonymous access types that point to
compatible object subtypes in the language sense.
gcc/ada/
* gcc-interface/utils2.cc (build_binary_op) <EQ_EXPR>: Relax
assertion for regular pointer types.
gcc/testsuite/
* gnat.dg/specs/anon4.ads: New test.
|
|
This is a small regression present on the mainline and 13 branch, although
the underlying problem has probably been there for ages, in the form of a
segfault during the delay slot scheduling pass, for a function that falls
through to exit without any instruction generated for the end of function.
gcc/
PR rtl-optimization/113140
* reorg.cc (fill_slots_from_thread): If we are to branch after the
last instruction of the function, create an end label.
gcc/testsuite/
* g++.dg/opt/delay-slot-2.C: New test.
|
|
The issue addressed by this patch is that when initializing vectors by
broadcasting integer constants, the compiler has the flexibility to
select the most appropriate vector mode to perform the broadcast, as
long as the resulting vector has an identical bit pattern.
For example, the following constants are all equivalent:
V4SImode {0x01010101, 0x01010101, 0x01010101, 0x01010101 }
V8HImode {0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101 }
V16QImode {0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, ... 0x01 }
So instruction sequences that construct any of these can be used to
construct the others (with a suitable cast/SUBREG).
On x86_64, it turns out that broadcasts of SImode constants are preferred,
as DImode constants often require a longer movabs instruction, and
HImode and QImode broadcasts require multiple uops on some architectures.
Hence, SImode is always the equal shortest/fastest implementation.
Examples of this improvement, can be seen in the testsuite.
gcc.target/i386/pr102021.c
Before:
0: 48 b8 0c 00 0c 00 0c movabs $0xc000c000c000c,%rax
7: 00 0c 00
a: 62 f2 fd 28 7c c0 vpbroadcastq %rax,%ymm0
10: c3 retq
After:
0: b8 0c 00 0c 00 mov $0xc000c,%eax
5: 62 f2 7d 28 7c c0 vpbroadcastd %eax,%ymm0
b: c3 retq
and
gcc.target/i386/pr90773-17.c:
Before:
0: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 7 <foo+0x7>
7: b8 0c 00 00 00 mov $0xc,%eax
c: 62 f2 7d 08 7a c0 vpbroadcastb %eax,%xmm0
12: 62 f1 7f 08 7f 02 vmovdqu8 %xmm0,(%rdx)
18: c7 42 0f 0c 0c 0c 0c movl $0xc0c0c0c,0xf(%rdx)
1f: c3 retq
After:
0: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 7 <foo+0x7>
7: b8 0c 0c 0c 0c mov $0xc0c0c0c,%eax
c: 62 f2 7d 08 7c c0 vpbroadcastd %eax,%xmm0
12: 62 f1 7f 08 7f 02 vmovdqu8 %xmm0,(%rdx)
18: c7 42 0f 0c 0c 0c 0c movl $0xc0c0c0c,0xf(%rdx)
1f: c3 retq
where according to Agner Fog's instruction tables broadcastd is slightly
faster on some microarchitectures, for example Knight's Landing.
2024-01-09 Roger Sayle <roger@nextmovesoftware.com>
Hongtao Liu <hongtao.liu@intel.com>
gcc/ChangeLog
PR target/112992
* config/i386/i386-expand.cc
(ix86_convert_const_wide_int_to_broadcast): Allow call to
ix86_expand_vector_init_duplicate to fail, and return NULL_RTX.
(ix86_broadcast_from_constant): Revert recent change; Return a
suitable MEMREF independently of mode/target combinations.
(ix86_expand_vector_move): Allow ix86_expand_vector_init_duplicate
to decide whether expansion is possible/preferrable. Only try
forcing DImode constants to memory (and trying again) if calling
ix86_expand_vector_init_duplicate fails with an DImode immediate
constant.
(ix86_expand_vector_init_duplicate) <case E_V2DImode>: Try using
V4SImode for suitable immediate constants.
<case E_V4DImode>: Try using V8SImode for suitable constants.
<case E_V4HImode>: Fail for CONST_INT_P, i.e. use constant pool.
<case E_V2HImode>: Likewise.
<case E_V8HImode>: For CONST_INT_P try using V4SImode via widen.
<case E_V16QImode>: For CONT_INT_P try using V8HImode via widen.
<label widen>: Handle CONT_INTs via simplify_binary_operation.
Allow recursive calls to ix86_expand_vector_init_duplicate to fail.
<case E_V16HImode>: For CONST_INT_P try V8SImode via widen.
<case E_V32QImode>: For CONST_INT_P try V16HImode via widen.
(ix86_expand_vector_init): Move try using a broadcast for all_same
with ix86_expand_vector_init_duplicate before using constant pool.
gcc/testsuite/ChangeLog
* gcc.target/i386/auto-init-8.c: Update test case.
* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Likewise.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512fp16-13.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/pr100865-1.c: Likewise.
* gcc.target/i386/pr100865-10a.c: Likewise.
* gcc.target/i386/pr100865-10b.c: Likewise.
* gcc.target/i386/pr100865-2.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr100865-4a.c: Likewise.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-5a.c: Likewise.
* gcc.target/i386/pr100865-5b.c: Likewise.
* gcc.target/i386/pr100865-9a.c: Likewise.
* gcc.target/i386/pr100865-9b.c: Likewise.
* gcc.target/i386/pr102021.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
|
|
Signed-off-by: Chung-Ju Wu <jasonwucj@gmail.com>
gcc/ChangeLog:
* doc/invoke.texi (Arm Options): Document Cortex-M52 options.
|
|
This patch adds the -mcpu support for the Arm Cortex-M52 CPU which is
an Armv8.1-M Mainline CPU supporting MVE and PACBTI by default.
-mcpu=cortex-m52 switch by default matches to -march=armv8.1-m.main+pacbti+mve.fp+fp.dp.
The cde feature is supported by specifying +cdecpN (e.g. -mcpu=cortex-m52+cdecp<N>),
where N is the coprocessor number 0 to 7.
Also following options are provided to disable default features.
+nomve.fp (disables MVE Floating point)
+nomve (disables MVE Integer and MVE Floating point)
+nodsp (disables dsp, MVE Integer and MVE Floating point)
+nopacbti (disables pacbti)
+nofp (disables floating point and MVE floating point)
Signed-off-by: Chung-Ju Wu <jasonwucj@gmail.com>
gcc/ChangeLog:
* config/arm/arm-cpus.in (cortex-m52): New cpu.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
|
|
After commit 01f4251b8775c832a92d55e2df57c9ac72eaceef, early break
vectorization is supported. The two testcases need to be fixed.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-xorsign-1.c: Fix testcase.
* gcc.target/i386/part-vect-absneghf.c: Ditto.
|
|
This patch implements more vec_init optabs that can handle two LSX vectors producing a LASX
vector by concatenating them. When an lsx vector is concatenated with an LSX const_vector of
zeroes, the vec_concatz pattern can be used effectively. For example as below
typedef short v8hi __attribute__ ((vector_size (16)));
typedef short v16hi __attribute__ ((vector_size (32)));
v8hi a, b;
v16hi vec_initv16hiv8hi ()
{
return __builtin_shufflevector (a, b, 0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15);
}
Before this patch:
vec_initv16hiv8hi:
addi.d $r3,$r3,-64
.cfi_def_cfa_offset 64
xvrepli.h $xr0,0
la.local $r12,.LANCHOR0
xvst $xr0,$r3,0
xvst $xr0,$r3,32
vld $vr0,$r12,0
vst $vr0,$r3,0
vld $vr0,$r12,16
vst $vr0,$r3,32
xvld $xr1,$r3,32
xvld $xr2,$r3,32
xvld $xr0,$r3,0
xvilvh.h $xr0,$xr1,$xr0
xvld $xr1,$r3,0
xvilvl.h $xr1,$xr2,$xr1
addi.d $r3,$r3,64
.cfi_def_cfa_offset 0
xvpermi.q $xr0,$xr1,32
jr $r1
After this patch:
vec_initv16hiv8hi:
la.local $r12,.LANCHOR0
vld $vr0,$r12,32
vld $vr2,$r12,48
xvilvh.h $xr1,$xr2,$xr0
xvilvl.h $xr0,$xr2,$xr0
xvpermi.q $xr1,$xr0,32
xvst $xr1,$r4,0
jr $r1
gcc/ChangeLog:
* config/loongarch/lasx.md (vec_initv32qiv16qi): Rename to ..
(vec_init<mode><lasxhalf>): .. this, and extend to mode.
(@vec_concatz<mode>): New insn pattern.
* config/loongarch/loongarch.cc (loongarch_expand_vector_group_init):
Handle VALS containing two vectors.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-vec-init-2.c: New test.
|
|
We have supported segment load/store intrinsics.
Committed as it is obvious.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-functions.def (vleff): Move comments.
(vundefined): Ditto.
|
|
Patch v8: Resubmit after fix the rtl-checking issue. Passed all the riscv regression test.
Patch v7: Add newline at the end of file.
Patch v6: Move intrinsic tests into rvv/base.
Patch v5: Rebase
Patch v4: Add some RV32 vx constraint testcase.
Patch v3: Refine crypto vector api-testing cases.
Patch v2: Update march info according to the change of riscv-common.c
This patch add crypto vector api-testing cases based on
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/vector-crypto
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zvbb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c: New test.
* gcc.target/riscv/rvv/base/zvbc-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c: New test.
* gcc.target/riscv/rvv/base/zvkg-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvkned-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknha-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknhb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksed-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksh-intrinsic.c: New test.
* gcc.target/riscv/zvkb.c: New test.
|
|
This patch add the intrinsic funtions of crypto vector based on the
intrinsic doc(https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob
/eopc/vector-crypto/auto-generated/vector-crypto/intrinsic_funcs.md).
Co-Authored by: Songhe Zhu <zhusonghe@eswincomputing.com>
Co-Authored by: Ciyan Pan <panciyan@eswincomputing.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc (class vandn):
Add new function_base for crypto vector.
(class bitmanip): Ditto.
(class b_reverse):Ditto.
(class vwsll): Ditto.
(class clmul): Ditto.
(class vg_nhab): Ditto.
(class crypto_vv):Ditto.
(class crypto_vi):Ditto.
(class vaeskf2_vsm3c):Ditto.
(class vsm3me): Ditto.
(BASE): Add BASE declaration for crypto vector.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (REQUIRED_EXTENSIONS):
Add crypto vector intrinsic definition.
(vbrev): Ditto.
(vclz): Ditto.
(vctz): Ditto.
(vwsll): Ditto.
(vandn): Ditto.
(vbrev8): Ditto.
(vrev8): Ditto.
(vrol): Ditto.
(vror): Ditto.
(vclmul): Ditto.
(vclmulh): Ditto.
(vghsh): Ditto.
(vgmul): Ditto.
(vaesef): Ditto.
(vaesem): Ditto.
(vaesdf): Ditto.
(vaesdm): Ditto.
(vaesz): Ditto.
(vaeskf1): Ditto.
(vaeskf2): Ditto.
(vsha2ms): Ditto.
(vsha2ch): Ditto.
(vsha2cl): Ditto.
(vsm4k): Ditto.
(vsm4r): Ditto.
(vsm3me): Ditto.
(vsm3c): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def):
Add new function_shape for crypto vector.
(struct crypto_vi_def): Ditto.
(struct crypto_vv_no_op_type_def): Ditto.
(SHAPE): Add SHAPE declaration of crypto vector.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data type for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(vuint32mf2_t): Ditto.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
(vuint64m1_t): Ditto.
(vuint64m2_t): Ditto.
(vuint64m4_t): Ditto.
(vuint64m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data struct for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(registered_function::overloaded_hash): Processing size_t uimm for C overloaded func.
* config/riscv/riscv-vector-builtins.def (vi): Add vi OP_TYPE.
|
|
|
|
2024-01-08 John David Anglin <danglin@gcc.gnu.org>
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ssa-sink-18.c: xfail dg-final "Sunk statements: 5"
on hppa*64*-*-*.
|
|
hppa*-*-hpux* doesn't have any long double trig functions.
2024-01-08 John David Anglin <danglin@gcc.gnu.org>
gcc/testsuite/ChangeLog:
* gfortran.dg/dec_math.f90: Skip on hppa*-*-hpux*.
|
|
Commit 6271dd98 changed the default from -fcommon to -fno-common.
This silently changed the alignment of uninitialized BSS data on
hppa where the alignment of common data must be greater or equal
to the alignment of the largest type that will fit in the block.
For example, the alignment of `double d[2];' changed from 16 to 8
on hppa64.
The hppa architecture requires strict alignment and the linker
warns about inconsistent alignment of variables. This change broke
the gfortran.dg/bind_c_coms.f90 and gfortran.dg/bind_c_vars.f90
tests. These tests check whether bind_c works between fortran
and C.
Adding the -fcommon option fixes the tests. Probably, gcc and HP
C are now by default inconsistent but that's water under the bridge.
2024-01-08 John David Anglin <danglin@gcc.gnu.org>
gcc/testsuite/ChangeLog:
PR testsuite/94253
* gfortran.dg/bind_c_coms.f90: Add -fcommon option on hppa*-*-*.
* gfortran.dg/bind_c_vars.f90: Likewise.
|
|
Using ASAN on i686-linux with -fPIC causes an ICE, because when
pc_thunks are generated, there is no current function anymore, but
asan_function_start () expects one.
Fix by not calling asan_function_start () without one.
A narrower fix would be to temporarily disable ASAN around pc_thunk
generation. However, the issue looks generic enough, and may affect
less often tested configurations, so go for a broader fix.
Fixes: e66dc37b299c ("asan: Align .LASANPC on function boundary")
Suggested-by: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
gcc/ChangeLog:
PR sanitizer/113251
* varasm.cc (assemble_function_label_raw): Do not call
asan_function_start () without the current function.
|
|
This patch fix a problem with kernel_helper attribute BTF information,
which incorrectly generates BTF_KIND_FUNC entry.
This BTF entry although accurate with traditional extern function
declarations, once the function is attributed with kernel_helper, it is
semantically incompatible of the kernel helpers in BPF infrastructure.
gcc/ChangeLog:
PR target/113225
* btfout.cc (btf_collect_datasec): Skip creating BTF info for
extern and kernel_helper attributed function decls.
gcc/testsuite/ChangeLog:
* gcc.target/bpf/attr-kernel-helper.c: New test.
|
|
When using -dA, this function was only printing as comment btf_string or
btf_aux_string.
This patch changes the comment to also include the position of the
string within the section in hexadecimal format.
gcc/ChangeLog:
* btfout.cc (output_btf_strs): Changed.
|
|
gcc/fortran/ChangeLog:
PR fortran/113245
* trans-intrinsic.cc (gfc_conv_intrinsic_size): Use
gfc_conv_expr_present() for proper check of optional DIM argument.
gcc/testsuite/ChangeLog:
PR fortran/113245
* gfortran.dg/size_optional_dim_2.f90: New test.
|
|
Commit r14-6997-g78dff4c25c1b95 added an arch-dependent
SET_XNACK_OFF vs. SET_XNACK_ANY check; that was added
between writing and committing the add-gfx1100
commit r14-7005-g52a2c659ae6c21 - and I missed to add
it there.
gcc/ChangeLog:
* config/gcn/mkoffload.cc (main): Handle gfx1100
when setting the default XNACK.
|
|
ROCm since 5.7.1 supports gfx1100 (RDNA3) cards. This commit adds support
for it, mostly by assuming gfx1100 behaves identical to gfx1030. Like gfx1030,
gfx1100 support is neither documented nor the build of the multilib enabled by
default.
But contrary to gfx1030, gfx1100 has a known issue causing some libraries not
to build, including newlib: The sdwa variant of v_mov_b32_sdwa is not supported
by the hardware but GCC current does generates this instruction.
This will be addressed in a later commit.
gcc/ChangeLog:
* config.gcc (amdgcn-*-amdhsa): Accept --with-arch=gfx1100.
* config/gcn/gcn-hsa.h (NO_XNACK): Add gfx1100:
(ASM_SPEC): Handle gfx1100.
* config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX1100.
(enum gcn_isa): Add ISA_RDNA3.
(TARGET_GFX1100, TARGET_RDNA2_PLUS, TARGET_RDNA3): Define.
* config/gcn/gcn-valu.md: Change TARGET_RDNA2 to TARGET_RDNA2_PLUS.
* config/gcn/gcn.cc (gcn_option_override,
gcn_omp_device_kind_arch_isa, output_file_start): Handle gfx1100.
(gcn_global_address_p, gcn_addr_space_legitimate_address_p): Change
TARGET_RDNA2 to TARGET_RDNA2_PLUS.
(gcn_hsa_declare_function_name): Don't use '.amdhsa_reserve_flat_scratch'
with gfx1100.
* config/gcn/gcn.h (ASSEMBLER_DIALECT): Likewise.
(TARGET_CPU_CPP_BUILTINS): Define __RDNA3__, __gfx1030__ and
__gfx1100__.
* config/gcn/gcn.md: Change TARGET_RDNA2 to TARGET_RDNA2_PLUS.
* config/gcn/gcn.opt (Enum gpu_type): Add gfx1100.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1100): Define.
(isa_has_combined_avgprs, main): Handle gfx1100.
* config/gcn/t-omp-device (isa): Add gfx1100.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (gcn_gfx1100_s): New const string.
(gcn_isa_name_len): Fix length.
(isa_hsa_name, isa_code, max_isa_vgprs): Handle gfx1100.
|
|
It was noticed that -mmovbe doesn't use movbe for __builtin_bswap{32,64}
when not optimizing. The follownig adjusts the documentation to
say it will be used for optimizing and applies to all byte swaps,
not just those carried out via builtin function calls.
* doc/invoke.texi (-mmovbe): Clarify.
|
|
The following avoids creating a niter peeling epilog more consistently,
matching what peeling later uses for the skip_vector condition, in
particular when versioning is required which then also ensures the
vector loop is entered unless the epilog is vectorized. This should
ideally match LOOP_VINFO_VERSIONING_THRESHOLD which is only computed
later, some refactoring could make that better matching.
The patch also makes sure to adjust the upper bound of the epilogues
when we do not have a skip edge around the vector loop.
PR tree-optimization/113026
* tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
Avoid an epilog in more cases.
* tree-vect-loop-manip.cc (vect_do_peeling): Adjust the
epilogues niter upper bounds and estimates.
* gcc.dg/torture/pr113026-1.c: New testcase.
* gcc.dg/torture/pr113026-2.c: Likewise.
|
|
The following testcase ICEs during regimplificatgion since the addition of
(convert (eqne zero_one_valued_p@0 INTEGER_CST@1))
simplification. That simplification is novel in the sense that in
gimplify_expr it can turn an expression (comparison in particular) into
a SSA_NAME. Normally when gimplify_expr sees originally a SSA_NAME, it does
case SSA_NAME:
/* Allow callbacks into the gimplifier during optimization. */
ret = GS_ALL_DONE;
break;
and doesn't try to recalculate side effects because of that, but in this
case gimplify_expr normally enters the:
default:
switch (TREE_CODE_CLASS (TREE_CODE (*expr_p)))
{
case tcc_comparison:
then does
*expr_p = gimple_boolify (*expr_p);
and then
*expr_p = fold_convert_loc (input_location,
org_type, *expr_p);
with this new match.pd simplification turns that tcc_comparison class
into SSA_NAME. Unlike the outer SSA_NAME handling though, this falls
through into
recalculate_side_effects (*expr_p);
dont_recalculate:
break;
but unfortunately recalculate_side_effects doesn't handle SSA_NAME and ICEs
on it.
SSA_NAMEs don't ever have TREE_SIDE_EFFECTS set on those, so the following
patch fixes it by handling it similarly to the tcc_constant case.
2024-01-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113228
* gimplify.cc (recalculate_side_effects): Do nothing for SSA_NAMEs.
* gcc.c-torture/compile/pr113228.c: New test.
|
|
The PHI argument expansion of INTEGER_CSTs where bitint_min_cst_precision
returns significantly smaller precision than the PHI result precision is
optimized by loading the much smaller constant (if any) from memory and
then either setting the remaining limbs to {} or calling memset with -1.
The case where no constant is loaded (i.e. c == NULL) is when the
INTEGER_CST is 0 or all_ones - in that case we can just set all the limbs
to {} or call memset with -1 on everything.
While for the all ones extension case that is what the code was already
doing, I missed one spot in the zero extension case, where constricting
the offset of the MEM_REF lhs of the = {} store it was using unconditionally
the byte size of c, which obviously doesn't work if c is NULL. In that case
we want to use zero offset.
2024-01-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113120
* gimple-lower-bitint.cc (gimple_lower_bitint): Fix handling of very
large _BitInt zero INTEGER_CST PHI argument.
* gcc.dg/bitint-62.c: New test.
|