aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-06-14syscall: gofmtIan Lance Taylor43-7/+53
Add blank lines after //sys comments where needed, and then run gofmt on the syscall package with the new formatter. This is the libgo version of CL 407136. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/412074
2022-06-14libstdc++: Check lengths first in operator== for basic_string [PR62187]Jonathan Wakely1-14/+10
As confirmed by LWG 2852, the calls to traits_type::compare do not need to be obsvervable, so we can make operator== compare string lengths first and return immediately for non-equal lengths. This avoids doing a slow string comparison for "abc...xyz" == "abc...xy". Previously we only did this optimization for std::char_traits<char>, but we can enable it unconditionally thanks to LWG 2852. For comparisons with a const char* we can call traits_type::length right away to do the same optimization. That strlen call can be folded away for constant arguments, making it very efficient. For the pre-C++20 operator== and operator!= overloads we can swap the order of the arguments to take advantage of the operator== improvements. libstdc++-v3/ChangeLog: PR libstdc++/62187 * include/bits/basic_string.h (operator==): Always compare lengths before checking string contents. [!__cpp_lib_three_way_comparison] (operator==, operator!=): Reorder arguments.
2022-06-14libstdc++: Inline all basic_string::compare overloads [PR59048]Jonathan Wakely4-95/+123
Defining the compare member functions inline allows calls to traits_type::length and std::min to be inlined, taking advantage of constant expression arguments. When not inline, the compiler prefers to use the explicit instantiation definitions in libstdc++.so and can't take advantage of constant arguments. libstdc++-v3/ChangeLog: PR libstdc++/59048 * include/bits/basic_string.h (compare): Define inline. * include/bits/basic_string.tcc (compare): Remove out-of-line definitions. * include/bits/cow_string.h (compare): Define inline. * testsuite/21_strings/basic_string/operations/compare/char/3.cc: New test.
2022-06-14libstdc++: Fix indentation in allocator base classesJonathan Wakely2-6/+6
libstdc++-v3/ChangeLog: * include/bits/new_allocator.h: Fix indentation. * include/ext/malloc_allocator.h: Likewise.
2022-06-14libstdc++: Check for size overflow in constexpr allocation [PR105957]Jonathan Wakely2-1/+24
libstdc++-v3/ChangeLog: PR libstdc++/105957 * include/bits/allocator.h (allocator::allocate): Check for overflow in constexpr allocation. * testsuite/20_util/allocator/105975.cc: New test.
2022-06-14regrename: Fix -fcompare-debug issue in check_new_reg_p [PR105041]Surya Kumari Jangala2-2/+23
In check_new_reg_p, the nregs of a du chain is computed by obtaining the MODE of the first element in the chain, and then calling hard_regno_nregs() with the MODE. But the first element of the chain can be a DEBUG_INSN whose mode need not be the same as the rest of the elements in the du chain. This was resulting in fcompare-debug failure as check_new_reg_p was returning a different result with -g for the same candidate register. We can instead obtain nregs from the du chain itself. 2022-06-10 Surya Kumari Jangala <jskumari@linux.ibm.com> gcc/ PR rtl-optimization/105041 * regrename.cc (check_new_reg_p): Use nregs value from du chain. gcc/testsuite/ PR rtl-optimization/105041 * gcc.target/powerpc/pr105041.c: New test.
2022-06-14rs6000: Delete VS_scalarSegher Boessenkool1-75/+66
It is just the same as VEC_base, which is a more generic name. 2022-06-14 Segher Boessenkool <segher@kernel.crashing.org> * config/rs6000/vsx.md (VS_scalar): Delete. (rest of file): Adjust.
2022-06-14c++: Elide calls to NOP module initializersNathan Sidwell6-32/+60
gcc/cp * cp-tree.h (fini_modules): Add has_inits parm. * decl2.cc (c_parse_final_cleanups): Check for inits, adjust fini_modules flags. * module.cc (module_state): Rename call_init_p to active_init_p. (module_state::write_config): Write active_init. (module_state::read_config): Read it. (module_determine_import_inits): Clear active_init_p of covered inits. (late_finish_module): Add has_init parm. Record it. (fini_modules): Adjust. gcc/testsuite/ * g++.dg/modules/init-2_a.C: Adjust. * g++.dg/modules/init-2_c.C: Adjust. * g++.dg/modules/init-2_d.C: New.
2022-06-14Fix ipa-cp wrt volatile loadsJan Hubicka2-0/+34
Check for volatile flag to ipa_load_from_parm_agg. gcc/ChangeLog: 2022-06-10 Jan Hubicka <hubicka@ucw.cz> PR ipa/105739 * ipa-prop.cc (ipa_load_from_parm_agg): Punt on volatile loads. gcc/testsuite/ChangeLog: 2022-06-10 Jan Hubicka <hubicka@ucw.cz> * gcc.dg/ipa/pr105739.c: New test.
2022-06-14RISC-V: Split slli+sh[123]add.uw opportunities to avoid zext.wPhilipp Tomsich2-0/+57
When encountering a prescaled (biased) value as a candidate for sh[123]add.uw, the combine pass will present this as shifted by the aggregate amount (prescale + shift-amount) with an appropriately adjusted mask constant that has fewer than 32 bits set. E.g., here's the failing expression seen in combine for a prescale of 1 and a shift of 2 (note how 0x3fffffff8 >> 3 is 0x7fffffff). Trying 7, 8 -> 10: 7: r78:SI=r81:DI#0<<0x1 REG_DEAD r81:DI 8: r79:DI=zero_extend(r78:SI) REG_DEAD r78:SI 10: r80:DI=r79:DI<<0x2+r82:DI REG_DEAD r79:DI REG_DEAD r82:DI Failed to match this instruction: (set (reg:DI 80 [ cD.1491 ]) (plus:DI (and:DI (ashift:DI (reg:DI 81) (const_int 3 [0x3])) (const_int 17179869176 [0x3fffffff8])) (reg:DI 82))) To address this, we introduce a splitter handling these cases. Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-developed-by: Manolis Tsamis <manolis.tsamis@vrull.eu> gcc/ChangeLog: * config/riscv/bitmanip.md: Add split to handle opportunities for slli + sh[123]add.uw gcc/testsuite/ChangeLog: * gcc.target/riscv/zba-shadd.c: New test.
2022-06-14RISC-V: add consecutive_bits_operand predicatePhilipp Tomsich1-0/+11
Provide an easy way to constrain for constants that are a a single, consecutive run of ones. gcc/ChangeLog: * config/riscv/predicates.md (consecutive_bits_operand): Implement new predicate. Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
2022-06-14tree-optimization/105946 - avoid accessing excess args from uninit diagRichard Biener1-0/+3
uninit diagnostics uses passing via reference and access attributes but that iterates over function type arguments which can in some cases appearantly outrun the actual arguments leading to ICEs. The following simply ignores not present arguments. 2022-06-14 Richard Biener <rguenther@suse.de> PR tree-optimization/105946 * tree-ssa-uninit.cc (maybe_warn_pass_by_reference): Do not look at arguments not specified in the function call.
2022-06-14middle-end/105965 - add missing v_c_e <{ el }> simplificationRichard Biener2-4/+25
When we got the simplification of bit-field-ref to view-convert we lost the ability to detect FMAs since we cannot look through _1 = {_10}; _11 = VIEW_CONVERT_EXPR<float>(_1); the following amends the (view_convert CONSTRUCTOR) pattern to handle this case. 2022-06-14 Richard Biener <rguenther@suse.de> PR middle-end/105965 * match.pd (view_convert CONSTRUCTOR): Handle single-element CTOR case. * gcc.target/i386/pr105965.c: New testcase.
2022-06-14Restore bootstrap on ARMEric Botcazou2-2/+21
The -Wuse-after-free warning is explicitly disabled for destructors on ARM because of the special ABI and the previous change to the warning machinery uncovered another case where the warning data would be incorrectly erased. gcc/ * warning-control.cc (copy_warning) [generic version]: Do not erase the warning data of the destination location when the no-warning bit is not set on the source. (copy_warning) [tree version]: Return early if TO is equal to FROM. (copy_warning) [gimple version]: Likewise. gcc/testsuite/ * g++.dg/warn/Wuse-after-free5.C: New test.
2022-06-14vect: Move suggested_unroll_factor applying [PR105940]Kewen Lin1-3/+3
As PR105940 shown, when rs6000 port tries to assign m_suggested_unroll_factor by 4 or so, there will be ICE on: exact_div (LOOP_VINFO_VECT_FACTOR (loop_vinfo), loop_vinfo->suggested_unroll_factor); In function vect_analyze_loop_2, the current place of suggested_unroll_factor applying can't guarantee it's applied for all cases. As the case shows, vectorizer could retry with SLP forced off, the vf is reset by saved_vectorization_factor which isn't applied with suggested_unroll_factor before. It means it can end up with one vf which neglects suggested_unroll_factor. I think it's off design, we should move the applying of suggested_unroll_factor after start_over. PR tree-optimization/105940 gcc/ChangeLog: * tree-vect-loop.cc (vect_analyze_loop_2): Move the place of applying suggested_unroll_factor after start_over.
2022-06-13xtensa: Optimize bitwise AND operation with some specific forms of constantsTakayuki 'January June' Suwa2-0/+189
This patch offers several insn-and-split patterns for bitwise AND with register and constant that can be represented as: i. 1's least significant N bits and the others 0's (17 <= N <= 31) ii. 1's most significant N bits and the others 0's (12 <= N <= 31) iii. M 1's sequence of bits and trailing N 0's bits, that cannot fit into a "MOVI Ax, simm12" instruction (1 <= M <= 16, 1 <= N <= 30) And also offers shortcuts for conditional branch if each of the abovementioned operations is (not) equal to zero. gcc/ChangeLog: * config/xtensa/predicates.md (shifted_mask_operand): New predicate. * config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one): New insn-and-split pattern. (*andsi3_const_negative_pow2, *andsi3_const_shifted_mask, *masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2, *masktrue_const_shifted_mask): Ditto.
2022-06-13xtensa: Make use of BALL/BNALL instructionsTakayuki 'January June' Suwa2-0/+54
In Xtensa ISA, there is no single machine instruction that calculates unary bitwise negation, but a few similar fused instructions are exist: "BALL Ax, Ay, label" // if ((~Ax & Ay) == 0) goto label; "BNALL Ax, Ay, label" // if ((~Ax & Ay) != 0) goto label; These instructions have never been emitted before, but it seems no reason not to make use of them. gcc/ChangeLog: * config/xtensa/xtensa.md (*masktrue_bitcmpl): New insn pattern. gcc/testsuite/ChangeLog: * gcc.target/xtensa/BALL-BNALL.c: New.
2022-06-13xtensa: Simplify conditional branch/move insn patternsTakayuki 'January June' Suwa3-161/+70
No need to describe the "false side" conditional insn patterns anymore. gcc/ChangeLog: * config/xtensa/xtensa-protos.h (xtensa_emit_branch): Remove the first argument. (xtensa_emit_bit_branch): Remove it because now called only from the output statement of *bittrue insn pattern. * config/xtensa/xtensa.cc (gen_int_relational): Remove the last argument 'p_invert', and make so that the condition is reversed by itself as needed. (xtensa_expand_conditional_branch): Share the common path, and remove condition inversion code. (xtensa_emit_branch, xtensa_emit_movcc): Simplify by removing the "false side" pattern. (xtensa_emit_bit_branch): Remove it because of the abovementioned reason, and move the function body to *bittrue insn pattern. * config/xtensa/xtensa.md (*bittrue): Transplant the output statement from removed xtensa_emit_bit_branch(). (*bfalse, *ubfalse, *bitfalse, *maskfalse): Remove the "false side" insn patterns.
2022-06-13xtensa: Improve shift operations moreTakayuki 'January June' Suwa5-38/+213
This patch introduces funnel shifter utilization, and rearranges existing "per-byte shift" insn patterns. gcc/ChangeLog: * config/xtensa/predicates.md (logical_shift_operator, xtensa_shift_per_byte_operator): New predicates. * config/xtensa/xtensa-protos.h (xtensa_shlrd_which_direction): New prototype. * config/xtensa/xtensa.cc (xtensa_shlrd_which_direction): New helper function for funnel shift patterns. * config/xtensa/xtensa.md (ior_op): New code iterator. (*ashlsi3_1): Replace with new split pattern. (*shift_per_byte): Unify *ashlsi3_3x, *ashrsi3_3x and *lshrsi3_3x. (*shift_per_byte_omit_AND_0, *shift_per_byte_omit_AND_1): New insn-and-split patterns that redirect to *xtensa_shift_per_byte, in order to omit unnecessary bitwise AND operation. (*shlrd_reg_<code>, *shlrd_const_<code>, *shlrd_per_byte_<code>, *shlrd_per_byte_<code>_omit_AND): New insn patterns for funnel shifts. gcc/testsuite/ChangeLog: * gcc.target/xtensa/funnel_shifter.c: New.
2022-06-14Daily bump.GCC Administrator10-1/+245
2022-06-14libphobos: Check in missing core.sync package moduleIain Buclaw1-0/+20
This was meant to be part of r13-1062 in the merge with upstream druntime 454471d8.
2022-06-13ubsan: -Wreturn-type and ubsan trap-on-errorJason Merrill3-3/+15
I noticed that -fsanitize=undefined -fsanitize-undefined-trap-on-error was omitting the usual -Wreturn-type warning for control flowing off the end of a function. This was because the warning code was looking for calls either to __builtin_unreachable or the UBSan function, but these flags produce a call to __builtin_trap instead. gcc/c-family/ChangeLog: * c-ubsan.cc (ubsan_instrument_return): Use BUILTINS_LOCATION. gcc/ChangeLog: * tree-cfg.cc (pass_warn_function_return::execute): Also check BUILT_IN_TRAP. gcc/testsuite/ChangeLog: * g++.dg/ubsan/return-8.C: New test.
2022-06-13RISC-V: Reset the length to the default of 4 for FP comparisonsMaciej W. Rozycki1-2/+0
The default length for floating-point compare operations is overridden to 8, however the FEQ.fmt, FLT.fmt, FLE.fmt machine instructions and FGE.fmt, FGT.fmt assembly idioms the relevant RTL insns produce are all 4 bytes long each. And all the floating-point compare RTL insns that produce multiple machine instructions explicitly set their lengths. Remove the override then, letting the default of 4 apply for the single instruction case. gcc/ * config/riscv/riscv.md (length): Remove the explicit setting for "fcmp".
2022-06-13x86: Require AVX for F16C and VAESH.J. Lu1-4/+4
Since F16C and VAES are only usable with AVX, require AVX for F16C and VAES. libgcc/105920 * common/config/i386/cpuinfo.h (get_available_features): Require AVX for F16C and VAES.
2022-06-13libstdc++: Rename __null_terminated to avoid collision with Apple SDKMark Mentovai1-6/+6
The macOS 13 SDK (and equivalent-version iOS and other Apple OS SDKs) contain this definition in <sys/cdefs.h>: 863 #define __null_terminated This collides with the use of __null_terminated in libstdc++'s experimental fs_path.h. As libstdc++'s use of this token is entirely internal to fs_path.h, the simplest workaround, renaming it, is most appropriate. Here, it's renamed to __nul_terminated, referencing the NUL ('\0') value that is used to terminate the strings in the context in which this tag structure is used. libstdc++-v3/ChangeLog: * include/experimental/bits/fs_path.h (__detail::__null_terminated): Rename to __nul_terminated to avoid colliding with a macro in Apple's SDK. Signed-off-by: Mark Mentovai <mark@mentovai.com>
2022-06-13libstdc++: Use type_identity_t for non-deducible std::atomic_xxx argsJonathan Wakely2-1/+16
This is LWG 3220 which is about to become Tentatively Ready. libstdc++-v3/ChangeLog: * include/std/atomic (__atomic_val_t): Use __type_identity_t instead of atomic<T>::value_type, as per LWG 3220. * testsuite/29_atomics/atomic/lwg3220.cc: New test.
2022-06-13i386: Return true for (SUBREG (MEM....)) in register_no_elim_operand [PR105927]Uros Bizjak2-0/+25
Under certain conditions register_operand predicate also allows subregs of memory operands. When RTL checking is enabled, these will fail with REGNO (op). Allow subregs of memory operands, these are guaranteed to be reloaded to a register. 2022-06-13 Uroš Bizjak <ubizjak@gmail.com> gcc/ChangeLog: PR target/105927 * config/i386/predicates.md (register_no_elim_operand): Return true for subreg of a memory operand. gcc/testsuite/ChangeLog: PR target/105927 * gcc.target/i386/pr105927.c: New test.
2022-06-13d: Match function declarations of gcc built-ins from any module.Iain Buclaw5-31/+203
Declarations of recognised gcc built-in functions are now matched from any module. Previously, only the `core.stdc' package was scanned. In addition to matching of the symbol, any user-applied `@attributes' or `pragma(mangle)' name will be applied to the built-in decl as well. Because there would now be no control over where built-in declarations are coming from, the warning option `-Wbuiltin-declaration-mismatch' has been implemented in the D front-end too. gcc/d/ChangeLog: * d-builtins.cc: Include builtins.h. (gcc_builtins_libfuncs): Remove. (strip_type_modifiers): New function. (matches_builtin_type): New function. (covariant_with_builtin_type_p): New function. (maybe_set_builtin_1): Set front-end built-in if identifier matches gcc built-in name. Apply user-specified attributes and assembler name overrides to the built-in. Warn about built-in declaration mismatches. (d_builtin_function): Set IDENTIFIER_DECL_TREE of built-in functions. * d-compiler.cc (Compiler::onParseModule): Scan all modules for any identifiers that match built-in function names. * lang.opt (Wbuiltin-declaration-mismatch): New option. gcc/testsuite/ChangeLog: * gdc.dg/Wbuiltin_declaration_mismatch.d: New test. * gdc.dg/builtins.d: New test.
2022-06-13Add a general mapping from internal fns to target insnsRichard Sandiford3-119/+87
Several existing internal functions map directly to an instruction defined in target-insns.def. This patch makes it easier to define more such functions in future. This should help to reduce cut-&-paste, but more importantly, it allows the difference between optab functions and target-insns.def functions to be abstracted away; both are now treated as “directly-mapped”. gcc/ * internal-fn.def (DEF_INTERNAL_INSN_FN): New macro. (GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT, GOMP_SIMT_LANE) (GOMP_SIMT_LAST_LANE, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_VOTE_ANY) (GOMP_SIMT_XCHG_BFLY, GOMP_SIMT_XCHG_IDX): Use it. * internal-fn.h (direct_internal_fn_info::directly_mapped): New member variable. (direct_internal_fn_info::vectorizable): Reduce to 1 bit. (direct_internal_fn_p): Also return true for internal functions that map directly to instructions defined target-insns.def. (direct_internal_fn): Adjust comment accordingly. * internal-fn.cc (direct_insn, optab1, optab2, vectorizable_optab1) (vectorizable_optab2): New local macros. (not_direct): Initialize directly_mapped. (mask_load_direct, load_lanes_direct, mask_load_lanes_direct) (gather_load_direct, len_load_direct, mask_store_direct) (store_lanes_direct, mask_store_lanes_direct, vec_cond_mask_direct) (vec_cond_direct, scatter_store_direct, len_store_direct) (vec_set_direct, unary_direct, binary_direct, ternary_direct) (cond_unary_direct, cond_binary_direct, cond_ternary_direct) (while_direct, fold_extract_direct, fold_left_direct) (mask_fold_left_direct, check_ptrs_direct): Use the macros above. (expand_GOMP_SIMT_ENTER_ALLOC, expand_GOMP_SIMT_EXIT): Delete (expand_GOMP_SIMT_LANE, expand_GOMP_SIMT_LAST_LANE): Likewise; (expand_GOMP_SIMT_ORDERED_PRED, expand_GOMP_SIMT_VOTE_ANY): Likewise. (expand_GOMP_SIMT_XCHG_BFLY, expand_GOMP_SIMT_XCHG_IDX): Likewise. (direct_internal_fn_types): Handle functions that map to instructions defined in target-insns.def. (direct_internal_fn_types): Likewise. (direct_internal_fn_supported_p): Likewise. (internal_fn_expanders): Likewise.
2022-06-13Factor out common internal-fn idiomRichard Sandiford1-154/+89
internal-fn.c has quite a few functions that simply map the result of the call to an instruction's output operand (if any) and map each argument to an instruction's input operand, in order. This patch adds a single function for doing that. It's really just a generalisation of expand_direct_optab_fn, but with the output operand being optional. Unfortunately, it isn't possible to do this for vcond_mask because the internal function has a different argument order from the optab. gcc/ * internal-fn.cc (expand_fn_using_insn): New function, split out and adapted from... (expand_direct_optab_fn): ...here. (expand_GOMP_SIMT_ENTER_ALLOC): Use it. (expand_GOMP_SIMT_EXIT): Likewise. (expand_GOMP_SIMT_LANE): Likewise. (expand_GOMP_SIMT_LAST_LANE): Likewise. (expand_GOMP_SIMT_ORDERED_PRED): Likewise. (expand_GOMP_SIMT_VOTE_ANY): Likewise. (expand_GOMP_SIMT_XCHG_BFLY): Likewise. (expand_GOMP_SIMT_XCHG_IDX): Likewise.
2022-06-13d: Improve TypeInfo errors when compiling in -fno-rtti modeIain Buclaw4-30/+63
The existing TypeInfo errors can be cryptic. This alters the diagnostic to include which expression is requiring `object.TypeInfo'. gcc/d/ChangeLog: * d-tree.h (check_typeinfo_type): Add Expression* parameter. (build_typeinfo): Likewise. Declare new override. * expr.cc (ExprVisitor): Call build_typeinfo with Expression*. * typeinfo.cc (check_typeinfo_type): Include expression in the diagnostic message. (build_typeinfo): New override. gcc/testsuite/ChangeLog: * gdc.dg/rtti1.d: New test.
2022-06-13openmp: Conforming device numbers and omp_{initial,invalid}_deviceJakub Jelinek14-88/+223
OpenMP 5.2 changed once more what device numbers are allowed. In 5.1, valid device numbers were [0, omp_get_num_devices()]. 5.2 makes also -1 valid (calls it omp_initial_device), which is equivalent in behavior to omp_get_num_devices() number but has the advantage that it is a constant. And it also introduces omp_invalid_device which is also a constant with implementation defined value < -1. That value should act like sNaN, any time any device construct (GOMP_target*) or OpenMP runtime API routine is asked for such a device, the program is terminated. And if OMP_TARGET_OFFLOAD=mandatory, all non-conforming device numbers (which is all but [-1, omp_get_num_devices()] other than omp_invalid_device) must be treated like omp_invalid_device. For device constructs, we have a compatibility problem, we've historically used 2 magic negative values to mean something special. GOMP_DEVICE_ICV (-1) means device clause wasn't present, pick the omp_get_default_device () number GOMP_DEVICE_FALLBACK (-2) means the host device (this is used e.g. for #pragma omp target if (cond) where if cond is false, we pass -2 But 5.2 requires that omp_initial_device is -1 (there were discussions about it, advantage of -1 is that one can say iterate over the [-1, omp_get_num_devices()-1] range to get all devices starting with the host/initial one. And also, if user passes -2, unless it is omp_invalid_device, we need to treat it like non-conforming with OMP_TARGET_OFFLOAD=mandatory. So, the patch does on the compiler side some number remapping, user_device_num >= -2U ? user_device_num - 1 : user_device_num. This remapping is done at compile time if device clause has constant argument, otherwise at runtime, and means that for user -1 (omp_initial_device) we pass -2 to GOMP_* in the runtime library where it treats it like host fallback, while -2 is remapped to -3 (one of the non-conforming device numbers, for those it doesn't matter which one is which). omp_invalid_device is then -4. For the OpenMP device runtime APIs, no remapping is done. This patch doesn't deal with the initial default-device-var for OMP_TARGET_OFFLOAD=mandatory , the spec says that the inital ICV value for that should in that case depend on whether there are any offloading devices or not (if not, should be omp_invalid_device), but that means we can't determine the number of devices lazily (and let libraries have the possibility to register their offloading data etc.). 2022-06-13 Jakub Jelinek <jakub@redhat.com> gcc/ * omp-expand.cc (expand_omp_target): Remap user provided device clause arguments, -1 to -2 and -2 to -3, either at compile time if constant, or at runtime. include/ * gomp-constants.h (GOMP_DEVICE_INVALID): Define. libgomp/ * omp.h.in (omp_initial_device, omp_invalid_device): New enumerators. * omp_lib.f90.in (omp_initial_device, omp_invalid_device): New parameters. * omp_lib.h.in (omp_initial_device, omp_invalid_device): Likewise. * target.c (resolve_device): Add remapped argument, handle GOMP_DEVICE_ICV only if remapped is true (and clear remapped), for negative values, treat GOMP_DEVICE_FALLBACK as fallback only if remapped, otherwise treat omp_initial_device that way. For omp_invalid_device, always emit gomp_fatal, even when OMP_TARGET_OFFLOAD isn't mandatory. (GOMP_target, GOMP_target_ext, GOMP_target_data, GOMP_target_data_ext, GOMP_target_update, GOMP_target_update_ext, GOMP_target_enter_exit_data): Pass true as remapped argument to resolve_device. (omp_target_alloc, omp_target_free, omp_target_is_present, omp_target_memcpy_check, omp_target_associate_ptr, omp_target_disassociate_ptr, omp_get_mapped_ptr, omp_target_is_accessible): Pass false as remapped argument to resolve_device. Treat omp_initial_device the same as gomp_get_num_devices (). Don't bypass resolve_device calls if device_num is negative. (omp_pause_resource): Treat omp_initial_device the same as gomp_get_num_devices (). Call resolve_device. * icv-device.c (omp_set_default_device): Always set to device_num even when it is negative. * libgomp.texi: Document that Conforming device numbers, omp_initial_device and omp_invalid_device is implemented. * testsuite/libgomp.c/target-41.c (main): Add test with omp_initial_device. * testsuite/libgomp.c/target-45.c: New test. * testsuite/libgomp.c/target-46.c: New test. * testsuite/libgomp.c/target-47.c: New test. * testsuite/libgomp.c-c++-common/target-is-accessible-1.c (main): Add test with omp_initial_device. Use -5 instead of -1 for negative value test. * testsuite/libgomp.fortran/target-is-accessible-1.f90 (main): Likewise. Reorder stop numbers.
2022-06-13Introduce -finstrument-functions-onceEric Botcazou4-35/+133
The goal is to make it possible to use it in (large) production binaries to do function-level coverage, so the overhead must be minimum and, in particular, there is no protection against data races so the "once" moniker is imprecise. gcc/ * common.opt (finstrument-functions): Set explicit value. (-finstrument-functions-once): New option. * doc/invoke.texi (Program Instrumentation Options): Document it. * gimplify.cc (build_instrumentation_call): New static function. (gimplify_function_tree): Call it to emit the instrumentation calls if -finstrument-functions[-once] is specified. gcc/testsuite/ * gcc.dg/instrument-4.c: New test.
2022-06-13Do not erase warning data in gimple_set_locationEric Botcazou4-7/+6
gimple_set_location is mostly invoked on newly built GIMPLE statements, so their location is UNKNOWN_LOCATION and setting it will clobber the warning data of the passed location, if any. gcc/ * dwarf2out.cc (output_one_line_info_table): Initialize prev_addr. * gimple.h (gimple_set_location): Do not copy warning data from the previous location when it is UNKNOWN_LOCATION. * optabs.cc (expand_widen_pattern_expr): Always set oprnd{1,2}. gcc/testsuite/ * c-c++-common/nonnull-1.c: Remove XFAIL for C++.
2022-06-13c++: Separate late stage module writingNathan Sidwell1-17/+30
This moves some module writing into a newly added write_end function, which is called after writing initializers. gcc/cp/ * module.cc (module_state::write): Separate to ... (module_state::write_begin, module_state::write_end): ... these. (module_state::write_readme): Drop extensions parameter. (struct module_processing_cookie): Add more fields. (finish_module_processing): Adjust state writing call. (late_finish_module): Call write_end.
2022-06-13d: Merge upstream dmd 821ed393d, druntime 454471d8, phobos 1206fc94f.Iain Buclaw62-712/+1029
D front-end changes: - Import latest bug fixes to mainline. D runtime changes: - Fix duplicate Elf64_Dyn definitions on Solaris. - _d_newThrowable has been converted to a template. Phobos changes: - Import latest bug fixes to mainline. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd 821ed393d. * expr.cc (ExprVisitor::visit (NewExp *)): Remove handled of allocating `@nogc' throwable object. * runtime.def (NEWTHROW): Remove. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime 454471d8. * libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add core/sync/package.d. * libdruntime/Makefile.in: Regenerate. * src/MERGE: Merge upstream phobos 1206fc94f.
2022-06-13i386: Fix up *<dwi>3_doubleword_mask [PR105911]Jakub Jelinek2-2/+20
Another regression caused by my recent patch. This time because define_insn_and_split only requires that the constant mask is const_int_operand. When it was only SImode, that wasn't a problem, HImode neither, but for DImode if we need to and the shift count we might run into a problem that it isn't a representable signed 32-bit immediate. But, we don't really care about the upper bits of the mask, so we can just mask the CONST_INT with the mode mask. 2022-06-13 Jakub Jelinek <jakub@redhat.com> PR target/105911 * config/i386/i386.md (*ashl<dwi>3_doubleword_mask, *<insn><dwi>3_doubleword_mask): Use operands[3] masked with (<MODE_SIZE> * BITS_PER_UNIT) - 1 as AND operand instead of operands[3] unmodified. * gcc.dg/pr105911.c: New test.
2022-06-13testsuite: Add -mtune=generic to dg-options for two testcases.Cui,Lili2-2/+2
Use -mtune=generic to limit these two test cases. Because configuring them with -mtune=cascadelake or znver3 will vectorize them. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c: Add -mtune=generic to dg-options. * gcc.target/i386/pr84101.c: Likewise.
2022-06-13Daily bump.GCC Administrator3-1/+42
2022-06-12Darwin: Truncate kernel-provided version to OS major for Darwin >= 20.Simon Wright1-11/+5
In common with system tools, GCC uses a version obtained from the kernel as the prevailing macOS target, when that is not overridden by command line or environment versions (i.e. mmacosx-version-min=, MACOSX_DEPLOYMENT_TARGET). Presently, GCC assumes that if the OS version is >= 20, the value used should include both major and minium version identifiers. However the system tools (for those versions) truncate the value to the major version - this leads to link errors when combining objects built with clang and GCC for example: ld: warning: object file (null.o) was built for newer macOS version (12.2) than being linked (12.0) The change here truncates the values GCC uses to the major version. gcc/ChangeLog: PR target/104871 * config/darwin-driver.cc (darwin_find_version_from_kernel): If the OS version is darwin20 (macOS 11) or greater, truncate the version to the major number.
2022-06-12Darwin: Future-proof -mmacosx-version-minMark Mentovai1-1/+2
f18cbc1ee1f4 (2021-12-18) updated various parts of gcc to not impose a Darwin or macOS version maximum of the current known release. Different parts of gcc accept, variously, Darwin version numbers matching darwin2*, and macOS major version numbers up to 99. The current released version is Darwin 21 and macOS 12, with Darwin 22 and macOS 13 expected for public release later this year. With one major OS release per year, this strategy is expected to provide another 8 years of headroom. However, f18cbc1ee1f4 missed config/darwin-c.c (now .cc), which continued to impose a maximum of macOS 12 on the -mmacosx-version-min compiler driver argument. This was last updated from 11 to 12 in 11b967577483 (2021-10-27), but kicking the can down the road one year at a time is not a viable strategy, and is not in line with the more recent technique from f18cbc1ee1f4. Prior to 556ab5125912 (2020-11-06), config/darwin-c.c did not impose a maximum that needed annual maintenance, as at that point, all macOS releases had used a major version of 10. The stricter approach imposed since then was valuable for a time until the particulars of the new versioning scheme were established and understood, but now that they are, it's prudent to restore a more permissive approach. gcc/ChangeLog: * config/darwin-c.cc: Make -mmacosx-version-min more future-proof. Signed-off-by: Mark Mentovai <mark@mentovai.com>
2022-06-11gcc: xtensa: fix pr95571 test for call0 ABIMax Filippov1-0/+6
gcc/testsuite/ * g++.target/xtensa/pr95571.C (__xtensa_libgcc_window_spill): New definition.
2022-06-12PR96463: Optimise svld1rq from vectors for little endian AArch64 targets.Prathamesh Kulkarni5-40/+212
The patch folds: lhs = svld1rq({-1, -1, ...}, rhs) into: tmp = mem_ref<vectype> [(elem_type * {ref-all}) rhs] lhs = vec_perm_expr<tmp, tmp, {0, 1, 2, 3 ...}>. which is then expanded using aarch64_expand_sve_dupq. Example: svint32_t foo (int32x4_t x) { return svld1rq (svptrue_b8 (), &x[0]); } code-gen: foo: .LFB4350: dup z0.q, z0.q[0] ret The patch relaxes type-checking for VEC_PERM_EXPR by allowing different vector types for lhs and rhs provided: (1) rhs3 is constant and has integer type element. (2) len(lhs) == len(rhs3) and len(rhs1) == len(rhs2) (3) lhs and rhs have same element type. gcc/ChangeLog: PR target/96463 * config/aarch64/aarch64-sve-builtins-base.cc: Include ssa.h. (svld1rq_impl::fold): Define. * config/aarch64/aarch64.cc (expand_vec_perm_d): Define new members op_mode and op_vec_flags. (aarch64_evpc_reencode): Initialize newd.op_mode and newd.op_vec_flags. (aarch64_evpc_sve_dup): New function. (aarch64_expand_vec_perm_const_1): Gate existing calls to aarch64_evpc_* functions under d->vmode == d->op_mode, and call aarch64_evpc_sve_dup. (aarch64_vectorize_vec_perm_const): Remove assert d->vmode != d->op_mode, and initialize d.op_mode and d.op_vec_flags. * tree-cfg.cc (verify_gimple_assign_ternary): Allow different vector types for lhs and rhs in VEC_PERM_EXPR if rhs3 is constant. gcc/testsuite/ChangeLog: PR target/96463 * gcc.target/aarch64/sve/acle/general/pr96463-1.c: New test. * gcc.target/aarch64/sve/acle/general/pr96463-2.c: Likewise.
2022-06-12Daily bump.GCC Administrator3-1/+57
2022-06-11xtensa: Improve constant synthesis for both integer and floating-pointTakayuki 'January June' Suwa6-16/+247
This patch revises the previous implementation of constant synthesis. First, changed to use define_split machine description pattern and to run after reload pass, in order not to interfere some optimizations such as the loop invariant motion. Second, not only integer but floating-point is subject to processing. Third, several new synthesis patterns - when the constant cannot fit into a "MOVI Ax, simm12" instruction, but: I. can be represented as a power of two minus one (eg. 32767, 65535 or 0x7fffffffUL) => "MOVI(.N) Ax, -1" + "SRLI Ax, Ax, 1 ... 31" (or "EXTUI") II. is between -34816 and 34559 => "MOVI(.N) Ax, -2048 ... 2047" + "ADDMI Ax, Ax, -32768 ... 32512" III. (existing case) can fit into a signed 12-bit if the trailing zero bits are stripped => "MOVI(.N) Ax, -2048 ... 2047" + "SLLI Ax, Ax, 1 ... 31" The above sequences consist of 5 or 6 bytes and have latency of 2 clock cycles, in contrast with "L32R Ax, <litpool>" (3 bytes and one clock latency, but may suffer additional one clock pipeline stall and implementation-specific InstRAM/ROM access penalty) plus 4 bytes of constant value. In addition, 3-instructions synthesis patterns (8 or 9 bytes, 3 clock latency) are also provided when optimizing for speed and L32R instruction has considerable access penalty: IV. 2-instructions synthesis (any of I ... III) followed by "SLLI Ax, Ax, 1 ... 31" V. 2-instructions synthesis followed by either "ADDX[248] Ax, Ax, Ax" or "SUBX8 Ax, Ax, Ax" (multiplying by 3, 5, 7 or 9) gcc/ChangeLog: * config/xtensa/xtensa-protos.h (xtensa_constantsynth): New prototype. * config/xtensa/xtensa.cc (xtensa_emit_constantsynth, xtensa_constantsynth_2insn, xtensa_constantsynth_rtx_SLLI, xtensa_constantsynth_rtx_ADDSUBX, xtensa_constantsynth): New backend functions that process the abovementioned logic. (xtensa_emit_move_sequence): Revert the previous changes. * config/xtensa/xtensa.md: New split patterns for integer and floating-point, as the frontend part. gcc/testsuite/ChangeLog: * gcc.target/xtensa/constsynth_2insns.c: New. * gcc.target/xtensa/constsynth_3insns.c: Ditto. * gcc.target/xtensa/constsynth_double.c: Ditto.
2022-06-11xtensa: Improve instruction cost estimation and suggestionTakayuki 'January June' Suwa3-15/+134
This patch implements a new target-specific relative RTL insn cost function because of suboptimal cost estimation by default, and fixes several "length" insn attributes (related to the cost estimation). And also introduces a new machine-dependent option "-mextra-l32r-costs=" that tells implementation-specific InstRAM/ROM access penalty for L32R instruction to the compiler (in clock-cycle units, 0 by default). gcc/ChangeLog: * config/xtensa/xtensa.cc (xtensa_rtx_costs): Correct wrong case for ABS and NEG, add missing case for BSWAP and CLRSB, and double the costs for integer divisions using libfuncs if optimizing for speed, in order to take advantage of fast constant division by multiplication. (TARGET_INSN_COST): New macro definition. (xtensa_is_insn_L32R_p, xtensa_insn_cost): New functions for calculating relative costs of a RTL insns, for both of speed and size. * config/xtensa/xtensa.md (return, nop, trap): Correct values of the attribute "length" that depends on TARGET_DENSITY. (define_asm_attributes, blockage, frame_blockage): Add missing attributes. * config/xtensa/xtensa.opt (-mextra-l32r-costs=): New machine- dependent option, however, preparatory work for now.
2022-06-11xtensa: Consider the Loop Option when setmemsi is expanded to small loopTakayuki 'January June' Suwa1-21/+50
Now apply to almost any size of aligned block under such circumstances. gcc/ChangeLog: * config/xtensa/xtensa.cc (xtensa_expand_block_set_small_loop): Pass through the block length / loop count conditions if zero-overhead looping is configured and active,
2022-06-11xtensa: Tweak some widen multiplicationsTakayuki 'January June' Suwa1-24/+32
umulsidi3 is faster than umuldi3 even if library call, and is also prerequisite for fast constant division by multiplication. gcc/ChangeLog: * config/xtensa/xtensa.md (mulsidi3, umulsidi3): Split into individual signedness, in order to use libcall "__umulsidi3" but not the other. (<u>mulhisi3): Merge into one by using code iterator. (<u>mulsidi3, mulhisi3, umulhisi3): Remove.
2022-06-11Disable generating load/store vector pairs for block copies.Michael Meissner1-1/+4
Testing has found that using load and store vector pair for block copies can result in a slow down on power10. This patch disables using the vector pair instructions for block copies if we are tuning for power10. 2022-06-11 Michael Meissner <meissner@linux.ibm.com> gcc/ * config/rs6000/rs6000.cc (rs6000_option_override_internal): Do not generate block copies with vector pair instructions if we are tuning for power10.
2022-06-11Daily bump.GCC Administrator7-1/+163