riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-04-16	RISC-V: Put jump table in text for large code model	Kito Cheng	2	-1/+25
	Large code model assume the data or rodata may put far away from text section. So we need to put jump table in text section for large code model. gcc/ChangeLog: * config/riscv/riscv.h (JUMP_TABLES_IN_TEXT_SECTION): Check if large code model. gcc/testsuite/ChangeLog: * gcc.target/riscv/jump-table-large-code-model.c: New test.
2025-04-16	testsuite: Add testcase for already fixed PR [PR116093]	Jakub Jelinek	1	-0/+20
	This testcase got fixed with r15-9397 PR119722 fix. 2025-04-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/116093 * gcc.dg/bitint-122.c: New test.
2025-04-16	AArch64: Fix operands order in vec_extract expander	Tejas Belagod	1	-3/+3
	The operand order to gen_vcond_mask call in the vec_extract pattern is wrong. Fix the order where predicate is operand 3. Tested and bootstrapped on aarch64-linux-gnu. OK for trunk? gcc/ChangeLog * config/aarch64/aarch64-sve.md (vec_extract<vpred><Vel>): Fix operand order to gen_vcond_mask_*.
2025-04-16	aarch64: Disable sysreg feature gating	Alice Carlotti	2	-4/+13
	This applies to the sysreg read/write intrinsics __arm_[wr]sr. It does not depend on changes to Binutils, because GCC converts recognised sysreg names to an encoding based form, which is already ungated in Binutils. We have, however, agreed to make an equivalent change in Binutils (which would then disable feature gating for sysreg accesses in inline assembly), but this has not yet been posted upstream. In the future we may introduce a new flag to renable some checking, but these checks could not be comprehensive because many system registers depend on architecture features that don't have corresponding GCC/GAS --march options. This would also depend on addressing numerous inconsistencies in the existing list of sysreg feature dependencies. gcc/ChangeLog: config/aarch64/aarch64.cc (aarch64_valid_sysreg_name_p): Remove feature check. (aarch64_retrieve_sysreg): Ditto. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/rwsr-ungated.c: New test.
2025-04-16	Daily bump.	GCC Administrator	13	-1/+458

2025-04-16	d: Fix ICE: type variant differs by TYPE_MAX_VALUE with -g [PR119826]	Iain Buclaw	3	-0/+42
	Forward referenced enum types were never fixed up after the main ENUMERAL_TYPE was finished. All flags set are now propagated to all variants after its mode, size, and alignment has been calculated. PR d/119826 gcc/d/ChangeLog: * types.cc (TypeVisitor::visit (TypeEnum )): Propagate flags of main enum types to all forward-referenced variants. gcc/testsuite/ChangeLog: gdc.dg/debug/imports/pr119826b.d: New test. * gdc.dg/debug/pr119826.d: New test.
2025-04-16	c++: Prune lambda captures from more places [PR119755]	Nathaniel Shead	3	-0/+48
	Currently, pruned lambda captures are still leftover in the function's BLOCK and topmost BIND_EXPR; this doesn't cause any issues for normal compilation, but does break modules streaming as we try to reconstruct a FIELD_DECL that no longer exists on the type itself. PR c++/119755 gcc/cp/ChangeLog: * lambda.cc (prune_lambda_captures): Remove pruned capture from function's BLOCK_VARS and BIND_EXPR_VARS. gcc/testsuite/ChangeLog: * g++.dg/modules/lambda-10_a.H: New test. * g++.dg/modules/lambda-10_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
2025-04-16	testsuite: Fix up completion-2.c test	Jakub Jelinek	1	-0/+1
	The r15-9487 change has added -flto-partition=default, which broke the completion-2.c testcase because that case is now also printed during completion. 2025-04-16 Jakub Jelinek <jakub@redhat.com> * gcc.dg/completion-2.c: Expect also -flto-partition=default line.
2025-04-15	libgomp.texi (gcn, nvptx): Mention self_maps alongside USM	Tobias Burnus	1	-2/+2
	libgomp/ChangeLog: * libgomp.texi (gcn, nvptx): Mention self_maps clause besides unified_shared_memory in the requirements item.
2025-04-15	c: Fully fold each parameter for call to .ACCESS_WITH_SIZE [PR119717]	Qing Zhao	2	-2/+30
	C_MAYBE_CONST_EXPR is a C FE operator that will be removed by c_fully_fold. In c_fully_fold, it assumes that operands of function calls have already been folded. However, when we build call to .ACCESS_WITH_SIZE, all its operands are not fully folded. therefore the C FE specific operator is passed to middle-end. In order to fix this issue, fully fold the parameters before building the call to .ACCESS_WITH_SIZE. PR c/119717 gcc/c/ChangeLog: * c-typeck.cc (build_access_with_size_for_counted_by): Fully fold the parameters for call to .ACCESS_WITH_SIZE. gcc/testsuite/ChangeLog: * gcc.dg/pr119717.c: New test.
2025-04-15	OpenMP: omp.h omp::allocator C++ Allocator interface	waffl3x	3	-0/+422
	The implementation of each allocator is simplified by inheriting from __detail::__allocator_templ. At the moment, none of the implementations diverge in any way, simply passing in the allocator handle to be used when an allocation is made. In the future, const_mem will need special handling added to it to support constant memory space. libgomp/ChangeLog: * omp.h.in: Add omp::allocator::* and ompx::allocator::* allocators. (__detail::__allocator_templ<T, omp_allocator_handle_t>): New struct template. (null_allocator<T>): New struct template. (default_mem<T>): Likewise. (large_cap_mem<T>): Likewise. (const_mem<T>): Likewise. (high_bw_mem<T>): Likewise. (low_lat_mem<T>): Likewise. (cgroup_mem<T>): Likewise. (pteam_mem<T>): Likewise. (thread_mem<T>): Likewise. (ompx::allocator::gnu_pinned_mem<T>): Likewise. * testsuite/libgomp.c++/allocator-1.C: New test. * testsuite/libgomp.c++/allocator-2.C: New test. Signed-off-by: waffl3x <waffl3x@baylibre.com>
2025-04-15	x86: Update gcc.target/i386/apx-interrupt-1.c	H.J. Lu	1	-1/+1
	ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers pushed in red-zone. Since commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 13 12:20:42 2025 -0700 APX: Don't use red-zone with 32 GPRs and no caller-saved registers disabled red-zone, update gcc.target/i386/apx-interrupt-1.c to expect 31 .cfi_restore directives. PR target/119784 * gcc.target/i386/apx-interrupt-1.c: Expect 31 .cfi_restore directives. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-04-15	Docs: Address -fivopts, -O0, and -Q confusion [PR71094]	Sandra Loosemore	1	-0/+9
	There's a blurb at the top of the "Optimize Options" node telling people that most optimization options are completely disabled at -O0 and a similar blurb in the entry for -Og, but nothing at the entry for -O0. Since this is a continuing point of confusion it seems wise to duplicate the information in all the places users are likely to look for it. gcc/ChangeLog PR tree-optimization/71094 * doc/invoke.texi (Optimize Options): Document that -fivopts is enabled at -O1 and higher. Add blurb about -O0 causing GCC to completely ignore most optimization options.
2025-04-15	c++: constexpr, trivial, and non-alias target [PR111075]	Jason Merrill	1	-0/+3
	On Darwin and other targets with !can_alias_cdtor, we instead go to maybe_thunk_ctor, which builds a thunk function that calls the general constructor. And then cp_fold tries to constant-evaluate that call, and we ICE because we don't expect to ever be asked to constant-evaluate a call to a trivial function. No new test because this fixes g++.dg/torture/tail-padding1.C on affected targets. PR c++/111075 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_call_expression): Allow trivial call from a thunk.
2025-04-15	configure, Darwin: Recognise new naming for Xcode ld.	Iain Sandoe	2	-6/+8
	The latest editions of XCode have altered the identify reported by 'ld -v' (again). This means that GCC configure no longer detects the version. Fixed by adding the new name to the set checked. gcc/ChangeLog: * configure: Regenerate. * configure.ac: Recognise PROJECT:ld-mmmm.nn.aa as an identifier for Darwin's static linker. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2025-04-15	includes, Darwin: Handle modular use for macOS SDKs [PR116827].	Iain Sandoe	1	-0/+15
	Recent changes to the OS SDKs have altered the way in which include guards are used for a number of headers when C++ modules are enabled. Instead of placing the guards in the included header, they are being placed in the including header. This breaks the assumptions in the current GCC stddef.h specifically, that the presence of __PTRDIFF_T and __SIZE_T means that the relevant defs are already made. However in the case of the module-enabled C++ with these SDKs, that is no longer true. stddef.h has a large body of special-cases already, but it seems that the only viable solution here is to add a new one specifically for __APPLE__ and modular code. This fixes around 280 new fails in the modules test-suite; it is needed on all open branches that support modules. PR target/116827 gcc/ChangeLog: * ginclude/stddef.h: Undefine __PTRDIFF_T and __SIZE_T for module- enabled c++ on Darwin/macOS platforms. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2025-04-15	Regenerate common.opt.urls	Kyrylo Tkachov	1	-0/+3
	Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> * common.opt.urls: Regenerate.
2025-04-15	cobol/119302 - transform gcobol.3 name during install, install as gcobol-io.3	Richard Biener	1	-3/+4
	The following installs gcobol.3 as gcobol-io.3 and applies program-transform-name to the gcobol-io part. This follows naming of the pdf and the html variants. It also uses $(man1ext) and $(man3ext) consistently. PR cobol/119302 gcc/cobol/ * Make-lang.in (GCOBOLIO_INSTALL_NAME): Define. Use $(GCOBOLIO_INSTALL_NAME) for gcobol.3 manpage source upon install.
2025-04-15	Set znver5 issue rate to 4.	Jan Hubicka	1	-7/+8
	this patch sets issue rate of znver5 to 4. With current model, unless a reservation is missing, we will never issue more than 4 instructions per cycle since that is the limit of decoders and the model does not take into acount the fact that typically code is run from op cache. gcc/ChangeLog: * config/i386/x86-tune-sched.cc (ix86_issue_rate): Set to 4 for znver5.
2025-04-15	Set ADDSS cost to 3 for znver5	Jan Hubicka	1	-1/+1
	Znver5 has latency of addss 2 in typical case while all earlier versions has latency 3. Unforunately addss cost is used to cost many other SSE instructions than just addss and setting the cost to 2 makes us to vectorize 4 64bit stores into one 256bit store which in turn regesses imagemagick. This patch sets the cost back to 3. Next stage1 we can untie addss from the other operatoins and set it correctly. bootstrapped/regtested x86_64-linux and also benchmarked on SPEC2k17 gcc/ChangeLog: PR target/119298 * config/i386/x86-tune-costs.h (znver5_cost): Set ADDSS cost to 3.
2025-04-15	libstdc++: Do not define __cpp_lib_ranges_iota in <ranges>	Jonathan Wakely	1	-1/+0
	In r14-7153-gadbc46942aee75 we removed a duplicate definition of __glibcxx_want_range_iota from <ranges>, but __cpp_lib_ranges_iota should be defined in <ranges> at all. libstdc++-v3/ChangeLog: * include/std/ranges (__glibcxx_want_ranges_iota): Do not define.
2025-04-15	libstdc++: Do not declare namespace ranges in <numeric> unconditionally	Jonathan Wakely	2	-5/+7
	Move namespace ranges inside the feature test macro guard, because 'ranges' is not a reserved name before C++20. libstdc++-v3/ChangeLog: * include/std/numeric (ranges): Only declare namespace for C++23 and later. (ranges::iota_result): Fix indentation. * testsuite/17_intro/names.cc: Check ranges is not used as an identifier before C++20.
2025-04-15	RISC-V: vsetvl: elide abnormal edges from LCM computations [PR119533]	Vineet Gupta	3	-1/+168
	vsetvl phase4 uses LCM guided info to insert VSETVL insns, including a straggler loop for "mising vsetvls" on certain edges. Currently it asserts on encountering EDGE_ABNORMAL. When enabling go frontend with V enabled, libgo build hits the assert. The solution is to prevent abnormal edges from getting into LCM at all (my prior attempt at this just ignored them after LCM which is not right). Existing invalid_opt_bb_p () current does this for BB predecessors but not for successors which is what the patch adds. Crucially, the ICE/fix also depends on avoiding vsetvl hoisting past non-transparent blocks: That is taken care of by Robin's patch "RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547]" for a different yet related issue. Reported-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> PR target/119533 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (invalid_opt_bb_p): Check for EDGE_ABNOMAL. (pre_vsetvl::compute_lcm_local_properties): Initialize kill bitmap. Debug dump skipped edge. gcc/testsuite/ChangeLog: * go.dg/pr119533-riscv.go: New test. * go.dg/pr119533-riscv-2.go: New test.
2025-04-15	RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547].	Robin Dapp	5	-3/+315
	When lifting up a vsetvl into a block we currently don't consider the block's transparency with respect to the vsetvl as in other parts of the pass. This patch does not perform the lift when transparency is not guaranteed. This condition is more restrictive than necessary as we can still perform a vsetvl lift if the conflicting register is only every used in vsetvls and no regular insns but given how late we are in the GCC 15 cycle it seems better to defer this. Therefore gcc.target/riscv/rvv/vsetvl/avl_single-68.c is XFAILed for now. This issue was found in OpenCV where it manifests as a runtime error. Zhijin Zeng debugged PR119547 and provided an initial patch. Reported-By: 曾治金 <zhijin.zeng@spacemit.com> PR target/119547 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): Do not perform lift if block is not transparent. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/avl_single-68.c: xfail. * g++.target/riscv/rvv/autovec/pr119547.C: New test. * g++.target/riscv/rvv/autovec/pr119547-2.C: New test. * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-10.c: Adjust.
2025-04-15	libstdc++: Implement formatter for ranges and range_formatter [PR109162]	Tomasz Kamiński	9	-65/+1125
	This patch implements formatter specialization for input_ranges and range_formatter class from P2286R8, as adjusted by P2585R1. The formatter for pair/tuple is not yet provided, making maps not formattable. This introduces an new _M_format_range member to internal __formatter_str, that formats range as _CharT as string, according to the format spec. This function transform any contiguous range into basic_string_view directly, by computing size if necessary. Otherwise, for ranges for which size can be computed (forward_range or sized_range) we use a stack buffer, if they are sufficiently small. Finally, we create a basic_string<_CharT> from the range, and format its content. In case when padding is specified, this is handled by firstly formatting the content of the range to the temporary string object. However, this can be only implemented if the iterator of the basic_format_context is internal type-erased iterator used by implementation. Otherwise a new basic_format_context would need to be created, which would require rebinding of handles stored in the arguments: note that format spec for element type could retrieve any format argument from format context, visit and use handle to format it. As basic_format_context provide no user-facing constructor, the user are not able to construct object of that type with arbitrary iterators. The signatures of the user-facing parse and format methods of the provided formatters deviate from the standard by constraining types of params: * _CharT is constrained __formatter::__char * basic_format_parse_context<_CharT> for parse argument * basic_format_context<_Out, _CharT> for format second argument The standard specifies last three of above as unconstrained types. These types are later passed to possibly user-provided formatter specializations, that are required via formattable concept to only accept above types. Finally, the formatter<input_range, _CharT> specialization is implemented without using specialization of range-default-formatter exposition only template as base class, while providing same functionality. PR libstdc++/109162 libstdc++-v3/ChangeLog: * include/std/format (__format::__has_debug_format, _Pres_type::_Pres_seq) (_Pres_type::_Pres_str, __format::__Stackbuf_size): Define. (_Separators::_S_squares, _Separators::_S_parens, _Separators::_S_comma) (_Separators::_S_colon): Define additional constants. (_Spec::_M_parse_fill_and_align): Define overload accepting list of excluded characters for fill, and forward existing overload. (__formatter_str::_M_format_range): Define. (__format::_Buf_sink) Use __Stackbuf_size for size of array. (__format::__is_map_formattable, std::range_formatter) (std::formatter<_Rg, _CharT>): Define. * src/c++23/std.cc.in (std::format_kind, std::range_format) (std::range_formatter): Export. * testsuite/std/format/formatter/lwg3944.cc: Guarded tests with __glibcxx_format_ranges. * testsuite/std/format/formatter/requirements.cc: Adjusted for standard behavior. * testsuite/23_containers/vector/bool/format.cc: Test vector<bool> formatting. * testsuite/std/format/ranges/format_kind.cc: New test. * testsuite/std/format/ranges/formatter.cc: New test. * testsuite/std/format/ranges/sequence.cc: New test. * testsuite/std/format/ranges/string.cc: New test. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-04-15	libgcobol: mark riscv64--linux as supported target	Andreas Schwab	1	-0/+5
	* configure.tgt: Set LIBGCOBOL_SUPPORTED for riscv64--linux with 64-bit multilib.
2025-04-15	Fortran/OpenMP: Support automatic mapping allocatable components (deep mapping)	Tobias Burnus	21	-111/+3208
	When mapping an allocatable variable (or derived-type component), explicitly or implicitly, all its allocated allocatable components will automatically be mapped. The patch implements the target hooks, added for this feature to omp-low.cc with commit r15-3895-ge4a58b6f28383c. Namely, there is a check whether there are allocatable components at all: gfc_omp_deep_mapping_p. Then gfc_omp_deep_mapping_cnt, counting the number of required mappings; this is a dynamic value as it depends on array bounds and whether an allocatable is allocated or not. And, finally, the actual mapping: gfc_omp_deep_mapping. Polymorphic variables are partially supported: the mapping of the _data component is fully supported, but only components of the declared type are processed for additional allocatables. Additionally, _vptr is not touched. This means that everything needing _vtab information requires unified shared memory; in particular, _size data is required when accessing elements of polymorphic arrays. However, for scalar arrays, accessing components of the declare type should work just fine. As polymorphic variables are not (really) supported and OpenMP 6 explicitly disallows them, there is now a warning (-Wopenmp) when they are encountered. Unlimited polymorphics are rejected (error). Additionally, PRIVATE and FIRSTPRIVATE are not quite supported for allocatable components, polymorphic components and as polymorphic variable. Thus, those are now rejected as well. gcc/fortran/ChangeLog: * f95-lang.cc (LANG_HOOKS_OMP_DEEP_MAPPING, LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT): Define. * openmp.cc (gfc_match_omp_clause_reduction): Fix location setting. (resolve_omp_clauses): Permit allocatable components, reject them and polymorphic variables in PRIVATE/FIRSTPRIVATE. * trans-decl.cc (add_clause): Set clause location. * trans-openmp.cc (gfc_has_alloc_comps): Add ptr_ok and shallow_alloc_only Boolean arguments. (gfc_omp_replace_alloc_by_to_mapping): New. (gfc_omp_private_outer_ref, gfc_walk_alloc_comps, gfc_omp_clause_default_ctor, gfc_omp_clause_copy_ctor, gfc_omp_clause_assign_op, gfc_omp_clause_dtor): Update call to it. (gfc_omp_finish_clause): Minor cleanups, improve location data, handle allocatable components. (gfc_omp_deep_mapping_map, gfc_omp_deep_mapping_item, gfc_omp_deep_mapping_comps, gfc_omp_gen_simple_loop, gfc_omp_get_array_size, gfc_omp_elmental_loop, gfc_omp_deep_map_kind_p, gfc_omp_deep_mapping_int_p, gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_do, gfc_omp_deep_mapping_cnt, gfc_omp_deep_mapping): New. (gfc_trans_omp_array_section): Save array descriptor in case deep-mapping lang hook will need it. (gfc_trans_omp_clauses): Likewise; use better clause location data. * trans.h (gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_cnt, gfc_omp_deep_mapping): Add function prototypes. libgomp/ChangeLog: * libgomp.texi (5.0 Impl. Status): Mark mapping alloc comps as 'Y'. * testsuite/libgomp.fortran/allocatable-comp.f90: New test. * testsuite/libgomp.fortran/map-alloc-comp-3.f90: New test. * testsuite/libgomp.fortran/map-alloc-comp-4.f90: New test. * testsuite/libgomp.fortran/map-alloc-comp-5.f90: New test. * testsuite/libgomp.fortran/map-alloc-comp-6.f90: New test. * testsuite/libgomp.fortran/map-alloc-comp-7.f90: New test. * testsuite/libgomp.fortran/map-alloc-comp-8.f90: New test. * testsuite/libgomp.fortran/map-alloc-comp-9.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/map-alloc-comp-1.f90: Remove dg-error. * gfortran.dg/gomp/polymorphic-mapping-2.f90: Update warn wording. * gfortran.dg/gomp/polymorphic-mapping.f90: Change expected diagnostic; some tests moved to ... * gfortran.dg/gomp/polymorphic-mapping-1.f90: ... here as new test. * gfortran.dg/gomp/polymorphic-mapping-3.f90: New test. * gfortran.dg/gomp/polymorphic-mapping-4.f90: New test. * gfortran.dg/gomp/polymorphic-mapping-5.f90: New test.
2025-04-15	Locality cloning pass: -fipa-reorder-for-locality	Kyrylo Tkachov	18	-11/+1423
	Implement partitioning and cloning in the callgraph to help locality. A new -fipa-reorder-for-locality flag is used to enable this. The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc The optimization has two components: * Partitioning the callgraph so as to group callers and callees that frequently call each other in the same partition * Cloning functions that straddle multiple callchains and allowing each clone to be local to the partition of its callchain. The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc. It creates a partitioning plan and does the prerequisite cloning. The partitioning is then implemented during the existing LTO partitioning pass. To guide these locality heuristics we use PGO data. In the absence of PGO data we use a static heuristic that uses the accumulated estimated edge frequencies of the callees for each function to guide the reordering. We are investigating some more elaborate static heuristics, in particular using the demangled C++ names to group template instantiatios together. This is promising but we are working out some kinks in the implementation currently and want to send that out as a follow-up once we're more confident in it. A new bootstrap-lto-locality bootstrap config is added that allows us to test this on GCC itself with either static or PGO heuristics. GCC bootstraps with both (normal LTO bootstrap and profiledbootstrap). As this new pass enables a new partitioning scheme it is incompatible with explicit -flto-partition= options so an error is introduced when the user uses both flags explicitly. With this optimization we are seeing good performance gains on some large internal workloads that stress the parts of the processor that is sensitive to code locality, but we'd appreciate wider performance evaluation. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for mainline? Thanks, Kyrill Signed-off-by: Prachi Godbole <pgodbole@nvidia.com> Co-authored-by: Kyrylo Tkachov <ktkachov@nvidia.com> config/ChangeLog: * bootstrap-lto-locality.mk: New file. gcc/ChangeLog: * Makefile.in (OBJS): Add ipa-locality-cloning.o. * cgraph.h (set_new_clone_decl_and_node_flags): Declare prototype. * cgraphclones.cc (set_new_clone_decl_and_node_flags): Remove static qualifier. * common.opt (fipa-reorder-for-locality): New flag. (LTO_PARTITION_DEFAULT): Declare. (flto-partition): Change default to LTO_PARTITION_DFEAULT. * doc/invoke.texi: Document -fipa-reorder-for-locality. * flag-types.h (enum lto_locality_cloning_model): Declare. (lto_partitioning_model): Add LTO_PARTITION_DEFAULT. * lto-cgraph.cc (lto_set_symtab_encoder_in_partition): Add dumping of node and index. * opts.cc (validate_ipa_reorder_locality_lto_partition): Define. (finish_options): Handle LTO_PARTITION_DEFAULT. * params.opt (lto_locality_cloning_model): New enum. (lto-partition-locality-cloning): New param. (lto-partition-locality-frequency-cutoff): Likewise. (lto-partition-locality-size-cutoff): Likewise. (lto-max-locality-partition): Likewise. * passes.def: Register pass_ipa_locality_cloning. * timevar.def (TV_IPA_LC): New timevar. * tree-pass.h (make_pass_ipa_locality_cloning): Declare. * ipa-locality-cloning.cc: New file. * ipa-locality-cloning.h: New file. gcc/lto/ChangeLog: * lto-partition.cc (add_node_references_to_partition): Define. (create_partition): Likewise. (lto_locality_map): Likewise. (lto_promote_cross_file_statics): Add extra dumping. * lto-partition.h (lto_locality_map): Declare prototype. * lto.cc (do_whole_program_analysis): Handle flag_ipa_reorder_for_locality.
2025-04-15	ipa-bit-cp: Fix adjusting value according to mask (PR119803)	Martin Jambor	2	-3/+19
	In my fix for PR 119318 I put mask calculation in ipcp_bits_lattice::meet_with_1 above a final fix to value so that all the bits in the value which are meaningless according to mask have value zero, which has tripped a validator in PR 119803. This patch fixes that by moving the adjustment down. Even thought the fix for PR 119318 did a similar thing in ipcp_bits_lattice::meet_with, the same is not necessary because that code path then feeds the new value and mask to ipcp_bits_lattice::set_to_constant which does the final adjustment correctly. In both places, however, Jakup proposed a better way of calculating cap_mask and so I have changed it accordingly. gcc/ChangeLog: 2025-04-15 Martin Jambor <mjambor@suse.cz> PR ipa/119803 * ipa-cp.cc (ipcp_bits_lattice::meet_with_1): Move m_value adjustmed according to m_mask below the adjustment of the latter according to cap_mask. Optimize the calculation of cap_mask a bit. (ipcp_bits_lattice::meet_with): Optimize the calculation of cap_mask a bit. gcc/testsuite/ChangeLog: 2025-04-15 Martin Jambor <mjambor@suse.cz> PR ipa/119803 * gcc.dg/ipa/pr119803.c: New test. Co-authored-by: Jakub Jelinek <jakub@redhat.com>
2025-04-15	d: Fix internal compiler error: in visit, at d/decl.cc:838 [PR119799]	Iain Buclaw	3	-5/+13
	This was caused by a check in the D front-end disallowing static VAR_DECLs with a size `0'. While empty structs in D are give the size `1', the same symbol coming from ImportC modules do infact have no size, so allow C variables to pass the check as well as array objects. PR d/119799 gcc/d/ChangeLog: * decl.cc (DeclVisitor::visit (VarDeclaration )): Check front-end type size before building the VAR_DECL. Allow C symbols to have a size of `0'. gcc/testsuite/ChangeLog: gdc.dg/import-c/pr119799.d: New test. * gdc.dg/import-c/pr119799c.c: New test.
2025-04-15	c++: prev declared hidden tmpl friend inst, cont [PR119807]	Patrick Palka	3	-0/+71
	When remapping existing specializations of a hidden template friend from a previous declaration to the new definition, we must remap only those specializations that match this new definition, but currently we remap all specializations (since they all appear in the same DECL_TEMPLATE_INSTANTIATIONS list of the most general template). Concretely, in the first testcase below, we form two specializations of the friend A::f, one with arguments {{0},{bool}} and another with arguments {{1},{bool}}. Later when instantiating B, we need to remap these specializations. During the B<0> instantiation we only want to remap the first specialization, and during the B<1> instantiation only the second specialization, but currently we remap both specializations twice. tsubst_friend_function needs to determine if an existing specialization matches the shape of the new definition, which is tricky in general, e.g. if the outer template parameters may not match up. Fortunately we don't have to reinvent the wheel here since is_specialization_of_friend seems to do exactly what we need. We can check this unconditionally, but I think it's only necessary when dealing with specializations formed from a class template scope previous declaration, hence the TMPL_ARGS_HAVE_MULTIPLE_LEVELS check. PR c++/119807 PR c++/112288 gcc/cp/ChangeLog: * pt.cc (tsubst_friend_function): Skip remapping an existing specialization if it doesn't match the shape of the new friend definition. gcc/testsuite/ChangeLog: * g++.dg/template/friend86.C: New test. * g++.dg/template/friend87.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2025-04-15	d: Fix ICE in dwarf2out_imported_module_or_decl, at dwarf2out.cc:27676 ↵	Iain Buclaw	5	-1/+19
	[PR119817] The ImportVisitor method for handling the importing of overload sets was pushing NULL_TREE to the array of import decls, which in turn got passed to `debug_hooks->imported_module_or_decl', triggering the observed internal compiler error. NULL_TREE is returned from `build_import_decl' when the symbol was ignored for being non-trivial to represent in debug, for example, template or tuple declarations. So similarly "skip" adding the symbol when this is the case for overload sets too. PR d/119817 gcc/d/ChangeLog: * imports.cc (ImportVisitor::visit (OverloadSet )): Don't push NULL_TREE to vector of import symbols. gcc/testsuite/ChangeLog: gdc.dg/debug/imports/m119817/a.d: New test. * gdc.dg/debug/imports/m119817/b.d: New test. * gdc.dg/debug/imports/m119817/package.d: New test. * gdc.dg/debug/pr119817.d: New test.
2025-04-15	ipa-cp: Fix up ipcp_print_widest_int	Jakub Jelinek	1	-7/+17
	On Mon, Mar 31, 2025 at 03:34:07PM +0200, Martin Jambor wrote: > This patch just introduces a form of dumping of widest ints that only > have zeros in the lowest 128 bits so that instead of printing > thousands of f's the output looks like: > > Bits: value = 0xffff, mask = all ones folled by 0xffffffffffffffffffffffffffff0000 > > and then makes sure we use the function not only to print bits but > also to print masks where values like these can also occur. Shouldn't that be followed by instead? And the widest_int checks seems to be quite expensive (especially for large widest_ints), I think for the first one we can just == -1 and for the second one wi::arshift (value, 128) == -1 and the zero extension by using wi::zext. Anyway, I wonder if it wouldn't be better to use something shorter, the variant patch uses 0xf..f prefix before the 128-bit hexadecimal number (maybe we could also special case the even more common bits 64+ are all ones case). Or it could be 0xff prefix. Or printing such numbers as -0x prefixed negative, though that is not a good idea for masks. This version doesn't print e.g. 0xf..fffffffffffffffffffffffffffff0000 but just 0xf..f0000 (of course, for say mask of 0xf..f0000000000000000000000000000ffff it prints it like that, doesn't try to shorten the 0 digits. But if the most significant bits aren't set, it will be just 0xffff. 2025-04-15 Jakub Jelinek <jakub@redhat.com> ipa-cp.cc (ipcp_print_widest_int): Print values with all ones in bits 128+ with "0xf..f" prefix instead of "all ones folled by ". Simplify wide_int check for -1 or all ones above least significant 128 bits.
2025-04-15	tailc: Fix up musttail calls vs. -fsanitize=thread [PR119801]	Jakub Jelinek	3	-11/+118
	Calls with musttail attribute don't really work with -fsanitize=thread in GCC. The problem is that TSan instrumentation adds __tsan_func_entry (__builtin_return_address (0)); calls at the start of each instrumented function and __tsan_func_exit (); call at the end of those and the latter stands in a way of normal tail calls as well as musttail tail calls. Looking at what LLVM does, for normal calls -fsanitize=thread also prevents tail calls like in GCC (well, the __tsan_func_exit () call itself can be tail called in GCC (and from what I see not in clang)). But for [[clang::musttail]] calls it arranges to move the __tsan_func_exit () before the musttail call instead of after it. The following patch handles it similarly. If we for -fsanitize=thread instrumented function detect __builtin_tsan_func_exit () call, we process it normally (so that the call can be tail called in function returning void) but set a flag that the builtin has been seen (only for cfun->has_musttail in the diag_musttail phase). And then let tree_optimize_tail_calls_1 call find_tail_calls again in a new mode where the __tsan_func_exit () call is ignored and so we are able to find calls before it, but only accept that if the call before it is actually a musttail. For C++ it needs to verify that EH cleanup if any also has the __tsan_func_exit () call and if all goes well, the musttail call is registered for tailcalling with a flag that it has __tsan_func_exit () after it and when optimizing that we emit __tsan_func_exit (); call before the musttail tail call (or musttail tail recursion). 2025-04-15 Jakub Jelinek <jakub@redhat.com> PR sanitizer/119801 * sanitizer.def (BUILT_IN_TSAN_FUNC_EXIT): Use BT_FN_VOID rather than BT_FN_VOID_PTR. * tree-tailcall.cc: Include attribs.h and asan.h. (struct tailcall): Add has_tsan_func_exit member. (empty_eh_cleanup): Add eh_has_tsan_func_exit argument, set what it points to to 1 if there is exactly one __tsan_func_exit call and ignore that call otherwise. Adjust recursive call. (find_tail_calls): Add RETRY_TSAN_FUNC_EXIT argument, pass it to recursive calls. When seeing __tsan_func_exit call with RETRY_TSAN_FUNC_EXIT 0, set it to -1. If RETRY_TSAN_FUNC_EXIT is 1, initially ignore __tsan_func_exit calls. Adjust empty_eh_cleanup caller. When looking through stmts after the call, ignore exactly one __tsan_func_exit call but remember it in t->has_tsan_func_exit. Diagnose if EH cleanups didn't have __tsan_func_exit and normal path did or vice versa. (optimize_tail_call): Emit __tsan_func_exit before the tail call or tail recursion. (tree_optimize_tail_calls_1): Adjust find_tail_calls callers. If find_tail_calls changes retry_tsan_func_exit to -1, set it to 1 and call it again with otherwise the same arguments. * c-c++-common/tsan/pr119801.c: New test.
2025-04-15	Wbuiltin-declaration-mismatch-4.c: accept long long in warning for llp64	Jonathan Yong	1	-2/+2
	llp64 targets like mingw-w64 will print: gcc/testsuite/gcc.dg/Wbuiltin-declaration-mismatch-4.c:80:17: warning: ‘memset’ argument 3 promotes to ‘ptrdiff_t’ {aka ‘long long int’} where ‘long long unsigned int’ is expected in a call to built-in function declared without prototype [- Wbuiltin-declaration-mismatch] Change the regex pattern to accept it. Signed-off-by: Jonathan Yong <10walls@gmail.com> gcc/testsuite/ChangeLog: * gcc.dg/Wbuiltin-declaration-mismatch-4.c: Make diagnostic accept long long.
2025-04-15	testsuite: Fix up ipa/pr119318.c test [PR119318]	Jakub Jelinek	1	-2/+1
	dg-additional-options followed by dg-options is ignored. I've added the -w from there to dg-options and removed dg-additional-options. 2025-04-15 Jakub Jelinek <jakub@redhat.com> PR ipa/119318 * gcc.dg/ipa/pr119318.c: Remove dg-additional-options, add -w to dg-options.
2025-04-15	libstdc++: Fix std::string construction from volatile char* [PR119748]	Jonathan Wakely	5	-13/+73
	My recent r15-9381-g648d5c26e25497 change assumes that a contiguous iterator with the correct value_type can be converted to a const charT* but that's not true for volatile charT. The optimization should only be done if it can be converted to the right pointer type. Additionally, some generic loops for non-contiguous iterators need an explicit cast to deal with iterator reference types that do not bind to the const charT& parameter of traits_type::assign. libstdc++-v3/ChangeLog: PR libstdc++/119748 include/bits/basic_string.h (_S_copy_chars): Only optimize for contiguous iterators that are convertible to const charT. Use explicit conversion to charT after dereferencing iterator. (_S_copy_range): Likewise for contiguous ranges. include/bits/basic_string.tcc (_M_construct): Use explicit conversion to charT after dereferencing iterator. * include/bits/cow_string.h (_S_copy_chars): Likewise. (basic_string(from_range_t, R&&, const Allocator&)): Likewise. Only optimize for contiguous iterators that are convertible to const charT. testsuite/21_strings/basic_string/cons/char/119748.cc: New test. * testsuite/21_strings/basic_string/cons/wchar_t/119748.cc: New test. Reviewed-by: Tomasz Kaminski <tkaminsk@redhat.com>
2025-04-15	libstdc++: Enable __gnu_test::test_container constructor for C++98	Jonathan Wakely	1	-3/+1
	The only reason this constructor wasn't defined for C++98 is that it uses constructor delegation, but that isn't necessary. libstdc++-v3/ChangeLog: * testsuite/util/testsuite_iterators.h (test_container): Define array constructor for C++98 as well.
2025-04-15	libgcobol: Handle long double as an alternate IEEE754 quad [PR119244]	Jakub Jelinek	5	-151/+137
	I think there should be consistency in what we use, so like libgcobol-fp.h specifies, IEEE quad long double should have highest priority, then _Float128 with f128 APIs, then libquadmath. And when we decide to use say long double, we shouldn't mix that with strfromf128/strtof128. Additionally, given that the l vs. f128 vs. q API decision is done solely in libgcobol and not in the compiler (which is different from the Fortran case where compiled code emits say sinq or sinf128 calls), I think libgcobol.spec should only have -lquadmath in any form only in the case when using libquadmath for everything. In the Fortran case it is for backwards compatibility purposes, if something has been compiled with older gfortran which used say sinq and link is done by gfortran which has been configured against new glibc with f128, linking would fail otherwise. 2025-04-15 Jakub Jelinek <jakub@redhat.com> Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> PR cobol/119244 acinclude.m4 (LIBGCOBOL_CHECK_FLOAT128): Ensure libgcob_cv_have_float128 is not yes on targets with IEEE quad long double. Don't check for --as-needed nor set LIBQUADSPEC on targets which USE_IEC_60559. * libgcobol-fp.h (FP128_FMT, strtofp128, strfromfp128): Define. * intrinsic.cc (strtof128): Don't redefine. (WEIRD_TRANSCENDENT_RETURN_VALUE): Use GCOB_FP128_LITERAL macro. (__gg__numval_f): Use strtofp128 instead of strtof128. * libgcobol.cc (strtof128): Don't redefine. (format_for_display_internal): Use strfromfp128 instead of strfromf128 or quadmath_snprintf and use FP128_FMT in the format string. (get_float128, __gg__compare_2, __gg__move, __gg__move_literala): Use strtofp128 instead of strtof128. * configure: Regenerate.
2025-04-15	Doc: always_inline attribute vs multiple TUs and LTO [PR113203]	Sandra Loosemore	1	-0/+7
	gcc/ChangeLog PR ipa/113203 * doc/extend.texi (Common Function Attributes): Explain how to use always_inline in programs that have multiple translation units, and that LTO inlining additionally needs optimization enabled.
2025-04-14	c++: shortcut constexpr vector ctor [PR113835]	Jason Merrill	2	-0/+17
	Since std::vector became usable in constant evaluation in C++20, a vector variable with static storage duration might be manifestly constant-evaluated, so we properly try to constant-evaluate its initializer. But it can never succeed since the result will always refer to the result of operator new, so trying is a waste of time. Potentially a large waste of time for a large vector, as in the testcase in the PR. So, let's recognize this case and skip trying constant-evaluation. I do this only for the case of an integer argument, as that's the case that's easy to write but slow to (fail to) evaluate. In the test, I use dg-timeout-factor to lower the default timeout from 300 seconds to 15; on my laptop, compilation without the patch takes about 20 seconds versus about 2 with the patch. PR c++/113835 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_outermost_constant_expr): Bail out early for std::vector(N). gcc/testsuite/ChangeLog: * g++.dg/cpp2a/constexpr-vector1.C: New test.
2025-04-14	Revert documents from r11-344-g0fec3f62b9bfc0	liuhongt	1	-92/+2
	gcc/ChangeLog: PR target/108134 * doc/extend.texi: Remove documents from r11-344-g0fec3f62b9bfc0.
2025-04-15	Doc: clarify -march=pentiumpro has no MMX support [PR42683]	Sandra Loosemore	1	-1/+1
	gcc/ChangeLog PR target/42683 * doc/invoke.texi (x86 Options): Clarify that -march=pentiumpro doesn't include MMX.
2025-04-15	Daily bump.	GCC Administrator	10	-1/+638

2025-04-14	GCN, nvptx: Support '-mfake-exceptions', and use it for offloading ↵	Thomas Schwinge	38	-62/+331
	compilation [PR118794] With '-mfake-exceptions' enabled, the user-visible behavior in presence of exception handling constructs changes such that the compile-time 'sorry, unimplemented: exception handling not supported' is skipped, code generation proceeds, and instead, exception handling constructs 'abort' at run time. (..., or don't, if they're in dead code.) PR target/118794 gcc/ * config/gcn/gcn.opt (-mfake-exceptions): Support. * config/nvptx/nvptx.opt (-mfake-exceptions): Likewise. * config/gcn/gcn.md (define_expand "exception_receiver"): Use it. * config/nvptx/nvptx.md (define_expand "exception_receiver"): Likewise. * config/gcn/mkoffload.cc (main): Set it. * config/nvptx/mkoffload.cc (main): Likewise. * config/nvptx/nvptx.cc (nvptx_assemble_integer) <in_section == exception_section>: Special handling for 'SYMBOL_REF's. * except.cc (expand_dw2_landing_pad_for_region): Don't generate bogus code for (default) '#define EH_RETURN_DATA_REGNO(N) INVALID_REGNUM'. libgcc/ * config/gcn/unwind-gcn.c (_Unwind_Resume): New. * config/nvptx/unwind-nvptx.c (_Unwind_Resume): Likewise. gcc/testsuite/ * g++.target/gcn/exceptions-bad_cast-2.C: Set '-mno-fake-exceptions'. * g++.target/gcn/exceptions-pr118794-1.C: Likewise. * g++.target/gcn/exceptions-throw-2.C: Likewise. * g++.target/nvptx/exceptions-bad_cast-2.C: Likewise. * g++.target/nvptx/exceptions-pr118794-1.C: Likewise. * g++.target/nvptx/exceptions-throw-2.C: Likewise. * g++.target/gcn/exceptions-bad_cast-2_-mfake-exceptions.C: New. * g++.target/gcn/exceptions-pr118794-1_-mfake-exceptions.C: Likewise. * g++.target/gcn/exceptions-throw-2_-mfake-exceptions.C: Likewise. * g++.target/nvptx/exceptions-bad_cast-2_-mfake-exceptions.C: Likewise. * g++.target/nvptx/exceptions-pr118794-1_-mfake-exceptions.C: Likewise. * g++.target/nvptx/exceptions-throw-2_-mfake-exceptions.C: Likewise. libgomp/ * testsuite/libgomp.c++/target-exceptions-bad_cast-2-offload-sorry-GCN.C: Set '-foffload-options=-mno-fake-exceptions'. * testsuite/libgomp.c++/target-exceptions-bad_cast-2-offload-sorry-nvptx.C: Likewise. * testsuite/libgomp.c++/target-exceptions-pr118794-1-offload-sorry-GCN.C: Likewise. * testsuite/libgomp.c++/target-exceptions-pr118794-1-offload-sorry-nvptx.C: Likewise. * testsuite/libgomp.c++/target-exceptions-throw-2-offload-sorry-GCN.C: Likewise. * testsuite/libgomp.c++/target-exceptions-throw-2-offload-sorry-nvptx.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-bad_cast-2-offload-sorry-GCN.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-bad_cast-2-offload-sorry-nvptx.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-throw-2-offload-sorry-GCN.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-throw-2-offload-sorry-nvptx.C: Likewise. * testsuite/libgomp.c++/target-exceptions-bad_cast-2.C: Adjust. * testsuite/libgomp.c++/target-exceptions-pr118794-1.C: Likewise. * testsuite/libgomp.c++/target-exceptions-throw-2.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-bad_cast-2.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-throw-2.C: Likewise. * testsuite/libgomp.c++/target-exceptions-throw-2-O0.C: New.
2025-04-14	Add 'throw', dead code test cases for GCN, nvptx target and OpenACC, OpenMP ↵	Thomas Schwinge	4	-0/+84
	'target' offloading gcc/testsuite/ * g++.target/gcn/exceptions-throw-3.C: New. * g++.target/nvptx/exceptions-throw-3.C: Likewise. libgomp/ * testsuite/libgomp.c++/target-exceptions-throw-3.C: New. * testsuite/libgomp.oacc-c++/exceptions-throw-3.C: Likewise.
2025-04-14	Add 'throw', caught test cases for GCN, nvptx target and OpenACC, OpenMP ↵	Thomas Schwinge	8	-0/+160
	'target' offloading gcc/testsuite/ * g++.target/gcn/exceptions-throw-2.C: New. * g++.target/nvptx/exceptions-throw-2.C: Likewise. libgomp/ * testsuite/libgomp.c++/target-exceptions-throw-2.C: New. * testsuite/libgomp.c++/target-exceptions-throw-2-offload-sorry-GCN.C: Likewise. * testsuite/libgomp.c++/target-exceptions-throw-2-offload-sorry-nvptx.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-throw-2.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-throw-2-offload-sorry-GCN.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-throw-2-offload-sorry-nvptx.C: Likewise.
2025-04-14	Add 'throw' test cases for GCN, nvptx target and OpenACC, OpenMP 'target' ↵	Thomas Schwinge	5	-0/+126
	offloading gcc/testsuite/ * g++.target/gcn/exceptions-throw-1.C: New. * g++.target/nvptx/exceptions-throw-1.C: Likewise. libgomp/ * testsuite/libgomp.c++/target-exceptions-throw-1.C: New. * testsuite/libgomp.c++/target-exceptions-throw-1-O0.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-throw-1.C: Likewise.
2025-04-14	Add 'std::bad_cast' exception, dead code test cases for GCN, nvptx target ↵	Thomas Schwinge	4	-0/+86
	and OpenACC, OpenMP 'target' offloading gcc/testsuite/ * g++.target/gcn/exceptions-bad_cast-3.C: New. * g++.target/nvptx/exceptions-bad_cast-3.C: Likewise. libgomp/ * testsuite/libgomp.c++/target-exceptions-bad_cast-3.C: New. * testsuite/libgomp.oacc-c++/exceptions-bad_cast-3.C: Likewise.
2025-04-14	Add 'std::bad_cast' exception, caught test cases for GCN, nvptx target and ↵	Thomas Schwinge	8	-0/+158
	OpenACC, OpenMP 'target' offloading gcc/testsuite/ * g++.target/gcn/exceptions-bad_cast-2.C: New. * g++.target/nvptx/exceptions-bad_cast-2.C: Likewise. libgomp/ * testsuite/libgomp.c++/target-exceptions-bad_cast-2.C: New. * testsuite/libgomp.c++/target-exceptions-bad_cast-2-offload-sorry-GCN.C: Likewise. * testsuite/libgomp.c++/target-exceptions-bad_cast-2-offload-sorry-nvptx.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-bad_cast-2.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-bad_cast-2-offload-sorry-GCN.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-bad_cast-2-offload-sorry-nvptx.C: Likewise.