aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2022-03-24hardcmp: split before dispatch edgeAlexandre Oliva2-3/+29
If we harden a compare at the end of a block with an edge to the abnormal dispatch block, it won't have a single successor. Arrange to split the block at its final stmt so as to have a single succ. for gcc/ChangeLog PR middle-end/104975 * gimple-harden-conditionals.cc (pass_harden_compares::execute): Force split in case of multiple edges. for gcc/testsuite/ChangeLog PR middle-end/104975 * gcc.dg/pr104975.c: New.
2022-03-24testsuite: Add compat.exp testcase for most common zero width bitfld ABI ↵Jakub Jelinek8-0/+102
passing [PR102024] On Tue, Mar 22, 2022 at 05:51:58PM +0100, Jakub Jelinek via Gcc wrote: > I guess it would be nice to include the testcases we are talking about, > like { float x; int : 0; float y; } and { float x; int : 0; } and > { int : 0; float x; } into compat.exp testsuite so that we see ABI > differences in compat testing. Here is a patch that does that. It uses the struct-layout-1* framework, but isn't generated because we don't want in this case pseudo-random structure layouts, but particular ones we know cause or could cause problems on some targets. If other problematic cases are discovered, we can add further ones. Tested on x86_64-linux with: make check-gcc check-g++ RUNTESTFLAGS='ALT_CC_UNDER_TEST=gcc ALT_CXX_UNDER_TEST=g++ compat.exp=pr102*' and with make check-gcc check-g++ RUNTESTFLAGS='compat.exp=pr102*' The former as expected has: FAIL: gcc.dg/compat/pr102024 c_compat_x_tst.o-c_compat_y_alt.o execute FAIL: gcc.dg/compat/pr102024 c_compat_x_alt.o-c_compat_y_tst.o execute fails because on x86_64 we've changed the C ABI but kept the C++ ABI here. E.g. on rs6000 it should be the g++.dg such tests to fail (all assuming the alt gcc/g++ is GCC 4.5 through 11). 2022-03-24 Jakub Jelinek <jakub@redhat.com> PR target/102024 * gcc.dg/compat/pr102024_main.c: New test. * gcc.dg/compat/pr102024_test.h: New test. * gcc.dg/compat/pr102024_x.c: New test. * gcc.dg/compat/pr102024_y.c: New test. * g++.dg/compat/pr102024_main.C: New test. * g++.dg/compat/pr102024_test.h: New test. * g++.dg/compat/pr102024_x.C: New test. * g++.dg/compat/pr102024_y.C: New test.
2022-03-24fold-const: Handle C++ dependent COMPONENT_REFs in operand_equal_p [PR105035]Jakub Jelinek2-2/+34
As mentioned in the PR, operand_equal_p already contains some hacks so that it can be called already on pre-instantiation C++ trees from templates, but the recent change to compare DECL_FIELD_OFFSET in the COMPONENT_REF case broke this. Many such COMPONENT_REFs are already punted on earlier because they have NULL TREE_TYPE, but in this case the code knows what type they have but still uses an IDENTIFIER_NODE as second operand of COMPONENT_REF (I think SCOPE_REF is something that could be used too). The following patch looks at those DECL_FIELD_*OFFSET fields only if both field[01] args are FIELD_DECLs and otherwise keeps it to the earlier OP_SAME (1) check that guards this whole block. 2022-03-24 Jakub Jelinek <jakub@redhat.com> PR c++/105035 * fold-const.cc (operand_equal_p) <case COMPONENT_REF>: If either field0 or field1 is not a FIELD_DECL, return false. * g++.dg/warn/Wduplicated-cond2.C: New test.
2022-03-24Properly reset the port handle when closingPascal Obry2-0/+2
When the serial port is closed, we need to ensure that the port handle is properly reset for it to be detected as closed. gcc/ada/ PR ada/104767 * libgnat/g-sercom__mingw.adb (Close): Reset port handle to -1. * libgnat/g-sercom__linux.adb (Close): Likewise.
2022-03-24Fix memory leaksRichard Biener4-15/+19
When changing the predcom pass to use auto_vec leaks were introduced by failing to replace deallocation with C++ delete. The following does this. It also fixes leaks in vectorization and range folding. 2022-03-24 Richard Biener <rguenther@suse.de> * tree-predcom.cc (chain::chain): Add CTOR. (component::component): Likewise. (pcom_worker::release_chain): Use delete. (release_components): Likewise. (pcom_worker::filter_suitable_components): Likewise. (pcom_worker::split_data_refs_to_components): Use new. (make_invariant_chain): Likewise. (make_rooted_chain): Likewise. (pcom_worker::combine_chains): Likewise. * tree-vect-loop.cc (vect_create_epilog_for_reduction): Make sure to release previously constructed scalar_results. * tree-vect-stmts.cc (vectorizable_load): Use auto_vec for vec_offsets. * vr-values.cc (simplify_using_ranges::~simplify_using_ranges): Release m_flag_set_edges.
2022-03-24tree-optimization/104970: Limit size computation for access attributeSiddhesh Poyarekar2-3/+79
Limit object size computation only to the simple case where access attribute has been explicitly specified. The object passed to __builtin_dynamic_object_size could either be a pointer or a VLA whose size has been described using access attribute. Further, return a valid size only if the object is a void * pointer or points to (or is a VLA of) a type that has a constant size. gcc/ChangeLog: PR tree-optimization/104970 * tree-object-size.cc (parm_object_size): Restrict size computation scenarios to explicit access attributes. gcc/testsuite/ChangeLog: PR tree-optimization/104970 * gcc.dg/builtin-dynamic-object-size-0.c (test_parmsz_simple2, test_parmsz_simple3, test_parmsz_extern, test_parmsz_internal, test_parmsz_internal2, test_parmsz_internal3): New tests. (main): Use them. Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
2022-03-24c++: extern thread_local declarations in constexpr [PR104994]Jakub Jelinek9-30/+37
C++14 to C++20 apparently should allow extern thread_local declarations in constexpr functions, however useless they are there (because accessing such vars is not valid in a constant expression, perhaps sizeof/decltype). P2242 changed that for C++23 to passing through declaration but https://cplusplus.github.io/CWG/issues/2552.html has been filed for it yesterday. The following patch implements the proposed wording of CWG 2552 in addition to fixing the C++14 - C++20 handling bug. If you'd like instead to keep the current pedantic C++23 wording for now, that would mean taking out the first hunk (cxx_eval_constant_expression) and g++.dg/cpp23/constexpr-nonlit2.C hunk. 2022-03-24 Jakub Jelinek <jakub@redhat.com> PR c++/104994 * constexpr.cc (cxx_eval_constant_expression): Don't diagnose passing through extern thread_local declarations. Change wording from declaration to definition. (potential_constant_expression_1): Don't diagnose extern thread_local declarations. Change wording from declared to defined. * decl.cc (start_decl): Likewise. * g++.dg/diagnostic/constexpr1.C: Change expected diagnostic wording from declared to defined. * g++.dg/cpp23/constexpr-nonlit1.C: Likewise. (garply): Change dg-error into dg-bogus. * g++.dg/cpp23/constexpr-nonlit2.C: Change expected diagnostic wording from declaration to definition. * g++.dg/cpp23/constexpr-nonlit6.C: Change expected diagnostic wording from declared to defined. * g++.dg/cpp23/constexpr-nonlit7.C: New test. * g++.dg/cpp2a/constexpr-try5.C: Change expected diagnostic wording from declared to defined. * g++.dg/cpp2a/consteval3.C: Likewise.
2022-03-23rs6000: Skip overload instances with NULL fntype [PR104967]Kewen Lin1-0/+4
For some overload built-in function instance, if it requires a data type which isn't defined on the target, its fntype would be initialized as NULL. This patch is to consider this possibility in function find_instance, as shown in PR104967. PR target/104967 gcc/ChangeLog: * config/rs6000/rs6000-c.cc (find_instance): Skip instances with null function types.
2022-03-24Daily bump.GCC Administrator7-1/+190
2022-03-23analyzer: fix accessing wrong stack frame on interprocedural return [PR104979]David Malcolm8-71/+120
PR analyzer/104979 reports a leak false positive when handling an interprocedural return to a caller: LHS = CALL(ARGS); where the LHS is a certain non-trivial compound expression. The root cause is that parts of the LHS were being erroneously evaluated with respect to the stack frame of the called function, rather than tha of the caller. When LHS contained a local variable within the caller as part of certain nested expressions, this local variable was looked for within the called frame, rather than that of the caller. This lookup in the wrong stack frame led to the local variable being treated as uninitialized, and thus the write to LHS was considered as writing to a garbage location, leading to the return value being lost, and thus being considered as a leak. The region_model code uses the analyzer's path_var class to try to extend the tree type with stack depth information. Based on the above, I think that the path_var class is fundamentally broken, but it's used in a few other places in the analyzer, so I don't want to rip it out until the next stage 1. In the meantime, this patch reworks how region_model::pop_frame works so that the destination region for an interprocedural return value is computed after the frame is popped, so that the region_model has the stack frame for the *caller* at that point. Doing so fixes the issue. I attempted a more ambitious fix which moved the storing of the return svalue into the destination region from region_model::pop_region into region_model::update_for_return_gcall, with pop_frame returning the return svalue. Unfortunately, this regressed g++.dg/analyzer/pr93212.C, which returns a pointer into a stale frame. unbind_region_and_descendents and poison_any_pointers_to_descendents are only set up to poison regions with bindings into the stale frame, not individual svalues, and updating that became more invasive than I'm comfortable with in stage 4. The patch also adds assertions to verify that we have the correct function when looking up locals/SSA names in a stack frame. There doesn't seem to be a general-purpose way to get at the function of an SSA name, so the assertions go from SSA name to def-stmt to basic_block, and from there use the analyzer's supergraph to get the function from the basic_block. If there's a simpler way to do this, please let me know. gcc/analyzer/ChangeLog: PR analyzer/104979 * engine.cc (impl_run_checkers): Create the engine after the supergraph, and pass the supergraph to the engine. * region-model.cc (region_model::get_lvalue_1): Pass ctxt to frame_region::get_region_for_local. (region_model::update_for_return_gcall): Pass the lvalue for the result to pop_frame as a tree, rather than as a region. (region_model::pop_frame): Update for above change, determining the destination region after the frame is popped and thus with respect to the caller frame rather than the called frame. Likewise, set the value of the region to the return value after the frame is popped. (engine::engine): Add supergraph pointer. (selftest::test_stack_frames): Set the DECL_CONTECT of PARM_DECLs. (selftest::test_get_representative_path_var): Likewise. (selftest::test_state_merging): Likewise. * region-model.h (region_model::pop_frame): Convert first param from a const region * to a tree. (engine::engine): Add param "sg". (engine::m_sg): New field. * region.cc: Include "analyzer/sm.h" and "analyzer/program-state.h". (frame_region::get_region_for_local): Add "ctxt" param. Add assertions that VAR_DECLs are locals, and that expr is for the correct function. * region.h (frame_region::get_region_for_local): Add "ctxt" param. gcc/testsuite/ChangeLog: PR analyzer/104979 * gcc.dg/analyzer/boxed-malloc-1-29.c: Deleted test, moving the now fixed test_29 to... * gcc.dg/analyzer/boxed-malloc-1.c: ...here. * gcc.dg/analyzer/stale-frame-1.c: Add test coverage. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-03-23c++: tweak PR103337 fixJason Merrill1-11/+21
Patrick suggested a way to implement the designated-init handling without (temporarily) modifying the CONSTRUCTOR being reshaped. PR c++/103337 gcc/cp/ChangeLog: * decl.cc (reshape_single_init): New. (reshape_init_class): Use it.
2022-03-23c++: tweak PR105006 fixJason Merrill1-1/+1
Checking dependent_type_p avoids needing to walk the overloads in cases where it would not be possible to find a dependent using. PR c++/105006 gcc/cp/ChangeLog: * name-lookup.cc (lookup_using_decl): Check that scope is a dependent type before looking for dependent using.
2022-03-23Fortran: Fix directory stat check for '.' [PR103560]Tobias Burnus6-12/+12
MinGW does not like a call to 'stat' for './' via gfc_do_check_include_dir. Solution: Only append '/' when concatenating the path with the filename. gcc/fortran/ChangeLog: PR fortran/103560 * scanner.cc (add_path_to_list): Don't append '/' to the save include path. (open_included_file): Use '/' in concatenating path + file name. * module.cc (gzopen_included_file_1): Likewise. gcc/testsuite/ChangeLog: PR fortran/103560 * gfortran.dg/include_14.f90: Update dg-warning. * gfortran.dg/include_17.f90: Likewise. * gfortran.dg/include_18.f90: Likewise. * gfortran.dg/include_6.f90: Update dg-*.
2022-03-23target/102125 - alternative memcpy folding improvementRichard Biener1-2/+10
The following extends the heuristical memcpy folding path with the ability to use misaligned accesses on strict-alignment targets just like the size-based path does. That avoids regressing the following testcase on arm uint64_t bar64(const uint8_t *rData1) { uint64_t buffer; memcpy(&buffer, rData1, sizeof(buffer)); return buffer; } when r12-3482-g5f6a6c91d7c592 is reverted. 2022-03-23 Richard Biener <rguenther@suse.de> PR target/102125 * gimple-fold.cc (gimple_fold_builtin_memory_op): Allow the use of movmisalign when either the source or destination decl is properly aligned.
2022-03-23rtl-optimization/105028 - fix compile-time hog in form_threads_from_copiesRichard Biener1-43/+28
form_threads_from_copies processes a sorted array of copies, skipping those with the same thread and conflicting threads and merging the first non-conflicting ones. After that it terminates the loop and gathers the remaining elements of the array, skipping same thread copies, re-starting the process. For a large number of copies this gathering of the rest takes considerable time and it also appears pointless. The following simply continues processing the array which should be equivalent as far as I can see. This takes form_threads_from_copies off the profile radar from previously taking ~50% of the compile-time. 2022-03-23 Richard Biener <rguenther@suse.de> PR rtl-optimization/105028 * ira-color.cc (form_threads_from_copies): Remove unnecessary copying of the sorted_copies tail.
2022-03-23c++: using from enclosing class template [PR105006]Jason Merrill2-0/+28
Here, DECL_DEPENDENT_P was false for the second using because Row<eT> is "the current instantiation", so lookup succeeds. But since Row itself has a dependent using-decl for operator(), the set of functions imported by the second using is dependent, so we should set the flag. PR c++/105006 gcc/cp/ChangeLog: * name-lookup.cc (lookup_using_decl): Set DECL_DEPENDENT_P if lookup finds a dependent using. gcc/testsuite/ChangeLog: * g++.dg/template/using30.C: New test.
2022-03-23analyzer: use tainted_allocation_size::m_mem_space [PR105017]David Malcolm2-26/+58
gcc/analyzer/ChangeLog: PR analyzer/105017 * sm-taint.cc (taint_diagnostic::subclass_equal_p): Check m_has_bounds as well as m_arg. (tainted_allocation_size::subclass_equal_p): Chain up to base class implementation. Also check m_mem_space. (tainted_allocation_size::emit): Add note showing stack-based vs heap-based allocations. gcc/testsuite/ChangeLog: PR analyzer/105017 * gcc.dg/analyzer/taint-alloc-1.c: Add expected messages relating to heap vs stack. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-03-23analyzer: fix ICE adding note to disabled diagnostic [PR104997]David Malcolm4-14/+45
gcc/analyzer/ChangeLog: PR analyzer/104997 * diagnostic-manager.cc (diagnostic_manager::add_diagnostic): Convert return type from "void" to "bool", reporting success vs failure to caller, for both overloads. * diagnostic-manager.h (diagnostic_manager::add_diagnostic): Likewise. * engine.cc (impl_region_model_context::warn): Propagate return value from diagnostic_manager::add_diagnostic. gcc/testsuite/ChangeLog: PR analyzer/104997 * gcc.dg/analyzer/write-to-string-literal-4-disabled.c: New test, adapted from write-to-string-literal-4.c. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-03-23rs6000: Adjust error messages.Martin Liska2-4/+5
gcc/ChangeLog: * config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin): Use %qs in format. * config/rs6000/rs6000.cc (rs6000_option_override_internal): Reword the error message.
2022-03-23testsuite: Fix up sse2-v1ti-shift-3.c test [PR102986]Jakub Jelinek1-4/+4
This test is dg-do run and invokes UB when these rotate functions are called with 0 as second argument. There are some other tests that do this but they are dg-do compile only and not even call those functions at all, so it IMHO doesn't matter that they are only well defined for [1,127] and not [0,127]. The following patch fixes it, we pattern recognize both forms as rotates and we emit identical assembly. 2022-03-23 Jakub Jelinek <jakub@redhat.com> PR target/102986 * gcc.target/i386/sse2-v1ti-shift-3.c (rotr_v1ti, rotl_v1ti, rotr_ti, rotl_ti): Use -i&127 instead of 128-i to avoid UB on i == 0.
2022-03-23LTO: Fixes for renaming issues with offload/OpenMP [PR104285]Tobias Burnus2-34/+41
gcc/lto/ChangeLog: PR middle-end/104285 * lto-partition.cc (maybe_rewrite_identifier): Use get_identifier for the returned string to be usable as hash key. (validize_symbol_for_target): Hence, use return value directly. (privatize_symbol_name_1): Track maybe_rewrite_identifier renames. * lto.cc (offload_handle_link_vars): Move function up before ... (do_whole_program_analysis): Call it after static renamings. (lto_main): Move call after static renamings. libgomp/ChangeLog: PR middle-end/104285 * testsuite/libgomp.c++/target-same-name-2-a.C: New test. * testsuite/libgomp.c++/target-same-name-2-b.C: New test. * testsuite/libgomp.c++/target-same-name-2.C: New test. * testsuite/libgomp.c-c++-common/target-same-name-1-a.c: New test. * testsuite/libgomp.c-c++-common/target-same-name-1-b.c: New test. * testsuite/libgomp.c-c++-common/target-same-name-1.c: New test.
2022-03-23Fix ICE caused by NULL_RTX returned by lowpart_subreg.liuhongt6-65/+196
In validate_subreg, both (subreg:V2HF (reg:SI) 0) and (subreg:V8HF (reg:V2HF) 0) are valid, but not for (subreg:V8HF (reg:SI) 0) which causes ICE. Ideally it should be handled in validate_subreg to support subreg for all modes available in TARGET_CAN_CHANGE_MODE_CLASS, but that would be too risky in stage4, so the patch is a walkround in the backend to force_reg operands before lowpart_subreg for expanders or pre_reload splitters. gcc/ChangeLog: PR target/104976 * config/i386/sse.md (ssePSmodelower): New. (*avx_cmp<mode>3_ltint_not): Force_reg operand before lowpart_subreg to avoid NULL_RTX. (<avx512>_fmaddc_<mode>_mask1<round_expand_name>, <avx512>_fcmaddc_<mode>_mask1<round_expand_name>, fma_<mode>_fmaddc_bcst, fma_<mode>_fcmaddc_bcst, <avx512>_<complexopname>_<mode>_mask<round_name>, avx512fp16_fcmaddcsh_v8hf_mask1<round_expand_name>, avx512fp16_fcmaddcsh_v8hf_mask3<round_expand_name>, avx512fp16_fmaddcsh_v8hf_mask3<round_expand_name>, avx512fp16_fmaddcsh_v8hf_mask3<round_expand_name>, float<floatunssuffix><mode>v4hf2, float<floatunssuffix>v2div2hf2, fix<fixunssuffix>_truncv4hf<mode>2, fix<fixunssuffix>_truncv2hfv2di2, extendv4hf<mode>2, extendv2hfv2df2, trunc<mode>v4hf2,truncv2dfv2hf2, *avx512bw_permvar_truncv16siv16hi_1, *avx512bw_permvar_truncv16siv16hi_1_hf, *avx512f_permvar_truncv8siv8hi_1, *avx512f_permvar_truncv8siv8hi_1_hf, *avx512f_vpermvar_truncv8div8si_1, *avx512f_permvar_truncv32hiv32qi_1, *avx512f_permvar_truncv16hiv16qi_1, *avx512f_permvar_truncv4div4si_1, *avx512f_pshufb_truncv8hiv8qi_1, *avx512f_pshufb_truncv4siv4hi_1, *avx512f_pshufd_truncv2div2si_1, sdot_prod<mode>, avx2_pblend<ssemodesuffix>_1, ashrv2di3,ashrv2di3,usdot_prod<mode>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr104976.c: New test. * gcc.target/i386/avx512fp16-vfcmaddcph-1a.c: Scan either vblendps or masked vmovaps. * gcc.target/i386/avx512fp16-vfmaddcph-1a.c: Ditto * gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c: Ditto. * gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c: Ditto.
2022-03-23Daily bump.GCC Administrator5-1/+244
2022-03-22c: -Wmissing-field-initializers and designated inits [PR82283, PR84685]Marek Polacek6-4/+128
This patch fixes two kinds of wrong -Wmissing-field-initializers warnings. Our docs say that this warning "does not warn about designated initializers", but we give a warning for 1) the array case: struct S { struct N { int a; int b; } c[1]; } d = { .c[0].a = 1, .c[0].b = 1, // missing initializer for field 'b' of 'struct N' }; we warn because push_init_level, when constructing an array, clears constructor_designated (which the warning relies on), and we forget that we were in a designated initializer context. Fixed by the push_init_level hunk; and 2) the compound literal case: struct T { int a; int *b; int c; }; struct T t = { .b = (int[]){1} }; // missing initializer for field 'c' of 'struct T' where set_designator properly sets constructor_designated to 1, but the compound literal causes us to create a whole new initializer_stack in start_init, which clears constructor_designated. Then, after we've parsed the compound literal, finish_init flushes the initializer_stack entry, but doesn't restore constructor_designated, so we forget we were in a designated initializer context, which causes the bogus warning. (The designated flag is also tracked in constructor_stack, but in this case, we didn't perform push_init_level between set_designator and start_init so it wasn't saved anywhere.) PR c/82283 PR c/84685 gcc/c/ChangeLog: * c-typeck.cc (struct initializer_stack): Add 'designated' member. (start_init): Set it. (finish_init): Restore constructor_designated. (push_init_level): Set constructor_designated to the value of constructor_designated in the upper constructor_stack. gcc/testsuite/ChangeLog: * gcc.dg/Wmissing-field-initializers-1.c: New test. * gcc.dg/Wmissing-field-initializers-2.c: New test. * gcc.dg/Wmissing-field-initializers-3.c: New test. * gcc.dg/Wmissing-field-initializers-4.c: New test. * gcc.dg/Wmissing-field-initializers-5.c: New test.
2022-03-22Fortran: ensure intialization of stride arrayHarald Anlauf1-0/+1
gcc/fortran/ChangeLog: PR fortran/104999 * simplify.cc (gfc_simplify_cshift): Ensure temporary holding source array stride is initialized.
2022-03-22testsuite: Add testcase for already fixed PR [PR102489]Jakub Jelinek1-0/+16
This got broken with r12-3529 and fixed with r12-5255. 2022-03-22 Jakub Jelinek <jakub@redhat.com> PR c++/102489 * g++.dg/coroutines/pr102489.C: New test.
2022-03-22[nvptx] Use '%' as register prefixTom de Vries1-7/+8
The percentage sign as first character of a ptx identifier can be used to avoid name conflicts, e.g., between user-defined variable names and compiler-generated names. The insn nvptx_uniform_warp_check contains register names without '%' prefix, which potentially could lead to name conflicts with user-defined variable names. Fix this by adding a '%' prefix, more specifically a '%r_' prefix to avoid a name conflict with ptx special registers. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-03-20 Tom de Vries <tdevries@suse.de> PR target/104925 * config/nvptx/nvptx.md (define_insn "nvptx_uniform_warp_check"): Use % as register prefix.
2022-03-22[nvptx] Limit HFmode support to mexperimentalTom de Vries7-2/+8
With PR104489 still open and end-of-stage-4 approaching, classify HFmode support as experimental, which is not enabled by default but can be enabled using -mexperimental. This fixes the nvptx build when the default sm_xx is set to sm_53 or higher. Note that we're not using -mfp16 or some such, because that might create expectations about being able to switch support on or off in the future, and at this point it's not clear why, once reaching non-experimental status, it shouldn't always be enabled. gcc/ChangeLog: 2022-03-19 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (nvptx_scalar_mode_supported_p) (nvptx_libgcc_floating_mode_supported_p): Only enable HFmode for mexperimental. gcc/testsuite/ChangeLog: 2022-03-19 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/float16-1.c: Add additional-options -mexperimental. * gcc.target/nvptx/float16-2.c: Same. * gcc.target/nvptx/float16-3.c: Same. * gcc.target/nvptx/float16-4.c: Same. * gcc.target/nvptx/float16-5.c: Same. * gcc.target/nvptx/float16-6.c: Same.
2022-03-22[nvptx] Add mexperimentalTom de Vries1-0/+3
Add new option -mexperimental. This allows, rather than developing a new feature to completion in a development branch, to develop a new feature on trunk, without disturbing trunk. The equivalent of the feature branch merge then becomes making the functionality available for -mno-experimental. If more features at the same time will be developed, we can do something like -mexperimental=feature1,feature2 but for now that's not necessary. For now, has no effect. gcc/ChangeLog: 2022-03-19 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.opt (mexperimental): New option.
2022-03-22[nvptx] Use .alias directive for mptx >= 6.3Tom de Vries9-1/+182
Starting with ptx isa version 6.3, a ptx directive .alias is available. Use this directive to support symbol aliases, as far as possible. The alias support is off by default. It can be turned on using a switch -malias. Furthermore, for pre-sm_75, it's not effective unless the ptx version is bumped to 6.3 or higher using -mptx (given that the default for pre-sm_75 is 6.0). The alias support has the following limitations. Only function aliases are supported. Weak aliases are not supported. That is, if I disable the check in nvptx_asm_output_def_from_decls that disallows this, a weak alias is emitted and parsed by the driver. But the test gcc.dg/globalalias.c starts failing, with the behaviour matching the comment about "weird behavior of AIX's .set pseudo-op": a weak alias may resolve to different functions in different files. Aliases to weak symbols are not supported (see gcc.dg/localalias.c). This is currently not prohibited by the compiler, but with the driver link we run into: "error: Function test with .weak scope cannot be aliased". Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c). This is currently not prohibited by the compiler, but with the driver link we run into: "Internal error: alias to unknown symbol" . Unreferenced aliases are not emitted (these can occur f.i. when inlining a call to an alias). This avoids driver link error "Internal error: reference to deleted section". When enabling malias by default, libgomp detects alias support and consequently libgomp.a will contains a few uses of .alias. This however results in aforementioned "Internal error: reference to deleted section" in many test-cases. Either there's some error with how .alias is used, or there's a driver bug. While this issue is not resolved, we keep malias off-by-default. At some point we may add support in the nvptx-tools linker for symbol aliases, and define f.i. malias=ptx and malias=ld to choose between the two in the compiler. An example of where this support is useful, is the OvO (OpenMP vs Offload) testsuite. The testsuite passes already at -O2. But at -O0, there are errors in some c++ test-cases due to missing symbol alias support. By compiling with -malias, the whole testsuite passes also at -O0. This patch causes a regression: ... -PASS: gcc.dg/pr60797.c (test for errors, line 4) +FAIL: gcc.dg/pr60797.c (test for errors, line 4) ... The test-case is skipped for effective target alias, and both without and with this patch the nvptx target is considered to not support it, so the test-case is executed. The test-case expects an error message along the lines of "alias definitions not supported in this configuration", but instead we run into: ... gcc.dg/pr60797.c:4:12: error: foo aliased to undefined symbol ... This is probably due to the fact that the nvptx backend now defines macros ASM_OUTPUT_DEF and ASM_OUTPUT_DEF_FROM_DECLS, so from the point of view of the common part of the compiler, aliases are supported. gcc/ChangeLog: 2022-03-18 Tom de Vries <tdevries@suse.de> PR target/104957 * config/nvptx/nvptx-protos.h (nvptx_asm_output_def_from_decls): Declare. * config/nvptx/nvptx.cc (write_fn_proto_1): Don't add function marker for alias. (SET_ASM_OP, NVPTX_ASM_OUTPUT_DEF): New macro def. (nvptx_asm_output_def_from_decls): New function. * config/nvptx/nvptx.h (ASM_OUTPUT_DEF): New macro def, define to gcc_unreachable (). (ASM_OUTPUT_DEF_FROM_DECLS): New macro def, define to nvptx_asm_output_def_from_decls. * config/nvptx/nvptx.opt (malias): New opt. gcc/testsuite/ChangeLog: 2022-03-18 Tom de Vries <tdevries@suse.de> PR target/104957 * gcc.target/nvptx/alias-1.c: New test. * gcc.target/nvptx/alias-2.c: New test. * gcc.target/nvptx/alias-3.c: New test. * gcc.target/nvptx/alias-4.c: New test. * gcc.target/nvptx/nvptx.exp (check_effective_target_runtime_ptx_isa_version_6_3): New proc.
2022-03-22[nvptx] Add warp sync at simt exitTom de Vries1-0/+4
Consider this code (with N defined to 1024): ... float v = 0.0; #pragma omp target map(tofrom: v) #pragma omp parallel for simd for (int i = 0 ; i < N; i++) { #pragma omp atomic update v = v + 1.0; } ... It hangs when executing on target board unix/-foffload=-misa=sm_75, using drivers 470.103.01 and 510.54 on a T400 board (sm_75). I'm tentatively identifying the problem as a bug in -muniform-simt for architectures that support Independent Thread Scheduling (sm_70 and later). The problem -muniform-simt is trying to address is to make sure that a register produced outside an openmp simd region is available when used in any lane inside an simd region. The solution is to, outside an simd region, execute in all warp lanes, thus producing consistent values in result registers in each warp thread. This approach doesn't work when executing in all warp lanes multiplies the side effects from 1 to 32 separate side effects, which is the case for atomic insns. So atomic insns are rewritten to execute only in lane 0, and if there are any results, those are propagated to the other threads in the warp. [ And likewise for system calls malloc, free, vprintf. ] Now, consider a non-atomic update: ld, add, store. The store has side effects, are those multiplied or not? Pre-sm_70 we can assume that at the end of an SIMT region, any divergent control flow has reconverged, and we have a uniform warp, executing in lock step. So: - the load will load the same value into the result register across the warp, - the add will write the same value into the result register across the warp, - the store will write the same value to the same memory location, 32 times, at once, having the result of a single store. So, no side-effect multiplication (well, at least that's the observation). Starting sm_70, the threads in a warp are no longer guaranteed to reconverge after divergence. There's a "Convergence Optimizer" that can can identify that it is safe for a warp to reconverge, but that works only as long as the code does not contain "synchronizing operations". Consequently, the ld, add, store sequence can be executed by a non-uniform warp, which means the side effects can have multiplied, and the registers are no longer guarantueed to be in sync. The atomic update in the example above is translated using an atom.cas loop, which means that we have divergence (because only one thread is allowed to succeed at a time) and the "Convergence Optimizer" doesn't reconverge probably because the atom.cas counts as a "synchronizing operation". So, it seems plausible that the root cause for the mentioned hang is the problem described above. Fix this by adding an explicit warp sync at simt exit. Note that we're assuming here that the warp will stay uniform until the next SIMT region entry. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-03-09 Tom de Vries <tdevries@suse.de> PR target/104916 PR target/104783 * config/nvptx/nvptx.md (define_expand "omp_simt_exit"): Emit warp sync (or uniform warp check for mptx < 6.0). libgomp/ChangeLog: 2022-03-15 Tom de Vries <tdevries@suse.de> PR target/104916 PR target/104783 * testsuite/libgomp.c/pr104783-2.c: New test.
2022-03-22tree-optimization/105012 - fix ICE from local DSE of if-conversionRichard Biener1-1/+1
The following guards dse_classify_store with the same condition as the DSE pass does - availability of a virtual definition. For the PR we run into the fortran frontend generating a clobber for a FUNCTION_DECL lhs which is ignored by the operand scanner and has no virtual operands assigned. Apart from fixing the frontend the following fixes the ICE by adjusting if-conversion. 2022-03-22 Richard Biener <rguenther@suse.de> PR tree-optimization/105012 * tree-if-conv.cc (ifcvt_local_dce): Only call dse_classify_store when we have a VDEF.
2022-03-22nvptx: fix wrapping in an error message.Martin Liska1-2/+2
PR target/104902 gcc/ChangeLog: * config/nvptx/nvptx.cc (handle_ptx_version_option): Fix option wrapping in an error message.
2022-03-22rs6000: wrap const in an error message.Martin Liska1-2/+2
PR target/104903 gcc/ChangeLog: * config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin): Wrap const keyword.
2022-03-22v850: fix typo in pragma nameMartin Liska1-1/+1
PR target/104904 gcc/ChangeLog: * config/v850/v850-c.cc (pop_data_area): Fix typo in pragma name.
2022-03-22rs6000: update error message format.Martin Liska1-1/+1
PR target/104898 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_option_override_internal): Use %qs instead of (%qs).
2022-03-22i386: update error message format.Martin Liska5-8/+8
Use '%qs' instead of '(%qs)'. PR target/104898 gcc/ChangeLog: * config/i386/i386-options.cc (ix86_option_override_internal): Use '%qs' instead of '(%qs)'. gcc/testsuite/ChangeLog: * gcc.target/i386/pr99753.c: Update test. * gcc.target/i386/spellcheck-options-1.c: Likewise. * gcc.target/i386/spellcheck-options-2.c: Likewise. * gcc.target/i386/spellcheck-options-4.c: Likewise.
2022-03-22aarch64: update error message format.Martin Liska5-11/+11
Use 'qs' and remove usage '(%qs)'. PR target/104898 gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_handle_attr_arch): Use 'qs' and remove usage '(%qs)'. (aarch64_handle_attr_cpu): Likewise. (aarch64_handle_attr_tune): Likewise. (aarch64_handle_attr_isa_flags): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/branch-protection-attr.c: Use 'qs' and remove usage '(%qs)'. * gcc.target/aarch64/spellcheck_1.c: Likewise. * gcc.target/aarch64/spellcheck_2.c: Likewise. * gcc.target/aarch64/spellcheck_3.c: Likewise.
2022-03-22aarch64: Update regmove costs for neoverse-v1 and neoverse-512tvb tuningsAndre Vieira1-4/+14
This patch updates the register move tunings for -mcpu/-mtune={neoverse-v1,neoverse-512tvb}. gcc/ChangeLog: 2022-03-22 Tamar Christina <tamar.christina@arm.com> Andre Vieira <andre.simoesdiasvieira@arm.com> * config/aarch64/aarch64.cc (neoversev1_regmove_cost): New tuning struct. (neoversev1_tunings): Use neoversev1_regmove_cost and update store_int cost. (neoverse512tvb_tunings): Likewise.
2022-03-22aarch64: Add Demeter tuning structsAndre Vieira3-2/+222
This patch adds tuning structs for -mcpu/-mtune=demeter. gcc/ChangeLog: 2022-03-22 Tamar Christina <tamar.christina@arm.com> Andre Vieira <andre.simoesdiasvieira@arm.com> * config/aarch64/aarch64.cc (demeter_addrcost_table, demeter_regmove_cost, demeter_advsimd_vector_cost, demeter_sve_vector_cost, demeter_scalar_issue_info, demeter_advsimd_issue_info, demeter_sve_issue_info, demeter_vec_issue_info, demeter_vector_cost, demeter_tunings): New tuning structs. (aarch64_ve_op_count::rename_cycles_per_iter): Enable for demeter tuning. * config/aarch64/aarch64-cores.def: Add entry for demeter. * config/aarch64/aarch64-tune.md (tune): Add demeter to list.
2022-03-22aarch64: Update reg-costs to differentiate between memmove costsAndre Vieira2-27/+188
This patch introduces a struct to differentiate between different memmove costs to enable a better modeling of memory operations. These have been modelled for -mcpu/-mtune=neoverse-v1/neoverse-n1/neoverse-n2/neoverse-512tvb, for all other tunings all entries are equal to the old single memmove cost to ensure the behaviour remains the same. 2022-03-16 Tamar Christina <tamar.christina@arm.com> Andre Vieira <andre.simoesdiasvieira@arm.com> gcc/ChangeLog: * config/aarch64/aarch64-protos.h (struct cpu_memmov_cost): New struct. (struct tune_params): Change type of memmov_cost to use cpu_memmov_cost. * config/aarch64/aarch64.cc (aarch64_memory_move_cost): Update all tunings to use cpu_memmov_cost struct.
2022-03-22aarch64: Add Neoverse-N2 tuning structsAndre Vieira1-5/+191
This patch adds tuning structures for Neoverse N2. 2022-03-22 Tamar Christina <tamar.christina@arm.com> Andre Vieira <andre.simoesdiasvieira@arm.com> * config/aarch64/aarch64.cc (neoversen2_addrcost_table, neoversen2_regmove_cost, neoversen2_advsimd_vector_cost, neoversen2_sve_vector_cost, neoversen2_scalar_issue_info, neoversen2_advsimd_issue_info, neoversen2_sve_issue_info, neoversen2_vec_issue_info, neoversen2_tunings): New structs. (neoversen2_tunings): Use new structs and update tuning flags. (aarch64_vec_op_count::rename_cycles_per_iter): Enable for neoversen2 tuning.
2022-03-22aarch64: Enable FP16 feature by default for Armv9Andre Vieira1-1/+2
This patch adds the feature bit for FP16 to the feature set for Armv9 since Armv9 requires SVE to be implemented and SVE requires FP16 to be implemented. 2022-03-22 Andre Vieira <andre.simoesdiasvieira@arm.com> * config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH9): Add FP16 feature bit.
2022-03-22Extend splitter pattern to reversed condition by swapping then and else rtx. ↵liuhongt1-6/+8
[PR target/104982] Failed to match this instruction: (set (reg/v:SI 88 [ z ]) (if_then_else:SI (eq (zero_extract:SI (reg:SI 92) (const_int 1 [0x1]) (zero_extend:SI (subreg:QI (reg:SI 93) 0))) (const_int 0 [0])) (reg:SI 95) (reg:SI 94))) but it's equal to (set (reg/v:SI 88 [ z ]) (if_then_else:SI (ne (zero_extract:SI (reg:SI 92) (const_int 1 [0x1]) (zero_extend:SI (subreg:QI (reg:SI 93) 0))) (const_int 0 [0])) (reg:SI 94) (reg:SI 95))) which is the exact existing splitter. The patch will fix below regressions: On x86-64, r12-7687 caused: FAIL: gcc.target/i386/bt-5.c scan-assembler-not sar[lq][ \t] FAIL: gcc.target/i386/bt-5.c scan-assembler-times bt[lq][ \t] 7 gcc/ChangeLog: PR target/104982 * config/i386/i386.md (*jcc_bt<mode>_mask): Extend the following splitter to reversed condition.
2022-03-22testsuite: Add testcase for no longer failing PR [PR102645]Jakub Jelinek1-0/+18
This test started ICEing with r12-3876 but stopped with r12-5264. 2022-03-22 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/102645 * gcc.c-torture/compile/pr102645.c: New test.
2022-03-22calls: Fix error recovery after sorry differently [PR104989]Jakub Jelinek2-1/+17
On Mon, Feb 28, 2022 at 07:52:56AM -0000, Roger Sayle wrote: > This patch resolves PR c++/84964 which is an ICE in the middle-end after > emitting a "sorry, unimplemented" message, and is a regression from > earlier releases of GCC. This issue is that after encountering a > function call requiring an unreasonable amount of stack space, the > code continues and falls foul of an assert checking that stack pointer > has been correctly updated. The fix is to (locally) consider aborted > function calls as "no return", which skips this downstream sanity check. As can be seen on PR104989, just setting ECF_NORETURN after sorry is quite risky and leads to other ICEs. The problem is that ECF_NORETURN calls better should be at the end of basic blocks that don't have any fallthru successor edges, otherwise we can ICE later. This patch instead sets sibcall_failure if in pass == 0 (sibcall_failure means that the tail call sequence is not useful/not desirable and throws it away) and otherwise sets a new bool variable that will let us pass the assertion and also throws away the whole call sequence, I think that is best for error recovery. 2022-03-22 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/104989 * calls.cc (expand_call): Don't set ECF_NORETURN in flags after sorry for passing too large argument, instead set sibcall_failure for pass == 0, or a new normal_failure flag otherwise. If normal_failure is set, don't assert all stack has been deallocated at the end and throw away the whole insn sequence. * g++.dg/other/pr104989.C: New test.
2022-03-22print-tree:Avoid warnings of overflowQian Jianhua1-2/+2
This patch avoids two warnings of "'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=]" when build GCC. Tested on x86_64, and committed as obvious. gcc/ChangeLog: * print-tree.cc: Change array length
2022-03-22AVX512FP16: Fix wrong code for _mm_mask_f[c]madd.*sch [PR 104978]Hongyu Wang6-72/+42
For complex scalar intrinsic like _mm_mask_fcmadd_sch, the mask should be and by 1 to ensure the mask is bind to lowest byte. Use masked vmovss to perform same operation which omits higher bits of mask. gcc/ChangeLog: PR target/104978 * config/i386/sse.md (avx512fp16_fmaddcsh_v8hf_mask1<round_expand_name): Use avx512f_movsf_mask instead of vmovaps or vblend, and force_reg before lowpart_subreg. (avx512fp16_fcmaddcsh_v8hf_mask1<round_expand_name): Likewise. gcc/testsuite/ChangeLog: PR target/104978 * gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c: Adjust asm scan. * gcc.target/i386/avx512fp16-vfmaddcsh-1a.c: Ditto. * gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c: Removed. * gcc.target/i386/avx512fp16-vfmaddcsh-1c.c: Ditto. * gcc.target/i386/pr104978.c: New test.
2022-03-22Daily bump.GCC Administrator7-1/+192
2022-03-21x86: Disable SSE in ISA2 for -mgeneral-regs-onlyH.J. Lu5-1/+45
Replace OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET in OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET to disable SSE, AVX and AVX512 ISAs. gcc/ PR target/105000 * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET): Replace OPTION_MASK_ISA2_AVX512F_UNSET with OPTION_MASK_ISA2_SSE_UNSET. gcc/testsuite/ PR target/105000 * gcc.target/i386/pr105000-1.c: New test. * gcc.target/i386/pr105000-2.c: Likewise. * gcc.target/i386/pr105000-3.c: Likewise. * gcc.target/i386/pr105000-4.c: Likewise.