aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-12-06ranger: Add shortcuts for single-successor blocksRichard Sandiford2-0/+6
When compiling an optabs.ii at -O2 with a release-checking build, there were 6,643,575 calls to gimple_outgoing_range_stmt_p. 96.8% of them were for blocks with a single successor, which never have a control statement that generates new range info. This patch therefore adds a shortcut for that case. This gives a ~1% compile-time improvement for the test. I tried making the function inline (in the header) so that the single_succ_p didn't need to be repeated, but it seemed to make things slightly worse. gcc/ * gimple-range-edge.cc (gimple_outgoing_range::edge_range_p): Add a shortcut for blocks with single successors. * gimple-range-gori.cc (gori_map::calculate_gori): Likewise.
2021-12-06ranger: Optimise irange_unionRichard Sandiford1-33/+13
When compiling an optabs.ii at -O2 with a release-checking build, the hottest function in the profile was irange_union. This patch tries to optimise it a bit. The specific changes are: - Use quick_push rather than safe_push, since the final number of entries is known in advance. - Avoid assigning wi::to_wide & co. to a temporary wide_int, such as in: wide_int val_j = wi::to_wide (res[j]); wi::to_wide returns a wide_int "view" of the in-place INTEGER_CST storage. Assigning the result to wide_int forces an unnecessary copy to temporary storage. This is one area where "auto" helps a lot. In the end though, it seemed more readable to inline the wi::to_*s rather than use auto. - Use to_widest_int rather than to_wide_int. Both are functionally correct, but to_widest_int is more efficient, for three reasons: - to_wide returns a wide-int representation in which the most significant element might not be canonically sign-extended. This is because we want to allow the storage of an INTEGER_CST like 0x1U << 31 to be accessed directly with both a wide_int view (where only 32 bits matter) and a widest_int view (where many more bits matter, and where the 32 bits are zero-extended to match the unsigned type). However, operating on uncanonicalised wide_int forms is less efficient than operating on canonicalised forms. - to_widest_int has a constant rather than variable precision and there are never any redundant upper bits to worry about. - Using widest_int avoids the need for an overflow check, since there is enough precision to add 1 to any IL constant without wrap-around. This gives a ~2% compile-time speed up with the test above. I also tried adding a path for two single-pair ranges, but it wasn't a win. gcc/ * value-range.cc (irange::irange_union): Use quick_push rather than safe_push. Use widest_int rather than wide_int. Avoid assigning wi::to_* results to wide*_int temporaries.
2021-12-06Use dominators to reduce cache-flling.Andrew MacLeod2-0/+74
Before walking the CFG and filling all cache entries, check if the same information is available in a dominator. * gimple-range-cache.cc (ranger_cache::fill_block_cache): Check for a range from dominators before filling the cache. (ranger_cache::range_from_dom): New. * gimple-range-cache.h (ranger_cache::range_from_dom): Add prototype.
2021-12-06Add BB option for outgoing_edge_range_p and may_reocmpute_p.Andrew MacLeod2-29/+51
There are times we only need to know if any edge from a block can calculate a range. * gimple-range-gori.h (class gori_compute):: Add prototypes. * gimple-range-gori.cc (gori_compute::has_edge_range_p): Add alternate API for basic block. Call for edge alterantive. (gori_compute::may_recompute_p): Ditto.
2021-12-06libsanitizer: Update LOCAL_PATCHESH.J. Lu1-0/+1
Add commit 70b043845d7c378c6a9361a6769885897d1018c2 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Nov 30 05:31:26 2021 -0800 libsanitizer: Use SSE to save and restore XMM registers to LOCAL_PATCHES. * LOCAL_PATCHES: Add commit 70b043845d7.
2021-12-06libsanitizer: Use SSE to save and restore XMM registersH.J. Lu1-64/+64
Use SSE, instead of AVX, to save and restore XMM registers to support processors without AVX. The affected codes are unused in upstream since https://github.com/llvm/llvm-project/commit/66d4ce7e26a5 and will be removed in https://reviews.llvm.org/D112604 This fixed FAIL: g++.dg/tsan/pthread_cond_clockwait.C -O0 execution test FAIL: g++.dg/tsan/pthread_cond_clockwait.C -O2 execution test on machines without AVX. PR sanitizer/103466 * tsan/tsan_rtl_amd64.S (__tsan_trace_switch_thunk): Replace vmovdqu with movdqu. (__tsan_report_race_thunk): Likewise.
2021-12-06tree-optimization/103581 - fix masked gather on x86Richard Biener2-2/+61
The recent fix to PR103527 exposed an issue with how the various special casing for AVX512 masks in vect_build_gather_load_calls are handled. The following makes that more obvious, fixing the miscompile of 403.gcc. 2021-12-06 Richard Biener <rguenther@suse.de> PR tree-optimization/103581 * tree-vect-stmts.c (vect_build_gather_load_calls): Properly guard all the AVX512 mask cases. * gcc.dg/vect/pr103581.c: New testcase.
2021-12-06contrib: Filter out -Wreturn-type in fold-const-call.c.Martin Liska1-0/+1
contrib/ChangeLog: * filter-clang-warnings.py: Filter out one warning.
2021-12-06tree-optimization/103544 - SLP reduction chain as SLP reduction issueRichard Biener2-3/+33
When SLP reduction chain vectorization support added handling of an outer conversion in the chain picking a failed reduction up as SLP reduction that broke the invariant that the whole reduction was forward reachable. The following plugs that hole noting a future enhancement possibility. 2021-12-06 Richard Biener <rguenther@suse.de> PR tree-optimization/103544 * tree-vect-slp.c (vect_analyze_slp): Only add a SLP reduction opportunity if the stmt in question is the reduction root. (dot_slp_tree): Add missing check for NULL child. * gcc.dg/vect/pr103544.c: New testcase.
2021-12-06avr: Fix AVR build [PR71934]Jakub Jelinek1-2/+2
On Mon, Dec 06, 2021 at 11:00:30AM +0100, Martin Liška wrote: > Jakub, I think the patch broke avr-linux target: > > g++ -fno-PIE -c -g -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-erro > /home/marxin/Programming/gcc/gcc/config/avr/avr.c: In function ‘void avr_output_data_section_asm_op(const void*)’: > /home/marxin/Programming/gcc/gcc/config/avr/avr.c:10097:26: error: invalid conversion from ‘const void*’ to ‘const char*’ [-fpermissive] This patch fixes that. 2021-12-06 Jakub Jelinek <jakub@redhat.com> PR pch/71934 * config/avr/avr.c (avr_output_data_section_asm_op, avr_output_bss_section_asm_op): Change argument type from const void * to const char *.
2021-12-06cse: Make sure duplicate elements are not entered into the equivalence set ↵Tamar Christina2-1/+38
[PR103404] CSE uses equivalence classes to keep track of expressions that all have the same values at the current point in the program. Normal equivalences through SETs only insert and perform lookups in this set but equivalence determined from comparisons, e.g. (insn 46 44 47 7 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 105 [ iD.2893 ]) (const_int 0 [0]))) "cse.c":18:22 7 {*cmpsi_ccno_1} (expr_list:REG_DEAD (reg:SI 105 [ iD.2893 ]) (nil))) creates the equivalence EQ on (reg:SI 105 [ iD.2893 ]) and (const_int 0 [0]). This causes a merge to happen between the two equivalence sets denoted by (const_int 0 [0]) and (reg:SI 105 [ iD.2893 ]) respectively. The operation happens through merge_equiv_classes however this function has an invariant that the classes to be merge not contain any duplicates. This is because it frees entries before merging. The given testcase when using the supplied flags trigger an ICE due to the equivalence set being (rr) p dump_class (class1) Equivalence chain for (reg:SI 105 [ iD.2893 ]): (reg:SI 105 [ iD.2893 ]) $3 = void (rr) p dump_class (class2) Equivalence chain for (const_int 0 [0]): (const_int 0 [0]) (reg:SI 97 [ _10 ]) (reg:SI 97 [ _10 ]) $4 = void This happens because the original INSN being recorded is (insn 18 17 24 2 (set (subreg:V1SI (reg:SI 97 [ _10 ]) 0) (const_vector:V1SI [ (const_int 0 [0]) ])) "cse.c":11:9 1363 {*movv1si_internal} (expr_list:REG_UNUSED (reg:SI 97 [ _10 ]) (nil))) and we end up generating two equivalences. the first one is simply that reg:SI 97 is 0. The second one is that 0 can be extracted from the V1SI, so subreg (subreg:V1SI (reg:SI 97) 0) 0 == 0. This nested subreg gets folded away to just reg:SI 97 and we re-insert the same equivalence. This patch changes it so that if the nunits of a subreg is 1 then don't generate a vec_select from the subreg as the subreg will be folded away and we get a dup. gcc/ChangeLog: PR rtl-optimization/103404 * cse.c (find_sets_in_insn): Don't select elements out of a V1 mode subreg. gcc/testsuite/ChangeLog: PR rtl-optimization/103404 * gcc.target/i386/pr103404.c: New test.
2021-12-06Prefer INT_SSE_REGS for SSE_FLOAT_MODE_P in preferred_reload_class.liuhongt3-2/+38
When moves between integer and sse registers are cheap. 2021-12-06 Hongtao Liu <Hongtao.liu@intel.com> Uroš Bizjak <ubizjak@gmail.com> gcc/ChangeLog: PR target/95740 * config/i386/i386.c (ix86_preferred_reload_class): Allow integer regs when moves between register units are cheap. * config/i386/i386.h (INT_SSE_CLASS_P): New. gcc/testsuite/ChangeLog: * gcc.target/i386/pr95740.c: New test.
2021-12-06RISC-V: jal cannot refer to a default visibility symbol for shared object.Nelson Chu2-7/+14
This is the original binutils bugzilla report, https://sourceware.org/bugzilla/show_bug.cgi?id=28509 And this is the first version of the proposed binutils patch, https://sourceware.org/pipermail/binutils/2021-November/118398.html After applying the binutils patch, I get the the unexpected error when building libgcc, /scratch/nelsonc/riscv-gnu-toolchain/riscv-gcc/libgcc/config/riscv/div.S:42: /scratch/nelsonc/build-upstream/rv64gc-linux/build-install/riscv64-unknown-linux-gnu/bin/ld: relocation R_RISCV_JAL against `__udivdi3' which may bind externally can not be used when making a shared object; recompile with -fPIC Therefore, this patch add an extra hidden alias symbol for __udivdi3, and then use HIDDEN_JUMPTARGET to target a non-preemptible symbol instead. The solution is similar to glibc as follows, https://sourceware.org/git/?p=glibc.git;a=commit;h=68389203832ab39dd0dbaabbc4059e7fff51c29b libgcc/ChangeLog: * config/riscv/div.S: Add the hidden alias symbol for __udivdi3, and then use HIDDEN_JUMPTARGET to target it since it is non-preemptible. * config/riscv/riscv-asm.h: Added new macros HIDDEN_JUMPTARGET and HIDDEN_DEF.
2021-12-06Daily bump.GCC Administrator3-1/+14
2021-12-05Objective-C, NeXT: Reorganise meta-data declarations.Iain Sandoe4-23/+6
This moves the GTY declaration of the meta-data indentifier array into the header that enumerates these and provides shorthand defines for them. This avoids a problem seen with a relocatable PCH implementation. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> gcc/objc/ChangeLog: * objc-next-metadata-tags.h (objc_rt_trees): Declare here. * objc-next-runtime-abi-01.c: Remove from here. * objc-next-runtime-abi-02.c: Likewise. * objc-runtime-shared-support.c: Reorder headers, provide a GTY declaration the definition of objc_rt_trees.
2021-12-04aix: Move AIX math builtins before new builtin machinery.David Edelsohn1-23/+23
The new builtin machinery has an early exit, so move the AIX-specific builtins before the new machinery. gcc/ChangeLog: * config/rs6000/rs6000-call.c (rs6000_init_builtins): Move AIX math builtin initialization before new_builtins_are_live.
2021-12-05Daily bump.GCC Administrator8-1/+103
2021-12-04c++: Add fixed test [PR93614]Marek Polacek1-0/+17
This was fixed by r11-86. PR c++/93614 gcc/testsuite/ChangeLog: * g++.dg/template/lookup18.C: New test.
2021-12-04Fortran/OpenMP: Support most of 5.1 atomic extensionsTobias Burnus21-262/+1250
Implements moste of OpenMP 5.1 atomic extensions, except that 'compare' is parsed but rejected during resolution. (As the trans-openmp.c handling is missing.) gcc/fortran/ChangeLog: * dump-parse-tree.c (show_omp_clauses): Handle weak/compare/fail clause. * gfortran.h (gfc_omp_clauses): Add weak, compare, fail. * openmp.c (enum omp_mask1, gfc_match_omp_clauses, OMP_ATOMIC_CLAUSES): Update for new clauses. (gfc_match_omp_atomic): Update for 5.1 atomic changes. (is_conversion): Support widening in one go. (is_scalar_intrinsic_expr): New. (resolve_omp_atomic): Update for 5.1 atomic changes. * parse.c (parse_omp_oacc_atomic): Update for compare. * resolve.c (gfc_resolve_blocks): Update asserts. * trans-openmp.c (gfc_trans_omp_atomic): Handle new clauses. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/atomic-2.f90: Move now supported code to ... * gfortran.dg/gomp/atomic.f90: here. * gfortran.dg/gomp/atomic-10.f90: New test. * gfortran.dg/gomp/atomic-12.f90: New test. * gfortran.dg/gomp/atomic-15.f90: New test. * gfortran.dg/gomp/atomic-16.f90: New test. * gfortran.dg/gomp/atomic-17.f90: New test. * gfortran.dg/gomp/atomic-18.f90: New test. * gfortran.dg/gomp/atomic-19.f90: New test. * gfortran.dg/gomp/atomic-20.f90: New test. * gfortran.dg/gomp/atomic-22.f90: New test. * gfortran.dg/gomp/atomic-24.f90: New test. * gfortran.dg/gomp/atomic-25.f90: New test. * gfortran.dg/gomp/atomic-26.f90: New test. libgomp/ChangeLog * libgomp.texi (OpenMP 5.1): Update status.
2021-12-04libstdc++: Initialize member in std::match_results [PR103549]Jonathan Wakely1-2/+2
This fixes a -Wuninitialized warning for std::cmatch m1, m2; m1=m2; Also name the template parameters in the forward declaration, to get rid of the <template-parameter-1-1> noise in diagnostics. libstdc++-v3/ChangeLog: PR libstdc++/103549 * include/bits/regex.h (match_results): Give names to template parameters in first declaration. (match_results::_M_begin): Add default member-initializer.
2021-12-04libgomp.texi: Update OMP_PLACESTobias Burnus1-11/+19
libgomp/ChangeLog: * libgomp.texi (OMP_PLACES): Extend description for OMP 5.1 changes.
2021-12-04i386, ipa-modref: Comment spelling fixJakub Jelinek2-3/+3
This patch fixes spelling of prefer (misspelled as preffer). 2021-12-04 Jakub Jelinek <jakub@redhat.com> * config/i386/x86-tune.def (X86_TUNE_PARTIAL_REG_DEPENDENCY): Fix comment typo, Preffer -> prefer. * ipa-modref-tree.c (modref_access_node::closer_pair_p): Likewise.
2021-12-04c++: Allow indeterminate unsigned char or std::byte in bit_cast - P1272R4Jakub Jelinek7-2/+411
P1272R4 has added to the std::byteswap new stuff to me quite unrelated clarification for std::bit_cast. The patch treats it as DR, applying to all languages. We no longer diagnose if padding bits are stored into unsigned char or std::byte result, fields or bitfields, instead arrange for that result, those fields or bitfields to get indeterminate value (empty CONSTRUCTOR with CONSTRUCTOR_NO_ZEROING or just leaving the member's initializer out and setting CONSTRUCTOR_NO_ZEROING on parent). We still have a bug that we don't diagnose in lots of places lvalue-to-rvalue conversions of indeterminate values or class objects with some indeterminate members. 2021-12-04 Jakub Jelinek <jakub@redhat.com> * cp-tree.h (is_byte_access_type_not_plain_char): Declare. * tree.c (is_byte_access_type_not_plain_char): New function. * constexpr.c (clear_uchar_or_std_byte_in_mask): New function. (cxx_eval_bit_cast): Don't error about padding bits if target type is unsigned char or std::byte, instead return no clearing ctor. Use clear_uchar_or_std_byte_in_mask. * g++.dg/cpp2a/bit-cast11.C: New test. * g++.dg/cpp2a/bit-cast12.C: New test. * g++.dg/cpp2a/bit-cast13.C: New test. * g++.dg/cpp2a/bit-cast14.C: New test.
2021-12-04libcpp: Fix up handling of deferred pragmas [PR102432]Jakub Jelinek3-1/+61
The https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557903.html change broke the following testcases. The problem is when a pragma namespace allows expansion (i.e. p->is_nspace && p->allow_expansion), e.g. the omp or acc namespaces do, then when parsing the second pragma token we do it with pfile->state.in_directive set, pfile->state.prevent_expansion clear and pfile->state.in_deferred_pragma clear (the last one because we don't know yet if it will be a deferred pragma or not). If the pragma line only contains a single name and newline after it, and there exists a function-like macro with the same name, the preprocessor needs to peek in funlike_invocation_p the next token whether it isn't ( but in this case it will see a newline. As pfile->state.in_directive is set, we don't read anything after the newline, pfile->buffer->need_line is set and CPP_EOF is lexed, which funlike_invocation_p doesn't push back. Because name is a function-like macro and on the pragma line there is no ( after the name, it isn't expanded, and control flow returns to do_pragma. If name is valid deferred pragma, we set pfile->state.in_deferred_pragma (and really need it set so that e.g. end_directive later on doesn't eat all the tokens from the pragma line). Before Nathan's change (which unfortunately didn't contain rationale on why it is better to do it like that), this wasn't a problem, next _cpp_lex_direct called when we want next token would return CPP_PRAGMA_EOF when it saw buffer->need_line, which would turn off pfile->state.in_deferred_pragma and following get token would already read the next line. But Nathan's patch replaced it with an assertion failure that now triggers and CPP_PRAGMA_EOL is done only when lexing the '\n'. Except for this special case that works fine, but in this case it doesn't because when peeking the token we still didn't know that it will be a deferred pragma. I've tried to fix that up in do_pragma by detecting this and pushing CPP_PRAGMA_EOL as lookahead, but that doesn't work because end_directive still needs to see pfile->state.in_deferred_pragma set. So, this patch affectively reverts part of Nathan's change, CPP_PRAGMA_EOL addition isn't done only when parsing the '\n', but is now done in both places, in the first one instead of the assertion failure. 2021-12-04 Jakub Jelinek <jakub@redhat.com> PR preprocessor/102432 * lex.c (_cpp_lex_direct): If buffer->need_line while pfile->state.in_deferred_pragma, return CPP_PRAGMA_EOL token instead of assertion failure. * c-c++-common/gomp/pr102432.c: New test. * c-c++-common/goacc/pr102432.c: New test.
2021-12-04[PR103028] test ifcvt trap_if seq more strictly after reloadAlexandre Oliva2-1/+24
When -fif-conversion2 is enabled, we attempt to replace conditional branches around unconditional traps with conditional traps. That canonicalizes compares, which may change an immediate that barely fits into one that doesn't. The compare for the trap is first checked using the predicates of cbranch predicates, and then, compare and conditional trap insns are emitted and recognized. In the failing s390x testcase, i <=u 0xffff_ffff is canonicalized into i <u 0x1_0000_0000, and the latter immediate doesn't fit. The insn predicates (both cbranch and cmpdi_ccu) happily accept it, since the register allocator has no trouble getting them into registers. The problem is that ifcvt2 runs after reload, so we recognize the compare insn successfully, but later on we barf when we find that none of the constraints fit. This patch arranges for the trap_if-issuing bits in ifcvt to validate post-reload insns using a stricter test that also checks that operands fit the constraints. for gcc/ChangeLog PR rtl-optimization/103028 * ifcvt.c (find_cond_trap): Validate new insns more strictly after reload. for gcc/testsuite/ChangeLog PR rtl-optimization/103028 * gcc.dg/pr103028.c: New.
2021-12-03testsuite: powerpc/vec_reve_1.c requires VSX.David Edelsohn1-2/+2
vector long long int and vector double require VSX not just Altivec. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vec_reve_1.c: Require VSX.
2021-12-04Daily bump.GCC Administrator8-1/+342
2021-12-03libstdc++: Simplify emplace member functions in _Rb_treeJonathan Wakely1-70/+78
This introduces a new RAII type to simplify the emplace members which currently use try-catch blocks to deallocate a node if an exception is thrown by the comparisons done during insertion. The new type is created on the stack and manages the allocation of a new node and deallocates it in the destructor if it wasn't inserted into the tree. It also provides helper functions for doing the insertion, releasing ownership of the node to the tree. Also, we don't need to use long qualified names if we put the return type after the nested-name-specifier. libstdc++-v3/ChangeLog: * include/bits/stl_tree.h (_Rb_tree::_Auto_node): Define new RAII helper for creating and inserting new nodes. (_Rb_tree::_M_insert_node): Use trailing-return-type to simplify out-of-line definition. (_Rb_tree::_M_insert_lower_node): Likewise. (_Rb_tree::_M_insert_equal_lower_node): Likewise. (_Rb_tree::_M_emplace_unique): Likewise. Use _Auto_node. (_Rb_tree::_M_emplace_equal): Likewise. (_Rb_tree::_M_emplace_hint_unique): Likewise. (_Rb_tree::_M_emplace_hint_equal): Likewise.
2021-12-03c++: avoid redundant scope in diagnosticsJason Merrill2-1/+21
We can make some function signatures shorter to print by omitting redundant nested-name-specifiers in the rest of the declarator. gcc/cp/ChangeLog: * error.c (current_dump_scope): New variable. (dump_scope): Check it. (dump_function_decl): Set it. gcc/testsuite/ChangeLog: * g++.dg/diagnostic/scope1.C: New test.
2021-12-03Fix typos in libstdc++-v3/ChangeLogJonathan Wakely1-2/+2
2021-12-03rs6000: Fix up flag_shrink_wrap handling in presence of -mrop-protect [PR101324]Martin Liska2-4/+21
PR101324 shows a problem in disabling shrink-wrapping when using -mrop-protect when there is a attribute optimize/pragma. The fix envolves moving the handling of flag_shrink_wrap so it gets re-disbled when we change or add options. 2021-12-03 Martin Liska <mliska@suse.cz> gcc/ PR target/101324 * config/rs6000/rs6000.c (rs6000_option_override_internal): Move the disabling of shrink-wrapping when using -mrop-protect from here... (rs6000_override_options_after_change): ...to here. 2021-12-03 Peter Bergner <bergner@linux.ibm.com> gcc/testsuite/ PR target/101324 * gcc.target/powerpc/pr101324.c: New test.
2021-12-03rs6000: testsuite: Add rop_ok effective-target functionPeter Bergner6-5/+12
This patch adds a new effective-target function that tests whether it is safe to emit the ROP-protect instructions and updates the ROP test cases to use it. 2021-12-03 Peter Bergner <bergner@linux.ibm.com> gcc/testsuite/ * lib/target-supports.exp (check_effective_target_rop_ok): New function. * gcc.target/powerpc/rop-1.c: Use it. * gcc.target/powerpc/rop-2.c: Likewise. * gcc.target/powerpc/rop-3.c: Likewise. * gcc.target/powerpc/rop-4.c: Likewise. * gcc.target/powerpc/rop-5.c: Likewise.
2021-12-03Fortran: improve checking of array specificationsHarald Anlauf4-0/+39
gcc/fortran/ChangeLog: PR fortran/103505 * array.c (match_array_element_spec): Try to simplify array element specifications to improve early checking. * expr.c (gfc_try_simplify_expr): New. Try simplification of an expression via gfc_simplify_expr. When an error occurs, roll back. * gfortran.h (gfc_try_simplify_expr): Declare it. gcc/testsuite/ChangeLog: PR fortran/103505 * gfortran.dg/pr103505.f90: New test. Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
2021-12-03c++: Fix for decltype(auto) and parenthesized expr [PR103403]Marek Polacek7-20/+133
In r11-4758, I tried to fix this problem: int &&i = 0; decltype(auto) j = i; // should behave like int &&j = i; error wherein do_auto_deduction was getting confused with a REFERENCE_REF_P and it didn't realize its operand was a name, not an expression, and deduced the wrong type. Unfortunately that fix broke this: int&& r = 1; decltype(auto) rr = (r); where 'rr' should be 'int &' since '(r)' is an expression, not a name. But because I stripped the INDIRECT_REF with the r11-4758 change, we deduced 'rr's type as if decltype had gotten a name, resulting in 'int &&'. I suspect I thought that the REF_PARENTHESIZED_P check when setting 'bool id' in do_auto_deduction would handle the (r) case, but that's not the case; while the documentation for REF_PARENTHESIZED_P specifically says it can be set in INDIRECT_REF, we don't actually do so. This patch sets REF_PARENTHESIZED_P even on REFERENCE_REF_P, so that do_auto_deduction can use it. It also removes code in maybe_undo_parenthesized_ref that I think is dead -- and we don't hit it while running dg.exp. To adduce more data, it also looks dead here: https://splichal.eu/lcov/gcc/cp/semantics.c.gcov.html (It's dead since r9-1417.) Also add a fixed test for c++/81176. PR c++/103403 gcc/cp/ChangeLog: * cp-gimplify.c (cp_fold): Don't recurse if maybe_undo_parenthesized_ref doesn't change its argument. * pt.c (do_auto_deduction): Don't strip REFERENCE_REF_P trees if they are REF_PARENTHESIZED_P. Use stripped_init when checking for id-expression. * semantics.c (force_paren_expr): Set REF_PARENTHESIZED_P on REFERENCE_REF_P trees too. (maybe_undo_parenthesized_ref): Remove dead code. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/decltype-auto2.C: New test. * g++.dg/cpp1y/decltype-auto3.C: New test. * g++.dg/cpp1y/decltype-auto4.C: New test. * g++.dg/cpp1z/decomp-decltype1.C: New test.
2021-12-03x86: Add -mmove-max=bits and -mstore-max=bitsH.J. Lu17-18/+276
Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move and store, independent of -mprefer-vector-width=bits: 1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX512_STORE_BY_PIECES which are enabled for Intel Sapphire Rapids processor. 2. Add -mmove-max=bits to set the maximum number of bits can be moved from memory to memory efficiently. The default value is derived from X86_TUNE_AVX512_MOVE_BY_PIECES, X86_TUNE_AVX256_MOVE_BY_PIECES, and the preferred vector width. 3. Add -mstore-max=bits to set the maximum number of bits can be stored to memory efficiently. The default value is derived from X86_TUNE_AVX512_STORE_BY_PIECES, X86_TUNE_AVX256_STORE_BY_PIECES and the preferred vector width. gcc/ PR target/103269 * config/i386/i386-expand.c (ix86_expand_builtin): Pass PVW_NONE and PVW_NONE to ix86_target_string. * config/i386/i386-options.c (ix86_target_string): Add arguments for move_max and store_max. (ix86_target_string::add_vector_width): New lambda. (ix86_debug_options): Pass ix86_move_max and ix86_store_max to ix86_target_string. (ix86_function_specific_print): Pass ptr->x_ix86_move_max and ptr->x_ix86_store_max to ix86_target_string. (ix86_valid_target_attribute_tree): Handle x_ix86_move_max and x_ix86_store_max. (ix86_option_override_internal): Set the default x_ix86_move_max and x_ix86_store_max. * config/i386/i386-options.h (ix86_target_string): Add prefer_vector_width and prefer_vector_width. * config/i386/i386.h (TARGET_AVX256_MOVE_BY_PIECES): Removed. (TARGET_AVX256_STORE_BY_PIECES): Likewise. (MOVE_MAX): Use 64 if ix86_move_max or ix86_store_max == PVW_AVX512. Use 32 if ix86_move_max or ix86_store_max >= PVW_AVX256. (STORE_MAX_PIECES): Use 64 if ix86_store_max == PVW_AVX512. Use 32 if ix86_store_max >= PVW_AVX256. * config/i386/i386.opt: Add -mmove-max=bits and -mstore-max=bits. * config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES): New. (X86_TUNE_AVX512_STORE_BY_PIECES): Likewise. * doc/invoke.texi: Document -mmove-max=bits and -mstore-max=bits. gcc/testsuite/ PR target/103269 * gcc.target/i386/pieces-memcpy-17.c: New test. * gcc.target/i386/pieces-memcpy-18.c: Likewise. * gcc.target/i386/pieces-memcpy-19.c: Likewise. * gcc.target/i386/pieces-memcpy-20.c: Likewise. * gcc.target/i386/pieces-memcpy-21.c: Likewise. * gcc.target/i386/pieces-memset-45.c: Likewise. * gcc.target/i386/pieces-memset-46.c: Likewise. * gcc.target/i386/pieces-memset-47.c: Likewise. * gcc.target/i386/pieces-memset-48.c: Likewise. * gcc.target/i386/pieces-memset-49.c: Likewise.
2021-12-03rs6000: Fix use of wrong enum for built-in function codeBill Schmidt1-2/+2
I discovered this bug while working on patches to remove the old built-ins infrastructure. I missed a spot in converting from the rs6000_builtins enum to the rs6000_gen_builtins_enum. This fixes it. The fix is technically not right if new_builtins_are_enabled were to be set to zero, but we're not going to do that anymore, and the remnants of that code will be removed shortly. 2021-12-02 Bill Schmidt <wschmidt@linux.ibm.com> gcc/ * config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Fix builtin identifiers.
2021-12-03x86: Scan leal in PR target/83782 tests for x32H.J. Lu2-2/+2
Update PR target/83782 tests to scan leal for x32 to fix: FAIL: gcc.target/i386/pr83782-1.c scan-assembler leaq[ \\t]foo\\(%rip\\),[ \\t]%rax FAIL: gcc.target/i386/pr83782-2.c scan-assembler leaq[ \\t]foo\\(%rip\\),[ \\t]%rax PR target/83782 * gcc.target/i386/pr83782-1.c: Also scan leal x32. * gcc.target/i386/pr83782-2.c: Likewise.
2021-12-04RISC-V: Add implied defines of Zk, Zkn and ZksSiYu Wu2-2/+30
gcc/ChangeLog: 2021-11-22 SiYu Wu <siyu@isrc.iscas.ac.cn> * common/config/riscv/riscv-common.c (riscv_implied_info): Add K-ext related entry. (riscv_supported_std_ext): Add 'k'. * config/riscv/arch-canonicalize (CANONICAL_ORDER): Add 'k'. (IMPLIED_EXT): Add K-ext related entry.
2021-12-04RISC-V: Add option defines for Scalar CryptographySiYu Wu3-0/+47
gcc/ChangeLog: 2021-11-21 SiYu Wu <siyu@isrc.iscas.ac.cn> * common/config/riscv/riscv-common.c (riscv_ext_version_table): Add zbk* and zk*. * config/riscv/riscv-opts.h (MASK_ZBKB): New. (MASK_ZBKC): Ditto. (MASK_ZBKX): Ditto. (MASK_ZKNE): Ditto. (MASK_ZKND): Ditto. (MASK_ZKNH): Ditto. (MASK_ZKR): Ditto. (MASK_ZKSED): Ditto. (MASK_ZKSH): Ditto. (MASK_ZKT): Ditto. (TARGET_ZBKB): Ditto. (TARGET_ZBKC): Ditto. (TARGET_ZBKX): Ditto. (TARGET_ZKNE): Ditto. (TARGET_ZKND): Ditto. (TARGET_ZKNH): Ditto. (TARGET_ZKR): Ditto. (TARGET_ZKSED): Ditto. (TARGET_ZKSH): Ditto. (TARGET_ZKT): Ditto. * config/riscv/riscv.opt (riscv_zk_subext): New.
2021-12-03sve: combine nested if predicatesTamar Christina3-15/+82
The following example void f5(float * restrict z0, float * restrict z1, float *restrict x, float * restrict y, float c, int n) { for (int i = 0; i < n; i++) { float a = x[i]; float b = y[i]; if (a > b) { z0[i] = a + b; if (a > c) { z1[i] = a - b; } } } } generates currently: ptrue p3.b, all ld1w z1.s, p1/z, [x2, x5, lsl 2] ld1w z2.s, p1/z, [x3, x5, lsl 2] fcmgt p0.s, p3/z, z1.s, z0.s fcmgt p2.s, p1/z, z1.s, z2.s fcmgt p0.s, p0/z, z1.s, z2.s and p0.b, p0/z, p1.b, p1.b The conditions for a > b and a > c become separate comparisons. After this patch we generate: ld1w z1.s, p0/z, [x2, x5, lsl 2] ld1w z2.s, p0/z, [x3, x5, lsl 2] fcmgt p1.s, p0/z, z1.s, z2.s fcmgt p1.s, p1/z, z1.s, z0.s Where the condition a > b && a > c are folded by using the predicate result of the previous compare and thus allows the removal of one of the compares. When never a mask is being generated from an BIT_AND we mask the operands of the and instead and then just AND the result. This allows us to be able to CSE the masks and generate the right combination. However because re-assoc will try to re-order the masks in the & we have to now perform a small local CSE on the vectorized loop is vectorization is successful. Note: This patch series is working incrementally towards generating the most efficient code for this and other loops in small steps. gcc/ChangeLog: * tree-vect-stmts.c (prepare_load_store_mask): Rename to... (prepare_vec_mask): ...This and record operations that have already been masked. (vectorizable_call): Use it. (vectorizable_operation): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. * tree-vectorizer.h (class _loop_vec_info): Add vec_cond_masked_set. (vec_cond_masked_set_type, tree_cond_mask_hash): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/pred-combine-and.c: New test.
2021-12-03Add TARGET_IFUNC_REF_LOCAL_OKH.J. Lu9-3/+95
1. On some targets, like PowerPC, reference to ifunc function resolver must be non-local so that compiler will properly emit PLT call. Add TARGET_IFUNC_REF_LOCAL_OK to allow binding indirect function resolver locally for targets which don't require special PLT call sequence. 2. Add ix86_call_use_plt_p to call local ifunc function resolvers via PLT. gcc/ PR target/51469 PR target/83782 * target.def (ifunc_ref_local_ok): Add a target hook. * varasm.c (default_binds_local_p_3): Force indirect function resolver non-local only if targetm.ifunc_ref_local_ok returns false. * config/i386/i386-expand.c (ix86_expand_call): Call ix86_call_use_plt_p to check if PLT should be used. * config/i386/i386-protos.h (ix86_call_use_plt_p): New. * config/i386/i386.c (output_pic_addr_const): Call ix86_call_use_plt_p to check if "@PLT" is needed. (ix86_call_use_plt_p): New. (TARGET_IFUNC_REF_LOCAL_OK): New. * doc/tm.texi.in: Add TARGET_IFUNC_REF_LOCAL_OK. * doc/tm.texi: Regenerated. gcc/testsuite/ PR target/51469 PR target/83782 * gcc.target/i386/pr83782-1.c: New test. * gcc.target/i386/pr83782-2.c: Likewise.
2021-12-03testsuite: Fix up pr103456.c testcase [PR103456]Jakub Jelinek1-1/+1
ubsan.exp cycles through torture options, and that includes -O2 -flto -fno-fat-lto-objects. But with those options tree dump scans don't work for post-IPA passes, for dg-do compile tests nothing after IPA is done. So we get an unresolved testcase: gcc.dg/ubsan/pr103456.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects : dump file does not exist UNRESOLVED: gcc.dg/ubsan/pr103456.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-tree-dump-not objsz1 "maximum object size 0" Fixed by adding -ffat-lto-objects so that we perform the post-IPA passes. 2021-12-03 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/103456 * gcc.dg/ubsan/pr103456.c: Add -ffat-lto-objects to dg-options.
2021-12-03x86: Speed up target attribute handling by using a cacheJakub Jelinek3-2/+27
The target attribute handling is very expensive and for the common case from x86intrin.h where many functions get implicitly the same target attribute, we can speed up compilation a lot by caching it. The following patches both create a single entry cache, where they cache for a particular target attribute argument list the resulting DECL_FUNCTION_SPECIFIC_TARGET and DECL_FUNCTION_SPECIFIC_OPTIMIZATION values from ix86_valid_target_attribute_p and use the cache if the args are the same as last time and we start either from NULL values of those, or from the recorded values for those from last time. Compiling a simple: #include <x86intrin.h> int i; testcase with ./cc1 -quiet -O2 -isystem include/ test.c takes on my WS without the patches ~0.392s and with either of the patches ~0.182s, i.e. roughly half the time as before. For ./cc1plus -quiet -O2 -isystem include/ test.c it is slightly worse, the speed up is from ~0.613s to ~0.403s. The difference between the 2 patches is that the first one uses copy_list while the second one uses a vec, so I think the second one has the advantage of creating less GC garbage. I've verified both patches achieve the same content of those DECL_FUNCTION_SPECIFIC_TARGET and DECL_FUNCTION_SPECIFIC_OPTIMIZATION nodes as before on x86intrin.h by doing debug_tree on those and comparing the stderr from without these patches to with these patches. 2021-12-03 Jakub Jelinek <jakub@redhat.com> * attribs.h (simple_cst_list_equal): Declare. * attribs.c (simple_cst_list_equal): No longer static. * config/i386/i386-options.c (target_attribute_cache): New variable. (ix86_valid_target_attribute_p): Cache DECL_FUNCTION_SPECIFIC_TARGET and DECL_FUNCTION_SPECIFIC_OPTIMIZATION based on args.
2021-12-03pch: Add support for PCH for relocatable executables [PR71934]Jakub Jelinek15-51/+175
So, if we want to make PCH work for PIEs, I'd say we can: 1) add a new GTY option, say callback, which would act like skip for non-PCH and for PCH would make us skip it but remember for address bias translation 2) drop the skip for tree_translation_unit_decl::language 3) change get_unnamed_section to have const char * as last argument instead of const void *, change unnamed_section::data also to const char * and update everything related to that 4) maybe add a host hook whether it is ok to support binaries changing addresses (the only thing I'm worried is if some host that uses function descriptors allocates them dynamically instead of having them somewhere in the executable) 5) maybe add a gengtype warning if it sees in GTY tracked structure a function pointer without that new callback option Here is 1), 2), 3) implemented. Note, on stdc++.h.gch/O2g.gch there are just those 10 relocations without the second patch, with it a few more, but nothing huge. And for non-PIEs there isn't really any extra work on the load side except freading two scalar values and fseek. 2021-12-03 Jakub Jelinek <jakub@redhat.com> PR pch/71934 gcc/ * ggc.h (gt_pch_note_callback): Declare. * gengtype.h (enum typekind): Add TYPE_CALLBACK. (callback_type): Declare. * gengtype.c (dbgprint_count_type_at): Handle TYPE_CALLBACK. (callback_type): New variable. (process_gc_options): Add CALLBACK argument, handle callback option. (set_gc_used_type): Adjust process_gc_options caller, if callback, set type to &callback_type. (output_mangled_typename): Handle TYPE_CALLBACK. (walk_type): Likewise. Handle callback option. (write_types_process_field): Handle TYPE_CALLBACK. (write_types_local_user_process_field): Likewise. (write_types_local_process_field): Likewise. (write_root): Likewise. (dump_typekind): Likewise. (dump_type): Likewise. * gengtype-state.c (type_lineloc): Handle TYPE_CALLBACK. (state_writer::write_state_callback_type): New method. (state_writer::write_state_type): Handle TYPE_CALLBACK. (read_state_callback_type): New function. (read_state_type): Handle TYPE_CALLBACK. * ggc-common.c (callback_vec): New variable. (gt_pch_note_callback): New function. (gt_pch_save): Stream out gt_pch_save function address and relocation table. (gt_pch_restore): Stream in saved gt_pch_save function address and relocation table and apply relocations if needed. * doc/gty.texi (callback): Document new GTY option. * varasm.c (get_unnamed_section): Change callback argument's type and last argument's type from const void * to const char *. (output_section_asm_op): Change argument's type from const void * to const char *, remove unnecessary cast. * tree-core.h (struct tree_translation_unit_decl): Drop GTY((skip)) from language member. * output.h (unnamed_section_callback): Change argument type from const void * to const char *. (struct unnamed_section): Use GTY((callback)) instead of GTY((skip)) for callback member. Change data member type from const void * to const char *. (struct noswitch_section): Use GTY((callback)) instead of GTY((skip)) for callback member. (get_unnamed_section): Change callback argument's type and last argument's type from const void * to const char *. (output_section_asm_op): Change argument's type from const void * to const char *. * config/avr/avr.c (avr_output_progmem_section_asm_op): Likewise. Remove unneeded cast. * config/darwin.c (output_objc_section_asm_op): Change argument's type from const void * to const char *. * config/pa/pa.c (som_output_text_section_asm_op): Likewise. (som_output_comdat_data_section_asm_op): Likewise. * config/rs6000/rs6000.c (rs6000_elf_output_toc_section_asm_op): Likewise. (rs6000_xcoff_output_readonly_section_asm_op): Likewise. Instead of dereferencing directive hardcode variable names and decide based on whether directive is NULL or not. (rs6000_xcoff_output_readwrite_section_asm_op): Change argument's type from const void * to const char *. (rs6000_xcoff_output_tls_section_asm_op): Likewise. Instead of dereferencing directive hardcode variable names and decide based on whether directive is NULL or not. (rs6000_xcoff_output_toc_section_asm_op): Change argument's type from const void * to const char *. (rs6000_xcoff_asm_init_sections): Adjust get_unnamed_section callers. gcc/c-family/ * c-pch.c (struct c_pch_validity): Remove pch_init member. (pch_init): Don't initialize v.pch_init. (c_common_valid_pch): Don't warn and punt if .text addresses change. libcpp/ * include/line-map.h (class line_maps): Add GTY((callback)) to reallocator and round_alloc_size members.
2021-12-03fortran: Fix setting of array lower bound for named arraysChung-Lin Tang3-14/+35
This patch fixes a case of setting array low-bounds, found for particular uses of SOURCE=/MOLD=. This adjusts the relevant part in gfc_trans_allocate() to set e3_has_nodescriptor only for non-named arrays. 2021-12-03 Tobias Burnus <tobias@codesourcery.com> gcc/fortran/ChangeLog: * trans-stmt.c (gfc_trans_allocate): Set e3_has_nodescriptor to true only for non-named arrays. gcc/testsuite/ChangeLog: * gfortran.dg/allocate_with_source_26.f90: Adjust testcase. * gfortran.dg/allocate_with_mold_4.f90: New testcase.
2021-12-03Make sure that we get unique test names if several DejaGnu directives refer ↵Thomas Schwinge1-2/+12
to the same line [PR102735] gcc/testsuite/ PR testsuite/102735 * lib/gcc-dg.exp (process-message): Make sure that we get unique test names.
2021-12-03[Committed] New testcase for C++/71792, bitfields and autoAndrew Pinski1-0/+42
This testcase used to fail before GCC 6.4.0 due to the wrong type being used for auto when used with bitfields, the C++ front-end was using the "bitfield" type rather than the underlaying type. Committed the testcase after a quick check. PR c++/71792 gcc/testsuite/ChangeLog: * g++.dg/torture/pr71792.C: New test.
2021-12-02gcc: Fix "argument list too long" from install-pluginsRichard Purdie1-1/+1
When building in longer build paths (200+ characters), the "echo $(PLUGIN_HEADERS)" from the install-plugins target would cause an "argument list too long error" on some systems. Avoid this by calling make's sort function on the list which removes duplicates and stops the overflow from reaching the echo command. The original sort is left to handle the the .h and .def files. 2021-10-26 Richard Purdie <richard.purdie@linuxfoundation.org> gcc/ChangeLog: * Makefile.in: Fix "argument list too long" from install-plugins. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2021-12-02build: Implement --with-multilib-list for avr targetMatt Jacobson5-3/+27
gcc * config.gcc: For the AVR target, populate TM_MULTILIB_CONFIG. * config/avr/genmultilib.awk: Add ability to filter generated multilib list. * config/avr/t-avr: Pass TM_MULTILIB_CONFIG to genmultilib.awk. * configure.ac: Update help string for --with-multilib-list. * configure: Regenerate.
2021-12-03Daily bump.GCC Administrator12-1/+770