aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2020-11-19c++: Fix array new with value-initialization [PR97523]Marek Polacek3-1/+64
Since my r11-3092 the following is rejected with -std=c++20: struct T { explicit T(); }; void fn(int n) { new T[1](); } with "would use explicit constructor 'T::T()'". It is because since that change we go into the P1009 block in build_new (array_p is false, but nelts is non-null and we're in C++20). Since we only have (), we build a {} and continue to build_new_1, which then calls build_vec_init and then we error because the {} isn't CONSTRUCTOR_IS_DIRECT_INIT. For (), which is value-initializing, we want to do what we were doing before: pass empty init and let build_value_init take care of it. For various reasons I wanted to dig a little bit deeper into this, and as a result, I'm adding a test for [expr.new]/24 (and checked that out current behavior matches clang++). gcc/cp/ChangeLog: PR c++/97523 * init.c (build_new): When value-initializing an array new, leave the INIT as an empty vector. gcc/testsuite/ChangeLog: PR c++/97523 * g++.dg/expr/anew5.C: New test. * g++.dg/expr/anew6.C: New test.
2020-11-19c++: Fix crash with broken deduction from {} [PR97895]Marek Polacek2-4/+17
Unfortunately, the otherwise beautiful for (constructor_elt &elt : *CONSTRUCTOR_ELTS (init)) is not immune to an empty constructor, so we have to check CONSTRUCTOR_ELTS first. gcc/cp/ChangeLog: PR c++/97895 * pt.c (do_auto_deduction): Don't crash when the constructor has zero elements. gcc/testsuite/ChangeLog: PR c++/97895 * g++.dg/cpp0x/auto54.C: New test.
2020-11-19config: Add tests for modules-desired featuresNathan Sidwell3-18/+268
this adds configure tests for features that modules can take advantage of -- and if they are not present has reduced or fallback functionality. gcc/ * configure.ac: Add tests for fstatat, sighandler_t, O_CLOEXEC, unix-domain and ipv6 sockets. * config.in: Rebuilt. * configure: Rebuilt.
2020-11-19c++: Relax new assert [PR 97905]Nathan Sidwell2-3/+9
It turns out there are legitimate cases for the new decl to not have lang-specific. PR c++/97905 gcc/cp/ * decl.c (duplicate_decls): Relax new assert. gcc/testsuite/ * g++.dg/lookup/pr97905.C: New.
2020-11-19pru: Add builtins for HALT and LMBDDimitar Dimitrov7-6/+201
Add builtins for HALT and LMBD, per Texas Instruments document SPRUHV7C. Use the new LMBD pattern to define an expand for clz. Binutils [1] and sim [2] support for LMBD instruction are merged now. [1] https://sourceware.org/pipermail/binutils/2020-October/113901.html [2] https://sourceware.org/pipermail/gdb-patches/2020-November/173141.html gcc/ChangeLog: * config/pru/alu-zext.md: Add lmbd patterns for zero_extend variants. * config/pru/pru.c (enum pru_builtin): Add HALT and LMBD. (pru_init_builtins): Ditto. (pru_builtin_decl): Ditto. (pru_expand_builtin): Ditto. * config/pru/pru.h (CLZ_DEFINED_VALUE_AT_ZERO): Define PRU value for CLZ with zero value parameter. * config/pru/pru.md: Add halt, lmbd and clz patterns. * doc/extend.texi: Document PRU builtins. gcc/testsuite/ChangeLog: * gcc.target/pru/halt.c: New test. * gcc.target/pru/lmbd.c: New test.
2020-11-19vect: Add a “very cheap” cost modelRichard Sandiford11-10/+123
Currently we have three vector cost models: cheap, dynamic and unlimited. -O2 -ftree-vectorize uses “cheap” by default, but that's still relatively aggressive about peeling and aliasing checks, and can lead to significant code size growth. This patch adds an even more conservative choice, which for lack of imagination I've called “very cheap”. It only allows vectorisation if the vector code entirely replaces the scalar code. It also requires one iteration of the vector loop to pay for itself, regardless of how often the loop iterates. (If the vector loop needs multiple iterations to be beneficial then things are probably too close to call, and the conservative thing would be to stick with the scalar code.) The idea is that this should be suitable for -O2, although the patch doesn't change any defaults itself. I tested this by building and running a bunch of workloads for SVE, with three options: (1) -O2 (2) -O2 -ftree-vectorize -fvect-cost-model=very-cheap (3) -O2 -ftree-vectorize [-fvect-cost-model=cheap] All three builds used the default -msve-vector-bits=scalable and ran with the minimum vector length of 128 bits, which should give a worst-case bound for the performance impact. The workloads included a mixture of microbenchmarks and full applications. Because it's quite an eclectic mix, there's not much point giving exact figures. The aim was more to get a general impression. Code size growth with (2) was much lower than with (3). Only a handful of tests increased by more than 5%, and all of them were microbenchmarks. In terms of performance, (2) was significantly faster than (1) on microbenchmarks (as expected) but also on some full apps. Again, performance only regressed on a handful of tests. As expected, the performance of (3) vs. (1) and (3) vs. (2) is more of a mixed bag. There are several significant improvements with (3) over (2), but also some (smaller) regressions. That seems to be in line with -O2 -ftree-vectorize being a kind of -O2.5. The patch reorders vect_cost_model so that values are in order of increasing aggressiveness, which makes it possible to use range checks. The value 0 still represents “unlimited”, so “if (flag_vect_cost_model)” is still a meaningful check. gcc/ * doc/invoke.texi (-fvect-cost-model): Add a very-cheap model. * common.opt (fvect-cost-model=): Add very-cheap as a possible option. (fsimd-cost-model=): Likewise. (vect_cost_model): Add very-cheap. * flag-types.h (vect_cost_model): Add VECT_COST_MODEL_VERY_CHEAP. Put the values in order of increasing aggressiveness. * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use range checks when comparing against VECT_COST_MODEL_CHEAP. (vect_prune_runtime_alias_test_list): Do not allow any alias checks for the very-cheap cost model. * tree-vect-loop.c (vect_analyze_loop_costing): Do not allow any peeling for the very-cheap cost model. Also require one iteration of the vector loop to pay for itself. gcc/testsuite/ * gcc.dg/vect/vect-cost-model-1.c: New test. * gcc.dg/vect/vect-cost-model-2.c: Likewise. * gcc.dg/vect/vect-cost-model-3.c: Likewise. * gcc.dg/vect/vect-cost-model-4.c: Likewise. * gcc.dg/vect/vect-cost-model-5.c: Likewise. * gcc.dg/vect/vect-cost-model-6.c: Likewise.
2020-11-19AArch64: Add cost table for Cortex-A76Wilco Dijkstra2-3/+106
Add an initial cost table for Cortex-A76 - this is copied from cotexa57_extra_costs but updated based on the Optimization Guide. Use the new cost table on all Neoverse tunings and ensure the tunings are consistent for all. As a result more compact code is generated with more combined shift+alu operations. Eg. -mcpu=cortex-a76 will now merge the shifts in: int f(int x, int y) { return (x & y << 3) * (x | y << 3); } and w2, w0, w1, lsl 3 orr w0, w0, w1, lsl 3 mul w0, w2, w0 ret SPEC2017 codesize improves by 0.02% and SPECINT2017 shows 0.24% gain. 2020-11-18 Wilco Dijkstra <wdijkstr@arm.com> gcc/ * config/aarch64/aarch64.c (neoversen1_tunings): Use new cortexa76_extra_costs. (neoversev1_tunings): Likewise. (neoversen2_tunines): Likewise. * config/arm/aarch-cost-tables.h (cortexa76_extra_costs): add new costs.
2020-11-19AArch64: Improve inline memcpy expansionWilco Dijkstra1-36/+37
Improve the inline memcpy expansion. Use integer load/store for copies <= 24 bytes instead of SIMD. Set the maximum copy to expand to 256 by default, except that -Os or no Neon expands up to 128 bytes. When using LDP/STP of Q-registers, also use Q-register accesses for the unaligned tail, saving 2 instructions (eg. all sizes up to 48 bytes emit exactly 4 instructions). Cleanup code and comments. The codesize gain vs the GCC10 expansion is 0.05% on SPECINT2017. 2020-11-03 Wilco Dijkstra <wdijkstr@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_expand_cpymem): Cleanup code and comments, tweak expansion decisions and improve tail expansion.
2020-11-19Fix PR ada/97805Eric Botcazou1-0/+7
We need to include limits.h (or <climits>) in adaint.c because of LLONG_MIN. gcc/ada/ChangeLog: PR ada/97805 * adaint.c: Include climits in C++ and limits.h otherwise.
2020-11-19Fix bootstrapRichard Biener1-1/+1
This fixes a typo in the TREE_CODE compare which should compare against TYPE_DECL, not TYPE_NAME. 2020-11-19 Richard Biener <rguenther@suse.de> * fold-const.c (operand_compare::hash_operand): Fix typo.
2020-11-19Fix gcc.dg/pr97897.cRichard Biener1-0/+1
This adds dg-options "" to avoid the pedantic error on _Complex int. 2020-11-19 Richard Biener <rguenther@suse.de> * gcc.dg/pr97897.c: Add dg-options.
2020-11-19refactor reassocs get_rankRichard Biener1-22/+24
This refactors things so assigned ranks are dumped and the cache is consistently used also for PHIs. 2020-11-19 Richard Biener <rguenther@suse.de> * tree-ssa-reassoc.c (get_rank): Refactor to consistently use the cache and dump ranks assigned.
2020-11-19Fix operand_equal_p hash and copare of ODR_TYPE_REFJan Hubicka1-19/+30
* fold-const.c (operand_compare::operand_equal_p): More OBJ_TYPE_REF matching to correct place; drop OEP_ADDRESS_OF for TOKEN, OBJECT and class. (operand_compare::hash_operand): Hash ODR type for OBJ_TYPE_REF.
2020-11-19[3/3] [AArch64][vect] vec_widen_lshift patternJoel Hutton3-2/+131
Add aarch64 vec_widen_lshift_lo/hi patterns and fix bug it triggers in mid-end. This pattern takes one vector with N elements of size S, shifts each element left by the element width and stores the results as N elements of size 2*s (in 2 result vectors). The aarch64 backend implements this with the shll,shll2 instruction pair. gcc/ChangeLog: * config/aarch64/aarch64-simd.md: Add vec_widen_lshift_hi/lo<mode> patterns. * tree-vect-stmts.c (vectorizable_conversion): Fix for widen_lshift case. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-widen-lshift.c: New test.
2020-11-19[2/3] [vect] Add widening add, subtract patternsJoel Hutton13-4/+331
Add widening add, subtract patterns to tree-vect-patterns. Update the widened code of patterns that detect PLUS_EXPR to also detect WIDEN_PLUS_EXPR. These patterns take 2 vectors with N elements of size S and perform an add/subtract on the elements, storing the results as N elements of size 2*S (in 2 result vectors). This is implemented in the aarch64 backend as addl,addl2 and subl,subl2 respectively. Add aarch64 tests for patterns. gcc/ChangeLog: * doc/generic.texi: Document new widen_plus/minus_lo/hi tree codes. * doc/md.texi: Document new widenening add/subtract hi/lo optabs. * expr.c (expand_expr_real_2): Add widen_add, widen_subtract cases. * optabs-tree.c (optab_for_tree_code): Add case for widening optabs. * optabs.def (OPTAB_D): Define vectorized widen add, subtracts. * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, subtracts. * tree-inline.c (estimate_operator_cost): Add case for widening adds, subtracts. * tree-vect-generic.c (expand_vector_operations_1): Add case for widening adds, subtracts * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog pattern. (vect_recog_widen_sub_pattern): New recog pattern. (vect_recog_average_pattern): Update widened add code. (vect_recog_average_pattern): Update widened add code. * tree-vect-stmts.c (vectorizable_conversion): Add case for widened add, subtract. (supportable_widening_operation): Add case for widened add, subtract. * tree.def (WIDEN_PLUS_EXPR): New tree code. (WIDEN_MINUS_EXPR): New tree code. (VEC_WIDEN_ADD_HI_EXPR): New tree code. (VEC_WIDEN_PLUS_LO_EXPR): New tree code. (VEC_WIDEN_MINUS_HI_EXPR): New tree code. (VEC_WIDEN_MINUS_LO_EXPR): New tree code. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-widen-add.c: New test. * gcc.target/aarch64/vect-widen-sub.c: New test.
2020-11-19[1/3][aarch64] Add vec_widen patterns to aarch64Joel Hutton1-0/+47
Add widening add and subtract patterns to the aarch64 backend. These allow taking vectors of N elements of size S and performing and add/subtract on the high or low half widening the resulting elements and storing N/2 elements of size 2*S. These correspond to the addl,addl2,subl,subl2 instructions. gcc/ChangeLog: * config/aarch64/aarch64-simd.md: New patterns vec_widen_saddl_lo/hi_<mode>.
2020-11-19tree-optimization/97901 - ICE propagating out LC PHIsRichard Biener2-17/+20
We need to fold the stmt to canonicalize MEM_REFs which means we're back to using replace_uses_by. Which means we need dominators to not require a CFG cleanup upthread. 2020-11-19 Richard Biener <rguenther@suse.de> PR tree-optimization/97901 * tree-ssa-propagate.c (clean_up_loop_closed_phi): Compute dominators and use replace_uses_by. * gcc.dg/torture/pr97901.c: New testcase.
2020-11-19Enhance debug info for fixed-point typesEric Botcazou5-92/+27
The Ada language supports fixed-point types as first-class citizens so they need to be described as-is in the debug info. We devised the langhook get_fixed_point_type_info for this purpose a few years ago, but it comes with a limitation for the representation of the scale factor that we would need to lift in order to be able to represent more fixed-point types. gcc/ChangeLog: * dwarf2out.h (struct fixed_point_type_info) <scale_factor>: Turn numerator and denominator into a tree. * dwarf2out.c (base_type_die): In the case of a fixed-point type with arbitrary scale factor, call add_scalar_info on numerator and denominator to emit the appropriate attributes. gcc/ada/ChangeLog: * exp_dbug.adb (Is_Handled_Scale_Factor): Delete. (Get_Encoded_Name): Do not call it. * gcc-interface/decl.c (gnat_to_gnu_entity) <Fixed_Point_Type>: Tidy up and always use a meaningful description for arbitrary scale factors. * gcc-interface/misc.c (gnat_get_fixed_point_type_info): Remove obsolete block and adjust the description of the scale factor.
2020-11-19tree-optimization/97897 - complex lowering on abnormal edgesRichard Biener3-1/+23
This fixes complex lowering to not put constants into abnormal edge PHI values by making sure abnormally used SSA names are VARYING in its propagation lattice. 2020-11-19 Richard Biener <rguenther@suse.de> PR tree-optimization/97897 * tree-complex.c (complex_propagate::visit_stmt): Make sure abnormally used SSA names are VARYING. (complex_propagate::visit_phi): Likewise. * tree-ssa.c (verify_phi_args): Verify PHI arguments on abnormal edges are SSA names. * gcc.dg/pr97897.c: New testcase.
2020-11-19i386: Disable *<absneg:code><mode>2_i387_1 for TARGET_SSE_MATH modesUros Bizjak2-1/+16
This pattern interferes with *<absneg:code><mode>2_1 when TARGET_SSE_MATH modes are active. Combine pass is able to remove (use) RTXes and transforms *<absneg:code><mode>2_1 to *<absneg:code><mode>2_i387_1 where SSE alternatives are not available. 2020-11-19 Uroš Bizjak <ubizjak@gmail.com> gcc/ * config/i386/i386.md (*<absneg:code><mode>2_i387_1): Disable for TARGET_SSE_MATH modes. gcc/testsuite/ * gcc.target/i386/pr97887.c: New test.
2020-11-18Minor H8 shift code generation change in preparation for cc0 removalJeff Law4-18/+82
So I didn't stay up late to work from pago pago this year and beat the stage1 close, but I do want to flush out the removal of cc0 from the H8 port this cycle. Given these patches only affect the H8 and the H8 would be killed this cycle without the conversion, I think this is suitable even though we're past stage1 close. This patch addresses an initial codegen issue that would have resulted in regressions after removal of cc0. The compare/test eliminate pass is unable to handle multiple clobbers. So patterns that clobber a scratch and also clobber a condition code are never used to eliminate a compare/test. The H8 can shift 1 or 2 bits at a time depending on the precise model. Not surprisingly we have multiple strategies to implement shifts, some of which clobber scratch registers -- but we have a clobber on every shift insn and as a result they can not participate in compare/test removal once cc0 is removed from the port. This patch removes the clobber in the initial code generation in cases where it's obviously not needed allowing those shifts to participate in compare/test removal in a future patch. It has the advantage that is also generates slightly better code. By installing this now the removal of cc0 is a smaller patch, but more importantly, it allows for a more direct comparison of the generated code before/after cc0 removal. I've had my tester test before/after this patch with no regressions on the major H8 multilibs. I've also spot checked the generated code and as expected it's ever-so-slightly better after this patch. I'll be installing this on the trunk momentarily. More patches will follow, though probably not in rapid succession as my time to push this stuff is very limited. gcc/ * config/h8300/constraints.md (R constraint): Add argument to call to h8300_shift_needs_scratch_p. (S and T constraints): Similary. * config/h8300/h8300-protos.h: Update h8300_shift_needs_scratch_p prototype. * config/h8300/h8300.c (expand_a_shift): Emit a different pattern if the shift does not require a scratch register. (h8300_shift_needs_scratch_p): Refine to be more accurate. * config/h8300/shiftrotate.md (shiftqi_noscratch): New pattern. (shifthi_noscratch, shiftsi_noscratch): Similarly.
2020-11-19Daily bump.GCC Administrator17-1/+385
2020-11-18Fix middle-end/85811: Introduce tree_expr_maybe_non_p et al.Roger Sayle11-13/+443
The motivation for this patch is PR middle-end/85811, a wrong-code regression entitled "Invalid optimization with fmax, fabs and nan". The optimization involves assuming max(x,y) is non-negative if (say) y is non-negative, i.e. max(x,2.0). Unfortunately, this is an invalid assumption in the presence of NaNs. Hence max(x,+qNaN), with IEEE fmax semantics will always return x even though the qNaN is non-negative. Worse, max(x,2.0) may return a negative value if x is -sNaN. I'll quote Joseph Myers (many thanks) who describes things clearly as: > (a) When both arguments are NaNs, the return value should be a qNaN, > but sometimes it is an sNaN if at least one argument is an sNaN. > (b) Under TS 18661-1 semantics, if either argument is an sNaN then the > result should be a qNaN (whereas if one argument is a qNaN and the > other is not a NaN, the result should be the non-NaN argument). > Various implementations treat sNaNs like qNaNs here. Under this logic, the tree_expr_nonnegative_p for IEEE fmax should be: CASE_CFN_FMAX: CASE_CFN_FMAX_FN: /* Usually RECURSE (arg0) || RECURSE (arg1) but NaNs complicate things. In the presence of sNaNs, we're only guaranteed to be non-negative if both operands are non-negative. In the presence of qNaNs, we're non-negative if either operand is non-negative and can't be a qNaN, or if both operands are non-negative. */ if (tree_expr_maybe_signaling_nan_p (arg0) || tree_expr_maybe_signaling_nan_p (arg1)) return RECURSE (arg0) && RECURSE (arg1); return RECURSE (arg0) ? (!tree_expr_maybe_nan_p (arg0) || RECURSE (arg1)) : (RECURSE (arg1) && !tree_expr_maybe_nan_p (arg1)); Which indeed resolves the wrong code in the PR. The infrastructure that makes this possible are the two new functions tree_expr_maybe_nan_p and tree_expr_maybe_signaling_nan_p which test whether a value may potentially be a NaN or a signaling NaN respectively. In fact, this patch adds seven new predicates to the middle-end: bool tree_expr_finite_p (const_tree); bool tree_expr_infinite_p (const_tree); bool tree_expr_maybe_infinite_p (const_tree); bool tree_expr_signaling_nan_p (const_tree); bool tree_expr_maybe_signaling_nan_p (const_tree); bool tree_expr_nan_p (const_tree); bool tree_expr_maybe_nan_p (const_tree); These functions correspond to the "must" and "may" operators in modal logic, and allow us to triage expressions in the middle-end; definitely a NaN, definitely not a NaN, and unknown at compile-time, etc. A prime example of the utility of these functions is that a IEEE floating point value promoted from an integer type can't be a NaN or infinite. Hence (double)i+0.0 where i is an integer can be simplified to (double)i even with -fsignaling-nans. Currently in GCC optimizations are enabled/disabled based on whether the expression's type supports NaNs or sNaNs; with these new predicates they can be controlled by whether the actual operands may or may not be NaNs. Having added these extremely useful helper functions to the middle-end, I couldn't help by use then in a few places in fold-const.c, builtins.c and match.pd. In the near term, these can/should be used in places where the tree optimizers test for HONOR_NANS, HONOR_INFINITIES or HONOR_SNANS, or explicitly test whether a REAL_CST is a NaN or Inf. In the longer term (I'm not volunteering) these predicates could perhaps be hooked into the middle-end's SSA chaining and/or VRP machinery, allowing finiteness to propagated around the CFG, much like we currently propagate value ranges. This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap" and "make -k check". Ok for mainline? 2020-08-15 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR middle-end/85811 * fold-const.c (tree_expr_finite_p): New function to test whether a tree expression must be finite, i.e. not a FP NaN or infinity. (tree_expr_infinite_p): New function to test whether a tree expression must be infinite, i.e. a FP infinity. (tree_expr_maybe_infinite_p): New function to test whether a tree expression may be infinite, i.e. a FP infinity. (tree_expr_signaling_nan_p): New function to test whether a tree expression must evaluate to a signaling NaN (sNaN). (tree_expr_maybe_signaling_nan_p): New function to test whether a tree expression may be a signaling NaN (sNaN). (tree_expr_nan_p): New function to test whether a tree expression must evaluate to a (quiet or signaling) NaN. (tree_expr_maybe_nan_p): New function to test whether a tree expression me be a (quiet or signaling) NaN. (tree_binary_nonnegative_warnv_p) [MAX_EXPR]: In the presence of NaNs, MAX_EXPR is only guaranteed to be non-negative, if both operands are non-negative. (tree_call_nonnegative_warnv_p) [CASE_CFN_FMAX,CASE_CFN_FMAX_FN]: In the presence of signaling NaNs, fmax is only guaranteed to be non-negative if both operands are negative. In the presence of quiet NaNs, fmax is non-negative if either operand is non-negative and not a qNaN, or both operands are non-negative. * fold-const.h (tree_expr_finite_p, tree_expr_infinite_p, tree_expr_maybe_infinite_p, tree_expr_signaling_nan_p, tree_expr_maybe_signaling_nan_p, tree_expr_nan_p, tree_expr_maybe_nan_p): Prototype new functions here. * builtins.c (fold_builtin_classify) [BUILT_IN_ISINF]: Fold to a constant if argument is known to be (or not to be) an Infinity. [BUILT_IN_ISFINITE]: Fold to a constant if argument is known to be (or not to be) finite. [BUILT_IN_ISNAN]: Fold to a constant if argument is known to be (or not to be) a NaN. (fold_builtin_fpclassify): Check tree_expr_maybe_infinite_p and tree_expr_maybe_nan_p instead of HONOR_INFINITIES and HONOR_NANS respectively. (fold_builtin_unordered_cmp): Fold UNORDERED_EXPR to a constant when its arguments are known to be (or not be) NaNs. Check tree_expr_maybe_nan_p instead of HONOR_NANS when choosing between unordered and regular forms of comparison operators. * match.pd (ordered(x,y)->true/false): Constant fold ORDERED_EXPR if its operands are known to be (or not to be) NaNs. (unordered(x,y)->true/false): Constant fold UNORDERED_EXPR if its operands are known to be (or not to be) NaNs. (sqrt(x)*sqrt(x)->x): Check tree_expr_maybe_signaling_nan_p instead of HONOR_SNANS. gcc/testsuite/ChangeLog PR middle-end/85811 * gcc.dg/pr85811.c: New test. * gcc.dg/fold-isfinite-1.c: New test. * gcc.dg/fold-isfinite-2.c: New test. * gcc.dg/fold-isinf-1.c: New test. * gcc.dg/fold-isinf-2.c: New test. * gcc.dg/fold-isnan-1.c: New test. * gcc.dg/fold-isnan-2.c: New test.
2020-11-18lto: Fix typo in comment of gcc/lto/lto-symtab.cJerry Clcanny1-1/+1
* lto-symtab.c (lto_symtab_merge_symbols): Fix typos in comment.
2020-11-18vrp: Fix operator_trunc_mod::op1_range [PR97888]Jakub Jelinek4-4/+48
As mentioned in the PR, in (x % y) >= 0 && y >= 0, we can't deduce x's range to be x >= 0, as e.g. -7 % 7 is 0. But we can deduce it from (x % y) > 0. The patch also fixes up the comments. 2020-11-18 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/91029 PR tree-optimization/97888 * range-op.cc (operator_trunc_mod::op1_range): Only set op1 range to >= 0 if lhs is > 0, rather than >= 0. Fix up comments. * gcc.dg/pr91029.c: Add comment with PR number. (f2): Use > 0 rather than >= 0. * gcc.c-torture/execute/pr97888-1.c: New test. * gcc.c-torture/execute/pr97888-2.c: New test.
2020-11-18plugins: Allow plugins to handle global_options changesJakub Jelinek2-0/+35
Any time somebody adds or removes an option in some *.opt file (which e.g. on the 10 branch after branching off 11 happened 7 times already), many offsets in global_options variable change and so plugins that ever access GCC options or other global_options values are ABI dependent on it. It is true we don't guarantee ABI stability for plugins, but we change the most often used data structures on the release branches only very rarely and so the options changes are the most problematic for ABI stability of plugins. Annobin uses a way to remap accesses to some of the global_options.x_* by looking them up in the cl_options array where we have offsetof (struct gcc_options, x_flag_lto) etc. remembered, but sadly doesn't do it for all options (e.g. some flag_* etc. option accesses may be hidden in various macros like POINTER_SIZE), and more importantly some struct gcc_options offsets are not covered at all. E.g. there is no offsetof (struct gcc_options, x_optimize), offsetof (struct gcc_options, x_flag_sanitize) etc. Those are usually: Variable int optimize in the *.opt files. The following patch allows the plugins to deal with reshuffling of even the global_options fields that aren't tracked in cl_options by adding another array that describes those, which adds an 816 bytes long array and 1039 bytes in string literals, so 1855 .rodata bytes in total ATM. And adds it only if --enable-plugin (the default), with --disable-plugin it will not be compiled in. 2020-11-18 Jakub Jelinek <jakub@redhat.com> * opts.h (struct cl_var): New type. (cl_vars): Declare. * optc-gen.awk: Generate cl_vars array.
2020-11-18analyzer: only use CWE-690 for unchecked return value [PR97893]David Malcolm2-19/+19
CWE-690 is only for dereferencing an unchecked return value; for other kinds of NULL dereference, use the parent classification, CWE-476. gcc/analyzer/ChangeLog: PR analyzer/97893 * sm-malloc.cc (null_deref::emit): Use CWE-476 rather than CWE-690, as this isn't due to an unchecked return value. (null_arg::emit): Likewise. gcc/testsuite/ChangeLog: PR analyzer/97893 * gcc.dg/analyzer/malloc-1.c: Add CWE-690 and CWE-476 codes to expected output.
2020-11-18Objective-C++ : Avoid ICE on invalid with empty attributes.Iain Sandoe1-2/+2
Empty prefix attributes like: __attribute__ (()) @interface MyClass @end cause an ICE at present, check for that case and skip them. gcc/cp/ChangeLog: * parser.c (cp_parser_objc_valid_prefix_attributes): Check for empty attributes.
2020-11-18Optimize two patterns with three xorsEugene Rozenfeld1-0/+10
gcc/ PR tree-optimization/96671 * match.pd (three xor patterns): New patterns.
2020-11-18Update gcc zh_TW.po.Joseph Myers1-4/+4
* zh_TW.po: Update.
2020-11-18options, lto: Optimize streaming of optimization nodesJakub Jelinek1-8/+28
Honza mentioned that especially for the new param machinery, most of streamed values are probably going to be the default values. Perhaps somehow we could stream them more effectively. This patch implements it and brings further savings, the size goes down from 574 bytes to 273 bytes, i.e. less than half. Not trying to handle enums because the code doesn't know if (enum ...) 10 is even valid, similarly non-parameters because those really generally don't have large initializers, and params without Init (those are 0 initialized and thus don't need to be handled). 2020-11-18 Jakub Jelinek <jakub@redhat.com> * optc-save-gen.awk: Initialize var_opt_init. In cl_optimization_stream_out for params with default values larger than 10, xor the default value with the actual parameter value. In cl_optimization_stream_in repeat the above xor.
2020-11-18configury: --enable-link-serialization supportJakub Jelinek15-31/+224
When performing LTO bootstraps, especially when using tmpfs for /tmp, one can run a machine to halt when using higher levels of parallelism and a large number of FEs, because there are too many concurrent LTO link commands running at the same time and each one of them puts most of the middle-end/backend objects into /tmp. We have --enable-link-mutex configure option, but --enable-link-mutex has a big problem that it decreases number of available jobs by the number of link commands waiting for the lock, so e.g. when doing make -j32 build with 11 different big programs linked with $(LLINKER) we end up with just 22 effective jobs, and with e.g. make -j8 with those 11 different big programs we actually most likely serialize everything during linking onto a single job. The following patch implements a new configure option, --enable-link-serialization, which implements different serialization and as it doesn't use the mutex, just modifying the old option to be implemented differently would be strange. We can deprecate and later remove the old option. The new option doesn't use any shell mutexes, but uses make dependencies. The option is implemented inside of gcc/ configure and Makefiles, which means that even inside of gcc/ make all (as well as e.g. make lto-dump) will serialize and build all previous large binaries when configured this way. One can always make -j32 cc1 DO_LINK_SERIALIZATION= to avoid that. Furthermore, I've implemented the idea I wrote about, so that --enable-link-serialization is the same as --enable-link-serialization=1 and means the large link commands are serialized, one can (the default) --disable-link-serialization which will cause all links to be parallelizable, but one can also --enable-link-serialization=3 etc. which says that at most 3 of the large link commands can run concurrently. And finally I've implemented (only if the serialization is enabled) simple progress bars for the linking. With --enable-link-serialization and e.g. the 5 large links I have in my current tree (cc1, cc1plus, f951, lto1 and lto-dump), before the linking it prints Linking |==-- | 20% and after it Linking |==== | 40% (each == characters stand for already finished links, each -- characters stand for the link being started). With --enable-link-serialization=3 it will change the way the start is printed, one will get: Linking |-- | 0% at the start of cc1 link, Linking |>>-- | 0% at the start of the second large link and Linking |>>>>-- | 0% at the start of the third large link, where the >> characters stand for already pending links. The printing at the end of link command is the same as with the full serialization, i.e. for the above 3: Linking |== | 20% Linking |==== | 40% Linking |====== | 60% but one could actually get them in any order depending on which of those 3 finishes first - to get it 100% accurate I'd need to add some directory with files representing finished links or similar, doesn't seem worth it. 2020-11-18 Jakub Jelinek <jakub@redhat.com> gcc/ * configure.ac: Add $lang.prev rules, INDEX.$lang and SERIAL_LIST and SERIAL_COUNT variables to Make-hooks. (--enable-link-serialization): New configure option. * Makefile.in (DO_LINK_SERIALIZATION, LINK_PROGRESS): New variables. * doc/install.texi (--enable-link-serialization): Document. * configure: Regenerated. gcc/c/ * Make-lang.in (c.serial): New goal. (.PHONY): Add c.serial c.prev. (cc1$(exeext)): Call LINK_PROGRESS. gcc/cp/ * Make-lang.in (c++.serial): New goal. (.PHONY): Add c++.serial c++.prev. (cc1plus$(exeext)): Depend on c++.prev. Call LINK_PROGRESS. gcc/fortran/ * Make-lang.in (fortran.serial): New goal. (.PHONY): Add fortran.serial fortran.prev. (f951$(exeext)): Depend on fortran.prev. Call LINK_PROGRESS. gcc/lto/ * Make-lang.in (lto, lto1.serial, lto2.serial): New goals. (.PHONY): Add lto lto1.serial lto1.prev lto2.serial lto2.prev. (lto.all.cross, lto.start.encap): Remove dependencies. ($(LTO_EXE)): Depend on lto1.prev. Call LINK_PROGRESS. ($(LTO_DUMP_EXE)): Depend on lto2.prev. Call LINK_PROGRESS. gcc/objc/ * Make-lang.in (objc.serial): New goal. (.PHONY): Add objc.serial objc.prev. (cc1obj$(exeext)): Depend on objc.prev. Call LINK_PROGRESS. gcc/objcp/ * Make-lang.in (obj-c++.serial): New goal. (.PHONY): Add obj-c++.serial obj-c++.prev. (cc1objplus$(exeext)): Depend on obj-c++.prev. Call LINK_PROGRESS. gcc/ada/ * gcc-interface/Make-lang.in (ada.serial): New goal. (.PHONY): Add ada.serial ada.prev. (gnat1$(exeext)): Depend on ada.prev. Call LINK_PROGRESS. gcc/brig/ * Make-lang.in (brig.serial): New goal. (.PHONY): Add brig.serial brig.prev. (brig1$(exeext)): Depend on brig.prev. Call LINK_PROGRESS. gcc/go/ * Make-lang.in (go.serial): New goal. (.PHONY): Add go.serial go.prev. (go1$(exeext)): Depend on go.prev. Call LINK_PROGRESS. gcc/jit/ * Make-lang.in (jit.serial): New goal. (.PHONY): Add jit.serial jit.prev. ($(LIBGCCJIT_FILENAME)): Depend on jit.prev. Call LINK_PROGRESS. gcc/d/ * Make-lang.in (d.serial): New goal. (.PHONY): Add d.serial d.prev. (d21$(exeext)): Depend on d.prev. Call LINK_PROGRESS.
2020-11-18testsuite: Adjust bb-slp-pr68892.c for AArch64Richard Sandiford1-3/+3
AArch64 passes the "not profitable" test because it treats vec_construct as having a high-enough cost. This means that we can try other vector modes, which in turn causes "BB vectorization with gaps at the end of a load is not supported" to be printed more than once. The number of times that we print the message doesn't seem important, so the patch converts it to a plain scan-tree-dump. gcc/testsuite/ * gcc.dg/vect/bb-slp-pr68892.c: Don't XFAIL the profitability test for aarch64*-*-*. Allow the "BB vectorization with gaps" message to be printed more than once.
2020-11-18testsuite: Adjust gcc.dg/vect/slp-21.c for Arm targetsRichard Sandiford1-1/+11
On arm* and aarch64* targets, we can vectorise the second of the main loops using SLP, not just the third. As the comments say, whether this is supported depends on a very specific permutation, so it seemed better to use direct target selectors. gcc/testsuite/ * gcc.dg/vect/slp-21.c: Expect 4 SLP instances to be vectorized on arm* and aarch64* targets.
2020-11-18testsuite: Add vect_perm3_int guardsRichard Sandiford2-5/+5
SLP vectorisation of gcc.dg/vect/fast-math-vect-call-1.c involves a group of 3 floats, which requires the same permutation as vect_perm3_int. The load/store_lanes XFAILs in gcc.dg/vect/slp-perm-6.c implicitly assumed vect_perm3_int, which is true for Advanced SIMD but not for VLA SVE. Whether it's true for fixed-length SVE depends on the vector length. The xfail selector applies on top of the target selector, so it's not necessary to make the xfail selector a strict subset of the target selector. gcc/testsuite/ * gcc.dg/vect/fast-math-vect-call-1.c: Only expect SLP to be used on vect_perm3_int targets. * gcc.dg/vect/slp-perm-6.c: Likewise. Only XFAIL the LOAD/STORE_LANES tests on vect_perm3_int targets.
2020-11-18testsuite: Add a vect_partial_vectors_usage_2 guardRichard Sandiford1-1/+1
We don't need an epilogue loop if the main loop can operate on partial vectors, so this patch disables an associated test. The alternative would be to force partial-vectors-usage=1 on the command line. gcc/testsuite/ * gcc.dg/vect/vect-epilogues.c: XFAIL test for epilogue loop vectorization if vect_partial_vectors_usage_2.
2020-11-18testsuite: Fix vect/vect-sdiv-pow2-1.cRichard Sandiford1-1/+4
We're now able to vectorise the set-up loop: int p = power2 (fns[i].po2); for (int j = 0; j < N; j++) a[j] = ((p << 4) * j) / (N - 1) - (p << 5); This patch adds an asm to stop the loop being vectorised. gcc/testsuite/ * gcc.dg/vect/vect-sdiv-pow2-1.c (main): Add an asm to the set-up loop.
2020-11-18preprocessor: C++ module-directivesNathan Sidwell1-1/+4
C++20 modules introduces a new kind of preprocessor directive -- a module directive. These are directives but without the leading '#'. We have to detect them by sniffing the start of a logical line. When detected we replace the initial identifiers with unspellable tokens and pass them through to the language parser the same way deferred pragmas are. There's a PRAGMA_EOL at the logical end of line too. One additional complication is that we have to do header-name lexing after the initial tokens, and that requires changes in the macro-aware piece of the preprocessor. The above sniffer sets a counter in the lexer state, and that triggers at the appropriate point. We then do the same header-name lexing that occurs on a #include directive or has_include pseudo-macro. Except that the header name ends up in the token stream. A couple of token emitters need to deal with the new token possibility. gcc/c-family/ * c-lex.c (c_lex_with_flags): CPP_HEADER_NAMEs can now be seen. libcpp/ * include/cpplib.h (struct cpp_options): Add module_directives option. (NODE_MODULE): New node flag. (struct cpp_hashnode): Make rid-code a bitfield, increase bits in flags and swap with type field. * init.c (post_options): Create module-directive identifier nodes. * internal.h (struct lexer_state): Add directive_file_token & n_modules fields. Add module node enumerator. * lex.c (cpp_maybe_module_directive): New. (_cpp_lex_token): Call it. (cpp_output_token): Add '"' around CPP_HEADER_NAME token. (do_peek_ident, do_peek_module): New. (cpp_directives_only): Detect module-directive lines. * macro.c (cpp_get_token_1): Deal with directive_file_token triggering.
2020-11-18[PR97870] LRA: don't remove asm goto, just nullify it.Vladimir N. Makarov1-3/+12
gcc/ 2020-11-18 Vladimir Makarov <vmakarov@redhat.com> PR target/97870 * lra-constraints.c (curr_insn_transform): Do not delete asm goto with wrong constraints. Nullify it saving CFG.
2020-11-18Fix PR ada/97859, building ada cross compiler targeting powerpc64le-linux-gnuMatthias Klose1-1/+1
2020-11-18 Matthias Klose <doko@ubuntu.com> PR ada/97859 * Makefile.rtl (powerpc% linux%): Also match powerpc64le cpu.
2020-11-18MSP430: Add mul{hi,si} and {u,}mulsidi3 expandersJozef Lawrynowicz1-5/+56
GCC generates better code when multiplication operations, which require library functions to perform, are caught in early in RTL, rather than leaving the operation to be mapped to a library function later on. When there is hardware multiply support, it is more efficient to perform widening multiplication using the hardware multiplier instead of letting GCC widen the arguments before calling the multiplication routine in the wider mode. gcc/ChangeLog: * config/msp430/msp430.md (mulhi3): New. (mulsi3): New. (mulsidi3): Rename to *mulsidi3_inline. (umulsidi3): Rename to *umulsidi3_inline. (mulsidi3): New define_expand. (umulsidi3): New define_expand.
2020-11-18tree-optimization/97886 - deal with strange LC PHI nodesRichard Biener1-0/+11
This makes vectorization properly assign vector types to PHI nodes that copy from externals on loop exit edges. 2020-11-18 Richard Biener <rguenther@suse.de> PR tree-optimization/97886 * tree-vect-loop.c (vectorizable_lc_phi): Properly assign vector types to invariants for SLP.
2020-11-18d: Fix LHS of array concatentation evaluated before the RHS.Iain Buclaw3-1/+44
In an array append expression: array ~= fun(array); The array in the left hand side of the expression was extended before evaluating the result of the right hand side, which resulted in the newly uninitialized array index being used before set. This fixes that so that the result of the right hand side is always saved in a reusable temporary before assigning to the destination. gcc/d/ChangeLog: PR d/97843 * d-codegen.cc (build_assign): Evaluate TARGET_EXPR before use in the right hand side of an assignment. * expr.cc (ExprVisitor::visit (CatAssignExp *)): Force a TARGET_EXPR on the element to append if it is a CALL_EXPR. gcc/testsuite/ChangeLog: PR d/97843 * gdc.dg/torture/pr97843.d: New test.
2020-11-18d: Fix a couple of ICEs found in the dmd front-end (PR97842)Iain Buclaw9-1/+93
- Segmentation fault on incomplete static if. - Segmentation fault resolving typeof() expression when gagging is on. Reviewed-on: https://github.com/dlang/dmd/pull/11971 gcc/d/ChangeLog: PR d/97842 * dmd/MERGE: Merge upstream dmd b6a779e49
2020-11-18d: Add dragonflybsd support for D compiler and runtimeIain Buclaw3-0/+61
gcc/ChangeLog: * config.gcc (*-*-dragonfly*): Add dragonfly-d.o and t-dragonfly. * config/dragonfly-d.c: New file. * config/t-dragonfly: New file. libphobos/ChangeLog: * configure.tgt: Add *-*-dragonfly* as a supported target. * configure: Regenerate. * m4/druntime/os.m4 (DRUNTIME_OS_SOURCES): Add dragonfly* as a posix target.
2020-11-18openmp: Fix ICE on non-rectangular loop with known 0 iterationsJakub Jelinek2-1/+17
The loops in the testcase are non-rectangular and have 0 iterations (the outer loop iterates, but the inner one never). In this case we just have the overall number of iterations computed (0), and don't have factor and other values computed. We never need to map logical iterations to the individual iterations in that case, and we were crashing during expansion of that code. 2020-11-18 Jakub Jelinek <jakub@redhat.com> PR middle-end/97862 * omp-expand.c (expand_omp_for_init_vars): Don't use the sqrt path if number of iterations is constant 0. * c-c++-common/gomp/pr97862.c: New test.
2020-11-18RISC-V: Support version controling for ISA standard extensionsKito Cheng21-88/+393
- New option -misa-spec support: -misa-spec=[2.2|20190608|20191213] and corresponding configuration option --with-isa-spec. - Current default ISA spec set to 2.2, but we intend to bump this to 20191213 or later in next release. gcc/ChangeLog: * common/config/riscv/riscv-common.c (riscv_ext_version): New. (riscv_ext_version_table): Ditto. (get_default_version): Ditto. (riscv_subset_t::implied_p): New field. (riscv_subset_t::riscv_subset_t): Init implied_p. (riscv_subset_list::add): New. (riscv_subset_list::handle_implied_ext): Pass riscv_subset_t instead of separated argument. (riscv_subset_list::to_string): Handle zifencei and zicsr, and omit version if version is unknown. (riscv_subset_list::parsing_subset_version): New argument `ext`, remove default_major_version and default_minor_version, get default version info via get_default_version. (riscv_subset_list::parse_std_ext): Update argument for parsing_subset_version calls. Handle 2.2 ISA spec, always enable zicsr and zifencei, they are included in baseline ISA in that time. (riscv_subset_list::parse_multiletter_ext): Update argument for `parsing_subset_version` and `add` calls. (riscv_subset_list::parse): Adjust argument for riscv_subset_list::handle_implied_ext call. * config.gcc (riscv*-*-*): Handle --with-isa-spec=. * config.in (HAVE_AS_MISA_SPEC): New. (HAVE_AS_MARCH_ZIFENCEI): Ditto. * config/riscv/riscv-opts.h (riscv_isa_spec_class): New. (riscv_isa_spec): Ditto. * config/riscv/riscv.h (HAVE_AS_MISA_SPEC): New. (ASM_SPEC): Pass -misa-spec if gas supported. * config/riscv/riscv.opt (riscv_isa_spec_class) New. * configure.ac (HAVE_AS_MARCH_ZIFENCEI): New test. (HAVE_AS_MISA_SPEC): Ditto. * configure: Regen. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-9.c: New. * gcc.target/riscv/arch-10.c: Ditto. * gcc.target/riscv/arch-11.c: Ditto. * gcc.target/riscv/attribute-6.c: Remove, we don't support G with version anymore. * gcc.target/riscv/attribute-8.c: Reorder arch string to fit canonical ordering. * gcc.target/riscv/attribute-9.c: We don't emit version for unknown extensions now. * gcc.target/riscv/attribute-11.c: Add -misa-spec=2.2 flags. * gcc.target/riscv/attribute-12.c: Ditto. * gcc.target/riscv/attribute-13.c: Ditto. * gcc.target/riscv/attribute-14.c: Ditto. * gcc.target/riscv/attribute-15.c: New. * gcc.target/riscv/attribute-16.c: Ditto. * gcc.target/riscv/attribute-17.c: Ditto.
2020-11-18RISC-V: Support zicsr and zifencei extension for -march.Kito Cheng6-2/+29
- CSR related instructions and fence instructions has to be splitted from baseline ISA, zicsr and zifencei are corresponding sub-extension. gcc/ChangeLog: * common/config/riscv/riscv-common.c (riscv_implied_info): d and f implied zicsr. (riscv_ext_flag_table): Handle zicsr and zifencei. * config/riscv/riscv-opts.h (MASK_ZICSR): New. (MASK_ZIFENCEI): Ditto. (TARGET_ZICSR): Ditto. (TARGET_ZIFENCEI): Ditto. * config/riscv/riscv.md (clear_cache): Check TARGET_ZIFENCEI. (fence_i): Ditto. * config/riscv/riscv.opt (riscv_zi_subext): New. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-8.c: New. * gcc.target/riscv/attribute-14.c: Ditto.
2020-11-18RISC-V: Handle implied extension in canonical ordering.Kito Cheng1-5/+172
- ISA spec has specify the order between multi-letter extensions, implied extension also need to follow store in canonical ordering, so most easy way is we keep that in-order during insertion. gcc/ChangeLog: * common/config/riscv/riscv-common.c (single_letter_subset_rank): New. (multi_letter_subset_rank): Ditto. (subset_cmp): Ditto. (riscv_subset_list::add): Insert subext in canonical ordering. (riscv_subset_list::parse_std_ext): Move handle_implied_ext to ... (riscv_subset_list::parse): ... here.
2020-11-18Clean up loop-closed PHIs after loop finalizeguojiufu6-3/+104
This patch propagates loop-closed PHIs them out at loop_optimizer_finalize. For some cases, to clean up loop-closed PHIs would save efforts of optimization passes after loopdone. Thanks, Jiufu Guo. gcc/ChangeLog: 2020-10-18 Jiufu Guo <guojiufu@linux.ibm.com> * cfgloop.h (loop_optimizer_finalize): Add flag argument. * loop-init.c (loop_optimizer_finalize): Call clean_up_loop_closed_phi. * tree-cfgcleanup.h (clean_up_loop_closed_phi): New declare. * tree-ssa-loop.c (tree_ssa_loop_done): Call loop_optimizer_finalize with flag argument. * tree-ssa-propagate.c (clean_up_loop_closed_phi): New function. gcc/testsuite/ChangeLog: 2020-10-18 Jiufu Guo <guojiufu@linux.ibm.com> * gcc.dg/tree-ssa/loopclosedphi.c: New test.