Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch generalizes the state machine in sm-malloc.c to support
multiple allocator APIs, and adds just enough support for C++ new and
delete to demonstrate the feature, allowing for detection of code
paths where the result of new in C++ can leak - for some crude examples,
at least (bearing in mind that the analyzer doesn't yet know about
e.g. vfuncs, exceptions, inheritance, RTTI, etc)
It also implements a new warning: -Wanalyzer-mismatching-deallocation.
For example:
demo.cc: In function 'void test()':
demo.cc:8:8: warning: 'f' should have been deallocated with 'delete'
but was deallocated with 'free' [CWE-762] [-Wanalyzer-mismatching-deallocation]
8 | free (f);
| ~~~~~^~~
'void test()': events 1-2
|
| 7 | foo *f = new foo;
| | ^~~
| | |
| | (1) allocated here (expects deallocation with 'delete')
| 8 | free (f);
| | ~~~~~~~~
| | |
| | (2) deallocated with 'free' here; allocation at (1) expects deallocation with 'delete'
|
The patch also adds just enough knowledge of exception-handling to
suppress a false positive from -Wanalyzer-malloc-leak on
g++.dg/analyzer/pr96723.C on the exception-handling CFG edge after
operator new. It does this by adding a constraint that the result is
NULL if an exception was thrown from operator new, since the result from
operator new is lost when following that exception-handling CFG edge.
gcc/analyzer/ChangeLog:
PR analyzer/94355
* analyzer.opt (Wanalyzer-mismatching-deallocation): New warning.
* region-model-impl-calls.cc
(region_model::impl_call_operator_new): New.
(region_model::impl_call_operator_delete): New.
* region-model.cc (region_model::on_call_pre): Detect operator new
and operator delete.
(region_model::on_call_post): Likewise.
(region_model::maybe_update_for_edge): Detect EH edges and call...
(region_model::apply_constraints_for_exception): New function.
* region-model.h (region_model::impl_call_operator_new): New decl.
(region_model::impl_call_operator_delete): New decl.
(region_model::apply_constraints_for_exception): New decl.
* sm-malloc.cc (enum resource_state): New.
(struct allocation_state): New state subclass.
(enum wording): New.
(struct api): New.
(malloc_state_machine::custom_data_t): New typedef.
(malloc_state_machine::add_state): New decl.
(malloc_state_machine::m_unchecked)
(malloc_state_machine::m_nonnull)
(malloc_state_machine::m_freed): Delete these states in favor
of...
(malloc_state_machine::m_malloc)
(malloc_state_machine::m_scalar_new)
(malloc_state_machine::m_vector_new): ...this new api instances,
which own their own versions of these states.
(malloc_state_machine::on_allocator_call): New decl.
(malloc_state_machine::on_deallocator_call): New decl.
(api::api): New ctor.
(dyn_cast_allocation_state): New.
(as_a_allocation_state): New.
(get_rs): New.
(unchecked_p): New.
(nonnull_p): New.
(freed_p): New.
(malloc_diagnostic::describe_state_change): Use unchecked_p and
nonnull_p.
(class mismatching_deallocation): New.
(double_free::double_free): Add funcname param for initializing
m_funcname.
(double_free::emit): Use m_funcname in warning message rather
than hardcoding "free".
(double_free::describe_state_change): Likewise. Use freed_p.
(double_free::describe_call_with_state): Use freed_p.
(double_free::describe_final_event): Use m_funcname in message
rather than hardcoding "free".
(double_free::m_funcname): New field.
(possible_null::describe_state_change): Use unchecked_p.
(possible_null::describe_return_of_state): Likewise.
(use_after_free::use_after_free): Add param for initializing m_api.
(use_after_free::emit): Use m_api->m_dealloc_funcname in message
rather than hardcoding "free".
(use_after_free::describe_state_change): Use freed_p. Change the
wording of the message based on the API.
(use_after_free::describe_final_event): Use
m_api->m_dealloc_funcname in message rather than hardcoding
"free". Change the wording of the message based on the API.
(use_after_free::m_api): New field.
(malloc_leak::describe_state_change): Use unchecked_p. Update
for renaming of m_malloc_event to m_alloc_event.
(malloc_leak::describe_final_event): Update for renaming of
m_malloc_event to m_alloc_event.
(malloc_leak::m_malloc_event): Rename...
(malloc_leak::m_alloc_event): ...to this.
(free_of_non_heap::free_of_non_heap): Add param for initializing
m_funcname.
(free_of_non_heap::emit): Use m_funcname in message rather than
hardcoding "free".
(free_of_non_heap::describe_final_event): Likewise.
(free_of_non_heap::m_funcname): New field.
(allocation_state::dump_to_pp): New.
(allocation_state::get_nonnull): New.
(malloc_state_machine::malloc_state_machine): Update for changes
to state fields and new api fields.
(malloc_state_machine::add_state): New.
(malloc_state_machine::on_stmt): Move malloc/calloc handling to
on_allocator_call and call it, passing in the API pointer.
Likewise for free, moving it to on_deallocator_call. Handle calls
to operator new and delete in an analogous way. Use unchecked_p
when testing for possibly-null-arg and possibly-null-deref, and
transition to the non-null for the correct API. Remove redundant
node param from call to on_zero_assignment. Use freed_p for
use-after-free check, and pass in API.
(malloc_state_machine::on_allocator_call): New, based on code in
on_stmt.
(malloc_state_machine::on_deallocator_call): Likewise.
(malloc_state_machine::on_phi): Mark node param with
ATTRIBUTE_UNUSED; don't pass it to on_zero_assignment.
(malloc_state_machine::on_condition): Mark node param with
ATTRIBUTE_UNUSED. Replace on_transition calls with get_state and
set_next_state pairs, transitioning to the non-null state for the
appropriate API.
(malloc_state_machine::can_purge_p): Port to new state approach.
(malloc_state_machine::on_zero_assignment): Replace on_transition
calls with get_state and set_next_state pairs. Drop redundant
node param.
* sm.h (state_machine::add_custom_state): New.
gcc/ChangeLog:
PR analyzer/94355
* doc/invoke.texi: Document -Wanalyzer-mismatching-deallocation.
gcc/testsuite/ChangeLog:
PR analyzer/94355
* g++.dg/analyzer/new-1.C: New test.
* g++.dg/analyzer/new-vs-malloc.C: New test.
|
|
This patch is yet more preliminary work towards generalizing sm-malloc.cc
beyond just malloc/free.
It eliminates sm_context::warn_for_state in terms of a new sm_context::warn
vfunc, guarded by sm_context::get_state calls.
gcc/analyzer/ChangeLog:
* diagnostic-manager.cc
(null_assignment_sm_context::warn_for_state): Replace with...
(null_assignment_sm_context::warn): ...this.
* engine.cc (impl_sm_context::warn_for_state): Replace with...
(impl_sm_context::warn): ...this.
* sm-file.cc (fileptr_state_machine::on_stmt): Replace
warn_for_state and on_transition calls with a get_state
test guarding warn and set_next_state calls.
* sm-malloc.cc (malloc_state_machine::on_stmt): Likewise.
* sm-pattern-test.cc (pattern_test_state_machine::on_condition):
Replace warn_for_state call with warn call.
* sm-sensitive.cc
(sensitive_state_machine::warn_for_any_exposure): Replace
warn_for_state call with a get_state test guarding a warn call.
* sm-signal.cc (signal_state_machine::on_stmt): Likewise.
* sm-taint.cc (taint_state_machine::on_stmt): Replace
warn_for_state and on_transition calls with a get_state
test guarding warn and set_next_state calls.
* sm.h (sm_context::warn_for_state): Replace with...
(sm_context::warn): ...this.
|
|
This patch is further preliminary work towards generalizing sm-malloc.cc
beyond just malloc/free.
Reimplement sm_context's on_transition vfunc in terms of new get_state
and set_next_state vfuncs, so that in followup patches we can implement
richer transitions (e.g. where the states are parametrized by
allocator).
gcc/analyzer/ChangeLog:
* diagnostic-manager.cc
(null_assignment_sm_context::null_assignment_sm_context): Add old_state
and ext_state params, initializing m_old_state and m_ext_state.
(null_assignment_sm_context::on_transition): Split into...
(null_assignment_sm_context::get_state): ...this new vfunc
implementation and...
(null_assignment_sm_context::set_next_state): ...this new vfunc
implementation.
(null_assignment_sm_context::m_old_state): New field.
(null_assignment_sm_context::m_ext_state): New field.
(diagnostic_manager::add_events_for_eedge): Pass in old state and
ext_state when creating sm_ctxt.
* engine.cc (impl_sm_context::on_transition): Split into...
(impl_sm_context::get_state): ...this new vfunc
implementation and...
(impl_sm_context::set_next_state): ...this new vfunc
implementation.
* sm.h (sm_context::get_state): New pure virtual function.
(sm_context::set_next_state): Likewise.
(sm_context::on_transition): Convert from a pure virtual function
to a regular function implemented in terms of get_state and
set_next_state.
|
|
This patch is preliminary work towards generalizing sm-malloc.cc so that
it can check APIs other than just malloc/free (and e.g. detect
mismatching alloc/dealloc pairs).
Generalize states in state machines so that, rather than state_t being
just an "unsigned", it becomes a "const state *", where the underlying
state objects are immutable objects managed by the state machine in
question, and can e.g. have vfuncs and extra fields. The start state
m_start becomes a member of the state_machine base_class.
gcc/analyzer/ChangeLog:
* checker-path.cc (state_change_event::get_desc): Update
state_machine::get_state_name calls to state::get_name.
(warning_event::get_desc): Likewise.
* diagnostic-manager.cc
(null_assignment_sm_context::on_transition): Update comparison
against 0 with comparison with m_sm.get_start_state.
(diagnostic_manager::prune_for_sm_diagnostic): Update
state_machine::get_state_name calls to state::get_name.
* engine.cc (impl_sm_context::on_transition): Likewise.
(exploded_node::get_dot_fillcolor): Use get_id when summing
the sm states.
* program-state.cc (sm_state_map::sm_state_map): Don't hardcode
0 as the start state when initializing m_global_state.
(sm_state_map::print): Use dump_to_pp rather than get_state_name
when dumping states.
(sm_state_map::is_empty_p): Don't hardcode 0 as the start state
when examining m_global_state.
(sm_state_map::hash): Use get_id when hashing states.
(selftest::test_sm_state_map): Use state objects rather than
arbitrary hardcoded integers.
(selftest::test_program_state_merging): Likewise.
(selftest::test_program_state_merging_2): Likewise.
* sm-file.cc (fileptr_state_machine::m_start): Move to base class.
(file_diagnostic::describe_state_change): Use get_start_state.
(fileptr_state_machine::fileptr_state_machine): Drop m_start
initialization.
* sm-malloc.cc (malloc_state_machine::m_start): Move to base
class.
(malloc_diagnostic::describe_state_change): Use get_start_state.
(possible_null::describe_state_change): Likewise.
(malloc_state_machine::malloc_state_machine): Drop m_start
initialization.
* sm-pattern-test.cc (pattern_test_state_machine::m_start): Move
to base class.
(pattern_test_state_machine::pattern_test_state_machine): Drop
m_start initialization.
* sm-sensitive.cc (sensitive_state_machine::m_start): Move to base
class.
(sensitive_state_machine::sensitive_state_machine): Drop m_start
initialization.
* sm-signal.cc (signal_state_machine::m_start): Move to base
class.
(signal_state_machine::signal_state_machine): Drop m_start
initialization.
* sm-taint.cc (taint_state_machine::m_start): Move to base class.
(taint_state_machine::taint_state_machine): Drop m_start
initialization.
* sm.cc (state_machine::state::dump_to_pp): New.
(state_machine::state_machine): Move here from sm.h. Initialize
m_next_state_id and m_start.
(state_machine::add_state): Reimplement in terms of state objects.
(state_machine::get_state_name): Delete.
(state_machine::get_state_by_name): Reimplement in terms of state
objects. Make const.
(state_machine::validate): Delete.
(state_machine::dump_to_pp): Reimplement in terms of state
objects.
* sm.h (state_machine::state): New class.
(state_machine::state_t): Convert typedef from "unsigned" to
"const state_machine::state *".
(state_machine::state_machine): Move to sm.cc.
(state_machine::get_default_state): Use m_start rather than
hardcoding 0.
(state_machine::get_state_name): Delete.
(state_machine::get_state_by_name): Make const.
(state_machine::get_start_state): New accessor.
(state_machine::alloc_state_id): New.
(state_machine::m_state_names): Drop in favor of...
(state_machine::m_states): New field
(state_machine::m_start): New field
(start_start_p): Delete.
|
|
omp reductions are modeled as nested functions, which is a thing C++
doesn't have. Leading to much confusion until I figured out what was
happening. Not helped by some duplicate code and inconsistencies in
the dependent and non-dependent paths. This patch removes the parser
duplication and fixes up some bookkeeping. Added some asserts and
comments too.
gcc/cp/
* parser.c (cp_parser_omp_declare_reduction): Refactor to avoid
code duplication. Update DECL_TI_TEMPLATE's context.
* pt.c (tsubst_expr): For OMP reduction function, set context to
global_namespace before pushing.
(tsubst_omp_udr): Assert current_function_decl, add comment about
decl context.
|
|
This test uses C++14 features so is failing with -std=c++11.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wnonnull6.C: Use target c++14.
|
|
This test uses a C++14 feature so fails with -std=c++11. Therefore
I've moved it to cpp1y/ and used target c++14.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/auto-96647.C: Moved to...
* g++.dg/cpp1y/auto-96647.C: ...here. Use target c++14.
|
|
Update gcc.target/i386/builtin_thread_pointer.c for x32. For
int
foo3 (int i)
{
int* p = (int*) __builtin_thread_pointer ();
return p[i];
}
we can't generate:
movl %fs:0(,%edi,4), %eax
ret
for x32 since the address of %fs:0(,%edi,4) is %fs + zero-extended to 64
bits of 0(,%edi,4). Instead, we generate:
movl %fs:0, %eax
movl (%eax,%edi,4), %eax
PR target/96955
* gcc.target/i386/builtin_thread_pointer.c: Update scan-assembler
for x32.
|
|
When the compgotos pass copies the tail of blocks ending in an indirect
jump, there is a micro-optimization to not copy the last one, since the
original block will then just be deleted. This does not work properly
if cleanup_cfg does not merge all pairs of blocks we expect it to. It
also does not work if that last block can be merged into multiple
predecessors.
2020-09-09 Segher Boessenkool <segher@kernel.crashing.org>
PR rtl-optimization/96475
* bb-reorder.c (maybe_duplicate_computed_goto): Remove single_pred_p
micro-optimization.
|
|
I'm running into this warning:
...
src/gcc/config/nvptx/nvptx.c: In function \
‘void nvptx_assemble_decl_begin(FILE*, const char*, const char*, \
const_tree, long int, unsigned int, bool)’:
src/gcc/config/nvptx/nvptx.c:2229:29: warning: format ‘%d’ expects argument \
of type ‘int’, but argument 5 has type ‘long unsigned int’ [-Wformat=]
elt_size * BITS_PER_UNIT);
^
...
which I seem to have introduced in commit b9c7fe59f9f "[nvptx] Fix array
dimension in nvptx_assemble_decl_begin", but not noticed due to configuring
with --disable-build-format-warnings.
Fix this by using the appropriate format.
Rebuild cc1 on nvptx.
gcc/ChangeLog:
* config/nvptx/nvptx.c (nvptx_assemble_decl_begin): Fix Wformat
warning.
|
|
In resolve_address_of_overloaded_function, currently only the second
pass over the overload set (which considers just the function templates
in the overload set) checks constraints and performs return type
deduction when necessary. But as the testcases below show, we need to
do the same when considering non-template functions during the first
pass.
gcc/cp/ChangeLog:
PR c++/96647
* class.c (resolve_address_of_overloaded_function): Check
constraints_satisfied_p and perform return-type deduction via
maybe_instantiate_decl when considering non-template functions
in the overload set.
* cp-tree.h (maybe_instantiate_decl): Declare.
* decl2.c (maybe_instantiate_decl): Remove static.
gcc/testsuite/ChangeLog:
PR c++/96647
* g++.dg/cpp0x/auto-96647.C: New test.
* g++.dg/cpp0x/error9.C: New test.
* g++.dg/cpp2a/concepts-fn6.C: New test.
|
|
This avoids unsharing the SLP tree when optimizing load permutations
for reductions but there is no actual permute taking place.
2020-09-09 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Do
nothing when the permutation doesn't permute.
|
|
When running this libgomp testcase for nvptx accelerator:
...
/* { dg-do run } */
__uint128_t v;
int main () {
#pragma omp target
{
__uint128_t exp = 2;
__atomic_compare_exchange_n (&v, &exp, 7, false, __ATOMIC_RELEASE,
__ATOMIC_ACQUIRE);
}
}
...
we run into this assert in write_fn_proto:
...
913 gcc_assert (type == boolean_type_node);
...
This happens when doing some special-handling code for
__atomic_compare_exchange_1/2/4/8/16. The function decls have a parameter
called weak of type bool, which is skipped when writing the decl because
the corresponding libatomic functions do not have that parameter. The assert
is there to verify that we skip the correct parameter.
However, we assert because we have different type of bools:
...
(gdb) call debug_generic_expr (type)
_Bool
(gdb) call debug_generic_expr (global_trees[TI_BOOLEAN_TYPE])
bool
...
Fix this by checking for TREE_CODE (type) == BOOLEAN_TYPE instead.
Tested libgomp on x86_64-linux with nvptx accelerator.
Likewise, tested that the test-case above does not ICE anymore.
gcc/ChangeLog:
PR target/96991
* config/nvptx/nvptx.c (write_fn_proto): Fix boolean type check.
|
|
This removes a check preventing vectorization of live results of
vectorized comparisons. I tested it with AVX512 mask registers
(inspecting assembly) and traditional vector masks.
2020-09-09 Richard Biener <rguenther@suse.de>
* tree-vect-stmts.c (vectorizable_comparison): Allow
STMT_VINFO_LIVE_P stmts.
* gcc.dg/vect/vect-live-6.c: New testcase.
|
|
nvptx has additional omp simd lines with _simt_ with -O1 and higher.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/combined-if.f90: Update scan-tree-dump-times for
'omp simd.*if' for nvptx even more.
|
|
This removes a check preventing vectorization of live results of
vectorized conditions.
2020-09-09 Richard Biener <rguenther@suse.de>
* tree-vect-stmts.c (vectorizable_condition): Allow
STMT_VINFO_LIVE_P stmts.
* gcc.dg/vect/vect-cond-13.c: New testcase.
* gcc.target/i386/pr87007-4.c: Adjust.
* gcc.target/i386/pr87007-5.c: Likewise.
|
|
This avoids looking at STMT_VINFO_LIVE_P when vectorizing BBs.
2020-09-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/96978
* tree-vect-stmts.c (vectorizable_condition): Do not
look at STMT_VINFO_LIVE_P for BB vectorization.
(vectorizable_comparison): Likewise.
|
|
gcc/ChangeLog:
PR target/96955
* config/i386/i386.md (get_thread_pointer<mode>): New
expander.
gcc/testsuite/ChangeLog:
* gcc.target/i386/builtin_thread_pointer.c: New test.
|
|
This commit also fixes a gfortran.dg/gomp/target1.f90 regression;
target1.f90 tests the resolve.c and openmp.c changes.
gcc/fortran/ChangeLog:
PR fortran/95109
PR fortran/94690
* resolve.c (gfc_resolve_code): Also call
gfc_resolve_omp_parallel_blocks for 'distribute parallel do (simd)'.
* openmp.c (gfc_resolve_omp_parallel_blocks): Handle it.
(gfc_resolve_do_iterator): Remove special code for SIMD, which is
not needed.
* trans-openmp.c (gfc_trans_omp_target): For TARGET_PARALLEL_DO_SIMD,
call simd not do processing function.
gcc/testsuite/ChangeLog:
PR fortran/95109
PR fortran/94690
* gfortran.dg/gomp/combined-if.f90: Update scan-tree-dump-times for
'omp simd.*if'.
* gfortran.dg/gomp/openmp-simd-5.f90: New test.
|
|
|
|
Data-share write (ds_write) instructions do not necessarily complete
the write to LDS immediately. When a write completes, LGKM_CNT is
decremented. For now, we wait until LGKM_CNT reaches zero after each
ds_write instruction.
This fixes a race condition in the case where LDS is read immediately
after being written. This can happen with broadcast operations.
2020-09-08 Julian Brown <julian@codesourcery.com>
gcc/
* config/gcn/gcn-valu.md (scatter<mode>_insn_1offset_ds<exec_scatter>):
Add waitcnt.
* config/gcn/gcn.md (*mov<mode>_insn, *movti_insn): Add waitcnt to
ds_write alternatives.
|
|
If an offload kernel uses a large number of VGPRs, AMD GCN hardware may
need to limit the number of threads/workers launched for that kernel.
The number of SGPRs/VGPRs in use is detected by mkoffload and recorded in
the processed output. The patterns emitted detailing SGPR/VGPR occupancy
changed between HSACO v2 and v3 though, so this patch updates parsing
to account for that.
2020-09-08 Julian Brown <julian@codesourcery.com>
gcc/
* config/gcn/mkoffload.c (process_asm): Initialise regcount. Update
scanning for SGPR/VGPR usage for HSACO v3.
|
|
PR analyzer/96949 reports an ICE with
--param analyzer-max-svalue-depth=0, where the param value leads
to INTEGER_CST values in a RANGE_EXPR being treated as unknown
symbolic values.
This patch replaces implicit assumptions that these values are
concrete (and thus have concrete bit offsets), adding
error-handling for symbolic cases instead of assertions.
gcc/analyzer/ChangeLog:
PR analyzer/96949
* store.cc (binding_map::apply_ctor_val_to_range): Add
error-handling for the cases where we have symbolic offsets.
gcc/testsuite/ChangeLog:
PR analyzer/96949
* gfortran.dg/analyzer/pr96949.f90: New test.
|
|
gcc/analyzer/ChangeLog:
PR analyzer/96950
* store.cc (binding_map::apply_ctor_to_region): Handle RANGE_EXPR
where min_index == max_index.
(binding_map::apply_ctor_val_to_range): Replace assertion that we
don't have a CONSTRUCTOR value with error-handling.
|
|
In g:ee7bfbe5eb70a23bbf3a2cedfdcbd2ea1a20c3f2 I added a
switch (DECL_UNCHECKED_FUNCTION_CODE (callee_fndecl))
to region_model::on_call_pre guarded by
fndecl_built_in_p (callee_fndecl).
I meant to handle only normal built-ins, whereas this
single-argument overload of fndecl_built_in_p returns true for any
kind of built-in.
PR analyzer/96962 reports a case where this matches for a
machine-specific builtin, leading to an ICE. Fixed thusly.
gcc/analyzer/ChangeLog:
PR analyzer/96962
* region-model.cc (region_model::on_call_pre): Fix guard on switch
on built-ins to only consider BUILT_IN_NORMAL, rather than other
kinds of build-ins.
|
|
PR tree-optimization/96967
* tree-vrp.c (find_case_label_range): Cast label range to
type of switch operand.
|
|
The assembly code ".mspabi_attribute 4,1" uses the object attribute
mechanism to indicate that the 430 ISA is in use. However, the default
ISA is 430X, so GAS fails to assemble this since the ISA wasn't also set
to 430 on the command line.
gcc/ChangeLog:
* config/msp430/msp430.c (msp430_file_end): Fix jumbled
HAVE_AS_MSPABI_ATTRIBUTE and HAVE_AS_GNU_ATTRIBUTE checks.
* configure: Regenerate.
* configure.ac: Use ".mspabi_attribute 4,2" to check for assembler
support for this object attribute directive.
|
|
The -mcpu= option accepts only a handful of string values.
Using enums instead of strings to handle the accepted values removes the
need to have specific processing of the strings in the backend, and
simplifies any comparisons which need to be performed on the value.
It also allows the default value to have semantic equivalence to a user
set value, whilst retaining the ability to differentiate between them.
Practically, this allows a user set -mcpu= value to override the the ISA set by
-mmcu, whilst the default -mcpu= value can still have an explicit meaning.
gcc/ChangeLog:
* common/config/msp430/msp430-common.c (msp430_handle_option): Remove
OPT_mcpu_ handling.
Set target_cpu value to new enum values when parsing certain -mmcu=
values.
* config/msp430/msp430-opts.h (enum msp430_cpu_types): New.
* config/msp430/msp430.c (msp430_option_override): Handle new
target_cpu enum values.
Set target_cpu using extracted value for given MCU when -mcpu=
option is not passed by the user.
* config/msp430/msp430.opt: Handle -mcpu= values using enums.
gcc/testsuite/ChangeLog:
* gcc.target/msp430/mcpu-is-430.c: New test.
* gcc.target/msp430/mcpu-is-430x.c: New test.
* gcc.target/msp430/mcpu-is-430xv2.c: New test.
|
|
gcc/fortran/ChangeLog:
* intrinsic.texi: Fix description of FINDLOC result.
|
|
|
|
When rounding a real to the nearest integer, temporarily convert the real
argument to a longer real kind when the result is of type/kind integer(16).
gcc/fortran/ChangeLog:
* trans-intrinsic.c (build_round_expr): Use temporary with
appropriate kind for conversion before rounding to nearest
integer when the result precision is 128 bits.
gcc/testsuite/ChangeLog:
* gfortran.dg/pr96711.f90: New test.
|
|
This PR is about LRA cycling for a reload of the form:
----------------------------------------------------------------------------
Changing pseudo 196 in operand 1 of insn 103 on equiv [r105:DI*0x8+r140:DI]
Creating newreg=287, assigning class ALL_REGS to slow/invalid mem r287
Creating newreg=288, assigning class ALL_REGS to slow/invalid mem r288
103: r203:SI=r288:SI<<0x1+r196:DI#0
REG_DEAD r196:DI
Inserting slow/invalid mem reload before:
316: r287:DI=[r105:DI*0x8+r140:DI]
317: r288:SI=r287:DI#0
----------------------------------------------------------------------------
The problem is with r287. We rightly give it a broad starting class of
POINTER_AND_FP_REGS (reduced from ALL_REGS by preferred_reload_class).
However, we never make forward progress towards narrowing it down to
a specific choice of class (POINTER_REGS or FP_REGS).
I think in practice we rely on two things to narrow a reload pseudo's
class down to a specific choice:
(1) a restricted class is specified when the pseudo is created
This happens for input address reloads, where the class is taken
from the target's chosen base register class. It also happens
for simple REG reloads, where the class is taken from the chosen
alternative's constraints.
(2) uses of the reload pseudo as a direct input operand
In this case get_reload_reg tries to reuse the existing register
and narrow its class, instead of creating a new reload pseudo.
However, neither occurs here. As described above, r287 rightly
starts out with a wide choice of class, ultimately derived from
ALL_REGS, so we don't get (1). And as the comments in the PR
explain, r287 is never used as an input reload, only the subreg is,
so we don't get (2):
----------------------------------------------------------------------------
Choosing alt 13 in insn 317: (0) r (1) w {*movsi_aarch64}
Creating newreg=291, assigning class FP_REGS to r291
317: r288:SI=r291:SI
Inserting insn reload before:
320: r291:SI=r287:DI#0
----------------------------------------------------------------------------
IMO, in this case we should rely on the reload of r316 to narrow
down the class of r278. Currently we do:
----------------------------------------------------------------------------
Choosing alt 7 in insn 316: (0) r (1) m {*movdi_aarch64}
Creating newreg=289 from oldreg=287, assigning class GENERAL_REGS to r289
316: r289:DI=[r105:DI*0x8+r140:DI]
Inserting insn reload after:
318: r287:DI=r289:DI
---------------------------------------------------
i.e. we create a new pseudo register r289 and give *that* pseudo
GENERAL_REGS instead. This is because get_reload_reg only narrows
down the existing class for OP_IN and OP_INOUT, not OP_OUT.
But if we have a reload pseudo in a reload instruction and have chosen
a specific class for the reload pseudo, I think we should simply install
it for OP_OUT reloads too, if the class is a subset of the existing class.
We will need to pick such a register whatever happens (for r289 in the
example above). And as explained in the PR, doing this actually avoids
an unnecessary move via the FP registers too.
The patch is quite aggressive in that it does this for all reload
pseudos in all reload instructions. I wondered about reusing the
condition for a reload move in in_class_p:
INSN_UID (curr_insn) >= new_insn_uid_start
&& curr_insn_set != NULL
&& ((OBJECT_P (SET_SRC (curr_insn_set))
&& ! CONSTANT_P (SET_SRC (curr_insn_set)))
|| (GET_CODE (SET_SRC (curr_insn_set)) == SUBREG
&& OBJECT_P (SUBREG_REG (SET_SRC (curr_insn_set)))
&& ! CONSTANT_P (SUBREG_REG (SET_SRC (curr_insn_set)))))))
but I can't really justify that on first principles. I think we
should apply the rule consistently until we have a specific reason
for doing otherwise.
gcc/
PR rtl-optimization/96796
* lra-constraints.c (in_class_p): Add a default-false
allow_all_reload_class_changes_p parameter. Do not treat
reload moves specially when the parameter is true.
(get_reload_reg): Try to narrow the class of an existing OP_OUT
reload if we're reloading a reload pseudo in a reload instruction.
gcc/testsuite/
PR rtl-optimization/96796
* gcc.c-torture/compile/pr96796.c: New test.
|
|
gcc/ChangeLog
2020-09-07 Andrea Corallo <andrea.corallo@arm.com>
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Revert
dead-code removal introduced by 09fa6acd8d9 + add a comment to
clarify.
|
|
In d8487c949ad5, MODE_PARTIAL_INT modes were changed from having an
unknown number of undefined bits, to having a known number of undefined
bits, however the documentation on using SUBREG expressions with
MODE_PARTIAL_INT modes was not updated to reflect this.
gcc/ChangeLog:
* doc/rtl.texi (subreg): Fix documentation to state there is a known
number of undefined bits in regs and subregs of MODE_PARTIAL_INT modes.
|
|
430X is the default ISA under normal operation, so even when the MCU name
passed to -mmcu= is unrecognized, it should not be overriden.
gcc/ChangeLog:
* config/msp430/msp430.c (msp430_option_override): Don't set the
ISA to 430 when the MCU is unrecognized.
gcc/testsuite/ChangeLog:
* gcc.target/msp430/430x-default-isa.c: New test.
|
|
Recent changes in debug output have resulted in a change
in the length of the pub types info. This updates the tests to
reflect the new length.
gcc/testsuite/ChangeLog:
* gcc.dg/pubtypes-2.c: Amend Pub Info Length.
* gcc.dg/pubtypes-3.c: Likewise.
* gcc.dg/pubtypes-4.c: Likewise.
|
|
Darwin libc has sincos from 10.9 (darwin13) onwards.
gcc/ChangeLog:
* config/darwin.c (darwin_libc_has_function): Report sincos
available from 10.9.
|
|
Following on from the previous commit to fix up the syntax for
add/sub/adds/subs and friends with a sign/zero-extended operand, this
patch removes the "mult" variants of these patterns which are all
redundant.
This patch removes the following patterns from the AArch64 backend:
*adds_mul_imm_<mode>
*subs_mul_imm_<mode>
*adds_<optab><mode>_multp2
*subs_<optab><mode>_multp2
*add_mul_imm_<mode>
*add_<optab><ALLX:mode>_mult_<GPI:mode>
*add_<optab><SHORT:mode>_mult_si_uxtw
*add_<optab><mode>_multp2
*add_<optab>si_multp2_uxtw
*add_uxt<mode>_multp2
*add_uxtsi_multp2_uxtw
*sub_mul_imm_<mode>
*sub_mul_imm_si_uxtw
*sub_<optab><mode>_multp2
*sub_<optab>si_multp2_uxtw
*sub_uxt<mode>_multp2
*sub_uxtsi_multp2_uxtw
*neg_mul_imm_<mode>2
*neg_mul_imm_si2_uxtw
Together with the following predicates which were used only by these
patterns:
aarch64_pwr_imm3
aarch64_pwr_2_si
aarch64_pwr_2_di
These patterns are all redundant since multiplications by powers of two
should be represented as shfits outside a (mem).
---
gcc/ChangeLog:
* config/aarch64/aarch64.md (*adds_mul_imm_<mode>): Delete.
(*subs_mul_imm_<mode>): Delete.
(*adds_<optab><mode>_multp2): Delete.
(*subs_<optab><mode>_multp2): Delete.
(*add_mul_imm_<mode>): Delete.
(*add_<optab><ALLX:mode>_mult_<GPI:mode>): Delete.
(*add_<optab><SHORT:mode>_mult_si_uxtw): Delete.
(*add_<optab><mode>_multp2): Delete.
(*add_<optab>si_multp2_uxtw): Delete.
(*add_uxt<mode>_multp2): Delete.
(*add_uxtsi_multp2_uxtw): Delete.
(*sub_mul_imm_<mode>): Delete.
(*sub_mul_imm_si_uxtw): Delete.
(*sub_<optab><mode>_multp2): Delete.
(*sub_<optab>si_multp2_uxtw): Delete.
(*sub_uxt<mode>_multp2): Delete.
(*sub_uxtsi_multp2_uxtw): Delete.
(*neg_mul_imm_<mode>2): Delete.
(*neg_mul_imm_si2_uxtw): Delete.
* config/aarch64/predicates.md (aarch64_pwr_imm3): Delete.
(aarch64_pwr_2_si): Delete.
(aarch64_pwr_2_di): Delete.
|
|
Given the following C function:
double *f(double *p, unsigned x)
{
return p + x;
}
prior to this patch, GCC at -O2 would generate:
f:
add x0, x0, x1, uxtw 3
ret
but this add instruction uses architecturally-invalid syntax: the width
of the third operand conflicts with the width of the extension
specifier. The third operand is only permitted to be an x register when
the extension specifier is (u|s)xtx.
This instruction, and analogous insns for adds, sub, subs, and cmp, are
rejected by clang, but accepted by binutils. Assembling and
disassembling such an insn with binutils gives the architecturally-valid
version in the disassembly:
0: 8b214c00 add x0, x0, w1, uxtw #3
This patch fixes several patterns in the AArch64 backend to use the
standard syntax as specified in the Arm ARM such that GCC's output can
be assembled by assemblers other than GAS.
---
gcc/ChangeLog:
* config/aarch64/aarch64.md
(*adds_<optab><ALLX:mode>_<GPI:mode>): Ensure extended operand
agrees with width of extension specifier.
(*subs_<optab><ALLX:mode>_<GPI:mode>): Likewise.
(*adds_<optab><ALLX:mode>_shift_<GPI:mode>): Likewise.
(*subs_<optab><ALLX:mode>_shift_<GPI:mode>): Likewise.
(*add_<optab><ALLX:mode>_<GPI:mode>): Likewise.
(*add_<optab><ALLX:mode>_shft_<GPI:mode>): Likewise.
(*add_uxt<mode>_shift2): Likewise.
(*sub_<optab><ALLX:mode>_<GPI:mode>): Likewise.
(*sub_<optab><ALLX:mode>_shft_<GPI:mode>): Likewise.
(*sub_uxt<mode>_shift2): Likewise.
(*cmp_swp_<optab><ALLX:mode>_reg<GPI:mode>): Likewise.
(*cmp_swp_<optab><ALLX:mode>_shft_<GPI:mode>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
* gcc.target/aarch64/cmp.c: Likewise.
* gcc.target/aarch64/subs3.c: Likewise.
* gcc.target/aarch64/subsp.c: Likewise.
* gcc.target/aarch64/extend-syntax.c: New test.
|
|
This adds additional dumping helping in particular basic-block
vectorization SLP dump reading plus showing what we actually
generate code from.
2020-09-07 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_analyze_slp_instance): Dump
stmts we start SLP analysis from, failure and splitting.
(vect_schedule_slp): Dump SLP graph entry and root stmt
we are about to emit code for.
|
|
This fixes compilation of codepaths for dos-like filesystems
with Clang. When built with clang, it treats C input files as C++
when the compiler driver is invoked in C++ mode, triggering errors
when the return value of strchr() on a pointer to const is assigned
to a pointer to non-const variable.
This matches similar variables outside of the ifdefs for dos-like
path handling.
2020-09-07 Martin Storsjö <martin@martin.st>
gcc/
* dwarf2out.c (file_name_acquire): Make a strchr return value
pointer to const.
libcpp/
* files.c (remap_filename): Make a strchr return value pointer
to const.
|
|
gcc/fortran/ChangeLog:
PR fortran/96896
* resolve.c (get_temp_from_expr): Also reset proc_pointer +
use_assoc attribute.
(resolve_ptr_fcn_assign): Use information from the LHS.
gcc/testsuite/ChangeLog:
PR fortran/96896
* gfortran.dg/ptr_func_assign_4.f08: Update dg-error.
* gfortran.dg/ptr-func-3.f90: New test.
|
|
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-46.c: Add --param vect-epilogues-nomask=0 to
void backend interference.
|
|
The following patch adds streaming of edge goto_locus (both LOCATION_LOCUS
and LOCATION_BLOCK from it), the PR shows a testcase (inappropriate for
gcc testsuite) where the lack of streaming of goto_locus results in worse
debug info.
Earlier version of the patch (without the output_function changes) failed
miserably, because on the order mismatch - input_function would
first input_cfg, then input_eh_regions and then input_bb (all of which now
have locations), while output_function used output_eh_regions, then output_bb
and then output_cfg. *_cfg went to a separate stream...
Now, is there a reason why the order is different?
If the intent is that the cfg could be read separately from the rest of
function or vice versa, alternatively we'd need to clear_line_info ();
before output_eh_regions and before/after output_cfg to make them
independent.
2020-09-07 Jakub Jelinek <jakub@redhat.com>
PR debug/94235
* lto-streamer-out.c (output_cfg): Also stream goto_locus for edges.
Use bp_pack_var_len_unsigned instead of streamer_write_uhwi to stream
e->dest->index and e->flags.
(output_function): Call output_cfg before output_ssa_name, rather than
after streaming all bbs.
* lto-streamer-in.c (input_cfg): Stream in goto_locus for edges.
Use bp_unpack_var_len_unsigned instead of streamer_read_uhwi to stream
in dest_index and edge_flags.
|
|
The following adds the capability to code-generate live lanes in
basic-block vectorization using lane extracts from vector stmts
rather than keeping the original scalar code around for those.
This eventually makes previously not profitable vectorizations
profitable (the live scalar code was appropriately costed so
are the lane extracts now), without considering the cost model
this patch doesn't add or remove any basic-block vectorization
capabilities.
The patch re/ab-uses STMT_VINFO_LIVE_P in basic-block vectorization
mode to tell whether a live lane is vectorized or whether it is
provided by means of keeping the scalar code live.
The patch is a first step towards vectorizing sequences of
stmts that do not end up in stores or vector constructors though.
Bootstrapped and tested on x86_64-unknown-linux-gnu.
2020-09-04 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vectorizable_live_operation): Adjust.
* tree-vect-loop.c (vectorizable_live_operation): Vectorize
live lanes out of basic-block vectorization nodes.
* tree-vect-slp.c (vect_bb_slp_mark_live_stmts): New function.
(vect_slp_analyze_operations): Analyze live lanes and their
vectorization possibility after the whole SLP graph is final.
(vect_bb_slp_scalar_cost): Adjust for vectorized live lanes.
* tree-vect-stmts.c (can_vectorize_live_stmts): Adjust.
(vect_transform_stmt): Call can_vectorize_live_stmts also for
basic-block vectorization.
* gcc.dg/vect/bb-slp-46.c: New testcase.
* gcc.dg/vect/bb-slp-47.c: Likewise.
* gcc.dg/vect/bb-slp-32.c: Adjust.
|
|
gcc/fortran/ChangeLog
* trans-types.c (gfc_get_derived_type): Fix argument types.
|
|
gcc/fortran/ChangeLog
* resolve.c (resolve_select_type): Provide a formal arg list.
|
|
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr92658-avx512bw-trunc.c: Add
-mprefer-vector-width=512 to avoid impact of different default
tune which gcc is built with.
|
|
|
|
gcc/fortran/ChangeLog
* trans-types.c (gfc_get_ppc_type): Add comment.
|