diff options
author | Matthew Malcomson <matthew.malcomson@arm.com> | 2023-08-04 11:26:47 +0100 |
---|---|---|
committer | Matthew Malcomson <matthew.malcomson@arm.com> | 2023-08-04 11:26:47 +0100 |
commit | 0782b01c9ea43d43648071faa9c65a101f5068a2 (patch) | |
tree | d85aba0ce955503c80d3f2f06d7dc2eedb13cdd6 /gcc/tree-vect-patterns.cc | |
parent | 04aa0edcace22a7815cfc57575f1f7b1f166ac10 (diff) | |
download | gcc-0782b01c9ea43d43648071faa9c65a101f5068a2.zip gcc-0782b01c9ea43d43648071faa9c65a101f5068a2.tar.gz gcc-0782b01c9ea43d43648071faa9c65a101f5068a2.tar.bz2 |
mid-end: Use integral time intervals in timevar.cc
On some AArch64 bootstrapped builds, we were getting a flaky test
because the floating point operations in `get_time` were being fused
with the floating point operations in `timevar_accumulate`.
This meant that the rounding behaviour of our multiplication with
`ticks_to_msec` was different when used in `timer::start` and when
performed in `timer::stop`. These extra inaccuracies led to the
testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
------------------------------
Avoiding the inlining which was agreed to be undesirable. Three
alternative approaches:
1) Use `-ffp-contract=on` to avoid this particular optimisation.
2) Adjusting the code so that the "tolerance" is always of the order of
a "tick".
3) Recording times and elapsed differences in integral values.
- Could be in terms of a standard measurement (e.g. nanoseconds or
microseconds).
- Could be in terms of whatever integral value ("ticks" /
secondsµseconds / "clock ticks") is returned from the syscall
chosen at configure time.
While `-ffp-contract=on` removes the problem that I bumped into, there
has been a similar bug on x86 that was to do with a different floating
point problem that also happens after `get_time` and
`timevar_accumulate` both being inlined into the same function. Hence
it seems worth choosing a different approach.
Of the two other solutions, recording measurements in integral values
seems the most robust against slightly "off" measurements being
presented to the user -- even though it could avoid the ICE that creates
a flaky test.
I considered storing time in whatever units our syscall returns and
normalising them at the time we print out rather than normalising them
to nanoseconds at the point we record our "current time". The logic
being that normalisation could have some rounding affect (e.g. if
TICKS_PER_SECOND is 3) that would be taken into account in calculations.
I decided against it in order to give the values recorded in
`timevar_time_def` some interpretive value so it's easier to read the
code. Compared to the small rounding that would represent a tiny amount
of time and AIUI can not trigger the same kind of ICE's as we are
attempting to fix, said interpretive value seems more valuable.
Recording time in microseconds seemed reasonable since all obvious
values for ticks and `getrusage` are at microsecond granularity or less
precise. That said, since TICKS_PER_SECOND and CLOCKS_PER_SEC are both
variables given to use by the host system I was not sure of that enough
to make this decision.
------------------------------
timer::all_zero is ignoring rows which are inconsequential to the user
and would be printed out as all zeros. Since upon printing rows we
convert to the same double value and print out the same precision as
before, we return true/false based on the same amount of time as before.
timer::print_row casts to a floating point measurement in units of
seconds as was printed out before.
timer::validate_phases -- I'm printing out nanoseconds here rather than
floating point seconds since this is an error message for when things
have "gone wrong" printing out the actual nanoseconds that have been
recorded seems like the best approach.
N.b. since we now print out nanoseconds instead of floating point value
the padding requirements are different. Originally we were padding to
24 characters and printing 18 decimal places. This looked odd with the
now visually smaller values getting printed. I judged 13 characters
(corresponding to 2 hours) to be a reasonable point at which our
alignment could start to degrade and this provides a more compact output
for the majority of cases (checked by triggering the error case via
GDB).
------------------------------
N.b. I use a literal 1000000000 for "NANOSEC_PER_SEC". I believe this
would fit in an integer on all hosts that GCC supports, but am not
certain there are not strange integer sizes we support hence am pointing
it out for special attention during review.
------------------------------
No expected change in generated code.
Bootstrapped and regtested on AArch64 with no regressions.
Hope this is acceptable -- I had originally planned to use
`-ffp-contract` as agreed until I saw mention of the old x86 bug in the
same area which was not to do with floating point contraction of
operations (PR 99903).
gcc/ChangeLog:
PR middle-end/110316
PR middle-end/9903
* timevar.cc (NANOSEC_PER_SEC, TICKS_TO_NANOSEC,
CLOCKS_TO_NANOSEC, nanosec_to_floating_sec, percent_of): New.
(TICKS_TO_MSEC, CLOCKS_TO_MSEC): Remove these macros.
(timer::validate_phases): Use integral arithmetic to check
validity.
(timer::print_row, timer::print): Convert from integral
nanoseconds to floating point seconds before printing.
(timer::all_zero): Change limit to nanosec count instead of
fractional count of seconds.
(make_json_for_timevar_time_def): Convert from integral
nanoseconds to floating point seconds before recording.
* timevar.h (struct timevar_time_def): Update all measurements
to use uint64_t nanoseconds rather than seconds stored in a
double.
Diffstat (limited to 'gcc/tree-vect-patterns.cc')
0 files changed, 0 insertions, 0 deletions