From 126793971bee0e92bea237823bdc51a594951faa Mon Sep 17 00:00:00 2001 From: Cassio Neri Date: Wed, 24 Feb 2021 17:37:36 +0000 Subject: libstdc++: More efficient is_leap This patch reimplements std::chrono::year::is_leap(). Leap year check is ubiquitously implemented (including here) as: y % 4 == 0 && (y % 100 != 0 || y % 400 == 0). The rationale being that testing divisibility by 4 first implies an earlier return for 75% of the cases, therefore, avoiding the needless calculations of y % 100 and y % 400. Although this fact is true, it does not take into account the cost of branching. This patch, instead, tests divisibility by 100 first: (y % 100 != 0 || y % 400 == 0) && y % 4 == 0. It is certainly counterintuitive that this could be more efficient since among the three divisibility tests (4, 100 and 400) the one by 100 is the only one that can never provide a definitive answer and a second divisibility test (by 4 or 400) is always required. However, measurements [1] in x86_64 suggest this is 3x more efficient! A possible explanation is that checking divisibility by 100 first implies a split in the execution path with probabilities of (1%, 99%) rather than (25%, 75%) when divisibility by 4 is checked first. This decreases the entropy of the branching distribution which seems to help prediction. Given that y belongs to [-32767, 32767] [time.cal.year.members], a more efficient algorithm [2] to check divisibility by 100 is used (instead of y % 100 != 0). Measurements suggest that this optimization improves performance by 20%. The patch adds a test that exhaustively compares the result of this implementation with the ubiquitous one for all y in [-32767, 32767]. Although its completeness, the test completes in a matter of seconds. References: [1] https://stackoverflow.com/a/60646967/1137388 [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16 libstdc++-v3/ChangeLog: * include/std/chrono (year::is_leap): New implementation. * testsuite/std/time/year/2.cc: New test. --- libstdc++-v3/include/std/chrono | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) (limited to 'libstdc++-v3/include') diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index b031678..3ba35a5 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -1597,7 +1597,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION constexpr bool is_leap() const noexcept - { return _M_y % 4 == 0 && (_M_y % 100 != 0 || _M_y % 400 == 0); } + { + // Testing divisibility by 100 first gives better performance, that is, + // return (_M_y % 100 != 0 || _M_y % 400 == 0) && _M_y % 4 == 0; + + // It gets even faster if _M_y is in [-536870800, 536870999] + // (which is the case here) and _M_y % 100 is replaced by + // __is_multiple_of_100 below. + + // References: + // [1] https://github.com/cassioneri/calendar + // [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16 + + constexpr uint32_t __multiplier = 42949673; + constexpr uint32_t __bound = 42949669; + constexpr uint32_t __max_dividend = 1073741799; + constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100; + const bool __is_multiple_of_100 + = __multiplier * (_M_y + __offset) < __bound; + return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0; + } explicit constexpr operator int() const noexcept -- cgit v1.1