diff options
author | Eric <eric@efcs.ca> | 2024-04-13 03:16:11 -0400 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-04-13 03:16:11 -0400 |
commit | 11f22f1a963ab3c8949cb723a63c07d7a409c8a8 (patch) | |
tree | 257709dccd655f68bb84255e180aa49929ec4d73 /libcxx/benchmarks | |
parent | 6d66db3890a18e3926a49cbfeb28e99c464cfcd5 (diff) | |
download | llvm-11f22f1a963ab3c8949cb723a63c07d7a409c8a8.zip llvm-11f22f1a963ab3c8949cb723a63c07d7a409c8a8.tar.gz llvm-11f22f1a963ab3c8949cb723a63c07d7a409c8a8.tar.bz2 |
[tzdb] Replace shared_mutex with mutex. (#87929)
The overhead of taking a std::mutex is much lower than taking a reader
lock on a shared mutex, even under heavy contention.
The benefit of shared_mutex only occurs as the amount of
time spent in the critical sections grows large enough.
In our case all we do is read a pointer and return the lock.
As a result, using a shared lock can be ~50%-100% slower
Here are the results for the provided benchmark on my machine:
```
2024-04-07T12:48:51-04:00
Running ./libcxx/benchmarks/shared_mutex_vs_mutex.libcxx.out
Run on (12 X 400 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x6)
L1 Instruction 32 KiB (x6)
L2 Unified 1024 KiB (x6)
L3 Unified 32768 KiB (x1)
Load Average: 2.70, 2.70, 1.63
---------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------
BM_shared_mutex/threads:1 13.9 ns 13.9 ns 50533700
BM_shared_mutex/threads:2 34.5 ns 68.9 ns 9957784
BM_shared_mutex/threads:4 38.4 ns 137 ns 4987772
BM_shared_mutex/threads:8 51.1 ns 358 ns 1974160
BM_shared_mutex/threads:32 57.1 ns 682 ns 1043648
BM_mutex/threads:1 5.54 ns 5.53 ns 125867422
BM_mutex/threads:2 15.5 ns 30.9 ns 21830116
BM_mutex/threads:4 15.4 ns 57.2 ns 12136920
BM_mutex/threads:8 19.3 ns 140 ns 4997080
BM_mutex/threads:32 20.8 ns 252 ns 2859808
```
Diffstat (limited to 'libcxx/benchmarks')
-rw-r--r-- | libcxx/benchmarks/CMakeLists.txt | 1 | ||||
-rw-r--r-- | libcxx/benchmarks/shared_mutex_vs_mutex.bench.cpp | 41 |
2 files changed, 42 insertions, 0 deletions
diff --git a/libcxx/benchmarks/CMakeLists.txt b/libcxx/benchmarks/CMakeLists.txt index 928238c..527a2acf 100644 --- a/libcxx/benchmarks/CMakeLists.txt +++ b/libcxx/benchmarks/CMakeLists.txt @@ -221,6 +221,7 @@ set(BENCHMARK_TESTS map.bench.cpp monotonic_buffer.bench.cpp ordered_set.bench.cpp + shared_mutex_vs_mutex.bench.cpp stop_token.bench.cpp std_format_spec_string_unicode.bench.cpp string.bench.cpp diff --git a/libcxx/benchmarks/shared_mutex_vs_mutex.bench.cpp b/libcxx/benchmarks/shared_mutex_vs_mutex.bench.cpp new file mode 100644 index 0000000..19d13b7 --- /dev/null +++ b/libcxx/benchmarks/shared_mutex_vs_mutex.bench.cpp @@ -0,0 +1,41 @@ +//===----------------------------------------------------------------------===// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +// This benchmark compares the performance of std::mutex and std::shared_mutex in contended scenarios. +// it's meant to establish a baseline overhead for std::shared_mutex and std::mutex, and to help inform decisions about +// which mutex to use when selecting a mutex type for a given use case. + +#include <atomic> +#include <mutex> +#include <numeric> +#include <shared_mutex> +#include <thread> + +#include "benchmark/benchmark.h" + +int global_value = 42; +std::mutex m; +std::shared_mutex sm; + +static void BM_shared_mutex(benchmark::State& state) { + for (auto _ : state) { + std::shared_lock<std::shared_mutex> lock(sm); + benchmark::DoNotOptimize(global_value); + } +} + +static void BM_mutex(benchmark::State& state) { + for (auto _ : state) { + std::lock_guard<std::mutex> lock(m); + benchmark::DoNotOptimize(global_value); + } +} + +BENCHMARK(BM_shared_mutex)->Threads(1)->Threads(2)->Threads(4)->Threads(8)->Threads(32); +BENCHMARK(BM_mutex)->Threads(1)->Threads(2)->Threads(4)->Threads(8)->Threads(32); + +BENCHMARK_MAIN(); |