Commit graph

13 commits

Author SHA1 Message Date
MarcoFalke
fa0074e2d8
scripted-diff: Bump copyright headers
-BEGIN VERIFY SCRIPT-
./contrib/devtools/copyright_header.py update ./
-END VERIFY SCRIPT-
2020-12-31 09:45:41 +01:00
Martin Ankerl
78c312c983 Replace current benchmarking framework with nanobench
This replaces the current benchmarking framework with nanobench [1], an
MIT licensed single-header benchmarking library, of which I am the
autor. This has in my opinion several advantages, especially on Linux:

* fast: Running all benchmarks takes ~6 seconds instead of 4m13s on
  an Intel i7-8700 CPU @ 3.20GHz.

* accurate: I ran e.g. the benchmark for SipHash_32b 10 times and
  calculate standard deviation / mean = coefficient of variation:

  * 0.57% CV for old benchmarking framework
  * 0.20% CV for nanobench

  So the benchmark results with nanobench seem to vary less than with
  the old framework.

* It automatically determines runtime based on clock precision, no need
  to specify number of evaluations.

* measure instructions, cycles, branches, instructions per cycle,
  branch misses (only Linux, when performance counters are available)

* output in markdown table format.

* Warn about unstable environment (frequency scaling, turbo, ...)

* For better profiling, it is possible to set the environment variable
  NANOBENCH_ENDLESS to force endless running of a particular benchmark
  without the need to recompile. This makes it to e.g. run "perf top"
  and look at hotspots.

Here is an example copy & pasted from the terminal output:

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                2.52 |      396,529,415.94 |    0.6% |           25.42 |            8.02 |  3.169 |           0.06 |    0.0% |      0.03 | `bench/crypto_hash.cpp RIPEMD160`
|                1.87 |      535,161,444.83 |    0.3% |           21.36 |            5.95 |  3.589 |           0.06 |    0.0% |      0.02 | `bench/crypto_hash.cpp SHA1`
|                3.22 |      310,344,174.79 |    1.1% |           36.80 |           10.22 |  3.601 |           0.09 |    0.0% |      0.04 | `bench/crypto_hash.cpp SHA256`
|                2.01 |      496,375,796.23 |    0.0% |           18.72 |            6.43 |  2.911 |           0.01 |    1.0% |      0.00 | `bench/crypto_hash.cpp SHA256D64_1024`
|                7.23 |      138,263,519.35 |    0.1% |           82.66 |           23.11 |  3.577 |           1.63 |    0.1% |      0.00 | `bench/crypto_hash.cpp SHA256_32b`
|                3.04 |      328,780,166.40 |    0.3% |           35.82 |            9.69 |  3.696 |           0.03 |    0.0% |      0.03 | `bench/crypto_hash.cpp SHA512`

[1] https://github.com/martinus/nanobench

* Adds support for asymptotes

  This adds support to calculate asymptotic complexity of a benchmark.
  This is similar to #17375, but currently only one asymptote is
  supported, and I have added support in the benchmark `ComplexMemPool`
  as an example.

  Usage is e.g. like this:

  ```
  ./bench_bitcoin -filter=ComplexMemPool -asymptote=25,50,100,200,400,600,800
  ```

  This runs the benchmark `ComplexMemPool` several times but with
  different complexityN settings. The benchmark can extract that number
  and use it accordingly. Here, it's used for `childTxs`. The output is
  this:

  | complexityN |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |     total | benchmark
  |------------:|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|----------:|:----------
  |          25 |        1,064,241.00 |              939.64 |    1.4% |    3,960,279.00 |    2,829,708.00 |  1.400 |      0.01 | `ComplexMemPool`
  |          50 |        1,579,530.00 |              633.10 |    1.0% |    6,231,810.00 |    4,412,674.00 |  1.412 |      0.02 | `ComplexMemPool`
  |         100 |        4,022,774.00 |              248.58 |    0.6% |   16,544,406.00 |   11,889,535.00 |  1.392 |      0.04 | `ComplexMemPool`
  |         200 |       15,390,986.00 |               64.97 |    0.2% |   63,904,254.00 |   47,731,705.00 |  1.339 |      0.17 | `ComplexMemPool`
  |         400 |       69,394,711.00 |               14.41 |    0.1% |  272,602,461.00 |  219,014,691.00 |  1.245 |      0.76 | `ComplexMemPool`
  |         600 |      168,977,165.00 |                5.92 |    0.1% |  639,108,082.00 |  535,316,887.00 |  1.194 |      1.86 | `ComplexMemPool`
  |         800 |      310,109,077.00 |                3.22 |    0.1% |1,149,134,246.00 |  984,620,812.00 |  1.167 |      3.41 | `ComplexMemPool`

  |   coefficient |   err% | complexity
  |--------------:|-------:|------------
  |   4.78486e-07 |   4.5% | O(n^2)
  |   6.38557e-10 |  21.7% | O(n^3)
  |   3.42338e-05 |  38.0% | O(n log n)
  |   0.000313914 |  46.9% | O(n)
  |     0.0129823 | 114.4% | O(log n)
  |     0.0815055 | 133.8% | O(1)

  The best fitting curve is O(n^2), so the algorithm seems to scale
  quadratic with `childTxs` in the range 25 to 800.
2020-06-13 12:24:18 +02:00
MarcoFalke
aaaaad6ac9
scripted-diff: Bump copyright of files changed in 2019
-BEGIN VERIFY SCRIPT-
./contrib/devtools/copyright_header.py update ./
-END VERIFY SCRIPT-
2019-12-30 10:42:20 +13:00
practicalswift
084e17cebd Remove unused includes 2019-10-15 22:56:43 +00:00
João Barbosa
d2dbc7da26 bench: Add benchmark for CRollingBloomFilter::reset 2019-05-22 15:55:50 +01:00
MarcoFalke
fea4e9eca5
Merge #13767: Remove redundant assignments (dead stores)
dd777f3e12 Remove unused variable (practicalswift)
cdf4089457 Remove redundant assignments (dead stores) (practicalswift)

Pull request description:

  Remove redundant assignments (dead stores).

Tree-SHA512: e852059b22a161c34a0f18a6a6ed798e2b35e6d2b9f23c526af0ec33e01f6a5bb1fa5ada6671ba183d7b02393ff0d397be5aa4b4e2edbd5e604c9a76ac48d249
2018-08-27 13:39:46 -04:00
practicalswift
cdf4089457 Remove redundant assignments (dead stores) 2018-08-02 14:30:53 +02:00
DrahtBot
eb7daf4d60 Update copyright headers to 2018 2018-07-27 07:15:02 -04:00
Akira Takizawa
595a7bab23 Increment MIT Licence copyright header year on files modified in 2017 2018-01-03 02:26:56 +09:00
Martin Ankerl
00721e69f8 Improved microbenchmarking with multiple features.
* inline performance critical code
* Average runtime is specified and used to calculate iterations.
* Console: show median of multiple runs
* plot: show box plot
* filter benchmarks
* specify scaling factor
* ignore src/test and src/bench in command line check script
* number of iterations instead of time
* Replaced runtime in BENCHMARK makro number of iterations.
* Added -? to bench_bitcoin
* Benchmark plotly.js URL, width, height can be customized
* Fixed incorrect precision warning
2017-12-23 11:03:17 +01:00
MeshCollider
1a445343f6 scripted-diff: Replace #include "" with #include <> (ryanofsky)
-BEGIN VERIFY SCRIPT-
for f in \
  src/*.cpp \
  src/*.h \
  src/bench/*.cpp \
  src/bench/*.h \
  src/compat/*.cpp \
  src/compat/*.h \
  src/consensus/*.cpp \
  src/consensus/*.h \
  src/crypto/*.cpp \
  src/crypto/*.h \
  src/crypto/ctaes/*.h \
  src/policy/*.cpp \
  src/policy/*.h \
  src/primitives/*.cpp \
  src/primitives/*.h \
  src/qt/*.cpp \
  src/qt/*.h \
  src/qt/test/*.cpp \
  src/qt/test/*.h \
  src/rpc/*.cpp \
  src/rpc/*.h \
  src/script/*.cpp \
  src/script/*.h \
  src/support/*.cpp \
  src/support/*.h \
  src/support/allocators/*.h \
  src/test/*.cpp \
  src/test/*.h \
  src/wallet/*.cpp \
  src/wallet/*.h \
  src/wallet/test/*.cpp \
  src/wallet/test/*.h \
  src/zmq/*.cpp \
  src/zmq/*.h
do
  base=${f%/*}/ relbase=${base#src/} sed -i "s:#include \"\(.*\)\"\(.*\):if test -e \$base'\\1'; then echo \"#include <\"\$relbase\"\\1>\\2\"; else echo \"#include <\\1>\\2\"; fi:e" $f
done
-END VERIFY SCRIPT-
2017-11-16 08:23:01 +13:00
Cory Fields
c515d266ec bench: switch to std::chrono for time measurements
std::chrono removes portability issues.

Rather than storing doubles, store the untouched time_points. Then
convert to nanoseconds for display. This allows for maximum precision, while
keeping results comparable between differing hardware/operating systems.

Also, display full nanosecond counts rather than sub-second floats.
2017-11-07 17:15:58 -05:00
Pieter Wuille
aa62b68745 Benchmark rolling bloom filter 2016-04-28 14:56:32 +02:00