This replaces the current benchmarking framework with nanobench [1], an
MIT licensed single-header benchmarking library, of which I am the
autor. This has in my opinion several advantages, especially on Linux:
* fast: Running all benchmarks takes ~6 seconds instead of 4m13s on
an Intel i7-8700 CPU @ 3.20GHz.
* accurate: I ran e.g. the benchmark for SipHash_32b 10 times and
calculate standard deviation / mean = coefficient of variation:
* 0.57% CV for old benchmarking framework
* 0.20% CV for nanobench
So the benchmark results with nanobench seem to vary less than with
the old framework.
* It automatically determines runtime based on clock precision, no need
to specify number of evaluations.
* measure instructions, cycles, branches, instructions per cycle,
branch misses (only Linux, when performance counters are available)
* output in markdown table format.
* Warn about unstable environment (frequency scaling, turbo, ...)
* For better profiling, it is possible to set the environment variable
NANOBENCH_ENDLESS to force endless running of a particular benchmark
without the need to recompile. This makes it to e.g. run "perf top"
and look at hotspots.
Here is an example copy & pasted from the terminal output:
| ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 2.52 | 396,529,415.94 | 0.6% | 25.42 | 8.02 | 3.169 | 0.06 | 0.0% | 0.03 | `bench/crypto_hash.cpp RIPEMD160`
| 1.87 | 535,161,444.83 | 0.3% | 21.36 | 5.95 | 3.589 | 0.06 | 0.0% | 0.02 | `bench/crypto_hash.cpp SHA1`
| 3.22 | 310,344,174.79 | 1.1% | 36.80 | 10.22 | 3.601 | 0.09 | 0.0% | 0.04 | `bench/crypto_hash.cpp SHA256`
| 2.01 | 496,375,796.23 | 0.0% | 18.72 | 6.43 | 2.911 | 0.01 | 1.0% | 0.00 | `bench/crypto_hash.cpp SHA256D64_1024`
| 7.23 | 138,263,519.35 | 0.1% | 82.66 | 23.11 | 3.577 | 1.63 | 0.1% | 0.00 | `bench/crypto_hash.cpp SHA256_32b`
| 3.04 | 328,780,166.40 | 0.3% | 35.82 | 9.69 | 3.696 | 0.03 | 0.0% | 0.03 | `bench/crypto_hash.cpp SHA512`
[1] https://github.com/martinus/nanobench
* Adds support for asymptotes
This adds support to calculate asymptotic complexity of a benchmark.
This is similar to #17375, but currently only one asymptote is
supported, and I have added support in the benchmark `ComplexMemPool`
as an example.
Usage is e.g. like this:
```
./bench_bitcoin -filter=ComplexMemPool -asymptote=25,50,100,200,400,600,800
```
This runs the benchmark `ComplexMemPool` several times but with
different complexityN settings. The benchmark can extract that number
and use it accordingly. Here, it's used for `childTxs`. The output is
this:
| complexityN | ns/op | op/s | err% | ins/op | cyc/op | IPC | total | benchmark
|------------:|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|----------:|:----------
| 25 | 1,064,241.00 | 939.64 | 1.4% | 3,960,279.00 | 2,829,708.00 | 1.400 | 0.01 | `ComplexMemPool`
| 50 | 1,579,530.00 | 633.10 | 1.0% | 6,231,810.00 | 4,412,674.00 | 1.412 | 0.02 | `ComplexMemPool`
| 100 | 4,022,774.00 | 248.58 | 0.6% | 16,544,406.00 | 11,889,535.00 | 1.392 | 0.04 | `ComplexMemPool`
| 200 | 15,390,986.00 | 64.97 | 0.2% | 63,904,254.00 | 47,731,705.00 | 1.339 | 0.17 | `ComplexMemPool`
| 400 | 69,394,711.00 | 14.41 | 0.1% | 272,602,461.00 | 219,014,691.00 | 1.245 | 0.76 | `ComplexMemPool`
| 600 | 168,977,165.00 | 5.92 | 0.1% | 639,108,082.00 | 535,316,887.00 | 1.194 | 1.86 | `ComplexMemPool`
| 800 | 310,109,077.00 | 3.22 | 0.1% |1,149,134,246.00 | 984,620,812.00 | 1.167 | 3.41 | `ComplexMemPool`
| coefficient | err% | complexity
|--------------:|-------:|------------
| 4.78486e-07 | 4.5% | O(n^2)
| 6.38557e-10 | 21.7% | O(n^3)
| 3.42338e-05 | 38.0% | O(n log n)
| 0.000313914 | 46.9% | O(n)
| 0.0129823 | 114.4% | O(log n)
| 0.0815055 | 133.8% | O(1)
The best fitting curve is O(n^2), so the algorithm seems to scale
quadratic with `childTxs` in the range 25 to 800.
e6e622e5a0 Implement O(1) OP_IF/NOTIF/ELSE/ENDIF logic (Pieter Wuille)
d0e8f4d5d8 [refactor] interpreter: define interface for vfExec (Anthony Towns)
89fb241c54 Benchmark script verification with 100 nested IFs (Pieter Wuille)
Pull request description:
While investigating what mechanisms are possible to maximize the per-opcode verification cost of scripts, I noticed that the logic for determining whether a particular opcode is to be executed is O(n) in the nesting depth. This issue was also pointed out by Sergio Demian Lerner in https://bitslog.wordpress.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/, and this PR implements a variant of the O(1) algorithm suggested there.
This is not a problem currently, because even with a nesting depth of 100 (the maximum possible right now due to the 201 ops limit), the slowdown caused by this on my machine is around 70 ns per opcode (or 0.25 s per block) at worst, far lower than what is possible with other opcodes.
This PR mostly serves as a proof of concept that it's possible to avoid it, which may be relevant in discussions around increasing the opcode limits in future script versions. Without it, the execution time of scripts can grow quadratically with the nesting depth, which very quickly becomes unreasonable.
This improves upon #14245 by completely removing the `vfExec` vector.
ACKs for top commit:
jnewbery:
Code review ACK e6e622e5a0
MarcoFalke:
ACK e6e622e5a0🐴
fjahr:
ACK e6e622e5a0
ajtowns:
ACK e6e622e5a0
laanwj:
concept and code review ACK e6e622e5a0
jonatack:
ACK e6e622e5a0 code review, build, benches, fuzzing
Tree-SHA512: 1dcfac3411ff04773de461959298a177f951cb5f706caa2734073bcec62224d7cd103767cfeef85cd129813e70c14c74fa8f1e38e4da70ec38a0f615aab1f7f7
prototypes used in src/test/script_tests.cpp:
- CMutableTransaction BuildCreditingTransaction(const CScript& scriptPubKey, int nValue = 0);
- CMutableTransaction BuildSpendingTransaction(const CScript& scriptSig, const CScriptWitness& scriptWitness, const CTransaction& txCredit);
prototypes used in bench/verify_script.cpp:
- CMutableTransaction BuildCreditingTransaction(const CScript& scriptPubKey);
- CMutableTransaction BuildSpendingTransaction(const CScript& scriptSig, const CMutableTransaction& txCredit);
The more generic versions from the script tests are moved into a new file pair
transaction_utils.cpp/h and the calls are adapted accordingly in the
verify_script benchmark (passing the nValue of 1 explicitely for
BuildCreditingTransaction(), passing empty scriptWitness explicitely and
converting txCredit parameter to CTransaction in BuildSpendingTransaction()).
e306be7429 Use 72 byte dummy signatures when watching only inputs may be used (Andrew Chow)
48b1473c89 Use 71 byte signature for DUMMY_SIGNATURE_CREATOR (Andrew Chow)
18dfea0dd0 Always create 70 byte signatures with low R values (Andrew Chow)
Pull request description:
When creating signatures for transactions, always make one which has a 32 byte or smaller R and 32 byte or smaller S value. This results in signatures that are always less than 71 bytes (32 byte R + 32 byte S + 6 bytes DER + 1 byte sighash) with low R values. In most cases, the signature will be 71 bytes.
Because R is not mutable in the same way that S is, a low R value can only be found by trying different nonces. RFC 6979 for deterministic nonce generation has the option to specify additional entropy, so we simply use that and add a uin32_t counter which we increment in order to try different nonces. Nonces are sill deterministically generated as the nonce used will the be the first one where the counter results in a nonce that results in a low R value. Because different nonces need to be tried, time to produce a signature does increase. On average, it takes twice as long to make a signature as two signatures need to be created, on average, to find one with a low R.
Having a fixed size signature makes size calculations easier and also saves half a byte of transaction size, on average.
DUMMY_SIGNATURE_CREATOR has been modified to produce 71 byte dummy signatures instead of 72 byte signatures.
Tree-SHA512: 3cd791505126ce92da7c631856a97ba0b59e87d9c132feff6e0eef1dc47768e81fbb38bfbe970371bedf9714b7f61a13a5fe9f30f962c81734092a4d19a4ef33
When extra entropy is not specified by the caller, CKey::Sign will
now always create a signature that has a low R value and is at most
70 bytes. The resulting signature on the stack will be 71 bytes when
the sighash byte is included.
Using low R signatures means that the resulting DER encoded signature
will never need to have additional padding to account for high R
values.
In addition to having the scriptSig and scriptWitness, have SignatureData
also be able to store just the signatures (pubkeys mapped to sigs) and
scripts (script ids mapped to scripts).
Also have DataFromTransaction be able to extract signatures and scripts
from the scriptSig and scriptWitness of an input to put them in SignatureData.
Adds a new SignatureChecker which takes a SignatureData and puts pubkeys
and signatures into it when it successfully verifies a signature.
Adds a new field in SignatureData which stores whether the SignatureData
was complete. This allows us to also update the scriptSig and
scriptWitness to the final one when updating a SignatureData with another
one.
* inline performance critical code
* Average runtime is specified and used to calculate iterations.
* Console: show median of multiple runs
* plot: show box plot
* filter benchmarks
* specify scaling factor
* ignore src/test and src/bench in command line check script
* number of iterations instead of time
* Replaced runtime in BENCHMARK makro number of iterations.
* Added -? to bench_bitcoin
* Benchmark plotly.js URL, width, height can be customized
* Fixed incorrect precision warning
Avoid static analyzer warnings regarding "Function call argument
is a pointer to uninitialized value" in cases where we are
intentionally using such arguments.
This is achieved by using ...
`f(b.begin(), b.end())` (`std::array<char, N>`)
... instead of ...
`f(b, b + N)` (`char b[N]`)
Rationale:
* Reduce false positives by guiding static analyzers regarding our
intentions.
Before this commit:
```
$ clang-tidy-3.5 -checks=* src/bench/base58.cpp
bench/base58.cpp:23:9: warning: Function call argument is a pointer to uninitialized value [clang-analyzer-core.CallAndMessage]
EncodeBase58(b, b + 32);
^
$ clang-tidy-3.5 -checks=* src/bench/verify_script.cpp
bench/verify_script.cpp:59:5: warning: Function call argument is a pointer to uninitialized value [clang-analyzer-core.CallAndMessage]
key.Set(vchKey, vchKey + 32, false);
^
$
```
After this commit:
```
$ clang-tidy-3.5 -checks=* src/bench/base58.cpp
$ clang-tidy-3.5 -checks=* src/bench/verify_script.cpp
$
```
5113474 wallet: Use CDataStream.data() (Wladimir J. van der Laan)
e2300ff bench: Use CDataStream.data() (Wladimir J. van der Laan)
adff950 dbwrapper: Use new .data() method of CDataStream (Wladimir J. van der Laan)
a2141e4 streams: Remove special cases for ancient MSVC (Wladimir J. van der Laan)
af4c44c streams: Add data() method to CDataStream (Wladimir J. van der Laan)
The new benchmarks exercise script validation, CCoinsDBView caching,
mempool eviction, and wallet coin selection code.
All of the benchmarks added here are extremely simple and don't
necessarily mirror common real world conditions or interesting
performance edge cases. Details about how specific benchmarks can be
improved are noted in comments.
Github-Issue: #7883