Remove the variant of ChaCha20Poly1305 AEAD that was previously added in
anticipation of BIP324 using it. BIP324 was updated to instead use rekeying
wrappers around otherwise unmodified versions of the ChaCha20 stream cipher
and the ChaCha20Poly1305 AEAD as specified in RFC8439.
A memory resource similar to std::pmr::unsynchronized_pool_resource, but
optimized for node-based containers.
Co-Authored-By: Pieter Wuille <pieter@wuille.net>
db929893ef Faster -reindex by initially deserializing only headers (Larry Ruane)
c72de9990a util: add CBufferedFile::SkipTo() to move ahead in the stream (Larry Ruane)
48a68908ba Add LoadExternalBlockFile() benchmark (Larry Ruane)
Pull request description:
### Background
During the first part of reindexing, `LoadExternalBlockFile()` sequentially reads raw blocks from the `blocks/blk00nnn.dat` files (rather than receiving them from peers, as with initial block download) and eventually adds all of them to the block index. When an individual block is initially read, it can't be immediately added unless all its ancestors have been added, which is rare (only about 8% of the time), because the blocks are not sorted by height. When the block can't be immediately added to the block index, its disk location is saved in a map so it can be added later. When its parent is later added to the block index, `LoadExternalBlockFile()` reads and deserializes the block from disk a second time and adds it to the block index. Most blocks (92%) get deserialized twice.
### This PR
During the initial read, it's rarely useful to deserialize the entire block; only the header is needed to determine if the block can be added to the block index immediately. This change to `LoadExternalBlockFile()` initially deserializes only a block's header, then deserializes the entire block only if it can be added immediately. This reduces reindex time on mainnet by 7 hours on a Raspberry Pi, which translates to around a 25% reduction in the first part of reindexing (adding blocks to the index), and about a 6% reduction in overall reindex time.
Summary: The performance gain is the result of deserializing each block only once, except its header which is deserialized twice, but the header is only 80 bytes.
ACKs for top commit:
andrewtoth:
ACK db929893ef
achow101:
ACK db929893ef
aureleoules:
ACK db929893ef - minor changes and new benchmark since last review
theStack:
re-ACK db929893ef
stickies-v:
re-ACK db929893e
Tree-SHA512: 5a5377192c11edb5b662e18f511c9beb8f250bc88aeadf2f404c92c3232a7617bade50477ebf16c0602b9bd3b68306d3ee7615de58acfd8cae664d28bb7b0136
Goal 1:
Benchmark the transaction creation process for pre-selected-inputs only.
Setting `m_allow_other_inputs=false` to disallow the wallet to include coins automatically.
Goal 2:
Benchmark the transaction creation process for pre-selected-inputs and coin selection.
-----------------------
Benchmark Setup:
1) Generates a 5k blockchain, loading the wallet with 5k transactions with two outputs each.
2) Fetch 4 random UTXO from the wallet's available coins and pre-select them as inputs inside CoinControl.
Benchmark (Goal 1):
Call `CreateTransaction` providing the coin control, who has set `m_allow_other_inputs=false` and
the manually selected coins.
Benchmark (Goal 2):
Call `CreateTransaction` providing the coin control, who has set `m_allow_other_inputs=true` and
the manually selected coins.
035fa1f07a build: Remove LIBTOOL_APP_LDFLAGS for bitcoin-chainstate (Cory Fields)
3f0595095d docs: Add libbitcoinkernel_la_SOURCES explanation (Carl Dong)
94ad45deb2 ci: Build libbitcoinkernel (Carl Dong)
26b2e7ffb3 build: Extract the libbitcoinkernel library (Carl Dong)
1df44dd20c b-cs: Define G_TRANSLATION_FUN in bitcoinkernel.cpp (Carl Dong)
83a0bb7cc9 build: Separate lib_LTLIBRARIES initialization (Carl Dong)
c1e16cb31f build: Create .la library for bitcoincrypto (Carl Dong)
8bdfe057c7 build: Create .la library for leveldb (Carl Dong)
05d1525b6d build: Create .la library for crc32c (Carl Dong)
64caf94479 build: Remove vestigial LIBLEVELDB_SSE42 (Carl Dong)
1392e8e2d8 build: Don't add unrelated libs to LIBTEST_* (Carl Dong)
Pull request description:
Part of: #24303
This PR introduces a `libbitcoinkernel` static library linking in the minimal list of files necessary to use our consensus engine as-is. `bitcoin-chainstate` introduced in #24304 now will link against `libbitcoinkernel`.
Most of the changes are related to the build system.
Please read the commit messages for more details.
ACKs for top commit:
theuni:
This may be my favorite PR ever. It's a privilege to ACK 035fa1f07a.
Tree-SHA512: b755edc3471c7c1098847e9b16ab182a6abb7582563d9da516de376a770ac7543c6fdb24238ddd4d3d2d458f905a0c0614b8667aab182aa7e6b80c1cca7090bc
- LIBLEVELDB_SSE42_INT was defined, but never referenced anywhere
- LIBLEVELDB_SSE42 is referenced, but never defined anywhere
Apparently leveldb used to have platform-specific crc32 code before it
got split off into a separate lib.
This was used to, in effect, manually emulate --start-group/--end-group.
However, we can just order the libraries correctly and avoid specifying
libraries multiple times on the link line.
Note: lld (not ld.bfd) knows how to resolve out-of-order references and
doesn't seem to need the reodering
fafe06c379 bench: Sort bench_bench_bitcoin_SOURCES (MarcoFalke)
fa31dc9b71 bench: Add logging benchmark (MarcoFalke)
Pull request description:
Might make finding performance bottlenecks or regressions (https://github.com/bitcoin/bitcoin/pull/17218) easier.
For example, fuzzing relies on disabled logging to be as fast as possible.
ACKs for top commit:
dergoegge:
ACK fafe06c379
Tree-SHA512: dd858f3234a4dfb00bd7dec4398eb076370a4b9746aa24eecee7da86f6882398a2d086e5ab0b7c9f7321abcb135e7ffc54cc78e60d18b90379c6dba6d613b3f7
Note that with this change we are no-longer including PTHREAD_* flags
when building libbitcoinconsensus.
Also note that we are including PTHREAD_LIBS in AM_PTHREAD_FLAGS
This replaces the current benchmarking framework with nanobench [1], an
MIT licensed single-header benchmarking library, of which I am the
autor. This has in my opinion several advantages, especially on Linux:
* fast: Running all benchmarks takes ~6 seconds instead of 4m13s on
an Intel i7-8700 CPU @ 3.20GHz.
* accurate: I ran e.g. the benchmark for SipHash_32b 10 times and
calculate standard deviation / mean = coefficient of variation:
* 0.57% CV for old benchmarking framework
* 0.20% CV for nanobench
So the benchmark results with nanobench seem to vary less than with
the old framework.
* It automatically determines runtime based on clock precision, no need
to specify number of evaluations.
* measure instructions, cycles, branches, instructions per cycle,
branch misses (only Linux, when performance counters are available)
* output in markdown table format.
* Warn about unstable environment (frequency scaling, turbo, ...)
* For better profiling, it is possible to set the environment variable
NANOBENCH_ENDLESS to force endless running of a particular benchmark
without the need to recompile. This makes it to e.g. run "perf top"
and look at hotspots.
Here is an example copy & pasted from the terminal output:
| ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 2.52 | 396,529,415.94 | 0.6% | 25.42 | 8.02 | 3.169 | 0.06 | 0.0% | 0.03 | `bench/crypto_hash.cpp RIPEMD160`
| 1.87 | 535,161,444.83 | 0.3% | 21.36 | 5.95 | 3.589 | 0.06 | 0.0% | 0.02 | `bench/crypto_hash.cpp SHA1`
| 3.22 | 310,344,174.79 | 1.1% | 36.80 | 10.22 | 3.601 | 0.09 | 0.0% | 0.04 | `bench/crypto_hash.cpp SHA256`
| 2.01 | 496,375,796.23 | 0.0% | 18.72 | 6.43 | 2.911 | 0.01 | 1.0% | 0.00 | `bench/crypto_hash.cpp SHA256D64_1024`
| 7.23 | 138,263,519.35 | 0.1% | 82.66 | 23.11 | 3.577 | 1.63 | 0.1% | 0.00 | `bench/crypto_hash.cpp SHA256_32b`
| 3.04 | 328,780,166.40 | 0.3% | 35.82 | 9.69 | 3.696 | 0.03 | 0.0% | 0.03 | `bench/crypto_hash.cpp SHA512`
[1] https://github.com/martinus/nanobench
* Adds support for asymptotes
This adds support to calculate asymptotic complexity of a benchmark.
This is similar to #17375, but currently only one asymptote is
supported, and I have added support in the benchmark `ComplexMemPool`
as an example.
Usage is e.g. like this:
```
./bench_bitcoin -filter=ComplexMemPool -asymptote=25,50,100,200,400,600,800
```
This runs the benchmark `ComplexMemPool` several times but with
different complexityN settings. The benchmark can extract that number
and use it accordingly. Here, it's used for `childTxs`. The output is
this:
| complexityN | ns/op | op/s | err% | ins/op | cyc/op | IPC | total | benchmark
|------------:|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|----------:|:----------
| 25 | 1,064,241.00 | 939.64 | 1.4% | 3,960,279.00 | 2,829,708.00 | 1.400 | 0.01 | `ComplexMemPool`
| 50 | 1,579,530.00 | 633.10 | 1.0% | 6,231,810.00 | 4,412,674.00 | 1.412 | 0.02 | `ComplexMemPool`
| 100 | 4,022,774.00 | 248.58 | 0.6% | 16,544,406.00 | 11,889,535.00 | 1.392 | 0.04 | `ComplexMemPool`
| 200 | 15,390,986.00 | 64.97 | 0.2% | 63,904,254.00 | 47,731,705.00 | 1.339 | 0.17 | `ComplexMemPool`
| 400 | 69,394,711.00 | 14.41 | 0.1% | 272,602,461.00 | 219,014,691.00 | 1.245 | 0.76 | `ComplexMemPool`
| 600 | 168,977,165.00 | 5.92 | 0.1% | 639,108,082.00 | 535,316,887.00 | 1.194 | 1.86 | `ComplexMemPool`
| 800 | 310,109,077.00 | 3.22 | 0.1% |1,149,134,246.00 | 984,620,812.00 | 1.167 | 3.41 | `ComplexMemPool`
| coefficient | err% | complexity
|--------------:|-------:|------------
| 4.78486e-07 | 4.5% | O(n^2)
| 6.38557e-10 | 21.7% | O(n^3)
| 3.42338e-05 | 38.0% | O(n log n)
| 0.000313914 | 46.9% | O(n)
| 0.0129823 | 114.4% | O(log n)
| 0.0815055 | 133.8% | O(1)
The best fitting curve is O(n^2), so the algorithm seems to scale
quadratic with `childTxs` in the range 25 to 800.
b0c774b48a Add new mempool benchmarks for a complex pool (Jeremy Rubin)
Pull request description:
This PR is related to #17268.
It adds a mempool stress test which makes a really big complicated tx graph, and then, similar to mempool_eviction test, trims the size.
The test setup is to make 100 original transactions with Rand(10)+2 outputs each.
Then, 800 times:
we create a new transaction with Rand(10) + 1 parents that are randomly sampled from all existing transactions (with unspent outputs). From each such parent, we then select Rand(remaining outputs) +1 50% of the time, or 1 outputs 50% of the time.
Then, we trim the size to 3/4. Then we trim it to just a single transaction.
This creates, hopefully, a big bundle of transactions with lots of complex structure, that should really put a strain on the mempool graph algorithms.
This ends up testing both the descendant and ancestor tracking.
I don't love that the test is "unstable". That is, in order to compare this test to another, you really can't modify any of the internal state because it will have a different order of invocations of the deterministic randomness. However, it certainly suffices for comparing branches.
Top commit has no ACKs.
Tree-SHA512: cabe96b849b9885878e20eec558915e921d49e6ed1e4b011b22ca191b4c99aa28930a8b963784c9adf78cc8b034a655513f7a0da865e280a1214ae15ebb1d574
prototypes used in src/test/script_tests.cpp:
- CMutableTransaction BuildCreditingTransaction(const CScript& scriptPubKey, int nValue = 0);
- CMutableTransaction BuildSpendingTransaction(const CScript& scriptSig, const CScriptWitness& scriptWitness, const CTransaction& txCredit);
prototypes used in bench/verify_script.cpp:
- CMutableTransaction BuildCreditingTransaction(const CScript& scriptPubKey);
- CMutableTransaction BuildSpendingTransaction(const CScript& scriptSig, const CMutableTransaction& txCredit);
The more generic versions from the script tests are moved into a new file pair
transaction_utils.cpp/h and the calls are adapted accordingly in the
verify_script benchmark (passing the nValue of 1 explicitely for
BuildCreditingTransaction(), passing empty scriptWitness explicitely and
converting txCredit parameter to CTransaction in BuildSpendingTransaction()).
bb326add9f Add ChaCha20Poly1305@Bitcoin AEAD benchmark (Jonas Schnelli)
99aea045d6 Add ChaCha20Poly1305@Bitcoin tests (Jonas Schnelli)
af5d1b5f4a Add ChaCha20Poly1305@Bitcoin AEAD implementation (Jonas Schnelli)
Pull request description:
This adds a new AEAD (authenticated encryption with additional data) construct optimised for small messages (like used in Bitcoins p2p network).
Includes: #15519, #15512 (please review those first).
The construct is specified here.
https://gist.github.com/jonasschnelli/c530ea8421b8d0e80c51486325587c52#ChaCha20Poly1305Bitcoin_Cipher_Suite
This aims for being used in v2 peer-to-peer messages.
ACKs for top commit:
laanwj:
code review ACK bb326add9f
Tree-SHA512: 15bcb86c510fce7abb7a73536ff2ae89893b24646bf108c6cf18f064d672dbbbea8b1dd0868849fdac0c6854e498f1345d01dab56d1c92031afd728302234686
2dfe27517 Add ChaCha20 bench (Jonas Schnelli)
2bc2b8b49 Add ChaCha20 encryption option (XOR) (Jonas Schnelli)
Pull request description:
The current ChaCha20 implementation does not support message encryption (it can only output the keystream which is sufficient for the RNG).
This PR adds the actual XORing of the `plaintext` with the `keystream` in order to return the desired `ciphertext`.
Required for v2 message transport protocol.
ACKs for commit 2dfe27:
jnewbery:
Looks good. utACK 2dfe275171.
jnewbery:
utACK 2dfe275171
sipa:
utACK 2dfe275171
ryanofsky:
utACK 2dfe275171. Changes since last review are just renaming the Crypt method, adding comments, and simplifying the benchmark.
Tree-SHA512: 84bb234da2ca9fdc44bc29a786d9dd215520f81245270c1aef801ef66b6091b7793e2eb38ad6dbb084925245065c5dce9e5582f2d0fa220ab3e182d43412d5b5