Merge bitcoin/bitcoin#31551 : [IBD] batch block reads/writes during AutoFile
serialization
...
8d801e3efb
optimization: bulk serialization writes in `WriteBlockUndo` and `WriteBlock` (Lőrinc)
520965e293
optimization: bulk serialization reads in `UndoRead`, `ReadBlock` (Lőrinc)
056cb3c0d2
refactor: clear up blockstorage/streams in preparation for optimization (Lőrinc)
67fcc64802
log: unify error messages for (read/write)[undo]block (Lőrinc)
a4de160492
scripted-diff: shorten BLOCK_SERIALIZATION_HEADER_SIZE constant (Lőrinc)
6640dd52c9
Narrow scope of undofile write to avoid possible resource management issue (Lőrinc)
3197155f91
refactor: collect block read operations into try block (Lőrinc)
c77e3107b8
refactor: rename leftover WriteBlockBench (Lőrinc)
Pull request description:
This change is part of [[IBD] - Tracking PR for speeding up Initial Block Download](https://github.com/bitcoin/bitcoin/pull/32043 )
### Summary
We can serialize the blocks and undos to any `Stream` which implements the appropriate read/write methods.
`AutoFile` is one of these, writing the results "directly" to disk (through the OS file cache). Batching these in memory first and reading/writing these to disk is measurably faster (likely because of fewer native fread calls or less locking, as [observed](https://github.com/bitcoin/bitcoin/pull/28226#issuecomment-1666842501 ) by Martinus in a similar change).
### Unlocking new optimization opportunities
Buffered writes will also enable batched obfuscation calculations (implemented in https://github.com/bitcoin/bitcoin/pull/31144 ) - especially since currently we need to copy the write input's std::span to do the obfuscation on it, and batching enables doing the operations on the internal buffer directly.
### Measurements (micro benchmarks, full IBDs and reindexes)
Microbenchmarks for `[Read|Write]BlockBench` show a ~**30%**/**168%** speedup with `macOS/Clang`, and ~**19%**/**24%** with `Linux/GCC` (the follow-up XOR batching improves these further):
<details>
<summary>macOS Sequoia - Clang 19.1.7</summary>
> Before:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2,271,441.67 | 440.25 | 0.1% | 11.00 | `ReadBlockBench`
| 5,149,564.31 | 194.19 | 0.8% | 10.95 | `WriteBlockBench`
> After:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 1,738,683.04 | 575.15 | 0.2% | 11.04 | `ReadBlockBench`
| 3,052,658.88 | 327.58 | 1.0% | 10.91 | `WriteBlockBench`
</details>
<details>
<summary>Ubuntu 24 - GNU 13.3.0</summary>
> Before:
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 6,895,987.11 | 145.01 | 0.0% | 71,055,269.86 | 23,977,374.37 | 2.963 | 5,074,828.78 | 0.4% | 22.00 | `ReadBlockBench`
| 5,152,973.58 | 194.06 | 2.2% | 19,350,886.41 | 8,784,539.75 | 2.203 | 3,079,335.21 | 0.4% | 23.18 | `WriteBlockBench`
> After:
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 5,771,882.71 | 173.25 | 0.0% | 65,741,889.82 | 20,453,232.33 | 3.214 | 3,971,321.75 | 0.3% | 22.01 | `ReadBlockBench`
| 4,145,681.13 | 241.21 | 4.0% | 15,337,596.85 | 5,732,186.47 | 2.676 | 2,239,662.64 | 0.1% | 23.94 | `WriteBlockBench`
</details>
2 full IBD runs against master (compiled with GCC where the gains seem more modest) for **888888** blocks (seeded from real nodes) indicates a ~**7%** total speedup.
<details>
<summary>Details</summary>
```bash
COMMITS="d2b72b13699cf460ffbcb1028bcf5f3b07d3b73a 652b4e3de5c5e09fb812abe265f4a8946fa96b54"; \
STOP_HEIGHT=888888; DBCACHE=1000; \
C_COMPILER=gcc; CXX_COMPILER=g++; \
BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
(for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \
hyperfine \
--sort 'command' \
--runs 2 \
--export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$C_COMPILER.json" \
--parameter-list COMMIT ${COMMITS// /,} \
--prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF -DCMAKE_C_COMPILER=$C_COMPILER -DCMAKE_CXX_COMPILER=$CXX_COMPILER && \
cmake --build build -j$(nproc) --target bitcoind && \
./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \
--cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
"COMPILER=$C_COMPILER COMMIT=${COMMIT:0:10} ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"
d2b72b1369
refactor: rename leftover WriteBlockBench
652b4e3de5
optimization: Bulk serialization writes in `WriteBlockUndo` and `WriteBlock`
Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = d2b72b1369
)
Time (mean ± σ): 41528.104 s ± 354.003 s [User: 44324.407 s, System: 3074.829 s]
Range (min … max): 41277.786 s … 41778.421 s 2 runs
Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 652b4e3de5
)
Time (mean ± σ): 38771.457 s ± 441.941 s [User: 41930.651 s, System: 3222.664 s]
Range (min … max): 38458.957 s … 39083.957 s 2 runs
Relative speed comparison
1.07 ± 0.02 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = d2b72b1369
)
1.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 652b4e3de5
)
```
</details>
ACKs for top commit:
maflcko:
re-ACK 8d801e3efb
🐦
achow101:
ACK 8d801e3efb
ryanofsky:
Code review ACK 8d801e3efb
. Most notable change is switching from BufferedReader to ReadRawBlock for block reads, which makes sense, and there are also various cleanups in blockstorage and test code.
hodlinator:
re-ACK 8d801e3efb
Tree-SHA512: 24e1dee653b927b760c0ba3c69d1aba15fa5d9c4536ad11cfc2d70196ae16b9228ecc3056eef70923364257d72dc929882e73e69c6c426e28139d31299d08adc