Merge 3170e2c162 into c5e44a0435

prevector: store P2WSH/P2TR/P2PK scripts inline
The current `prevector` size of 28 bytes (chosen to fill the `sizeof(CScript)` aligned size) was introduced in 2015 (https://github.com/bitcoin/bitcoin/pull/6914) before SegWit and TapRoot. However, the increasingly common `P2WSH` and `P2TR` scripts are both 34 bytes, and are forced to use heap (re)allocation rather than efficient inline storage. The core trade-off of this change is to eliminate heap allocations for common 34-36 byte scripts at the cost of increasing the base memory footprint of all `CScript` objects by 8 bytes (while still respecting peak memory usage defined by `-dbcache`). Increasing the `prevector` size allows these scripts to be stored inline, avoiding extra heap allocations, reducing potential memory fragmentation, and improving performance during cache flushes. Massif analysis confirms a lower stable memory usage after flushing, suggesting the elimination of heap allocations outweighs the larger base size for common workloads. Due to memory alignment, increasing the `prevector` size to 36 bytes doesn't change the overall `sizeof(CScript)` compared to an increase to 34 bytes, allowing us to include `P2PK` scripts as well at no additional memory cost. Performance benchmarks for AssumeUTXO load and flush show: * Small dbcache (450MB): ~1-3% performance improvement (despite more frequent flushes) * Large dbcache (4500MB): ~6-8% performance improvement due to fewer heap allocations (and basically the number of flushes) * Very large dbcache (4500MB): ~5-6% performance improvement due to fewer heap allocations (and memory limit not being reached, so there's no memory penalty) Full IBD and reindex-chainstate with larger `dbcache` values also show an overall ~2-3% speedup. Co-authored-by: Ava Chow <github@achow101.com> Co-authored-by: Andrew Toth <andrewstoth@gmail.com>
2025-04-29 14:59:39 -04:00 · 2025-04-29 09:44:23 +00:00 · 2025-04-29 11:40:56 +02:00 · 2025-04-29 11:40:54 +02:00 · 2025-04-29 11:32:47 +02:00 · 2025-04-29 09:59:42 +01:00
6 changed files with 113 additions and 25 deletions
--- a/src/bench/checkqueue.cpp
+++ b/src/bench/checkqueue.cpp
@ -8,6 +8,7 @@
 #include <key.h>
 #include <prevector.h>
 #include <random.h>
+#include <script/script.h>

 #include <cstddef>
 #include <cstdint>
@ -16,7 +17,6 @@

 static const size_t BATCHES = 101;
 static const size_t BATCH_SIZE = 30;
-static const int PREVECTOR_SIZE = 28;
 static const unsigned int QUEUE_BATCH_SIZE = 128;

 // This Benchmark tests the CheckQueue with a slightly realistic workload,
--- a/src/bench/prevector.cpp
+++ b/src/bench/prevector.cpp
@ -5,17 +5,20 @@
 #include <prevector.h>

 #include <bench/bench.h>
+#include <script/script.h>
 #include <serialize.h>
 #include <streams.h>

 #include <type_traits>
 #include <vector>

-struct nontrivial_t {
+struct nontrivial_t
+{
    int x{-1};
    nontrivial_t() = default;
    SERIALIZE_METHODS(nontrivial_t, obj) { READWRITE(obj.x); }
 };
+
 static_assert(!std::is_trivially_default_constructible_v<nontrivial_t>,
              "expected nontrivial_t to not be trivially constructible");

@ -27,22 +30,22 @@ template <typename T>
 static void PrevectorDestructor(benchmark::Bench& bench)
 {
    bench.batch(2).run([&] {
-        prevector<28, T> t0;
-        prevector<28, T> t1;
-        t0.resize(28);
-        t1.resize(29);
+        prevector<PREVECTOR_SIZE, T> t0;
+        prevector<PREVECTOR_SIZE, T> t1;
+        t0.resize(PREVECTOR_SIZE);
+        t1.resize(PREVECTOR_SIZE + 1);
    });
 }

 template <typename T>
 static void PrevectorClear(benchmark::Bench& bench)
 {
-    prevector<28, T> t0;
-    prevector<28, T> t1;
+    prevector<PREVECTOR_SIZE, T> t0;
+    prevector<PREVECTOR_SIZE, T> t1;
    bench.batch(2).run([&] {
-        t0.resize(28);
+        t0.resize(PREVECTOR_SIZE);
        t0.clear();
-        t1.resize(29);
+        t1.resize(PREVECTOR_SIZE + 1);
        t1.clear();
    });
 }
@ -50,12 +53,12 @@ static void PrevectorClear(benchmark::Bench& bench)
 template <typename T>
 static void PrevectorResize(benchmark::Bench& bench)
 {
-    prevector<28, T> t0;
-    prevector<28, T> t1;
+    prevector<PREVECTOR_SIZE, T> t0;
+    prevector<PREVECTOR_SIZE, T> t1;
    bench.batch(4).run([&] {
-        t0.resize(28);
+        t0.resize(PREVECTOR_SIZE);
        t0.resize(0);
-        t1.resize(29);
+        t1.resize(PREVECTOR_SIZE + 1);
        t1.resize(0);
    });
 }
@ -64,8 +67,8 @@ template <typename T>
 static void PrevectorDeserialize(benchmark::Bench& bench)
 {
    DataStream s0{};
-    prevector<28, T> t0;
-    t0.resize(28);
+    prevector<PREVECTOR_SIZE, T> t0;
+    t0.resize(PREVECTOR_SIZE);
    for (auto x = 0; x < 900; ++x) {
        s0 << t0;
    }
@ -74,7 +77,7 @@ static void PrevectorDeserialize(benchmark::Bench& bench)
        s0 << t0;
    }
    bench.batch(1000).run([&] {
-        prevector<28, T> t1;
+        prevector<PREVECTOR_SIZE, T> t1;
        for (auto x = 0; x < 1000; ++x) {
            s0 >> t1;
        }
@ -86,7 +89,7 @@ template <typename T>
 static void PrevectorFillVectorDirect(benchmark::Bench& bench)
 {
    bench.run([&] {
-        std::vector<prevector<28, T>> vec;
+        std::vector<prevector<PREVECTOR_SIZE, T>> vec;
        vec.reserve(260);
        for (size_t i = 0; i < 260; ++i) {
            vec.emplace_back();
@ -99,11 +102,11 @@ template <typename T>
 static void PrevectorFillVectorIndirect(benchmark::Bench& bench)
 {
    bench.run([&] {
-        std::vector<prevector<28, T>> vec;
+        std::vector<prevector<PREVECTOR_SIZE, T>> vec;
        vec.reserve(260);
        for (size_t i = 0; i < 260; ++i) {
            // force allocation
-            vec.emplace_back(29, T{});
+            vec.emplace_back(PREVECTOR_SIZE + 1, T{});
        }
    });
 }
--- a/src/script/script.h
+++ b/src/script/script.h
@ -406,7 +406,8 @@ private:
 * Tests in October 2015 showed use of this reduced dbcache memory usage by 23%
 *  and made an initial sync 13% faster.
 */
-typedef prevector<28, unsigned char> CScriptBase;
+static constexpr unsigned int PREVECTOR_SIZE{36};
+typedef prevector<PREVECTOR_SIZE, unsigned char> CScriptBase;

 bool GetScriptOp(CScriptBase::const_iterator& pc, CScriptBase::const_iterator end, opcodetype& opcodeRet, std::vector<unsigned char>* pvchRet);

--- a/src/test/script_tests.cpp
+++ b/src/test/script_tests.cpp
@ -1158,6 +1158,91 @@ BOOST_AUTO_TEST_CASE(script_CHECKMULTISIG23)
    BOOST_CHECK_MESSAGE(err == SCRIPT_ERR_INVALID_STACK_OPERATION, ScriptErrorString(err));
 }

+BOOST_AUTO_TEST_CASE(script_size_and_capacity_test)
+{
+    BOOST_CHECK_EQUAL(sizeof(prevector<34, uint8_t>), sizeof(prevector<PREVECTOR_SIZE, uint8_t>));
+    BOOST_CHECK_EQUAL(sizeof(CScriptBase), 40);
+    BOOST_CHECK_EQUAL(sizeof(CScript), 40);
+    BOOST_CHECK_EQUAL(sizeof(CTxOut), 48);
+
+    CKey dummyKey;
+    dummyKey.MakeNewKey(true);
+
+    std::vector<std::vector<uint8_t>> dummyVSolutions;
+
+    // Small OP_RETURN has direct allocation
+    {
+        const auto scriptSmallOpReturn{CScript() << OP_RETURN << std::vector<uint8_t>(10, 0xaa)};
+        BOOST_CHECK_EQUAL(Solver(scriptSmallOpReturn, dummyVSolutions), TxoutType::NULL_DATA);
+        BOOST_CHECK_EQUAL(scriptSmallOpReturn.size(), 12);
+        BOOST_CHECK_EQUAL(scriptSmallOpReturn.capacity(), PREVECTOR_SIZE);
+        BOOST_CHECK_EQUAL(scriptSmallOpReturn.allocated_memory(), 0);
+    }
+
+    // P2WPKH has direct allocation
+    {
+        const auto scriptP2WPKH{GetScriptForDestination(WitnessV0KeyHash{PKHash{CKeyID{CPubKey{dummyKey.GetPubKey()}.GetID()}}})};
+        BOOST_CHECK_EQUAL(Solver(scriptP2WPKH, dummyVSolutions), TxoutType::WITNESS_V0_KEYHASH);
+        BOOST_CHECK_EQUAL(scriptP2WPKH.size(), 22);
+        BOOST_CHECK_EQUAL(scriptP2WPKH.capacity(), PREVECTOR_SIZE);
+        BOOST_CHECK_EQUAL(scriptP2WPKH.allocated_memory(), 0);
+    }
+
+    // P2SH has direct allocation
+    {
+        const auto scriptP2SH{GetScriptForDestination(ScriptHash{CScript{} << OP_TRUE})};
+        BOOST_CHECK(scriptP2SH.IsPayToScriptHash());
+        BOOST_CHECK_EQUAL(scriptP2SH.size(), 23);
+        BOOST_CHECK_EQUAL(scriptP2SH.capacity(), PREVECTOR_SIZE);
+        BOOST_CHECK_EQUAL(scriptP2SH.allocated_memory(), 0);
+    }
+
+    // P2PKH has direct allocation
+    {
+        const auto scriptP2PKH{GetScriptForDestination(PKHash{CKeyID{CPubKey{dummyKey.GetPubKey()}.GetID()}})};
+        BOOST_CHECK_EQUAL(Solver(scriptP2PKH, dummyVSolutions), TxoutType::PUBKEYHASH);
+        BOOST_CHECK_EQUAL(scriptP2PKH.size(), 25);
+        BOOST_CHECK_EQUAL(scriptP2PKH.capacity(), PREVECTOR_SIZE);
+        BOOST_CHECK_EQUAL(scriptP2PKH.allocated_memory(), 0);
+    }
+
+    // P2WSH has direct allocation
+    {
+        const auto scriptP2WSH{GetScriptForDestination(WitnessV0ScriptHash{CScript{} << OP_TRUE})};
+        BOOST_CHECK(scriptP2WSH.IsPayToWitnessScriptHash());
+        BOOST_CHECK_EQUAL(scriptP2WSH.size(), 34);
+        BOOST_CHECK_EQUAL(scriptP2WSH.capacity(), PREVECTOR_SIZE);
+        BOOST_CHECK_EQUAL(scriptP2WSH.allocated_memory(), 0);
+    }
+
+    // P2TR has direct allocation
+    {
+        const auto scriptTaproot{GetScriptForDestination(WitnessV1Taproot{XOnlyPubKey{CPubKey{dummyKey.GetPubKey()}}})};
+        BOOST_CHECK_EQUAL(Solver(scriptTaproot, dummyVSolutions), TxoutType::WITNESS_V1_TAPROOT);
+        BOOST_CHECK_EQUAL(scriptTaproot.size(), 34);
+        BOOST_CHECK_EQUAL(scriptTaproot.capacity(), PREVECTOR_SIZE);
+        BOOST_CHECK_EQUAL(scriptTaproot.allocated_memory(), 0);
+    }
+
+    // P2PK has direct allocation
+    {
+        const auto scriptPubKey{GetScriptForRawPubKey(CPubKey{dummyKey.GetPubKey()})};
+        BOOST_CHECK_EQUAL(Solver(scriptPubKey, dummyVSolutions), TxoutType::PUBKEY);
+        BOOST_CHECK_EQUAL(scriptPubKey.size(), 35);
+        BOOST_CHECK_EQUAL(scriptPubKey.capacity(), PREVECTOR_SIZE);
+        BOOST_CHECK_EQUAL(scriptPubKey.allocated_memory(), 0);
+    }
+
+    // MULTISIG needs extra allocation
+    {
+        const auto scriptMultisig{GetScriptForMultisig(1, std::vector{2, CPubKey{dummyKey.GetPubKey()}})};
+        BOOST_CHECK_EQUAL(Solver(scriptMultisig, dummyVSolutions), TxoutType::MULTISIG);
+        BOOST_CHECK_EQUAL(scriptMultisig.size(), 71);
+        BOOST_CHECK_EQUAL(scriptMultisig.capacity(), 103);
+        BOOST_CHECK_EQUAL(scriptMultisig.allocated_memory(), 103);
+    }
+}
+
 /* Wrapper around ProduceSignature to combine two scriptsigs */
 SignatureData CombineSignatures(const CTxOut& txout, const CMutableTransaction& tx, const SignatureData& scriptSig1, const SignatureData& scriptSig2)
 {
--- a/src/test/validation_flush_tests.cpp
+++ b/src/test/validation_flush_tests.cpp
@ -26,9 +26,8 @@ BOOST_AUTO_TEST_CASE(getcoinscachesizestate)
    LOCK(::cs_main);
    auto& view = chainstate.CoinsTip();

-    // The number of bytes consumed by coin's heap data, i.e. CScript
-    // (prevector<28, unsigned char>) when assigned 56 bytes of data per above.
-    //
+    // The number of bytes consumed by coin's heap data, i.e.
+    // CScript (prevector<36, unsigned char>) when assigned 56 bytes of data per above.
    // See also: Coin::DynamicMemoryUsage().
    constexpr unsigned int COIN_SIZE = is_64_bit ? 80 : 64;

--- a/test/functional/wallet_backwards_compatibility.py
+++ b/test/functional/wallet_backwards_compatibility.py
@ -87,7 +87,7 @@ class BackwardsCompatibilityTest(BitcoinTestFramework):
        # 0.21.x and 22.x would both produce bad derivation paths when topping up an inactive hd chain
        # Make sure that this is being automatically cleaned up by migration
        node_master = self.nodes[1]
-        node_v22 = self.nodes[self.num_nodes - 5]
+        node_v22 = self.nodes[self.num_nodes - 3]
        wallet_name = "bad_deriv_path"
        node_v22.createwallet(wallet_name=wallet_name, descriptors=False)
        bad_deriv_wallet = node_v22.get_wallet_rpc(wallet_name)
Author	SHA1	Message	Date
l0rinc	bcb2297811	Merge `3170e2c162` into `c5e44a0435`	2025-04-29 09:44:23 +00:00
Lőrinc	3170e2c162	prevector: store P2WSH/P2TR/P2PK scripts inline The current `prevector` size of 28 bytes (chosen to fill the `sizeof(CScript)` aligned size) was introduced in 2015 (https://github.com/bitcoin/bitcoin/pull/6914) before SegWit and TapRoot. However, the increasingly common `P2WSH` and `P2TR` scripts are both 34 bytes, and are forced to use heap (re)allocation rather than efficient inline storage. The core trade-off of this change is to eliminate heap allocations for common 34-36 byte scripts at the cost of increasing the base memory footprint of all `CScript` objects by 8 bytes (while still respecting peak memory usage defined by `-dbcache`). Increasing the `prevector` size allows these scripts to be stored inline, avoiding extra heap allocations, reducing potential memory fragmentation, and improving performance during cache flushes. Massif analysis confirms a lower stable memory usage after flushing, suggesting the elimination of heap allocations outweighs the larger base size for common workloads. Due to memory alignment, increasing the `prevector` size to 36 bytes doesn't change the overall `sizeof(CScript)` compared to an increase to 34 bytes, allowing us to include `P2PK` scripts as well at no additional memory cost. Performance benchmarks for AssumeUTXO load and flush show: * Small dbcache (450MB): ~1-3% performance improvement (despite more frequent flushes) * Large dbcache (4500MB): ~6-8% performance improvement due to fewer heap allocations (and basically the number of flushes) * Very large dbcache (4500MB): ~5-6% performance improvement due to fewer heap allocations (and memory limit not being reached, so there's no memory penalty) Full IBD and reindex-chainstate with larger `dbcache` values also show an overall ~2-3% speedup. Co-authored-by: Ava Chow <github@achow101.com> Co-authored-by: Andrew Toth <andrewstoth@gmail.com>	2025-04-29 11:40:56 +02:00
Lőrinc	ecc6c07e58	test: assert CScript allocation characteristics Verifies that script types are correctly allocated using prevector's direct or indirect storage based on their size: Direct allocated script types (size ≤ 28 bytes): * OP_RETURN (small) * P2WPKH * P2SH * P2PKH Indirect allocated script types (size > 28 bytes): * P2WSH * P2TR * P2PK * MULTISIG (small) This test provides a baseline for verifying changes to prevector's inline capacity.	2025-04-29 11:40:54 +02:00
Lőrinc	a5f2ffa951	refactor: extract `PREVECTOR_SIZE` to header constant Co-authored-by: Luke Dashjr <luke-jr+git@utopios.org> Co-authored-by: kuegi <29012906+kuegi@users.noreply.github.com>	2025-04-29 11:32:47 +02:00
merge-script	c5e44a0435	Merge bitcoin/bitcoin#32369 : test: Use the correct node for doubled keypath test Some checks are pending CI / macOS 14 native, arm64, fuzz (push) Waiting to run Details CI / Windows native, VS 2022 (push) Waiting to run Details CI / Windows native, fuzz, VS 2022 (push) Waiting to run Details CI / Linux->Windows cross, no tests (push) Waiting to run Details CI / Windows, test cross-built (push) Blocked by required conditions Details CI / ASan + LSan + UBSan + integer, no depends, USDT (push) Waiting to run Details CI / test each commit (push) Waiting to run Details CI / macOS 14 native, arm64, no depends, sqlite only, gui (push) Waiting to run Details `32d55e28af` test: Use the correct node for doubled keypath test (Ava Chow) Pull request description: #29124 had a silent merge conflict with #32350 which resulted in it using the wrong node. Fix the test to use the correct v22 node. ACKs for top commit: maflcko: lgtm ACK `32d55e28af` rkrux: ACK `32d55e28af` BrandonOdiwuor: Code Review ACK `32d55e28af` Tree-SHA512: 1e0231985beb382b16e1d608c874750423d0502388db0c8ad450b22d17f9d96f5e16a6b44948ebda5efc750f62b60d0de8dd20131f449427426a36caf374af92	2025-04-29 09:59:42 +01:00
Ava Chow	32d55e28af	test: Use the correct node for doubled keypath test	2025-04-28 14:44:17 -07:00