dd065dae9f refactor: Make mapBlocksUnknownParent local, and rename it (Hennadii Stepanov)
Pull request description:
This PR is a second attempt at #19594. This PR has two motivations:
- Improve code hygiene by eliminating a global variable, `mapBlocksUnknownParent`
- Fix fuzz test OOM when running too long ([see #19594 comment](https://github.com/bitcoin/bitcoin/pull/19594#issuecomment-958801638))
A minor added advantage is to release `mapBlocksUnknownParent` memory when the reindexing phase is done. The current situation is somewhat similar to a memory leak because this map exists unused for the remaining lifetime of the process. It's true that this map should be empty of data elements after use, but its internal metadata (indexing structures, etc.) can have non-trivial size because there can be many thousands of simultaneous elements in this map.
This PR helps our efforts to reduce the use of global variables. This variable isn't just global, it's hidden inside a function (it looks like a local variable but has the `static` attribute).
This global variable exists because the `-reindex` processing code calls `LoadExternalBlockFile()` multiple times (once for each block file), but that function must preserve some state between calls (the `mapBlocksUnknownParent` map). This PR fixes this by allocating this map as a local variable in the caller's scope and passing it in on each call. When reindexing completes, the map goes out of scope and is deallocated.
I tested this manually by reindexing on mainnet and signet. Also, the existing `feature_reindex.py` functional test passes.
ACKs for top commit:
mzumsande:
re-ACK dd065dae9f
theStack:
re-ACK dd065dae9f
shaavan:
reACK dd065dae9f
Tree-SHA512: 9cd20e44d2fa1096dd405bc107bc065ea8f904f5b3f63080341b08d8cf57b790df565f58815c2f331377d044d5306708b4bf6bdfc5ef8d0ed85d8e97d744732c
3a61fc56a0 refactor: move CBlockIndex#ToString() from header to implementation (Jon Atack)
57865eb512 CDiskBlockIndex: rename GetBlockHash() to ConstructBlockHash() (Jon Atack)
99e8ec8721 CDiskBlockIndex: remove unused ToString() class member (Jon Atack)
14aeece462 CBlockIndex: ensure phashBlock is not nullptr before dereferencing (Jon Atack)
Pull request description:
Fix a few design issues, potential footguns and inconsistent behavior in the CBlockIndex and CDiskBlockIndex classes.
- Ensure phashBlock in `CBlockIndex#GetBlockHash()` is not nullptr before dereferencing and remove a now-redundant assert preceding a GetBlockHash() caller. This protects against UB here, and in case of failure (which would indicate a consensus bug), the debug log will print `bitcoind: chain.h:265: uint256 CBlockIndex::GetBlockHash() const: Assertion 'phashBlock != nullptr' failed. Aborted` instead of `Segmentation fault`.
- Remove the unused `CDiskBlockIndex#ToString()` class member, and mark the inherited `CBlockIndex#ToString()` public interface member as deleted to disallow calling it in the derived CDiskBlockIndex class.
- Rename the `CDiskBlockIndex GetBlockHash()` class member to `ConstructBlockHash()`, which also makes sense as they perform different operations to return a blockhash, and mark the inherited `CBlockIndex#GetBlockHash()` public interface member as deleted to disallow calling it in the derived CDiskBlockIndex class.
- Move `CBlockIndex#ToString()` from header to implementation, which also allows dropping `tinyformat.h` from the header file.
Rationale and discussion regarding the CDiskBlockIndex changes:
Here is a failing test on master that demonstrates the inconsistent behavior of the current design: calling the same inherited public interface functions on the same CDiskBlockIndex object should yield identical behavior, but does not.
```diff
diff --git a/src/test/validation_chainstatemanager_tests.cpp b/src/test/validation_chainstatemanager_tests.cpp
index 6dc522b421..dac3840f32 100644
--- a/src/test/validation_chainstatemanager_tests.cpp
+++ b/src/test/validation_chainstatemanager_tests.cpp
@@ -240,6 +240,15 @@ BOOST_FIXTURE_TEST_CASE(chainstatemanager_activate_snapshot, TestChain100Setup)
const CBlockIndex* tip = chainman.ActiveTip();
BOOST_CHECK_EQUAL(tip->nChainTx, au_data.nChainTx);
+ // CDiskBlockIndex "is a" CBlockIndex, as it publicly inherits from it.
+ // Test that calling the same inherited interface functions on the same
+ // object yields identical behavior.
+ CDiskBlockIndex index{tip};
+ CBlockIndex *pB = &index;
+ CDiskBlockIndex *pD = &index;
+ BOOST_CHECK_EQUAL(pB->GetBlockHash(), pD->GetBlockHash());
+ BOOST_CHECK_EQUAL(pB->ToString(), pD->ToString());
```
(build and run: `$ ./src/test/test_bitcoin -t validation_chainstatemanager_tests`)
The GetBlockHash() test assertion only passes on master because the different methods invoked by the current design happen to return the same result. If one of the two is changed, it fails like the ToString() assertion does.
Redefining inherited non-virtual functions is well-documented as incorrect design to avoid inconsistent behavior (see Scott Meyers, Effective C++, Item 36). Class usage is confusing when the behavior depends on the pointer definition instead of the object definition (static binding happening where dynamic binding was expected). This can lead to unsuspected or hard-to-track bugs.
Outside of critical hot spots, correctness usually comes before optimisation, but the current design dates back to main.cpp and it may possibly have been chosen to avoid the overhead of dynamic dispatch. This solution does the same: the class sizes are unchanged and no vptr or vtbl is added.
There are better designs for doing this that use composition instead of inheritance, or that separate the public interface from the private implementations. One example of the latter would be a non-virtual public interface that calls private virtual implementation methods, i.e. the Template pattern via the Non-Virtual Interface (NVI) idiom.
ACKs for top commit:
vasild:
ACK 3a61fc56a0
Tree-SHA512: 9ff358ab0a6d010b8f053ad8303c6d4d061e62d9c3755a56b9c9f5eab855d02f02bee42acc77dfa0cbf4bb5cb775daa72d675e1560610a29bd285c46faa85ab7
and remove a now-redundant assert preceding a GetBlockHash() caller.
This protects against UB here, and in case of failure (which would
indicate a consensus bug), the debug log will print
bitcoind: chain.h:265: uint256 CBlockIndex::GetBlockHash() const: Assertion `phashBlock != nullptr' failed.
Aborted
instead of
Segmentation fault
facc2fa7b8 Use AutoFile where possible (MacroFake)
6666803c89 streams: Add AutoFile without ser-type and ser-version (MacroFake)
Pull request description:
This was done in the context of https://github.com/bitcoin/bitcoin/pull/25284 , but I think it also makes sense standalone.
The basic idea is that serialization type should not be initialized when it is not needed. Same for the serialization version.
So do this here for `AutoFile`. `CAutoFile` remains in places where it is not yet possible.
ACKs for top commit:
laanwj:
Code review ACK facc2fa7b8
fanquake:
ACK facc2fa7b8
Tree-SHA512: d82d024d55af57565ac53d9d1517afafc12b46964effba0332de62a6c77869356fa77f89e6d4834438fff44c45b64fccdf5a1358bfea03e28dfe55013b3c099d
Also:
1. Have CChainState::LoadMempool and ::ThreadImport take in paths and
pass it through untouched to LoadMempool.
2. Make LoadMempool exit early if the load_path is empty.
3. Adjust the call to ::ThreadImport in ::AppInitMain to correctly pass
in an empty path if mempool persistence is disabled.
Not only does this increase coverage, it is also more correct in that
when ::LoadMempool is called with a mempool and chainstate, it calls
AcceptToMemoryPool with just the chainstate.
AcceptToMemoryPool will then act on the chainstate's mempool via
CChainState::GetMempool, which may be different from the mempool
originally passed to ::LoadMempool. (In this fuzz test's case, it
definitely is different)
Also, move DummyChainstate to its own file since it's now used by the
validation_load_mempool fuzz test to replace CChainState's m_mempool.
m_is_loaded/IsLoaded() doesn't actually indicate whether or not the
mempool was successfully, loaded, but rather if a load has been
attempted and did not result in a catastrophic ShutdownRequested.
-BEGIN VERIFY SCRIPT-
find_regex="\bm_is_loaded\b" \
&& git grep -l -E "$find_regex" \
| xargs sed -i -E "s@$find_regex@m_load_tried@g"
find_regex="\bIsLoaded\b" \
&& git grep -l -E "$find_regex" \
| xargs sed -i -E "s@$find_regex@GetLoadTried@g"
find_regex="\bSetIsLoaded\b" \
&& git grep -l -E "$find_regex" \
| xargs sed -i -E "s@$find_regex@SetLoadTried@g"
-END VERIFY SCRIPT-
Also introduce node::{ShouldPersistMempool,MempoolPath} helper functions
in node/mempool_persist_args.{h,cpp} which are used by non-kernel
DumpMempool callers to determine whether or not to automatically dump
the mempool and where to dump it to.
This new node policy setting enables to accept replaced-by-fee
transaction without inspection of the replaceability signaling
as described in BIP125 "explicit signaling".
If turns on, the node mempool accepts transaction replacement
as described in `policy/mempool-replacements.md`.
The default setting value is `false`, implying opt-in RBF
is enforced.
d1684beabe fees: Pass in a filepath instead of referencing gArgs (Carl Dong)
9a3d825c30 init: Remove redundant -*mempool*, -limit* queries (Carl Dong)
6c5c60c412 mempool: Use m_limit for UpdateTransactionsFromBlock (Carl Dong)
9e93b10301 node/ifaces: Use existing MemPoolLimits (Carl Dong)
38af2bcf35 mempoolaccept: Use limits from mempool in constructor (Carl Dong)
9333427014 mempool: Introduce (still-unused) MemPoolLimits (Carl Dong)
716bb5fbd3 scripted-diff: Rename anc/desc size limit vars to indicate SI unit (Carl Dong)
1ecc77321d scripted-diff: Rename DEFAULT_MEMPOOL_EXPIRY to indicate time unit (Carl Dong)
aa9141cd81 mempool: Pass in -mempoolexpiry instead of referencing gArgs (Carl Dong)
51c7a41a5e init: Only determine maxmempool once (Carl Dong)
386c9472c8 mempool: Make GetMinFee() with custom size protected (Carl Dong)
82f00de7a6 mempool: Pass in -maxmempool instead of referencing gArgs (Carl Dong)
f1941e8bfd pool: Add and use MemPoolOptions, ApplyArgsManOptions (Carl Dong)
0199bd35bb fuzz/rbf: Add missing TestingSetup (Carl Dong)
ccbaf546a6 scripted-diff: Rename DEFAULT_MAX_MEMPOOL_SIZE to indicate SI unit (Carl Dong)
fc02f77ca6 ArgsMan: Add Get*Arg functions returning optional (Carl Dong)
Pull request description:
This is part of the `libbitcoinkernel` project: #24303, https://github.com/bitcoin/bitcoin/projects/18
-----
As mentioned in the Stage 1 Step 2 description of [the `libbitcoinkernel` project](https://github.com/bitcoin/bitcoin/issues/24303), `ArgsManager` will not be part of `libbitcoinkernel`. Therefore, it is important that we remove any dependence on `ArgsManager` by code that will be part of `libbitcoinkernel`. This is the first in a series of PRs aiming to achieve this.
This PR removes `CTxMemPool+MempoolAccept`'s dependency on `ArgsManager` by introducing a `CTxMemPool::Options` struct, which is used to specify `CTxMemPool`'s various options at construction time.
These options are:
- `-maxmempool` -> `CTxMemPool::Options::max_size`
- `-mempoolexpiry` -> `CTxMemPool::Options::expiry`
- `-limitancestorcount` -> `CTxMemPool::Options::limits::ancestor_count`
- `-limitancestorsize` -> `CTxMemPool::Options::limits::ancestor_size`
- `-limitdescendantcount` -> `CTxMemPool::Options::limits::descendant_count`
- `-limitdescendantsize` -> `CTxMemPool::Options::limits::descendant_size`
More context can be gleaned from the commit messages. The important commits are:
- 56eb479ded8bfb2ef635bb6f3b484f9d5952c70d "pool: Add and use MemPoolOptions, ApplyArgsManOptions"
- a1e08b70f3068f4e8def1c630d8f50cd54da7832 "mempool: Pass in -maxmempool instead of referencing gArgs"
- 6f4bf3ede5812b374828f08fc728ceded2f10024 "mempool: Pass in -mempoolexpiry instead of referencing gArgs"
- 5958a7fe4806599fc620ee8c1a881ca10fa2dd16 "mempool: Introduce (still-unused) MemPoolLimits"
Reviewers: Help needed in the following commits (see commit messages):
- a1e08b70f3068f4e8def1c630d8f50cd54da7832 "mempool: Pass in -maxmempool instead of referencing gArgs"
- 0695081a797e9a5d7787b78b0f8289dafcc6bff7 "node/ifaces: Use existing MemPoolLimits"
Note to Reviewers: There are perhaps an infinite number of ways to architect `CTxMemPool::Options`, the current one tries to keep it simple, usable, and flexible. I hope we don't spend too much time arguing over the design here since that's not the point. In the case that you're 100% certain that a different design is strictly better than this one in every regard, please show us a fully-implemented branch.
-----
TODO:
- [x] Use the more ergonomic `CTxMemPool::Options` where appropriate
- [x] Doxygen comments for `ApplyArgsManOptions`, `MemPoolOptions`
-----
Questions for Reviewers:
1. Should we use `std::chrono::seconds` for `CTxMemPool::Options::expiry` and `CTxMemPool::m_expiry` instead of an `int64_t`? Something else? (`std::chrono::hours`?)
2. Should I merge `CTxMemPool::Limits` inside `CTxMemPool::Options`?
ACKs for top commit:
MarcoFalke:
ACK d1684beabe🍜
ryanofsky:
Code review ACK d1684beabe. Just minor cleanups since last review, mostly switching to brace initialization
Tree-SHA512: 2c138e52d69f61c263f1c3648f01c801338a8f576762c815f478ef5148b8b2f51e91ded5c1be915e678c0b14f6cfba894b82afec58d999d39a7bb7c914736e0b
Since:
- UpdateTransactionsFromBlock is only called by
MaybeUpdateMempoolForReorg, which calls it with the gArgs-determined
ancestor limits
- UpdateForDescendants is only called by UpdateTransactionsFromBlock
with the ancestor limits unchanged
We can remove the requirement to specify the ancestor limits for both
UpdateTransactionsFromBlock and UpdateForDescendants and just use the
values in the m_limits member.
Also move some removed comments to MemPoolLimits struct members.
The uint64_t cast in UpdateForDescendants is not new behavior,
see the diff in CChainState::MaybeUpdateMempoolForReorg for where they
were previously.
Better to be explicit when it comes to sizes to avoid unintentional
bugs. We use MB and KB all over the place.
-BEGIN VERIFY SCRIPT-
find_regex="DEFAULT_(ANCESTOR|DESCENDANT)_SIZE_LIMIT" \
&& git grep -l -E "$find_regex" \
| xargs sed -i -E "s@$find_regex@\0_KVB@g"
-END VERIFY SCRIPT-
- Store the mempool expiry (-mempoolexpiry) in CTxMemPool as a
std::chrono::seconds member.
- Remove the requirement to explicitly specify a mempool expiry for
LimitMempoolSize(...), just use the newly-introduced member.
- Remove all now-unnecessary instances of:
std::chrono::hours{gArgs.GetIntArg("-mempoolexpiry", DEFAULT_MEMPOOL_EXPIRY)}
- Store the mempool size limit (-maxmempool) in CTxMemPool as a member.
- Remove the requirement to explicitly specify a mempool size limit for
CTxMemPool::GetMinFee(...) and LimitMempoolSize(...), just use the
stored mempool size limit where possible.
- Remove all now-unnecessary instances of:
gArgs.GetIntArg("-maxmempool", DEFAULT_MAX_MEMPOOL_SIZE_MB) * 1000000
The code change in CChainState::GetCoinsCacheSizeState() is correct
since the coinscache should not repurpose "extra" mempool memory
headroom for itself if the mempool doesn't even exist.
Better to be explicit when it comes to sizes to avoid unintentional
bugs. We use MB and KB all over the place.
-BEGIN VERIFY SCRIPT-
find_regex="DEFAULT_MAX_MEMPOOL_SIZE" \
&& git grep -l -E "$find_regex" \
| xargs sed -i -E "s@$find_regex@\0_MB@g"
-END VERIFY SCRIPT-
ce893c0497 doc: Update developer notes (Anthony Towns)
d2852917ee sync.h: Imply negative assertions when calling LOCK (Anthony Towns)
bba87c0553 scripted-diff: Convert global Mutexes to GlobalMutexes (Anthony Towns)
a559509a0b sync.h: Add GlobalMutex type (Anthony Towns)
be6aa72f9f qt/clientmodel: thread safety annotation for m_cached_tip_mutex (Anthony Towns)
f24bd45b37 net_processing: thread safety annotation for m_tx_relay_mutex (Anthony Towns)
Pull request description:
This changes `LOCK(mutex)` for non-global, non-recursive mutexes to be annotated with the negative capability for the mutex it refers to, to prevent . clang applies negative capabilities recursively, so this helps avoid forgetting to annotate functions.
This can't reasonably be used for globals, because clang would require every function to be annotated with `EXCLUSIVE_LOCKS_REQUIRED(!g_mutex)` for each global mutex; so this introduces a trivial `GlobalMutex` subclass of `Mutex`, and reduces the annotations for both `GlobalMutex` to `LOCKS_EXCLUDED` which only catches trivial errors (eg (`LOCK(x); LOCK(x);`).
ACKs for top commit:
MarcoFalke:
review ACK ce893c0497🐦
hebasto:
ACK ce893c0497
Tree-SHA512: 5c35e8c7677ce3d994a7e3774f4344adad496223a51b3a1d1d3b5f20684b2e1d5cff688eb3fbc8d33e1b9940dfa76e515f9434e21de6f3ce3c935e29a319f529
fa4068b4e2 Move minRelayTxFee to policy/settings (MacroFake)
Pull request description:
Seems a bit confusing to put policy stuff into validation, so fix that.
Also fix includes via `iwyu`.
ACKs for top commit:
ariard:
ACK fa4068b, the includes move compiles well locally.
ryanofsky:
Code review ACK fa4068b4e2. Make sense to move the global variable to policy/settings and the default constant to policy/policy. Ariard points out other constants that could be moved, which seems fine, but it seems like moving the global variable to be with other related global variables is more significant.
Tree-SHA512: adf9619002610d1877f3aef0a9e6115fc4c2ad64135a3e5100824c650b560c47f47ac28894c6214a50a7888355252a9f6f7cec98c23a771a1964160ef1ca77de
This check isn't any longer just about bad pay-to-script-hash inputs; it
also excludes any kind of nonstandard input, unknown witness versions,
coinbases, etc.
This is the "fruit of our labor" for this patchset.
ChainstateManager::PopulateAndValidateSnapshot can now directly call
ComputeUTXOStats(...).
Our consensus engine is now fully decoupled from all indices.
See the src/Makefile.am for some satisfying removals.
In previous commits in this patchset, we removed all in-param members of
CCoinsStats. Now that that's done, we can modify GetUTXOStats to return
an optional CCoinsStats instead of a status bool. Callers are modified
accordingly.
In rpc/blockchain.cpp, we discover that GetUTXOStats' status bool when
getting UTXO stats for pprev was not checked for error. We fix this as
well.
Currently, CCoinsStats is a struct with both in-params and out-params
where the hash_type and index_requested members are the only in-params.
This change removes CCoinsStats' hash_type in-param member and adds it
to the relevant functions instead.
[META] In subsequent commits, all of CCoinsStats' members which serve as
in-params will be moved out so as to make CCoinsStats a pure
out-param struct.
This decouples validation.cpp from netaddress.cpp (transitively,
timedata.cpp, and asmap.cpp).
This is important for libbitcoinkernel as:
- There is no reason for the consensus engine to be coupled with
netaddress, timedata, and asmap
- Users of libbitcoinkernel can now easily supply their own
std::function that provides the adjusted time.
See the src/Makefile.am changes for some satisfying removals.
bb5c24b120 validation: move g_versionbitscache into ChainstateManager (Anthony Towns)
eca22c726a test/versionbits: make versionbitscache a parameter (Anthony Towns)
d603f1d8a7 deploymentstatus: make versionbitscache a parameter (Anthony Towns)
78adef1753 refactor: use chainman instead of chainParams for DeploymentActive* (Anthony Towns)
deffe0df6c deploymentstatus: allow chainman in place of consensusParams (Anthony Towns)
eaa2e3f25c validation: move UpdateUncommittedBlockStructures and GenerateCoinbaseCommitment into ChainstateManager (Anthony Towns)
5c67e84d37 validation: replace ::Params() calls with chainstate/chainman member (Anthony Towns)
38860f93b6 validation: remove redundant CChainParams params from ChainstateManager methods (Anthony Towns)
69675ea4e7 validation: add CChainParams to ChainstateManager (Anthony Towns)
Pull request description:
Gives `ChainstateManager` a reference to the `CChainParams` its working on, and simplifies some of the functions that would otherwise take that as a parameter. Removes the `g_versionbitscache` global by moving it into `ChainstateManager`.
ACKs for top commit:
dongcarl:
reACK bb5c24b120
MarcoFalke:
review ACK bb5c24b120📙
Tree-SHA512: 3fa74905e5df561e3e74bb0b8fce6085c5311e6633e7d74c0fb0c82a907f5bbb1fd4ebc5d11d4f0b1c019bb51eabb9f6e4bcc4652a696d36a5878c807b85f121