Instead of deserializing addresses, placing them in the buckets, and
then removing them if they're invalid, check first and don't place in
the buckets if they're invalid.
Negative `nNew` or `nTried` are not possible during normal operation.
So, if we read such values during unserialize, report addrman
corruption.
Fixes https://github.com/bitcoin/bitcoin/issues/22450
This is a temporary change to convert I2P addresses that have propagated
with port 8333 to ones with port 0.
It would cause a problem some day if indeed some bitcoin software is
listening on port 8333 only and rejects connections to port 0 and we are
still using SAM 3.1 which only supports port 0. In this case we would
replace 8333 with 0 and try to connect to such nodes.
This commit should be included in 22.0 and be reverted before 23.0 is
released.
00b875ba94 addrman: remove invalid addresses when unserializing (Vasil Dimov)
bdb62096f0 fuzz: reduce possible networks check (Vasil Dimov)
a164cd3ba6 net: simplify CNetAddr::IsRoutable() (Vasil Dimov)
Pull request description:
* Simplify some code, now that we know `CNetAddr::IsRFC4193()` and `CNetAddr::IsTor()` cannot be `true` at the same time.
* Drop Tor v2 addresses when loading addrman from `peers.dat` - they would have been loaded as dummy-all-zeros IPv6 addresses and linger in addrman, wasting space.
ACKs for top commit:
sipa:
ACK 00b875ba94. Reviewed the code, and tested with -DDEBUG_ADDRMAN (unit tests + mainnet run with peers.dat that contained v2 onions).
laanwj:
Code review and lightly tested ACK 00b875ba94
jonatack:
ACK 00b875ba94 reviewed, debug-built with -DEBUG_ADDRMAN rebased to current master, restarted node on mainnet/signet/testnet and verified that on each chain -addrinfo shows no change in address counts (as expected). Added some sanity check asserts, rebuilt/re-ran test. Checked that the new test fails on master with "test/addrman_tests.cpp(824): error: in "addrman_tests/remove_invalid": check addrman.size() == 2 has failed [4 != 2]"
jarolrod:
ACK 00b875ba94
Tree-SHA512: 6ed8e6745134b1b94fffaba28482de909ea39483b46b7f57bda61cdbae7a51251d15cb674de3631772fbeabe153d77a19269f96e62a89102a2d5c01e48f0ba06
The Tor v2 addresses, left over from when Tor v2 was supported will be
unserialized as a dummy, invalid `::` (all zeros) IPv6 address. Remove
them so that they do not take up space in addrman.
`CAddrMan` uses `std::map` internally even though it does not require
that the map's elements are sorted. `std::map`'s access time is
`O(log(map size))`. `std::unordered_map` is more suitable as it has a
`O(1)` access time.
This patch lowers the execution times of `CAddrMan`'s methods as follows
(as per `src/bench/addrman.cpp`):
```
AddrMan::Add(): -3.5%
AddrMan::GetAddr(): -76%
AddrMan::Good(): -0.38%
AddrMan::Select(): -45%
```
Only rebucket if the asmap checksum has changed, not if the file format
has changed but no asmap is provided.
Also, don't try to add an entry to another bucket if it already appears
in ADDRMAN_NEW_BUCKETS_PER_ADDRESS buckets.
Version implies that higher numbers take precendence. This is really a
checksum, to check whether the provided asmap is the same as the one
used when the peers.dat file was serialized.
Also update the comments to explain where/why this is used.
An addrman entry can appear in up to 8 new table buckets. We store this
entry->bucket indexing during shutdown so that on restart we can restore
the entries to their correct buckets.
Commit ec45646de9 broke the
deserialization code so that each entry could only be put in up to one
new bucket. Fix that.
0bfce9dc46 [addrman] Fix Connected() comment (John Newbery)
eefe194718 [net] Consolidate logic around calling CAddrMan::Connected() (John Newbery)
Pull request description:
Currently, the logic around whether we called CAddrMan::Connected() for
a peer is spread between verack processing (where we discard inbound
peers) and FinalizeNode (where we discard misbehaving and
block-relay-only peers). Consolidate that logic to a single place.
Also remove the CNode.fCurrentlyConnected bool, which is now
redundant. We can rely on CNode.fSuccessfullyConnected, since the two
bools were only ever flipped to true in the same place.
ACKs for top commit:
mzumsande:
Code review ACK 0bfce9dc46
amitiuttarwar:
code review ACK 0bfce9dc46. nice tidy, and bonus that we get to remove an unnecessary call to `cs_main`
Tree-SHA512: 1ab74dae3bc12a6846da57c853033e546bb4f91caa39f4c50bf0cf7eca59cb917bdb2ef795da55363e7e9f70949cb28bb3be004cb3afa4389f970d2fe097d932
Even though the format of `peers.dat` was changed in an incompatible
way (old software versions <0.21 cannot understand the new file format),
it is not guaranteed that old versions will fail to parse it. There is a
chance that old versions parse its contents as garbage and use it.
Old versions expect the "key size" field to be 32 and fail the parsing
if it is not. Thus, we put something other than 32 in it. This will make
versions between 0.11.0 and 0.20.1 deterministically fail on the new
format. Versions prior to https://github.com/bitcoin/bitcoin/pull/5941
(<0.11.0) will still parse it as garbage.
Also, introduce a way to increment the `peers.dat` format in a way that
does not necessary make older versions refuse to read it.
Change the serialization of `CAddrMan` to serialize its addresses
in ADDRv2/BIP155 format by default. Introduce a new `CAddrMan` format
version (3).
Add support for ADDRv2 format in `CAddress` (un)serialization.
Co-authored-by: Carl Dong <contact@carldong.me>
CAddrMan.GetAddr() would previously limit the number and percentage of
addresses returned (to ADDRMAN_GETADDR_MAX (1000) and
ADDRMAN_GETADDR_MAX_PCT (23) respectively). Instead, make it the callers
responsibility to specify the maximum addresses and percentage they want
returned.
For net_processing, the maximums are MAX_ADDR_TO_SEND (1000) and
MAX_PCT_ADDR_TO_SEND (23). For rpc/net, the maximum is specified by the
client.
- move asmap #includes to sorted positions in addrman and init (move-only)
- remove redundant quotes in asmap InitError, update test
- remove full stops from asmap logging to be consistent with debug logging,
update tests
3c1bc40205 Add extra logging of asmap use and bucketing (Gleb Naumenko)
e4658aa8ea Return mapped AS in RPC call getpeerinfo (Gleb Naumenko)
ec45646de9 Integrate ASN bucketing in Addrman and add tests (Gleb Naumenko)
8feb4e4b66 Add asmap utility which queries a mapping (Gleb Naumenko)
Pull request description:
This PR attempts to solve the problem explained in #16599.
A particular attack which encouraged us to work on this issue is explained here [[Erebus Attack against Bitcoin Peer-to-Peer Network](https://erebus-attack.comp.nus.edu.sg/)] (by @muoitranduc)
Instead of relying on /16 prefix to diversify the connections every node creates, we would instead rely on the (ip -> ASN) mapping, if this mapping is provided.
A .map file can be created by every user independently based on a router dump, or provided along with the Bitcoin release. Currently we use the python scripts written by @sipa to create a .map file, which is no larger than 2MB (awesome!).
Here I suggest adding a field to peers.dat which would represent a hash of asmap file used while serializing addrman (or 0 for /16 prefix legacy approach).
In this case, every time the file is updated (or grouping method changed), all buckets will be re-computed.
I believe that alternative selective re-bucketing for only updated ranges would require substantial changes.
TODO:
- ~~more unit tests~~
- ~~find a way to test the code without including >1 MB mapping file in the repo.~~
- find a way to check that mapping file is not corrupted (checksum?)
- comments and separate tests for asmap.cpp
- make python code for .map generation public
- figure out asmap distribution (?)
~Interesting corner case: I’m using std::hash to compute a fingerprint of asmap, and std::hash returns size_t. I guess if a user updates the OS to 64-bit, then the hash of asap will change? Does it even matter?~
ACKs for top commit:
laanwj:
re-ACK 3c1bc40205
jamesob:
ACK 3c1bc40205 ([`jamesob/ackr/16702.3.naumenkogs.p2p_supplying_and_using`](https://github.com/jamesob/bitcoin/tree/ackr/16702.3.naumenkogs.p2p_supplying_and_using))
jonatack:
ACK 3c1bc40205
Tree-SHA512: e2dc6171188d5cdc2ab2c022fa49ed73a14a0acb8ae4c5ffa970172a0365942a249ad3d57e5fb134bc156a3492662c983f74bd21e78d316629dcadf71576800c
-BEGIN VERIFY SCRIPT-
# Delete outdated alias for RecursiveMutex
sed -i -e '/CCriticalSection/d' ./src/sync.h
# Replace use of outdated alias with RecursiveMutex
sed -i -e 's/CCriticalSection/RecursiveMutex/g' $(git grep -l CCriticalSection)
-END VERIFY SCRIPT-
Instead of using /16 netgroups to bucket nodes in Addrman for connection
diversification, ASN, which better represents an actor in terms
of network-layer infrastructure, is used.
For testing, asmap.raw is used. It represents a minimal
asmap needed for testing purposes.
After 40 minutes, time out a test-before-evict entry and just evict without
testing. Otherwise, if we were unable to test an entry for some reason, we
might break using feelers altogether.
818dc74 Support serialization as another type without casting (Pieter Wuille)
Pull request description:
This adds a `READWRITEAS(type, obj)` macro which serializes `obj` as if it were converted to `const type&` when `const`, and to `type&` when non-`const`. No actual cast is involved, so this only works when this conversion can be done automatically.
This makes it usable in serialization code that uses a single implementation for both serialization and deserializing, which doesn't know the constness of the object involved.
This is a redo of #12712, using a slightly different interface.
Tree-SHA512: 262f0257284ff99b5ffaec9b997c194e221522ba35c3ac8eaa9bb344449d7ea0a314de254dc77449fa7aaa600f8cd9a24da65aade8c1ec6aa80c6e9a7bba5ca7
This adds a READWRITEAS(type, obj) macro which serializes obj as if it
were casted to (const type&) when const, and to (type&) when non-const.
This makes it usable in serialization code that uses a single
implementation for both serialization and deserializing, which doesn't
know the constness of the object involved.
Changes addrman to use the test-before-evict discipline in which an
address is to be evicted from the tried table is first tested and if
it is still online it is not evicted.
Adds tests to provide test coverage for this change.
This change was suggested as Countermeasure 3 in
Eclipse Attacks on Bitcoin’s Peer-to-Peer Network, Ethan Heilman,
Alison Kendler, Aviv Zohar, Sharon Goldberg. ePrint Archive Report
2015/263. March 2015.
9ad6746ccd Use static_cast instead of C-style casts for non-fundamental types (practicalswift)
Pull request description:
A C-style cast is equivalent to try casting in the following order:
1. `const_cast(...)`
2. `static_cast(...)`
3. `const_cast(static_cast(...))`
4. `reinterpret_cast(...)`
5. `const_cast(reinterpret_cast(...))`
By using `static_cast<T>(...)` explicitly we avoid the possibility of an unintentional and dangerous `reinterpret_cast`. Furthermore `static_cast<T>(...)` allows for easier grepping of casts.
For a more thorough discussion, see ["ES.49: If you must use a cast, use a named cast"](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es49-if-you-must-use-a-cast-use-a-named-cast) in the C++ Core Guidelines (Stroustrup & Sutter).
Tree-SHA512: bd6349b7ea157da93a47b8cf238932af5dff84731374ccfd69b9f732fabdad1f9b1cdfca67497040f14eaa85346391404f4c0495e22c467f26ca883cd2de4d3c
680bc2cbb Use range-based for loops (C++11) when looping over map elements (practicalswift)
Pull request description:
Before this commit:
```c++
for (std::map<T1, T2>::iterator x = y.begin(); x != y.end(); ++x) {
T1 z = (*x).first;
…
}
```
After this commit:
```c++
for (auto& x : y) {
T1 z = x.first;
…
}
```
Tree-SHA512: 954b136b7f5e6df09f39248a6b530fd9baa9ab59d7c2c7eb369fd4afbb591b7a52c92ee25f87f1745f47b41d6828b7abfd395b43daf84a55b4e6a3d45015e3a0
A C-style cast is equivalent to try casting in the following order:
1. const_cast(...)
2. static_cast(...)
3. const_cast(static_cast(...))
4. reinterpret_cast(...)
5. const_cast(reinterpret_cast(...))
By using static_cast<T>(...) explicitly we avoid the possibility
of an unintentional and dangerous reinterpret_cast. Furthermore
static_cast<T>(...) allows for easier grepping of casts.
This changes the logging categories to boolean flags instead of strings.
This simplifies the acceptance testing by avoiding accessing a scoped
static thread local pointer to a thread local set of strings. It
eliminates the only use of boost::thread_specific_ptr outside of
lockorder debugging.
This change allows log entries to be directed to multiple categories
and makes it easy to change the logging flags at runtime (e.g. via
an RPC, though that isn't done by this commit.)
It also eliminates the fDebug global.
Configuration of unknown logging categories now produces a warning.
Remove the nType and nVersion as parameters to all serialization methods
and functions. There is only one place where it's read and has an impact
(in CAddress), and even there it does not impact any of the recursively
invoked serializers.
Instead, the few places that need nType or nVersion are changed to read
it directly from the stream object, through GetType() and GetVersion()
methods which are added to all stream classes.
Given that in default GetSerializeSize implementations created by
ADD_SERIALIZE_METHODS we're already using CSizeComputer(), get rid
of the specialized GetSerializeSize methods everywhere, and just use
CSizeComputer. This removes a lot of code which isn't actually used
anywhere.
For CCompactSize and CVarInt this actually removes a more efficient
size computing algorithm, which is brought back in a later commit.
There are only a few uses of `insecure_random` outside the tests.
This PR replaces uses of insecure_random (and its accompanying global
state) in the core code with an FastRandomContext that is automatically
seeded on creation.
This is meant to be used for inner loops. The FastRandomContext
can be in the outer scope, or the class itself, then rand32() is used
inside the loop. Useful e.g. for pushing addresses in CNode or the fee
rounding, or randomization for coin selection.
As a context is created per purpose, thus it gets rid of
cross-thread unprotected shared usage of a single set of globals, this
should also get rid of the potential race conditions.
- I'd say TxMempool::check is not called enough to warrant using a special
fast random context, this is switched to GetRand() (open for
discussion...)
- The use of `insecure_rand` in ConnectThroughProxy has been replaced by
an atomic integer counter. The only goal here is to have a different
credentials pair for each connection to go on a different Tor circuit,
it does not need to be random nor unpredictable.
- To avoid having a FastRandomContext on every CNode, the context is
passed into PushAddress as appropriate.
There remains an insecure_random for test usage in `test_random.h`.
If a node is offline failed outbound connection attempts will crank up
the addrman counter and effectively blow away our state.
This change reduces the problem by only counting attempts made while
the node believes it has outbound connections to at least two
netgroups.
Connect and addnode connections are also not counted, as there is no
reason to unequally penalize them for their more frequent
connections -- though there should be no real effect from this
unless their addnode configureation is later removed.
Wasteful repeated connection attempts while only a few connections are
up are avoided via nLastTry.
This is still somewhat incomplete protection because our outbound
peers could be down but not timed out or might all be on 'local'
networks (although the requirement for multiple netgroups helps).
Adds several unittests for CAddrMan and CAddrInfo.
Increases the accuracy of addrman tests.
Removes non-determinism in tests by overriding the random number generator.
Extracts testing code from addrman class to test class.
This change was suggested as Countermeasure 6 in
Eclipse Attacks on Bitcoin’s Peer-to-Peer Network, Ethan Heilman,
Alison Kendler, Aviv Zohar, Sharon Goldberg. ePrint Archive Report
2015/263. March 2015.
This change was suggested as Countermeasure 2 in
Eclipse Attacks on Bitcoin’s Peer-to-Peer Network, Ethan Heilman,
Alison Kendler, Aviv Zohar, Sharon Goldberg. ePrint Archive Report
2015/263. March 2015.
Give each address a single fixed location in the new and tried tables,
which become simple fixed-size arrays instead of sets and vectors.
This prevents attackers from having an advantages by inserting an
address multiple times.
This change was suggested as Countermeasure 1 in
Eclipse Attacks on Bitcoin’s Peer-to-Peer Network, Ethan Heilman,
Alison Kendler, Aviv Zohar, Sharon Goldberg. ePrint Archive Report
2015/263. March 2015.
It is also more efficient.
Thanks to Pieter Wuille for most of the work on this commit.
I did not fixup the overhaul commit, because a rebase conflicted
with "remove fields of ser_streamplaceholder".
I prefer not to risk making a mistake while resolving it.
The implementation of each class' serialization/deserialization is no longer
passed within a macro. The implementation now lies within a template of form:
template <typename T, typename Stream, typename Operation>
inline static size_t SerializationOp(T thisPtr, Stream& s, Operation ser_action, int nType, int nVersion) {
size_t nSerSize = 0;
/* CODE */
return nSerSize;
}
In cases when codepath should depend on whether or not we are just deserializing
(old fGetSize, fWrite, fRead flags) an additional clause can be used:
bool fRead = boost::is_same<Operation, CSerActionUnserialize>();
The IMPLEMENT_SERIALIZE macro will now be a freestanding clause added within
class' body (similiar to Qt's Q_OBJECT) to implement GetSerializeSize,
Serialize and Unserialize. These are now wrappers around
the "SerializationOp" template.
- ensures a consistent usage in header files
- also add a blank line after the copyright header where missing
- also remove orphan new-lines at the end of some files