51f7668d31 addrman: change nid_type from int to int64_t (Martin Zumsande)
051ba3290e addrman, refactor: introduce user-defined type for internal nId (Martin Zumsande)
Pull request description:
With `nIdCount` being incremented for each addr received, an attacker could cause an overflow in the past, see https://bitcoincore.org/en/2024/07/31/disclose-addrman-int-overflow/
Even though that attack was made infeasible indirectly by addr rate-limiting (PR #22387), to be on the safe side and prevent any regressions change the `nId`s used internally to `int64_t`.
This is being done by first introducing a user-defined type for `nId`s in the first commit, and then updating it to `int64_t` (thanks sipa for help with this!).
Note that `nId` is only used internally, it is not part of the serialization, so `peers.dat` should not be affected by this.
I assume that the only reason this was not done in the past is to not draw attention to this previously undisclosed issue.
ACKs for top commit:
naumenkogs:
ACK 51f7668d31
stratospher:
ACK 51f7668d31. I think it's a good change to make the nId space large(64 bits) so that the nId values are distinct.
achow101:
ACK 51f7668d31
Tree-SHA512: 68d4b8b0269a01a9544bedfa7c1348ffde00a288537e4c8bf2b88372ac7d96c4566a44dd6b06285f2fcf31b4f9336761e3bca7253fbc20db5e0d04e887156224
This makes it easier to track which spots refer to an nId
(as opposed to, for example, bucket index etc. which also use int)
Co-authored-by: Pieter Wuille <pieter@wuille.net>
In many cases, it is known at compile time how many bits are requested from
randbits. Provide a variant of randbits that accepts this number as a template,
to make sure the compiler can make use of this knowledge. This is used immediately
in rand32() and randbool(), and a few further call sites.
8d491ae9ec serialization: Add ParamsStream GetStream() method (Ryan Ofsky)
951203bcc4 net: Simplify ParamsStream usage (Ryan Ofsky)
e6794e475c serialization: Accept multiple parameters in ParamsStream constructor (Ryan Ofsky)
cb28849a88 serialization: Reverse ParamsStream constructor order (Ryan Ofsky)
83436d14f0 serialization: Drop unnecessary ParamsStream references (Ryan Ofsky)
84502b755b serialization: Drop references to GetVersion/GetType (Ryan Ofsky)
f3a2b52376 serialization: Support for multiple parameters (Ryan Ofsky)
Pull request description:
Currently it is only possible to attach one serialization parameter to a stream at a time. For example, it is not possible to set a parameter controlling the transaction format and a parameter controlling the address format at the same time because one parameter will override the other.
This limitation is inconvenient for multiprocess code since it is not possible to create just one type of stream and serialize any object to it. Instead it is necessary to create different streams for different object types, which requires extra boilerplate and makes using the new parameter fields a lot more awkward than the older version and type fields.
Fix this problem by allowing an unlimited number of serialization stream parameters to be set, and allowing them to be requested by type. Later parameters will still override earlier parameters, but only if they have the same type.
For an example of different ways multiple parameters can be set, see the new [`with_params_multi`](40f505583f/src/test/serialize_tests.cpp (L394-L410)) unit test.
This change requires replacing the `stream.GetParams()` method with a `stream.GetParams<T>()` method in order for serialization code to retrieve the desired parameters. The change is more verbose, but probably a good thing for readability because previously it could be difficult to know what type the `GetParams()` method would return, and now it is more obvious.
---
This PR is part of the [process separation project](https://github.com/bitcoin/bitcoin/issues/28722).
ACKs for top commit:
maflcko:
ACK 8d491ae9ec🔵
sipa:
utACK 8d491ae9ec
TheCharlatan:
ACK 8d491ae9ec
Tree-SHA512: 40b7041ee01c0372b1f86f7fd6f3b6af56ef24a6383f91ffcedd04d388e63407006457bf7ed056b0e37b4dec9ffd5ca006cb8192e488ea2c64678567e38d4647
Move parameter argument after stream argument so will be possible to accept
multiple variadic parameter arguments in the following commit.
Also reverse template parameter order for consistency.
9d1dbbd4ce scripted-diff: Fix bitcoin_config_h includes (TheCharlatan)
Pull request description:
As mentioned in https://github.com/bitcoin/bitcoin/pull/26924#issuecomment-1403449932 and https://github.com/bitcoin/bitcoin/pull/29263#issuecomment-1922334399, it is currently not safe to remove `bitcoin-config.h` includes from headers because some unrelated file might be depending on it.
See also #26972 for discussion.
Solve this by including the file directly everywhere it's required, regardless of whether or not it's already included by another header.
There should be no functional change here, but it will allow us to safely remove includes from headers in the future.
~I'm afraid it's a bit tedious to reproduce these commits, but it's reasonably straightforward:~
Edit: See note below
```bash
# All commands executed from the src/ subdir.
# Collect all tokens from bitcoin-config.h.in
# Isolate the tokens and remove blank lines
# Replace newlines with | and remove the last trailing one
# Collect all files which use these tokens
# Filter out subprojects (proper forwarding can be verified from Makefiles)
# Filter out .rc files
# Save to a text file
git grep -E -l `grep undef config/bitcoin-config.h.in | cut -d" " -f2 | grep -v '^$' | tr '\n' '|' | sed 's/|$//'` | grep -v -e "^leveldb/" -e "^secp256k1/" -e "^crc32c/" -e "^minisketch/" -e "^Makefile" -e "\.rc$" > files-with-config-include.txt
# Find all files from the above list which don't include bitcoin-config.h
git grep -L -E "config/bitcoin-config.h" -- `cat files-with-config-include.txt`
# Include them manually with the exception of some files in crypto:
# crypto/sha256_arm_shani.cpp crypto/sha256_avx2.cpp crypto/sha256_sse41.cpp crypto/sha256_x86_shani.cpp
# These are exceptions which don't use bitcoin-config.h, rather the Makefile.am adds these cppflags manually.
# Commit changes. This should match the first commit of this PR.
# Use the same search as above to find all files which DON'T use any config tokens
git grep -E -L `grep undef config/bitcoin-config.h.in | cut -d" " -f2 | grep -v '^$' | tr '\n' '|' | sed 's/|$//'` | grep -v -e "^leveldb/" -e "^secp256k1/" -e "^crc32c/" -e "^minisketch/" -e "^Makefile" -e "\.rc$" > files-without-config-include.txt
# Manually remove the includes and commit changes. This should match the second commit of this PR.
```
Edit: I'll keep this old description for posterity, but the manual approach has been replaced with a scripted diff from TheCharlatan
ACKs for top commit:
maflcko:
ACK 9d1dbbd4ce🚪
TheCharlatan:
ACK 9d1dbbd4ce
hebasto:
ACK 9d1dbbd4ce, I have reviewed the code and it looks OK.
fanquake:
ACK 9d1dbbd4ce
Tree-SHA512: f11ddc4ae6a887f96b954a6b77f310558ddb271088a3fda3edc833669c4251b7f392515224bbb8e5f67eb2c799b4ffed3b07d96454e82ec635c686d0df545872
-BEGIN VERIFY SCRIPT-
regex_string='^(?!//).*(AC_APPLE_UNIVERSAL_BUILD|BOOST_PROCESS_USE_STD_FS|CHAR_EQUALS_INT8|CLIENT_VERSION_BUILD|CLIENT_VERSION_IS_RELEASE|CLIENT_VERSION_MAJOR|CLIENT_VERSION_MINOR|COPYRIGHT_HOLDERS|COPYRIGHT_HOLDERS_FINAL|COPYRIGHT_HOLDERS_SUBSTITUTION|COPYRIGHT_YEAR|ENABLE_ARM_SHANI|ENABLE_AVX2|ENABLE_EXTERNAL_SIGNER|ENABLE_SSE41|ENABLE_TRACING|ENABLE_WALLET|ENABLE_X86_SHANI|ENABLE_ZMQ|HAVE_BOOST|HAVE_BUILTIN_CLZL|HAVE_BUILTIN_CLZLL|HAVE_BYTESWAP_H|HAVE_CLMUL|HAVE_CONSENSUS_LIB|HAVE_CXX20|HAVE_DECL_BE16TOH|HAVE_DECL_BE32TOH|HAVE_DECL_BE64TOH|HAVE_DECL_BSWAP_16|HAVE_DECL_BSWAP_32|HAVE_DECL_BSWAP_64|HAVE_DECL_FORK|HAVE_DECL_FREEIFADDRS|HAVE_DECL_GETIFADDRS|HAVE_DECL_HTOBE16|HAVE_DECL_HTOBE32|HAVE_DECL_HTOBE64|HAVE_DECL_HTOLE16|HAVE_DECL_HTOLE32|HAVE_DECL_HTOLE64|HAVE_DECL_LE16TOH|HAVE_DECL_LE32TOH|HAVE_DECL_LE64TOH|HAVE_DECL_PIPE2|HAVE_DECL_SETSID|HAVE_DECL_STRERROR_R|HAVE_DEFAULT_VISIBILITY_ATTRIBUTE|HAVE_DLFCN_H|HAVE_DLLEXPORT_ATTRIBUTE|HAVE_ENDIAN_H|HAVE_EVHTTP_CONNECTION_GET_PEER_CONST_CHAR|HAVE_FDATASYNC|HAVE_GETENTROPY_RAND|HAVE_GETRANDOM|HAVE_GMTIME_R|HAVE_INTTYPES_H|HAVE_LIBADVAPI32|HAVE_LIBCOMCTL32|HAVE_LIBCOMDLG32|HAVE_LIBGDI32|HAVE_LIBIPHLPAPI|HAVE_LIBKERNEL32|HAVE_LIBOLE32|HAVE_LIBOLEAUT32|HAVE_LIBSHELL32|HAVE_LIBSHLWAPI|HAVE_LIBUSER32|HAVE_LIBUUID|HAVE_LIBWINMM|HAVE_LIBWS2_32|HAVE_MALLOC_INFO|HAVE_MALLOPT_ARENA_MAX|HAVE_MINIUPNPC_MINIUPNPC_H|HAVE_MINIUPNPC_UPNPCOMMANDS_H|HAVE_MINIUPNPC_UPNPERRORS_H|HAVE_NATPMP_H|HAVE_O_CLOEXEC|HAVE_POSIX_FALLOCATE|HAVE_PTHREAD|HAVE_PTHREAD_PRIO_INHERIT|HAVE_STDINT_H|HAVE_STDIO_H|HAVE_STDLIB_H|HAVE_STRERROR_R|HAVE_STRINGS_H|HAVE_STRING_H|HAVE_STRONG_GETAUXVAL|HAVE_SYSCTL|HAVE_SYSCTL_ARND|HAVE_SYSTEM|HAVE_SYS_ENDIAN_H|HAVE_SYS_PRCTL_H|HAVE_SYS_RESOURCES_H|HAVE_SYS_SELECT_H|HAVE_SYS_STAT_H|HAVE_SYS_SYSCTL_H|HAVE_SYS_TYPES_H|HAVE_SYS_VMMETER_H|HAVE_THREAD_LOCAL|HAVE_TIMINGSAFE_BCMP|HAVE_UNISTD_H|HAVE_VM_VM_PARAM_H|LT_OBJDIR|PACKAGE_BUGREPORT|PACKAGE_NAME|PACKAGE_STRING|PACKAGE_TARNAME|PACKAGE_URL|PACKAGE_VERSION|PTHREAD_CREATE_JOINABLE|QT_QPA_PLATFORM_ANDROID|QT_QPA_PLATFORM_COCOA|QT_QPA_PLATFORM_MINIMAL|QT_QPA_PLATFORM_WINDOWS|QT_QPA_PLATFORM_XCB|QT_STATICPLUGIN|STDC_HEADERS|STRERROR_R_CHAR_P|USE_ASM|USE_BDB|USE_DBUS|USE_NATPMP|USE_QRCODE|USE_SQLITE|USE_UPNP|_FILE_OFFSET_BITS|_LARGE_FILES)'
exclusion_files=":(exclude)src/minisketch :(exclude)src/crc32c :(exclude)src/secp256k1 :(exclude)src/crypto/sha256_arm_shani.cpp :(exclude)src/crypto/sha256_avx2.cpp :(exclude)src/crypto/sha256_sse41.cpp :(exclude)src/crypto/sha256_x86_shani.cpp"
git grep --perl-regexp --files-with-matches "$regex_string" -- '*.cpp' $exclusion_files | xargs git grep -L "bitcoin-config.h" | while read -r file; do line_number=$(awk -v my_file="$file" '/\/\/ file COPYING or https?:\/\/www.opensource.org\/licenses\/mit-license.php\./ {line = NR} /^\/\// && NR == line + 1 {while(getline && /^\/\//) line = NR} END {print line+1}' "$file"); sed -i "${line_number}i\\\\n\#if defined(HAVE_CONFIG_H)\\n#include <config/bitcoin-config.h>\\n\#endif" "$file"; done;
git grep --perl-regexp --files-with-matches "$regex_string" -- '*.h' $exclusion_files | xargs git grep -L "bitcoin-config.h" | while read -r file; do sed -i "/#define.*_H/a \\\\n\#if defined(HAVE_CONFIG_H)\\n#include <config/bitcoin-config.h>\\n\#endif" "$file"; done;
for file in $(git grep --files-with-matches 'bitcoin-config.h' -- '*.cpp' '*.h' $exclusion_files); do if ! grep -q --perl-regexp "$regex_string" $file; then sed -i '/HAVE_CONFIG_H/{N;N;N;d;}' $file; fi; done;
-END VERIFY SCRIPT-
The first command creates a regular expression for matching all bitcoin-config.h symbols in the following form: ^(?!//).*(AC_APPLE_UNIVERSAL_BUILD|BOOST_PROCESS_USE_STD_FS|...|_LARGE_FILES). It was generated with:
./autogen.sh && printf '^(?!//).*(%s)' $(awk '/^#undef/ {print $2}' src/config/bitcoin-config.h.in | paste -sd "|" -)
The second command holds a list of files and directories that should not be processed. These include subtree directories as well as some crypto files that already get their symbols through the makefile.
The third command checks for missing bitcoin-config headers in .cpp files and adds the header if it is missing.
The fourth command checks for missing bitcoin-config headers in .h files and adds the header if it is missing.
The fifth command checks for unneeded bitcoin-config headers in sources files and removes the header if it is unneeded.
e064487ca2 addrman, refactor: improve stochastic test in `AddSingle` (brunoerg)
Pull request description:
This PR changes this algorithm to be O(1) instead of O(n). Also, in the current implementation, if `pinfo->nRefCount` is 0, we created an unnecessary variable (`nFactor`), this changes it. the change is relatively simple and does not cause conflicts.
ACKs for top commit:
achow101:
ACK e064487ca2
amitiuttarwar:
ACK e064487ca2
stratospher:
ACK e064487ca2. simple use of << instead of a loop, didn't observe any behaviour difference before and after.
Tree-SHA512: ff0a65155e47f65d2ce3cb5a3fd7a86efef1861181143df13a9d8e59cb16aee9be2f8801457bba8478b17fac47b015bff5cc656f6fac2ccc071ee7178a38d291
3ea54e5db7 net: Add continuous ASMap health check logging (Fabian Jahr)
28d7e55dff test: Add tests for unfiltered GetAddr usage (Fabian Jahr)
b8843d37ae fuzz: Let fuzzers use filter options in GetAddr/GetAddresses (Fabian Jahr)
e16f420547 net: Optionally include terrible addresses in GetAddr results (Fabian Jahr)
Pull request description:
There are certain statistics we can collect by running all our known clearnet addresses against the ASMap file. This could show issues with a maliciously manipulated file or with an old file that has decayed with time.
This is just a proof of concept for now. My idea currently is to run the analysis once per day and print the results to logs if an ASMap file is used.
ACKs for top commit:
achow101:
ACK 3ea54e5db7
mzumsande:
ACK 3ea54e5db7
brunoerg:
crACK 3ea54e5db7
Tree-SHA512: 777acbfac43cc43ce4a0a3612434e4ddbc65f59ae8ffc9e24f21de09011bccb297f0599cbaa82bcf40ef68e5af582c4e98556379db7ceff7d9f97574a1cf8e09
Exposing address manager table entries in a hidden RPC allows to introspect
addrman tables in tests and during development.
As response JSON object the following FORMAT1 is choosen:
{
"table": {
"<bucket>/<position>": { "address": "..", "port": .., ... },
"<bucket>/<position>": { "address": "..", "port": .., ... },
"<bucket>/<position>": { "address": "..", "port": .., ... },
...
}
}
An alternative would be FORMAT2
{
"table": {
"bucket": {
"position": { "address": "..", "port": .., ... },
"position": { "address": "..", "port": .., ... },
..
},
"bucket": {
"position": { "address": "..", "port": .., ... },
..
},
}
}
FORMAT1 and FORMAT2 have different encodings for the location of the
address in the address manager. While FORMAT2 might be easier to process
for downstream tools, it also mimics internal addrman mappings, which
might change at some point. Users not interested in the address location
can ignore the location key. They don't have to adapt to a new RPC
response format, when the internal addrman layout changes. Additionally,
FORMAT1 is also slightly easier to to iterate in downstream tools. The
RPC response-building implemenation complexcity is lower with FORMAT1
as we can more easily build a "<bucket>/<position>" key than a multiple
"bucket" objects with multiple "position" objects (FORMAT2).
This also cleans up the addrman (de)serialization code paths to only
allow `Disk` serialization. Some unit tests previously forced a
`Network` serialization, which does not make sense, because Bitcoin Core
in production will always `Disk` serialize.
This cleanup idea was suggested by Pieter Wuille and implemented by Anthony
Towns.
Co-authored-by: Pieter Wuille <pieter@wuille.net>
Co-authored-by: Anthony Towns <aj@erisian.com.au>
`Assume` is safer since the checks are non-fatal- errors in these functions
should provide feedback in debug builds, but do not need to deter further node
operations in production.
Add an optional parameter to the addrman Select function that allows callers to
specify which network the returned address should be on. Ensure that the proper
table is selected with different cases of whether the new or tried table has
network addresses that match.
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
in preparation for consolidating the logic for searching the new and tried
tables, generalize the call paths for both
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
Extract the logic that decides whether the new or the tried table is going to
be searched to the beginning of the function.
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
c9d548c91f net: remove CService::ToStringPort() (Vasil Dimov)
fd4f0f41e9 gui: simplify OptionsDialog::updateDefaultProxyNets() (Vasil Dimov)
96c791dd20 net: remove CService::ToString() use ToStringAddrPort() instead (Vasil Dimov)
944a9de08a net: remove CNetAddr::ToString() and use ToStringAddr() instead (Vasil Dimov)
043b9de59a scripted-diff: rename ToStringIP[Port]() to ToStringAddr[Port]() (Vasil Dimov)
Pull request description:
Before this PR we had the somewhat confusing combination of methods:
`CNetAddr::ToStringIP()`
`CNetAddr::ToString()` (duplicate of the above)
`CService::ToStringIPPort()`
`CService::ToString()` (duplicate of the above, overrides a non-virtual method from `CNetAddr`)
`CService::ToStringPort()`
Avoid [overriding non-virtual methods](https://github.com/bitcoin/bitcoin/pull/25349/#issuecomment-1185226396).
"IP" stands for "Internet Protocol" and while sometimes "IP addresses" are called just "IPs", it is incorrect to call Tor or I2P addresses "IPs". Thus use "Addr" instead of "IP".
Change the above to:
`CNetAddr::ToStringAddr()`
`CService::ToStringAddrPort()`
The changes touch a lot of files, but are mostly mechanical.
ACKs for top commit:
sipa:
utACK c9d548c91f
achow101:
ACK c9d548c91f
jonatack:
re-ACK c9d548c91f only change since my previous reviews is rebase, but as a sanity check rebased to current master and at each commit quickly re-reviewed and re-verified clean build and green unit tests
LarryRuane:
ACK c9d548c91f
Tree-SHA512: 633fb044bdecf9f551b5e3314c385bf10e2b78e8027dc51ec324b66b018da35e5b01f3fbe6295bbc455ea1bcd1a3629de1918d28de510693afaf6a52693f2157
80f39c99ef addrman, refactor: combine two size functions (Amiti Uttarwar)
4885d6f197 addrman, refactor: move count increment into Create() (Martin Zumsande)
c77c877a8e net: Load fixed seeds from reachable networks for which we don't have addresses (Martin Zumsande)
d35595a78a addrman: add function to return size by network and table (Martin Zumsande)
Pull request description:
AddrMan currently doesn't track the number of its entries by network, it only knows the total number of addresses. This PR makes AddrMan keep track of these numbers, which would be helpful for multiple things:
1. Allow to specifically add fixed seeds to AddrMan of networks where we don't have any addresses yet - even if AddrMan as a whole is not empty (partly fixing #26035). This is in particular helpful if the user abruptly changes `-onlynet` settings (such that addrs that used to be reachable are no longer and vice versa), in which case they currently could get stuck and not find any outbound peers. The second commit of this PR implements this.
1. (Future work): Add logic for automatic connection management with respect to networks - such as making attempts to have at least one connection to each reachable network as suggested [here](https://github.com/bitcoin/bitcoin/issues/26035#issuecomment-1249420209). This would involve requesting an address from a particular network from AddrMan, and expanding its corresponding function `AddrMan::Select()` to do this requires internal knowledge of the current number of addresses for each network and table to avoid getting stuck in endless loops.
1. (Future work): Perhaps display the totals to users. At least I would find this helpful to debug, the existing option (`./bitcoin-cli -addrinfo`) is rather indirect by doing the aggregation itself in each call, doesn't distinguish between new and tried, and being based on `AddrMan::GetAddr()` it's also subject to a quality filter which we probably don't want in this spot.
ACKs for top commit:
naumenkogs:
utACK 80f39c9
stratospher:
ACK 80f39c9
achow101:
ACK 80f39c99ef
vasild:
ACK 80f39c99ef
Tree-SHA512: 6359f2e3f4db7c120c0789d92d74cb7d87a2ceedb7d6a34b5eff20c7f55c5c81092d10ed94efe29afc1c66947820a0d9c14876ee0c8d1f8e068a6df4e1131927
The functionality of the old size() is covered by the new Size()
when no arguments are specified, so this does not change behavior.
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
For now, the new functionality will be used in the context of
querying fixed seeds. Other possible applications for
future changes is the use in the context of making automatic
connections to specific networks, or making more detailed info
about addrman accessible via rpc.
The previous logic would call it once for serializing into the filestream,
and then again for serializing into the hasher. If AddrMan was changed
in between these calls by another thread, the resulting peers.dat would
be corrupt with non-matching checksum and data.
Fix this by using HashedSourceWriter, which writes the data
to the underlying stream and keeps track of the hash in one go.
Both methods do the same thing, so simplify to having just one.
`ToString()` is too generic in this case and it is unclear what it does,
given that there are similar methods:
`ToStringAddr()` (inherited from `CNetAddr`),
`ToStringPort()` and
`ToStringAddrPort()`.
Both methods do the same thing, so simplify to having just one.
Further, `CService` inherits `CNetAddr` and `CService::ToString()`
overrides `CNetAddr::ToString()` but the latter is not virtual which
may be confusing. Avoid such a confusion by not having non-virtual
methods with the same names in inheritance.
This makes the code less verbose. Also, future changes that change how
to get the time are less verbose.
Moreover, GetAdjustedTime() might arbitrarily change the value during
the execution of this function. For example, the system time advances
over a second boundary, or the network adjusts the time arbitrarily.
Most of the time however the value will not change, so it seems better
to always lock the value in this scope for clarity.