748977690e Add asmap_direct fuzzer that tests Interpreter directly (Pieter Wuille)
7cf97fda15 Make asmap Interpreter errors fatal and fuzz test it (Pieter Wuille)
c81aefc537 Add additional effiency checks to sanity checker (Pieter Wuille)
fffd8dca2d Add asmap sanity checker (Pieter Wuille)
5feefbe6e7 Improve asmap Interpret checks and document failures (Pieter Wuille)
2b3dbfa5a6 Deal with decoding failures explicitly in asmap Interpret (Pieter Wuille)
1479007a33 Introduce Instruction enum in asmap (Pieter Wuille)
Pull request description:
This improves/documents the failure cases inside the asmap interpreter. None of the changes are bug fixes (they only change behavior for corrupted asmap files), but they may make things easier to follow.
In a second step, a sanity checker is added that effectively executes every potential code path through the asmap file, checking the same failure cases as the interpreter, and more. It takes around 30 ms to run for me for a 1.2 MB asmap file.
I've verified that this accepts asmap files constructed by https://github.com/sipa/asmap/blob/master/buildmap.py with a large dataset, and no longer accepts it with 1 bit changed in it.
ACKs for top commit:
practicalswift:
ACK 748977690e modulo feedback below.
jonatack:
ACK 748977690e code review, regular build/tests/ran bitcoin with -asmap, fuzz build/ran both fuzzers overnight.
fjahr:
ACK 748977690e
Tree-SHA512: d876df3859735795c857c83e7155ba6851ce839bdfa10c18ce2698022cc493ce024b5578c1828e2a94bcdf2552c2f46c392a251ed086691b41959e62a6970821
3c1bc40205 Add extra logging of asmap use and bucketing (Gleb Naumenko)
e4658aa8ea Return mapped AS in RPC call getpeerinfo (Gleb Naumenko)
ec45646de9 Integrate ASN bucketing in Addrman and add tests (Gleb Naumenko)
8feb4e4b66 Add asmap utility which queries a mapping (Gleb Naumenko)
Pull request description:
This PR attempts to solve the problem explained in #16599.
A particular attack which encouraged us to work on this issue is explained here [[Erebus Attack against Bitcoin Peer-to-Peer Network](https://erebus-attack.comp.nus.edu.sg/)] (by @muoitranduc)
Instead of relying on /16 prefix to diversify the connections every node creates, we would instead rely on the (ip -> ASN) mapping, if this mapping is provided.
A .map file can be created by every user independently based on a router dump, or provided along with the Bitcoin release. Currently we use the python scripts written by @sipa to create a .map file, which is no larger than 2MB (awesome!).
Here I suggest adding a field to peers.dat which would represent a hash of asmap file used while serializing addrman (or 0 for /16 prefix legacy approach).
In this case, every time the file is updated (or grouping method changed), all buckets will be re-computed.
I believe that alternative selective re-bucketing for only updated ranges would require substantial changes.
TODO:
- ~~more unit tests~~
- ~~find a way to test the code without including >1 MB mapping file in the repo.~~
- find a way to check that mapping file is not corrupted (checksum?)
- comments and separate tests for asmap.cpp
- make python code for .map generation public
- figure out asmap distribution (?)
~Interesting corner case: I’m using std::hash to compute a fingerprint of asmap, and std::hash returns size_t. I guess if a user updates the OS to 64-bit, then the hash of asap will change? Does it even matter?~
ACKs for top commit:
laanwj:
re-ACK 3c1bc40205
jamesob:
ACK 3c1bc40205 ([`jamesob/ackr/16702.3.naumenkogs.p2p_supplying_and_using`](https://github.com/jamesob/bitcoin/tree/ackr/16702.3.naumenkogs.p2p_supplying_and_using))
jonatack:
ACK 3c1bc40205
Tree-SHA512: e2dc6171188d5cdc2ab2c022fa49ed73a14a0acb8ae4c5ffa970172a0365942a249ad3d57e5fb134bc156a3492662c983f74bd21e78d316629dcadf71576800c
Instead of using /16 netgroups to bucket nodes in Addrman for connection
diversification, ASN, which better represents an actor in terms
of network-layer infrastructure, is used.
For testing, asmap.raw is used. It represents a minimal
asmap needed for testing purposes.
After this commit, the only remaining output is:
$ test/lint/lint-spelling.sh
src/test/base32_tests.cpp:14: fo ==> of, for
src/test/base64_tests.cpp:14: fo ==> of, for
^ Warning: codespell identified likely spelling errors. Any false positives? Add them to the list of ignored words in test/lint/lint-spelling.ignore-words.txt
Note:
* I ignore several valid alternative spellings
* homogenous is present in tinyformat, hence should be addressed upstream
* process' is correct only if there are plural processes
The original ORCHID prefix was deprecated as of 2014-03, the new
ORCHIDv2 prefix was allocated by RFC7343 as of 2014-07. We did not
consider the original ORCHID prefix routable, and I don't see any reason
to consider the new one to be either.
b7b36decaf fix uninitialized read when stringifying an addrLocal (Kaz Wesley)
8ebbef0169 add test demonstrating addrLocal UB (Kaz Wesley)
Pull request description:
Reachable from either place where SetIP is used when all of:
- our best-guess addrLocal for a peer is IPv4
- the peer tells us it's reaching us at an IPv6 address
- NET logging is enabled
In that case, SetIP turns an IPv4 address into an IPv6 address without
setting the scopeId, which is subsequently read in GetSockAddr during
CNetAddr::ToStringIP and passed to getnameinfo. Fix by ensuring every
constructor initializes the scopeId field with something.
Tree-SHA512: 8f0159750995e08b985335ccf60a273ebd09003990bcf2c3838b550ed8dc2659552ac7611650e6dd8e29d786fe52ed57674f5880f2e18dc594a7a863134739e3
Reachable from either place where SetIP is used when our best-guess
addrLocal for a peer is IPv4, but the peer tells us it's reaching us at
an IPv6 address.
In that case, SetIP turns an IPv4 address into an IPv6 address without
setting the scopeId, which is subsequently read in GetSockAddr during
CNetAddr::ToStringIP and passed to getnameinfo. Fix by ensuring every
constructor initializes the scopeId field with something.
3fc20632a3 qt: Set BLOCK_CHAIN_SIZE = 220 (DrahtBot)
2b6a2f4a28 Regenerate manpages (DrahtBot)
eb7daf4d60 Update copyright headers to 2018 (DrahtBot)
Pull request description:
Some trivial maintenance to avoid having to do it again after the 0.17 branch off.
(The scripts to do this are in `./contrib/`)
Tree-SHA512: 16b2af45e0351b1c691c5311d48025dc6828079e98c2aa2e600dc5910ee8aa01858ca6c356538150dc46fe14c8819ed8ec8e4ec9a0f682b9950dd41bc50518fa
-BEGIN VERIFY SCRIPT-
sed --in-place'' --expression='s/NET_TOR/NET_ONION/g' $(git grep -I --files-with-matches 'NET_TOR')
-END VERIFY SCRIPT-
The --in-place'' hack is required for sed on macOS to edit files in-place without passing a backup extension.
9ad6746ccd Use static_cast instead of C-style casts for non-fundamental types (practicalswift)
Pull request description:
A C-style cast is equivalent to try casting in the following order:
1. `const_cast(...)`
2. `static_cast(...)`
3. `const_cast(static_cast(...))`
4. `reinterpret_cast(...)`
5. `const_cast(reinterpret_cast(...))`
By using `static_cast<T>(...)` explicitly we avoid the possibility of an unintentional and dangerous `reinterpret_cast`. Furthermore `static_cast<T>(...)` allows for easier grepping of casts.
For a more thorough discussion, see ["ES.49: If you must use a cast, use a named cast"](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es49-if-you-must-use-a-cast-use-a-named-cast) in the C++ Core Guidelines (Stroustrup & Sutter).
Tree-SHA512: bd6349b7ea157da93a47b8cf238932af5dff84731374ccfd69b9f732fabdad1f9b1cdfca67497040f14eaa85346391404f4c0495e22c467f26ca883cd2de4d3c
A C-style cast is equivalent to try casting in the following order:
1. const_cast(...)
2. static_cast(...)
3. const_cast(static_cast(...))
4. reinterpret_cast(...)
5. const_cast(reinterpret_cast(...))
By using static_cast<T>(...) explicitly we avoid the possibility
of an unintentional and dangerous reinterpret_cast. Furthermore
static_cast<T>(...) allows for easier grepping of casts.
Identified with `cppcheck --enable=unusedFunction .`.
- GetSendBufferSize()'s last use removed in
991955ee81
- SetPort()'s last use removed in
7e195e8459
- GetfLargeWorkInvalidChainFound() was introduced in
e3ba0ef956 and never used
We currently do two resolves for dns seeds: one for the results, and one to
serve in addrman as the source for those addresses.
There's no requirement that the source hostname resolves to the stored
identifier, only that the mapping is unique. So rather than incurring the
second lookup, combine a private subnet with a hash of the hostname.
The resulting v6 ip is guaranteed not to be publicy routable, and has only a
negligible chance of colliding with a user's internal network (which would be
of no consequence anyway).