Since `util::Xor` accepts `uint64_t` values, we're eliminating any repeated vector-to-uint64_t conversions going back to the loading/saving of these values (we're still serializing them as vectors, but converting as soon as possible to `uint64_t`). This is the reason the tests still generate vector values and convert to `uint64_t` later instead of generating it directly.
We're also short-circuit `Xor` calls with 0 key values early to avoid unnecessary calculations (e.g. `MakeWritableByteSpan`) - even assuming that XOR is never called for 0.
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release \
&& cmake --build build -j$(nproc) \
&& build/bin/bench_bitcoin -filter='XorObfuscationBench' -min-time=10000
C++ compiler .......................... AppleClang 17.0.0.17000013
| ns/MiB | MiB/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 14,730.40 | 67,886.80 | 0.1% | 11.01 | `XorObfuscationBench`
C++ compiler .......................... GNU 13.3.0
| ns/MiB | MiB/s | err% | ins/MiB | cyc/MiB | IPC | bra/MiB | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 51,187.17 | 19,536.15 | 0.0% | 327,683.95 | 183,747.58 | 1.783 | 65,536.55 | 0.0% | 11.00 | `XorObfuscationBench`
----
A few other benchmarks that seem to have improved as well (tested with Clang only):
Before:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2,202,618.49 | 454.01 | 0.2% | 11.01 | `ReadBlockBench`
| 734,444.92 | 1,361.57 | 0.3% | 10.66 | `ReadRawBlockBench`
After:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 1,912,308.06 | 522.93 | 0.4% | 10.98 | `ReadBlockBench`
| 49,092.93 | 20,369.53 | 0.2% | 10.99 | `ReadRawBlockBench`
Since `CDBWrapper::Read` will still work with vectors, we won't be able to use the obfuscation key field to read into it directly.
This commit cleans up this part of the code, obviating that writing `obfuscate_key` is needed since following methods will actually use it implicitly, simplifying the `if (!key_exists` condition to extract the negation into the name of the boolean and inline the single-use `CreateObfuscateKey` which will just complicate the transition.
Remove the manual batch size estimation logic (`SizeEstimate()` method and `size_estimate` member) from `CDBBatch`.
Size is now determined solely by the `ApproximateSize()` method introduced in the previous commit, which delegates to the native LevelDB function.
The manual calculation is no longer necessary as LevelDB now provides this functionality directly, and the previous commit verified that the native function's results matched the manual estimation.
Assertions comparing the two methods are removed from `txdb.cpp`.
Co-authored-by: Wladimir J. van der Laan <laanwj@protonmail.com>
Serialized batch size can be queried via the underlying LevelDB implementation calling the native `leveldb::WriteBatch::ApproximateSize()`.
The previous manual calculation was added in e66dbde6d1 as part of https://github.com/bitcoin/bitcoin/pull/10195. At that time (April 2017), the version of LevelDB used by Bitcoin Core (and even the latest source) lacked a native function for this. LevelDB added this capability in 69e2bd224b, merged later that year.
The old manual estimation method (`SizeEstimate()`) is kept temporarily in this commit, and assertions are added in `txdb.cpp` to verify its results against `ApproximateSize()` during batch writes. This ensures the native function behaves as expected before removing the manual calculation in the subsequent commit.
The initialization of the manual `size_estimate` in `CDBBatch::Clear()` is corrected from `0` to `kHeader` (LevelDB's fixed batch header size).
This aligns the manual estimate with LevelDB's actual size immediately after clearing, fixing discrepancies that would otherwise be caught by tests in the next commit (e.g., `coins_tests`, `validation_chainstatemanager_tests`).
Historically, the headers have been bumped some time after a file has
been touched. Do it now to avoid having to touch them again in the
future for that reason.
-BEGIN VERIFY SCRIPT-
sed -i --regexp-extended 's;( 20[0-2][0-9])(-20[0-2][0-9])? The Bitcoin Core developers;\1-present The Bitcoin Core developers;g' $( git show --pretty="" --name-only HEAD~1 )
-END VERIFY SCRIPT-
In theory this commit should only touch the span.h header, because
std::span can implicilty convert into Span in most places, if needed.
However, at least when using the clang compiler, there are some
false-positive lifetimebound warnings and some implicit conversions can
not be resolved.
Thus, this refactoring commit also changed the affected places to
replace Span with std::span.
The default max file size for LevelDB is 2 MiB, which results in the
LevelDB compaction code generating ~4 disk cache flushes per second when
syncing with the Bitcoin network.
These disk cache flushes are triggered by fdatasync() syscall issued by the
LevelDB compaction code when reaching the max file size.
If the database is on a HDD this flush rate brings the whole system to a
crawl.
It also results in very slow throughput since 2 MiB * 4 flushes per second
is about 8 MiB / second max throughput, while even an old HDD can pull
100 - 200 MiB / second streaming throughput.
Increase the max file size for LevelDB to 32 MiB instead so the flush rate
drops significantly and the system no longer gets so sluggish.
The new max file size value chosen is a compromise between the one that
works best for HDD and SSD performance, as determined by benchmarks done by
various people.
d8f1222ac5 refactor: Correct dbwrapper key naming (TheCharlatan)
be8f159ac5 build: Remove leveldb from BITCOIN_INCLUDES (TheCharlatan)
c95b37d641 refactor: Move CDBWrapper leveldb members to their own context struct (TheCharlatan)
c534a615e9 refactor: Split dbwrapper CDBWrapper::EstimateSize implementation (TheCharlatan)
586448888b refactor: Move HandleError to dbwrapper implementation (TheCharlatan)
dede0eef7a refactor: Split dbwrapper CDBWrapper::Exists implementation (TheCharlatan)
a5c2eb5748 refactor: Fix logging.h includes (TheCharlatan)
84058e0eed refactor: Split dbwrapper CDBWrapper::Read implementation (TheCharlatan)
e4af2408f2 refactor: Pimpl leveldb::Iterator for CDBIterator (TheCharlatan)
ef941ff128 refactor: Split dbwrapper CDBIterator::GetValue implementation (TheCharlatan)
b7a1ab5cb4 refactor: Split dbwrapper CDBIterator::GetKey implementation (TheCharlatan)
d7437908cd refactor: Split dbwrapper CDBIterator::Seek implementation (TheCharlatan)
ea8135de7e refactor: Pimpl leveldb::batch for CDBBatch (TheCharlatan)
b9870c920d refactor: Split dbwrapper CDBatch::Erase implementation (TheCharlatan)
532ee812a4 refactor: Split dbwrapper CDBBatch::Write implementation (TheCharlatan)
afc534df9a refactor: Wrap DestroyDB in dbwrapper helper (TheCharlatan)
Pull request description:
Leveldb headers are currently included in the `dbwrapper.h` file and thus available to many of Bitcoin Core's source files. However, leveldb-specific functionality should be abstracted by the `dbwrapper` and does not need to be available to the rest of the code. Having leveldb included in a widely-used header such as `dbwrapper.h` bloats the entire project's header tree.
The `dbwrapper` is a key component of the libbitcoinkernel library. Future users of this library would not want to contend with having the leveldb headers exposed and potentially polluting their project's namespace.
For these reasons, the leveldb headers are removed from the `dbwrapper` by moving leveldb-specific code to the implementation file and creating a [pimpl](https://en.cppreference.com/w/cpp/language/pimpl) where leveldb member variables are indispensable. As a final step, the leveldb include flags are removed from the `BITCOIN_INCLUDES` and moved to places where the dbwrapper is compiled.
---
This pull request is part of the [libbitcoinkernel project](https://github.com/bitcoin/bitcoin/issues/27587), and more specifically its stage 1 step 3 "Decouple most non-consensus headers from libbitcoinkernel".
ACKs for top commit:
stickies-v:
re-ACK d8f1222ac5
MarcoFalke:
ACK d8f1222ac5🔠
Tree-SHA512: 0f58309be165af0162e648233451cd80fda88726fc10c0da7bfe4ec2ffa9afe63fbf7ffae9493698d3f39653b4ad870c372eee652ecc90ab1c29d86c387070f3
The context of this commit is an effort to decouple the dbwrapper header
file from leveldb includes. To this end, the includes are moved to the
dbwrapper implementation file. This is done as part of the kernel
project to reduce the number of required includes for users of the
kernel.
Keep the generic serialization in the header, while moving
leveldb-specifics to the implementation file.
Since CharCast is no longer needed in the header, move it to the
implementation file.
The context of this commit is an effort to decouple the dbwrapper header
file from leveldb includes. To this end, the includes are moved to the
dbwrapper implementation file. This is done as part of the kernel
project to reduce the number of required includes for users of the
kernel.
Make it a static function in dbwrapper.cpp, since it is not used
elsewhere and when left in the header, would expose a leveldb type.
The context of this commit is an effort to decouple the dbwrapper header
file from leveldb includes. To this end, the includes are moved to the
dbwrapper implementation file. This is done as part of the kernel
project to reduce the number of required includes for users of the
kernel.
Keep the generic serialization in the header, while moving
leveldb-specifics to the implementation file.
The context of this commit is an effort to decouple the dbwrapper header
file from leveldb includes. To this end, the includes are moved to the
dbwrapper implementation file. This is done as part of the kernel
project to reduce the number of required includes for users of the
kernel.
Keep the generic serialization in the header, while moving
leveldb-specifics to the implementation file.
The context of this commit is an effort to decouple the dbwrapper header
file from leveldb includes. To this end, the includes are moved to the
dbwrapper implementation file. This is done as part of the kernel
project to reduce the number of required includes for users of the
kernel.
Hide the leveldb::Iterator member variable with a pimpl in order not to
expose it directly in the header.
Also, move CDBWrapper::NewIterator to the dbwrapper implementation to
use the pimpl for CDBIterator initialziation.
The context of this commit is an effort to decouple the dbwrapper header
file from leveldb includes. To this end, the includes are moved to the
dbwrapper implementation file. This is done as part of the kernel
project to reduce the number of required includes for users of the
kernel.
Keep the generic serialization in the header, while moving
leveldb-specifics to the implementation file.
The context of this commit is an effort to decouple the dbwrapper header
file from leveldb includes. To this end, the includes are moved to the
dbwrapper implementation file. This is done as part of the kernel
project to reduce the number of required includes for users of the
kernel.
Keep the generic serialization in the header, while moving
leveldb-specifics to the implementation file.
The context of this commit is an effort to decouple the dbwrapper header
file from leveldb includes. To this end, the includes are moved to the
dbwrapper implementation file. This is done as part of the kernel
project to reduce the number of required includes for users of the
kernel.
Keep the generic serialization in the header, while moving
leveldb-specifics to the implementation file.
The context of this commit is an effort to decouple the dbwrapper header
file from leveldb includes. To this end, the includes are moved to the
dbwrapper implementation file. This is done as part of the kernel
project to reduce the number of required includes for users of the
kernel.
Hide the leveldb::WriteBatch member variable with a pimpl in order not
to expose it directly in the header.
Also move CDBBatch::Clear to the dbwrapper implementation to use the new
impl_batch.
The context of this commit is an effort to decouple the dbwrapper header
file from leveldb includes. To this end, the includes are moved to the
dbwrapper implementation file. This is done as part of the kernel
project to reduce the number of required includes for users of the
kernel.
Keep the generic serialization in the header, while moving
leveldb-specifics to the implementation file.
The context of this commit is an effort to decouple the dbwrapper header
file from leveldb includes. To this end, the includes are moved to the
dbwrapper implementation file. This is done as part of the kernel
project to reduce the number of required includes for users of the
kernel.
Keep the generic serialization in the header, while moving
leveldb-specifics to the implementation file.
The context of this commit is an effort to decouple the dbwrapper header
file from leveldb includes. To this end, the includes are moved to the
dbwrapper implementation file. This is done as part of the kernel
project to reduce the number of required includes for users of the
kernel.
Wrap leveldb::DestroyDB in a helper function without exposing
leveldb-specifics.
Also, add missing optional include.
The context of this commit is an effort to decouple the dbwrapper header
file from leveldb includes. To this end, the includes are moved to the
dbwrapper implementation file. This is done as part of the kernel
project to reduce the number of required includes for users of the
kernel.
The fs.* files are already part of the libbitcoin_util library. With the
introduction of the fs_helpers.* it makes sense to move fs.* into the
util/ directory as well.
This is an extraction of filesystem related functions from util/system
into their own utility file.
The background of this commit is an ongoing effort to decouple the
libbitcoinkernel library from the ArgsManager defined in system.h.
Moving these functions out of system.h allows including them from a
separate source file without including the ArgsManager definitions from
system.h.
Add DBParams and DBOptions structs to remove ArgsManager uses from dbwrapper.
To reduce size of this commit, this moves references to gArgs variable out of
dbwrapper.cpp to calling code in txdb.cpp. But these moves are temporary. The
gArgs references in txdb.cpp are moved out to calling code in init.cpp in later
commits.
This commit does not change behavior.
This is more consistent with the other functions, as well as with the
logging output itself. If we want to make this change, we should do it
before it's all over the place.
Messages with level `WARN` or higher should be logged even when
the category is not provided with `-debug=`, to make sure important
warnings are not lost.
6544ea5035 refactor: Block unsafe fs::path std::string conversion calls (Russell Yanofsky)
b39a477ec6 refactor: Add fs::PathToString, fs::PathFromString, u8string, u8path functions (Russell Yanofsky)
Pull request description:
The `fs::path` class has a `std::string` constructor which will implicitly convert from strings. Implicit conversions like this are not great in general because they can hide complexity and inefficiencies in the code, but this case is especially bad, because after the transition from `boost::filesystem` to `std::filesystem` in #20744 the behavior of this constructor on windows will be more complicated and can mangle path strings. The `fs::path` class also has a `.string()` method which is inverse of the constructor and has the same problems.
Fix this by replacing the unsafe method calls with `PathToString` and `PathFromString` function calls, and by forbidding unsafe method calls in the future.
ACKs for top commit:
kiminuo:
ACK 6544ea5035
laanwj:
Code review ACK 6544ea5035
hebasto:
re-ACK 6544ea5035, only added `fsbridge_stem` test case, updated comment, and rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/22937#pullrequestreview-765503126) review. Verified with the following command:
Tree-SHA512: c36324740eb4ee55151146626166c00d5ccc4b6f3df777e75c112bcb4d1db436c1d9cc8c29a1e7fb96051457d317961ab42e6c380c3be2771d135771b2b49fa0
There is no change in behavior. This just helps prepare for the
transition from boost::filesystem to std::filesystem by avoiding calls
to methods which will be unsafe after the transaction to std::filesystem
to due lack of a boost::filesystem::path::imbue equivalent and inability
to set a predictable locale.
Co-authored-by: Hennadii Stepanov <32963518+hebasto@users.noreply.github.com>
Co-authored-by: Kiminuo <kiminuo@protonmail.com>
Co-authored-by: MarcoFalke <falke.marco@gmail.com>
Boost Filesystem basename() function is deprecated since v1.36.0.
Also, defining BOOST_FILESYSTEM_NO_DEPRECATED before including
filesystem headers is strongly recommended. This prevents inadvertent
use of old features, particularly legacy function names, that have been
replaced and are going to go away in the future.