bitcoin/contrib/linearize
fanquake 8625446b4d
Merge #17336: scripts: search for first block file for linearize-data with some block files pruned
317fb96de9 Add search for first blk file with pruned node (Rjected)

Pull request description:

  <!--
  *** Please remove the following help text before submitting: ***

  Pull requests without a rationale and clear improvement may be closed
  immediately.
  -->

  <!--
  Please provide clear motivation for your patch and explain how it improves
  Bitcoin Core user experience or Bitcoin Core developer experience
  significantly:

  * Any test improvements or new tests that improve coverage are always welcome.
  * All other changes should have accompanying unit tests (see `src/test/`) or
    functional tests (see `test/`). Contributors should note which tests cover
    modified code. If no tests exist for a region of modified code, new tests
    should accompany the change.
  * Bug fixes are most welcome when they come with steps to reproduce or an
    explanation of the potential issue as well as reasoning for the way the bug
    was fixed.
  * Features are welcome, but might be rejected due to design or scope issues.
    If a feature is based on a lot of dependencies, contributors should first
    consider building the system outside of Bitcoin Core, if possible.
  * Refactoring changes are only accepted if they are required for a feature or
    bug fix or otherwise improve developer experience significantly. For example,
    most "code style" refactoring changes require a thorough explanation why they
    are useful, what downsides they have and why they *significantly* improve
    developer experience or avoid serious programming bugs. Note that code style
    is often a subjective matter. Unless they are explicitly mentioned to be
    preferred in the [developer notes](/doc/developer-notes.md), stylistic code
    changes are usually rejected.
  -->
  When bitcoind is running in pruned mode, producing a hashlist with `./linearize-hashes.py linearize.cfg > hashlist.txt` and then executing `linearize-data.py linearize.cfg` will produce:
  ```
  Read 313001 hashes
  Input file /home/dan/.bitcoin/blocks/blk00000.dat
  Premature end of block data
  ```
  This happens because `linearize-data` starts by attempting to process `blk00000.dat` regardless of whether or not `blk00000.dat` actually exists - this may not be the case if working with a pruned node.
  This PR adds a function which finds the first block file that does exist, and calls that function when the `BlockDataCopier` is initialized.

  This is a refactor of #16431.

  <!--
  Bitcoin Core has a thorough review process and even the most trivial change
  needs to pass a lot of eyes and requires non-zero or even substantial time
  effort to review. There is a huge lack of active reviewers on the project, so
  patches often sit for a long time.
  -->

ACKs for top commit:
  darosior:
    ACK 317fb96de9
  laanwj:
    Code review ACK 317fb96de9
  theStack:
    Code review ACK 317fb96de9

Tree-SHA512: fc8014282df6cfe7b267e64db8ce7d82b86b758c302fbfea4a3c39b62d93512f5c2e31a0de4e9c5ec18fc0268c917f011257d37b45afaef6033eec90e4aa585f
2020-02-05 08:45:06 +08:00
..
example-linearize.cfg doc: Added regtest config for linearize script 2019-11-07 08:07:10 +02:00
linearize-data.py Merge #17336: scripts: search for first block file for linearize-data with some block files pruned 2020-02-05 08:45:06 +08:00
linearize-hashes.py scripted-diff: Bump copyright of files changed in 2019 2019-12-30 10:42:20 +13:00
README.md build: Require python 3.5 2019-03-02 10:40:23 -05:00

Linearize

Construct a linear, no-fork, best version of the Bitcoin blockchain.

Step 1: Download hash list

$ ./linearize-hashes.py linearize.cfg > hashlist.txt

Required configuration file settings for linearize-hashes:

  • RPC: datadir (Required if rpcuser and rpcpassword are not specified)
  • RPC: rpcuser, rpcpassword (Required if datadir is not specified)

Optional config file setting for linearize-hashes:

  • RPC: host (Default: 127.0.0.1)
  • RPC: port (Default: 8332)
  • Blockchain: min_height, max_height
  • rev_hash_bytes: If true, the written block hash list will be byte-reversed. (In other words, the hash returned by getblockhash will have its bytes reversed.) False by default. Intended for generation of standalone hash lists but safe to use with linearize-data.py, which will output the same data no matter which byte format is chosen.

The linearize-hashes script requires a connection, local or remote, to a JSON-RPC server. Running bitcoind or bitcoin-qt -server will be sufficient.

Step 2: Copy local block data

$ ./linearize-data.py linearize.cfg

Required configuration file settings:

  • output_file: The file that will contain the final blockchain. or
  • output: Output directory for linearized blocks/blkNNNNN.dat output.

Optional config file setting for linearize-data:

  • debug_output: Some printouts may not always be desired. If true, such output will be printed.
  • file_timestamp: Set each file's last-accessed and last-modified times, respectively, to the current time and to the timestamp of the most recent block written to the script's blockchain.
  • genesis: The hash of the genesis block in the blockchain.
  • input: bitcoind blocks/ directory containing blkNNNNN.dat
  • hashlist: text file containing list of block hashes created by linearize-hashes.py.
  • max_out_sz: Maximum size for files created by the output_file option. (Default: 1000*1000*1000 bytes)
  • netmagic: Network magic number.
  • out_of_order_cache_sz: If out-of-order blocks are being read, the block can be written to a cache so that the blockchain doesn't have to be sought again. This option specifies the cache size. (Default: 100*1000*1000 bytes)
  • rev_hash_bytes: If true, the block hash list written by linearize-hashes.py will be byte-reversed when read by linearize-data.py. See the linearize-hashes entry for more information.
  • split_timestamp: Split blockchain files when a new month is first seen, in addition to reaching a maximum file size (max_out_sz).