purpleKarrot Gedankenexperimente

Towards a Side-Effect Free Validation Library

At the recent Core developer meeting in Barcelona, there were several discussions about the possibility of a side-effect free validation library. The idea is appealing for obvious reasons: consensus validation is among the most critical parts of Bitcoin software, yet today it lives inside a much larger system concerned with networking, storage, chainstate management, and node operation.

One discussion that particularly influenced my thinking was with Pieter Wuille. While talking about validation, he explained why Bitcoin Core defers certain expensive validation work until a side chain is promoted to the active chain. The details are less important than the requirement this imposes: any validation library that hopes to be useful must be capable of supporting the way Bitcoin Core actually validates blocks. A library that is merely designed for external consumers but not usable by Bitcoin Core itself would not be a success.

Recently, Sedited published a blog post analyzing Bitcoin Core's validation implementation. The article is valuable precisely because it focuses on understanding the architecture that exists today. Anyone interested in a validation library should read it first. Understanding the dependencies present in the current system is a prerequisite for deciding which responsibilities belong inside a consensus library and which do not.

A common assumption is that a validation library can be obtained by extracting existing validation code behind a cleaner API. I suspect the more interesting question is what the API should be in the first place.

The central design problem is not how to separate validation from the rest of the node. The central design problem is deciding which entities belong to the consensus vocabulary and which entities should be represented as abstractions.

It is entirely possible to design a validation library around abstract notions of blocks, transactions, scripts, and coins. Users could supply their own concrete representations while the library operates purely on these abstractions. Conversely, one could require users to adopt chain and UTXO representations provided by the library. Neither choice is dictated by the implementation. Instead, the boundary is a deliberate design decision.

My current view is that block, block_header, transaction, outpoint, tx_input, tx_output, script, and coin, should be treated as vocabulary types, while chain_view and coin_index should serve as the primary abstractions.

Let me elaborate on these two abstractions, as they loosely correspond to CChain and CCoinsViewCache in Bitcoin Core, or HeaderAncestryView and UnspentOutputsView from Hornet -- though with some important differences.

As discussed in a previous post, I consider chain_view to be a sized, random-access view of block_header, which can be expressed in C++ as:

template <typename T>
concept chain_view = std::ranges::view<T>
  && std::ranges::sized_range<T>
  && std::ranges::random_access_range<T>
  && std::convertible_to<
       std::ranges::range_reference_t<T>,
       block_header>;

Unlike CChain in Bitcoin Core, which represents the current active chain, chain_view is intended as a lightweight snapshot of the ancestry leading to a particular block. In C++ terms, a view is a range with $O(1)$ copy complexity; in that sense, chain_view is a true view, whereas HeaderAncestryView, UnspentOutputsView (both from Hornet), and CCoinsViewCache (from Bitcoin Core) are not -- despite their names suggesting otherwise.

In contrast to HeaderAncestryView in Hornet, which exposes the extension points TimestampAt, HashAt, and MedianTimePast via virtual functions, chain_view instead delegates the derivation of such properties to the validation library itself. For example:

auto median_time_past(chain_view auto chain) {
  assert(!chain.empty());
  auto get_time = [](block_header const& header) {
    return header.time();
  };
  auto times = chain
    | std::views::transform(get_time)
    | std::views::reverse
    | std::views::take(11)
    | std::ranges::to<std::vector>();
  auto const middle = times.begin() + times.size() / 2;
  std::ranges::nth_element(times, middle);
  return *middle;
}

The coin_index abstraction, in turn, is simply a partial mapping from outpoint to coin, which can be expressed in C++ as:

template <typename T>
concept coin_index = requires (T const& lookup, outpoint p) {
  { lookup(p) } -> std::convertible_to<std::optional<coin>>;
};

How this lookup is implemented -- whether it uses caching, what data structure backs it, or whether it is persisted in a database -- is entirely irrelevant to the validation library. There might be the requirement that the lookup function may be invoked concurrently, though.

Hornet's UnspentOutputsView exposes the extension points QueryPrevoutsUnspent and QueryOutPointsUnique as virtual functions. However, these are not merely state queries but effectively consensus decisions, which in turn push consensus logic into the database layer. In contrast, coin_index keeps a clean separation between state access and consensus logic, exposing only a pure lookup interface over state.

With the vocabulary types and abstractions in place, and following sedited's identification of the three stages of validation in his blog post, it becomes straightforward to define an overload set of verification functions.

Function overloading is preferred over introducing names such as ContextFreeCheck and ContextualCheck, since "context" is not a well-defined abstraction. Instead, additional parameters naturally represent additional pieces of consensus evidence, and the progression is made explicit through the type system.

verify(header);
verify(header, chain, now);

verify(tx);
verify(tx, chain);
verify(tx, chain, coins);

verify(block);
verify(block, chain, now);
verify(block, chain, now, coins);

All these functions have a boolean outcome: the object is either valid with respect to the provided evidence or it is not. If the object is valid, additional information or facts may be produced as a by-product, for example spent coins or undo data, which should be communicated back to the caller so that state updates can be applied without recomputing the same information. If the object is invalid, the caller may also want to know precisely which consensus rule was violated.

Structurally, this could be modeled as std::expected<fact, verify_status>. However, I am not entirely convinced this is the right abstraction. My concern is that expected tends to frame the second alternative as an "error" in the conventional error-handling sense, while a failed verification is not an exceptional condition but simply one of the two normal outcomes of consensus evaluation.

All true failures, such as allocation errors or database issues, remain exceptional conditions and are still propagated via exceptions. As a historical aside, the 2013 chain split is often discussed in the context of a subtle bug where an exception path was effectively treated as a negative verification result, contributing to inconsistent validation behavior between nodes. This highlights why the separation between a negative validation outcome and an exceptional condition is essential: consensus logic must remain a pure decision process, while exceptional failures must never be allowed to masquerade as valid semantic results of that decision.