Yggdrasil Parity Proof Report
Document round: §1–§9 evidence captured at R248 (2026-05-02);
header refreshed R308 (2026-05-09); R313–R320 docstring-classification
arc note added R321 (2026-05-09).
Cumulative arc: R1 → R320+
Build: target/release/yggdrasil-node (Cargo release profile, Rust 1.95.0)
Workspace tests: 4,982 passing, 0 failing (post-R336 sister-tools Phase A skeleton milestone; +126 since R325 baseline). Five gates clean
(cargo fmt --check, cargo check-all, cargo lint, cargo test-all,
python3 scripts/check-strict-mirror.py --fail-on-violation); the
parity-flow validators clean too (check-parity-matrix.py over 8
entries against tag 11.0.1; check-fixture-manifest.py over the
cardano-base SHA pin matrix; check-reference-artifacts.py over the
11.0.1 install with 9 binaries + 3 network bundles).
R273-rename + R274–R311 file-mirror & tech-debt arc + R313–R320 docstring-classification cleanup (closed 2026-05-09). The vendored upstream tree was refreshed to policy tag
11.0.1. A strict 1:1 file-mirror CI drift-guard (scripts/check-strict-mirror.py) landed warn-only at R275 and was promoted to fail-build at R288; the per-file allowlist lives indocs/strict-mirror-audit.tsv(post-R324 final: 246(a) DIRECT_MIRROR+ 202(c) NO_MIRROR_NEEDS_DOCSTRING= 448 graded files; zero(c) strict-partialafter R320 closure; zero(a) auto+ zero(a) auto (affinity-filtered)after R323/R324 closure — every (a) row has an explicit**Strict mirror:** <upstream/path.hs>declaration with no basename-heuristic reliance; zero(b)rename-needed; zero(d)clash-regrade). Every production.rseither mirrors a single upstream.hsby snake_case basename or carries a## Naming paritydocstring stanza. All production#[allow(dead_code)]sites and the lone productionTODOwere resolved. A newcrates/cardano-cli/workspace member (R289–R295) mirrors the full upstreamCardano.CLI.*surface (~237 Rust files mirroring 180 upstream.hs); concrete migration kicked off via R296 (Version) + R297 (ShowUpstreamConfig) with byte-equivalent output verified against.reference-haskell-cardano-node/install/bin/cardano-cli. Two new validators (check-fixture-manifest.py+check-reference-artifacts.py) joined the parity-flow surface at R303. The §1–§9 evidence below from R248 closure remains valid — the strict-mirror arc tightened the file-naming policy and reduced tech debt without changing any closure status.
This report documents yggdrasil’s parity status against upstream
IntersectMBO Cardano node / cardano-cli behavior. It is the
canonical reference for “what works end-to-end” today and what
remains. Each claim cites the round that closed it and the
operational evidence captured under docs/operational-runs/.
1. cardano-cli LSQ surface — 25/25 subcommands working
R205 verified all 25 cardano-cli conway query subcommands decode
end-to-end against yggdrasil’s NtC socket on preview with
YGG_LSQ_ERA_FLOOR=6:
Always-available (no era gating)
| Subcommand | Round | Wire encoder | Live data |
|---|---|---|---|
query tip |
R155 | BlockQuery (QueryHardFork ...) |
✅ live |
query protocol-parameters |
R156/R159/R160/R161 | Conway 31-elem PParams | ✅ live |
query era-history |
R153/R162 | Interpreter with bignum relativeTime |
✅ live |
query slot-number |
R162 | era-history coverage to slot 2^48 | ✅ live |
query utxo --whole-utxo/--address/--tx-in |
R157 | era-specific TxOut shapes | ✅ live |
query tx-mempool info/next-tx/tx-exists |
R158 | LocalTxMonitor v23+ | ✅ live |
Era-gated (require YGG_LSQ_ERA_FLOOR=6)
| Subcommand | Round | Wire encoder | Live data |
|---|---|---|---|
query stake-pools |
R163/R179 | tag 16 Set PoolKeyHash |
✅ live (3 pools on preview) |
query stake-distribution |
R179 | tag 37 [map, NonZero Coin] |
✅ live |
query stake-snapshot |
R173/R179/R202/R203 | tag 20 GetCBOR wrap |
✅ live (per-pool from sidecar) |
query pool-state |
R172/R179 | tag 19 GetCBOR wrap |
✅ live |
query stake-address-info |
R177 | tag 10 GetFilteredDelegationsAndRewardAccounts |
✅ live |
query ref-script-size |
R163 | era-specific TxOut script-ref | ✅ live |
Conway governance (R180–R188 + R193–R204)
| Subcommand | Round | Wire encoder | Live data |
|---|---|---|---|
query constitution |
R180 | tag 23 Constitution 2-elem |
✅ live (real anchor + script hash on preview) |
query gov-state |
R188/R193/R204 | tag 24 ConwayGovState 7-field |
✅ live (GovRelation from EnactState; OMap proposals shape adapter) |
query drep-state |
R180/R181 | tag 25 Map DRep DRepState |
✅ live |
query drep-stake-distribution |
R184/R194 | tag 26 Map DRep Coin |
✅ live |
query committee-state |
R182 | tag 27 CommitteeMembersState 3-elem |
✅ live |
query treasury |
R180 | tag 29 [treasury, reserves] |
✅ live |
query spo-stake-distribution |
R184/R194 | tag 30 Map (KeyHash 'StakePool) Coin |
✅ live (3 preview pools surfaced) |
query proposals |
R185 | tag 31 Seq GovActionState |
✅ live (empty on preview) |
query ratify-state |
R187 | tag 32 RatifyState 4-field record |
✅ live (real EnactState rendered) |
query future-pparams |
R183 | tag 33 Maybe (PParams era) |
✅ live |
query stake-pool-default-vote |
R185 | tag 35 DefaultVote enum |
✅ live |
query ledger-peer-snapshot |
R189/R195 | tag 34 V2 [1, [WithOrigin SlotNo, indef pools]] |
✅ live (3 preview pools with relays) |
Operational (R190)
| Subcommand | Round | Wire encoder | Live data |
|---|---|---|---|
query ledger-state |
R190 | tag 12 DebugNewEpochState (null acceptable per cli convention) |
✅ |
query protocol-state |
R190/R191/R196/R197/R198 | tag 13 Versioned 0 PraosState 8-field |
✅ live (real nonces + OCert counters) |
CLI-side validation (not yggdrasil bugs)
These three subcommands were initially flagged in R190 audit but turned out to be client-side argument validation issues, not yggdrasil bugs. They work given correct CLI inputs:
query kes-period-info— needs valid--op-cert-filequery leadership-schedule— needs--genesis FILEPATH+--stake-pool-verification-key STRINGquery stake-address-info— needs Bech32-valid stake address (verified working viacardano-cli conway stake-address build- generated address in R190)
2. Consensus-side state persistence
R238 makes slot-indexed ChainDepState bundles the authoritative
nonce/OpCert state source. They persist atomically at ledger-checkpoint
cadence, restore on restart, restore on rollback, and feed LSQ
protocol-state for exact acquired points. stake_snapshots.cbor
remains the separate stake-snapshot mirror for stake-query surfaces:
| Sidecar | Round | Filename | Surfaces in | Restart resilient |
|---|---|---|---|---|
| ChainDepState nonce + OpCert bundle | R238 | chain_dep_state/<slot-hex>.cbor |
query protocol-state nonces + oCertCounters |
✅ exact restart/rollback path |
| Stake snapshots | R202/R203 | stake_snapshots.cbor |
query stake-snapshot per-pool totals |
✅ |
Current canonical verification is R238: exact ChainDepState sidecars are
saved under chain_dep_state/, rollback restores the newest bundle at or
before the target point, and LSQ protocol-state ignores any stale
root-level nonce/OpCert mirror files.
R205 verified the live-nonce restart path before R238 replaced the root-level nonce/OpCert mirrors with slot-indexed ChainDepState bundles. Those filenames below are retained only as historical evidence for that round:
Pre-restart at slot ~10K:
$ ls /tmp/ygg-r205-preview-db/*.cbor
nonce_state.cbor (114 B)
ocert_counters.cbor (218 B)
stake_snapshots.cbor ( 18 B)
Restart log:
[Node.Recovery] recovered ledger state from coordinated storage
checkpointSlot=9960
point=BlockPoint(SlotNo(10960), HeaderHash(c6dfa20907819b0c...))
replayedVolatileBlocks=50
Post-restart:
$ cardano-cli conway query protocol-state --testnet-magic 2
{
"candidateNonce": "509aed8ad40c83c7201fd99c84501c698137a7152127e2ebe1bb9fe70a39077c",
"evolvingNonce": "509aed8ad40c83c7201fd99c84501c698137a7152127e2ebe1bb9fe70a39077c",
"labNonce": "0e45467482b969fd4a2f50031bda686935efad281be1b714e408afe7f3eb523a",
"lastSlot": 11940,
...
}
The current sidecars are persisted via:
crates/storage/src/ocert_sidecar.rs— atomic-write helpers forchain_dep_state/<slot-hex>.cborandstake_snapshots.cbornode/src/sync.rs::update_ledger_checkpoint_after_progress— persists ChainDepState only at checkpoint landing after nonce/OpCert updatesnode/src/local_server.rs::attach_chain_dep_state_from_sidecar— loads exact point sidecars at LSQ acquire time and attaches toLedgerStateSnapshotvia the R192with_chain_dep_stateand R202with_stake_snapshotsbuilders
2b. Genesis hash preflight
R244 closes the last genesis-hash verification asymmetry. Upstream
cardano-node reads Byron genesis through
Cardano.Chain.Genesis.readGenesisData; upstream cardano-ledger
parses Canonical JSON and hashes renderCanonicalJSON bytes. Yggdrasil
now mirrors that for ByronGenesisHash, while Shelley, Alonzo, and
Conway continue to use raw-file Blake2b-256.
Current operator-facing behavior:
| Genesis field | Hash basis | Startup/preflight status |
|---|---|---|
ByronGenesisHash |
Canonical JSON rendering | ✅ verified |
ShelleyGenesisHash |
raw file bytes | ✅ verified |
AlonzoGenesisHash |
raw file bytes | ✅ verified |
ConwayGenesisHash |
raw file bytes | ✅ verified |
validate-config --network mainnet now reports Genesis hashes: 4/4
verified, and Node.GenesisHash.Verified traces byronVerified=true
when the file/hash pair is present.
2c. BBODY header protocol-version policy
R245 mirrors the upstream cardano-ledger Conway BBODY change that
keeps HeaderProtVerTooHigh active on mainnet, temporarily suppresses
that check on testnets before Dijkstra, and re-enables it on testnets
once protocol major 12 is active. Yggdrasil threads network_magic
through VerificationConfig to classify mainnet vs. testnet, and keeps
the separate MaxMajorProtVer ceiling active for every network.
The paired upstream GOV drift switches preceedingHardFork from the
original state to accumulated proposals. Yggdrasil’s Conway proposal
validator already evaluates hard-fork sequencing against the accumulated
pending-proposal view, and the hard-fork sequencing regression tests
remain green.
2d. Preview Plutus well-formedness/runtime parity
R246 closes the preview replay blocker originally observed as
MalformedReferenceScripts near Babbage slot 730728, plus the later
Plutus validation mismatch at slot 840719. Follow-up preview replay also
closed Plutus serialiseData CBOR-shape drift and legacy
AccountRegistration redeemer/witness over-collection.
The local behavior now matches the upstream path that stores Plutus
scripts as PlutusBinary:
| Parity point | R246 behavior |
|---|---|
| Well-formedness input | Treat script bytes as raw PlutusBinary: a CBOR bytestring containing Flat; do not fall back between raw Flat and CBOR shapes |
| Language gate | Use protocol version: V1 >= PV5, V2 >= PV7, V3 >= PV9 |
| Reference inputs | Sort resolved Babbage/Conway reference inputs by ShelleyTxIn, matching upstream Set.toList order |
| CEK memory | Non-constant runtime values consume ExMemory = 1 |
| Plutus Integer | UPLC Constant::Integer, PlutusData::Integer, Flat integer decode, CBOR bignum decode/encode, and integer builtins use arbitrary-precision BigInt, matching upstream Haskell Integer |
| Validity intervals | Pre-Conway upper-only intervals use inclusive PV1.to; Conway/PV9+ upper bounds remain strict |
serialiseData |
PlutusData CBOR uses upstream constructor tags, bounded bytes chunks, and Haskell-list shape for non-empty lists/constructor fields |
| Legacy registration certs | Shelley AccountRegistration is not a Certifying script purpose and does not require a credential witness/redeemer; deposit-bearing Conway registration certs still do |
Focused verification:
cargo fmt
cargo fmt --all -- --check
cargo test -p yggdrasil-plutus flat::tests --lib
cargo test -p yggdrasil-plutus builtins::tests::serialise_data --lib
cargo test -p yggdrasil-ledger cbor::tests::extract_block_tx_byte_spans --lib
cargo test -p yggdrasil-ledger plutus::tests --lib
cargo test -p yggdrasil-ledger plutus_validation::tests::validate_certifying_script_skips_legacy_registration_without_redeemer --lib
cargo test -p yggdrasil-ledger witnesses::tests --lib
cargo test -p yggdrasil-ledger witness_validation --test integration
cargo test -p yggdrasil-node sync:: --lib
cargo test -p yggdrasil-node runtime::tests::runtime_recovery_preserves_current_epoch_block_counts --lib
cargo build -p yggdrasil-node --release
These focused gates passed, and the broad cargo check-all,
cargo test-all, and cargo lint gates also pass after the latest
patches.
Operational replay:
cargo run --manifest-path tmp/refscan/Cargo.toml --release -- \
tmp/preview-producer/db/producer \
tmp/preview-producer/config/preview-producer.json
Result after the PlutusBinary/integer/ScriptContext fixes: checkpoint
SlotNo(834713), volatile replay blocks=255, and final tip
SlotNo(842753). After the serialiseData and legacy registration
certificate fixes, refscan advanced from checkpoint SlotNo(898474) to
final tip SlotNo(901725). No MalformedReferenceScripts,
ValidationTagMismatch, ledger decode error, or missing
legacy-registration redeemer occurred.
A later bounded live preview run reached checkpoint slot 1038614, then
stopped at slot 1038978 with WithdrawalExceedsBalance in transaction
ffe6f2f14fe4743872f13863eb2e64898928b68415bce7180abe8d69325b580c.
That stop is stale persisted reward state from checkpoints written before
runtime recovery preserved current-epoch pool_block_counts, not a
Plutus failure. Clean/repaired preview replay is required for final
operational evidence past that point.
2e. Preview Origin-prefix BlockFetch parity
R247 fixes the clean-preview replay stop caused by the first verified
sync batch starting at Point::Origin, collecting several ChainSync
roll-forward headers, then requesting BlockFetch only for the last
announced header. Since BlockFetch requires concrete points and cannot
fetch from virtual Origin, the batch now uses the first announced concrete
header as the lower bound.
Local impact:
sync_batch_verified_with_tentative()routes pending roll-forward headers throughblockfetch_range_for_pending_forwards().normalize_blockfetch_range_points()keeps its Origin-to-upper behavior for ordinary single-point callers; the prefix-aware helper handles the multi-header verified-batch case.crates/ledger/src/state.rscomments now document upstream Byron genesis pseudo-output ids correctly: hash serializedAddressonly, not[address, amount].
Focused verification:
cargo test -p yggdrasil-node blockfetch_range_ --lib
Operational verification used a fresh preview database and the existing
preview producer config. The replay stored the early prefix blocks
including slots 0, 60, 300, and 320, resolved the previously
missing Byron lineage, and advanced to slot 101100 before the bounded
run ended.
2f. Preview TPraos overlay VRF parity
R248 closes the preview verified-sync stop at Alonzo slot 106220.
That slot is an active TPraos overlay slot because preview starts from
Shelley genesis with decentralisationParam = 1. Upstream validates
this branch with Cardano.Protocol.TPraos.Rules.Overlay.pbftVrfChecks:
the selected genesis delegate cold key and VRF key must match, both
TPraos VRF proofs must verify, and the pool stake leader-threshold check
is not applied.
Local behavior now matches that split:
| Overlay case | R248 behavior |
|---|---|
| Active overlay slot | Verify selected genesis delegate cold key, delegate VRF key, leader VRF proof, and nonce VRF proof; skip pool stake threshold |
| Reserved non-active overlay slot | Fail closed with TpraosOverlaySlotNotActive |
| Non-overlay TPraos slot | Keep existing pool stake distribution threshold validation |
Focused verification:
cargo test -p yggdrasil-consensus praos::tests --lib
cargo test -p yggdrasil-consensus epoch::tests --lib
cargo test -p yggdrasil-node tpraos_overlay_schedule --lib
cargo test -p yggdrasil-node sync:: --lib
cargo build -p yggdrasil-node --release
Operational verification used a copied preview database from before the
failure and the existing preview producer config. The first run passed
former blocker slot 106220, progressed into Babbage, and stopped only
after operator SIGTERM at slot 412896. A continuation run resumed that
same database, crossed the former 730728 MalformedReferenceScripts
region and the former 840719 ValidationTagMismatch region, and
reached slot 868687. Log scans found no VRF verification failed,
MalformedReferenceScripts, ValidationTagMismatch, ledger decode
error, or panic. Attempts to continue beyond 868687 hit repeated
preview peer mux closures during reconnect/intersection; that is
recorded as an operational peer availability issue, not a validation
failure.
3. Sync robustness — Phase B verified
R199 verified Phase B (R91 multi-peer storage livelock) is fully resolved:
Setup: --max-concurrent-block-fetch-peers 4 for 2 minutes
Result: 22 K blocks synced (slot 21960 reached)
667 immutable files written
volatile=963 KB, immutable=1.5 MB, ledger=22 KB
Restart: recovered ledger state from coordinated storage
checkpointSlot=21960
replayedVolatileBlocks=100
tip resumes from slot 23960 → advances to 25940
Multi-peer dispatch correctly persists to all three storage tiers and recovery from checkpoint resumes sync without re-fetching from origin. R91 was likely closed by R196’s checkpoint persistence wiring in the chaindb apply path; R199 explicitly verified the resolution.
4. Observability — Phase C.1 baseline (+R217 + R218 sync-rate quantification)
R200 added yggdrasil_apply_batch_duration_seconds Prometheus
histogram (10 cumulative buckets [0.001, 0.005, 0.01, 0.05, 0.1,
0.5, 1.0, 5.0, 10.0, +Inf] + _sum + _count):
$ curl -s http://127.0.0.1:12400/metrics | grep apply_batch
yggdrasil_apply_batch_duration_seconds_bucket{le="0.5"} 2
yggdrasil_apply_batch_duration_seconds_bucket{le="1"} 2
yggdrasil_apply_batch_duration_seconds_bucket{le="+Inf"} 2
yggdrasil_apply_batch_duration_seconds_sum 0.412206
yggdrasil_apply_batch_duration_seconds_count 2
Preview baseline: ~206 ms/batch.
R217 added the companion yggdrasil_fetch_batch_duration_seconds
histogram (same bucket boundaries; covers ChainSync RequestNext +
BlockFetch RequestRange round-trip + body-hash + KES verification).
Mainnet baseline (60 s, 4 batches, single-peer):
yggdrasil_fetch_batch_duration_seconds_sum 51.384605
yggdrasil_fetch_batch_duration_seconds_count 4
yggdrasil_apply_batch_duration_seconds_sum 0.871671
yggdrasil_apply_batch_duration_seconds_count 4
→ fetch avg = 12.85 s/batch, apply avg = 0.22 s/batch. Fetch is ~59× more expensive than apply on mainnet — pipelined fetch+apply (Phase C.2) saves at most ~1.7% throughput.
R218 operationally validated multi-peer dispatch as the actual
sync-rate lever. --max-concurrent-block-fetch-peers 4 on mainnet
(2 active warm peers) produces:
| Configuration | fetch avg / batch | apply avg / batch | throughput |
|---|---|---|---|
| Single-peer (R217) | 12.85 s | 0.22 s | 3.33 blk/s |
| Multi-peer, 2 workers (R218) | 8.56 s | 0.23 s | 5.55 blk/s |
| Δ | −33% | flat (noise) | +67% |
Apply is unchanged — confirms multi-peer dispatch isolates fetch
parallelism without touching the apply path. Each additional warm
peer that registers as a worker subtracts ≈ (fetch_avg / N) from
the per-batch fetch time, so operators can recover sync rate by
adding topology peers.
4b. Phase D.2 — Multi-session peer accounting (5 lifetime counters)
R222 + R223 + R224 + R226 deliver the major Phase D.2 scope: a
parallel-tracking shadow data structure (PeerLifetimeStats) on
GovernorState that accumulates monotonically across reconnects,
distinct from the session-keyed failures map (which decays /
resets on record_success). Five Prometheus counters / gauges
expose the aggregate state:
| Metric | Type | Source | Round |
|---|---|---|---|
peer_lifetime_sessions_total |
counter | promote_to_warm Ok branch |
R223 |
peer_lifetime_failures_total |
counter | promote_to_warm Err branch |
R223 |
peer_lifetime_bytes_in_total |
counter | BlockFetch bytes_delivered mirror |
R224 |
peer_lifetime_unique_peers |
gauge | lifetime_stats map cardinality |
R226 |
peer_lifetime_handshakes_total |
counter | successful_handshakes aggregate |
R226 |
Mainnet operator-derived signals (verified R226, 60 s knob=4):
yggdrasil_peer_lifetime_sessions_total 2
yggdrasil_peer_lifetime_failures_total 0
yggdrasil_peer_lifetime_bytes_in_total 1548246
yggdrasil_peer_lifetime_unique_peers 3
yggdrasil_peer_lifetime_handshakes_total 2
Operators compute:
# Reliability ratio
yggdrasil_peer_lifetime_failures_total
/ yggdrasil_peer_lifetime_sessions_total
# Avg bytes per session
yggdrasil_peer_lifetime_bytes_in_total
/ yggdrasil_peer_lifetime_sessions_total
# Registry-leakage indicator (peers tracked but never promoted)
1 - (yggdrasil_peer_lifetime_sessions_total
/ yggdrasil_peer_lifetime_unique_peers)
# Real peer churn rate (cumulative reconnects)
rate(yggdrasil_peer_lifetime_sessions_total[5m])
R234, R235, and R237 complete the aggregate server egress path: BlockFetch, ChainSync, KeepAlive, TxSubmission2, and PeerSharing bytes-out counters are recorded without high-cardinality Prometheus labels, and per-peer egress totals are folded into lifetime stats.
4c. Phase D.1 — Rollback recovery and sidecars (R225+R237+R238)
R225 adds yggdrasil_rollback_depth_blocks Prometheus histogram
classifying actual rollback depths. Bucket boundaries
[1, 2, 5, 50, 2160 (k), 10_000, +Inf] span shallow chain
reorgs through the stability window edge to cross-epoch and
full-resync shapes.
Operators alert on rare deep rollbacks via:
histogram_quantile(0.99,
rate(yggdrasil_rollback_depth_blocks_bucket[1h]))
R237 adds epoch-boundary-aware checkpoint replay when stake
snapshots are enabled. R238 completes the code-level nonce/OpCert
sidecar hardening: storage keeps opaque slot-indexed
chain_dep_state/<slot-hex>.cbor bundles, verified sync writes
them only at ledger-checkpoint cadence after nonce/OpCert updates,
RollBackward terminates the current batch, recovery restores the
newest bundle at or before the rollback point, verifies the bundled
point against the selected chain prefix, and replays stored raw
blocks to the rollback target. LSQ protocol-state prefers exact
point sidecars and does not read nonce/OpCert latest mirrors.
5. Upstream alignment — Phase E.1 closed
R201 advanced the first documentary upstream pins to live HEAD, R216
refreshed the pins that drifted again, R239 completed the coordinated
cardano-base fixture refresh, R243 refreshed the import-only
cardano-ledger drift from upstream PR #5787, and R245 refreshed the
BBODY/GOV cardano-ledger drift through b90b97488da3…. All six canonical
IntersectMBO pins now match live HEAD and cardano-base still keeps
the test-vector directory name, crypto test constant, and node pin in
lockstep:
| Repository | Pinned audit baseline | Status |
|---|---|---|
cardano-base |
7a8a991945d4… (R239 fixture refresh) |
in-sync |
cardano-ledger |
b90b97488da3… (R245 BBODY/GOV refresh) |
in-sync |
ouroboros-consensus |
b047aca4a731… (R216 advance) |
in-sync |
ouroboros-network |
0e84bced45c7… |
in-sync |
plutus |
4cd40a14e364… (R216 advance) |
in-sync |
cardano-node |
799325937a45… |
in-sync |
Drift detector (bash crates/node/yggdrasil-node/scripts/check_upstream_drift.sh) reports
drifted=0 unreachable=0 total=6. Three drift-guard tests pass
(format, cardinality, vendored-directory match). R201 → R216 → R239 → R243 → R245
demonstrates the audit baseline is actively maintained against
upstream while preserving SHA-anchored vendored fixture provenance.
6. Cumulative phase status
| Phase | Item | Status | Round(s) |
|---|---|---|---|
| A.1 | ChainDepStateContext infrastructure |
✅ closed | R192 |
| A.2 | Live PraosState (OCert + nonces) | ✅ closed | R196+R197+R198 |
| A.3 | Live GovRelation + gov-state OMap shape |
✅ closed | R193+R204 |
| A.4 | Live DRep/SPO stake + deleg deposits | ✅ closed | R194 |
| A.5 | Live ledger-peer-snapshot pools | ✅ closed | R195 |
| A.6 | GetGenesisConfig ShelleyGenesis serialiser |
✅ closed | R214 |
| A.7 | Live stake-snapshots | ✅ closed | R202+R203 |
| B | R91 multi-peer dispatch livelock | ✅ verified resolved | R199 |
| B (mainnet) | Mainnet sync unblocked (Byron EBB hash + same-slot tolerance + mux egress) | ✅ closed | R211+R213 |
| B (P2P) | Bidirectional P2P parity (server ChainSync Tip envelope) |
✅ closed | R220+R221 |
| C.1 | Apply-batch duration histogram | ✅ wired | R200 |
| C.1+ | Fetch-batch duration histogram + multi-peer quantification | ✅ wired | R217+R218 |
| C.2 | Pipelined fetch+apply | 🚫 de-prioritised | R217 measurement showed ~1.7% gain — multi-peer dispatch is the actual sync-rate lever |
| D.1 | Deep rollback recovery and chain-dep sidecars | ✅ closed code-level slice | R225+R237+R238 |
| D.2 | Multi-session peer accounting + aggregate bytes-out | ✅ shipped | R222+R223+R224+R226+R234+R235+R237 |
| E.1 | Audit baseline pin refresh + cardano-base fixture refresh |
✅ closed, 6/6 pins in-sync | R201+R216+R239+R243+R245 |
| E.2a | Parallel BlockFetch soak automation | ✅ harness shipped | R240 |
| Config integrity | Byron + Shelley-family genesis hash verification | ✅ closed | R244 |
| Ledger drift | Conway BBODY testnet HeaderProtVerTooHigh grace |
✅ closed | R245 |
| Plutus replay | Preview reference-script well-formedness/runtime parity | ✅ closed | R246 |
| Preview prefix fetch | Origin-start verified sync preserves first ChainSync-announced BlockFetch prefix | ✅ closed | R247 |
| TPraos overlay VRF | Active overlay slots use genesis-delegate proof checks and skip pool stake threshold | ✅ closed | R248 |
| E.2 | Mainnet rehearsal (24h+) | ⏳ deferred | (long-running observation) |
| E.3 | Parity proof report | ✅ this document (R206) | — |
The remaining gates are no longer known code-level parity blockers.
They require sustained operator time: the 24h+ mainnet rehearsal and
the runbook §6.5 sign-off before changing the default
max_concurrent_block_fetch_peers.
7. Remaining gates
Phase E.2 — Mainnet rehearsal
24+ hour continuous mainnet sync from genesis with metrics capture. Validates the already-working R211/R213 mainnet path at operator duration, including restart cycles, hash comparison against the Haskell node, and R238 rollback sidecar behavior under real chain conditions.
Parallel BlockFetch default flip
Graduated at R258 (2026-05-06). max_concurrent_block_fetch_peers
now defaults to 2, matching upstream
Ouroboros.Network.BlockFetch.Decision::bfcMaxConcurrencyBulkSync = 2.
R218 (docs/operational-runs/archive/2026-04-30-round-218-mainnet-multipeer-fetch-rate.md)
measured the multi-peer path on mainnet at knob=4: 67% throughput
delta, 2 active workers (saturating the BulkSync cap). R258
graduates this empirical evidence into the shipped default while
preserving operator override (MaxConcurrentBlockFetchPeers = 1 for
strict single-peer audit/replay parity).
R240 (crates/node/yggdrasil-node/scripts/parallel_blockfetch_soak.sh) remains the
operator-facing soak harness for stress-testing knob > 2 or running
endurance verification: starts the node with the requested
concurrency knob, captures Prometheus snapshots, asserts worker
migration metrics, optionally runs Haskell tip comparison, and
preserves a summary under $LOG_DIR/summary.txt.
8. Verification commands
To reproduce R205’s comprehensive verification:
# Build
cargo build --release -p yggdrasil-node
# Boot
rm -rf /tmp/ygg-verify-db /tmp/ygg-verify.sock
YGG_LSQ_ERA_FLOOR=6 target/release/yggdrasil-node run \
--network preview \
--database-path /tmp/ygg-verify-db \
--socket-path /tmp/ygg-verify.sock \
--metrics-port 12400 &
sleep 30 # let sync establish
# Sweep all 25 cardano-cli subcommands
export CARDANO_NODE_SOCKET_PATH=/tmp/ygg-verify.sock
cardano-cli query tip --testnet-magic 2
cardano-cli conway query protocol-state --testnet-magic 2
cardano-cli conway query gov-state --testnet-magic 2
# ... (full list in R205 operational-run doc)
# Verify sidecars persist
find /tmp/ygg-verify-db -maxdepth 2 -name '*.cbor' -print
# Expected: stake_snapshots.cbor plus chain_dep_state/<slot-hex>.cbor
# Apply-batch histogram
curl -s http://127.0.0.1:12400/metrics | grep apply_batch
# Expected: 10 bucket lines + _sum + _count
# Drift detector
bash crates/node/yggdrasil-node/scripts/check_upstream_drift.sh
# Expected: drifted=0 unreachable=0 total=6
# Workspace gates
cargo fmt --all -- --check
cargo lint
cargo test-all
8b. Mainnet boot smoke test (R208, 2026-04-30)
This section is retained as historical diagnostic evidence. The failure was resolved by R211/R213 and then verified through the cardano-cli wire stack in R212.
R208 ran a 2-minute mainnet boot smoke test to validate the
--network mainnet codepath:
$ /workspaces/Cardano-node/target/release/yggdrasil-node run \
--network mainnet \
--database-path /tmp/ygg-r208-mainnet-db \
--socket-path /tmp/ygg-r208-mainnet.sock \
--metrics-port 12408 &
# After 2 minutes:
$ cardano-cli query tip --mainnet
{
"epoch": 0,
"era": "Byron",
"slotInEpoch": 0,
"slotsToEpochEnd": 21600,
"syncProgress": "0.00"
}
# Storage:
volatile/ (0 bytes — no blocks fetched)
ledger/ (no checkpoints persisted)
immutable/ (empty)
# Log shows repeated "ledger checkpoints cleared at origin"
# and verified-sync session establishes but doesn't advance.
Historical result: yggdrasil’s --network mainnet flag was
recognised and booted cleanly, but block fetch/apply did not
advance past Origin in this 2-minute R208 window.
Resolution: R211 fixed the Byron EBB hash prefix and same-slot
consensus tolerance, R213 fixed the mux egress limit for large
single LSQ payloads, and R212 verified mainnet cardano-cli
queries against an actively syncing node. Phase E.2 now means
long-duration operator rehearsal, not diagnosing this R208 stall.
8e. Mainnet operational verification with cardano-cli (R212, 2026-04-30)
R212 validates R211’s mainnet sync fix through the full LSQ wire stack — cardano-cli queries decode end-to-end against an actively syncing mainnet yggdrasil node.
Setup:
$ ./target/release/yggdrasil-node run \
--network mainnet \
--database-path /tmp/ygg-r212-mainnet-db \
--socket-path /tmp/ygg-r212-mainnet.sock \
--peer 3.135.125.51:3001 \
--metrics-port 12412 &
$ sleep 45
Sync after 45 s:
volatile/ 1 455 234 bytes
ledger/ 1 363 702 bytes
checkpoint persisted slot=47 retainedSnapshots=1
checkpoint skipped slot=97 / 147 (interval=2160)
cardano-cli query results (all against the active mainnet sync):
| Query | Result |
|---|---|
query tip |
{block: 197 → 397, era: "Shelley", hash: cf29…/a15b…} |
query era-history |
indef-length 2-era CBOR summary (Byron + Shelley) |
query slot-number 2024-06-01T00:00:00Z |
125712000 |
query protocol-parameters |
17-element Shelley shape, full PP JSON |
query tx-mempool info |
{capacity: 0, count: 0, size: 0, slot: 397} |
Sidecars (post-test mainnet <storage_dir>/, historical R212
pre-R238 filenames):
nonce_state.cbor 12 B
ocert_counters.cbor 1 B
stake_snapshots.cbor 14 B
Smaller than testnets because mainnet at slot 397 is pre-Shelley (post-Byron consensus state mostly empty — same shape as the pre-Shelley testnet behaviour observed in R207).
Post-R238 runs should instead show ChainDepState point bundles under
<storage_dir>/chain_dep_state/ plus the separate
stake_snapshots.cbor mirror:
$ find <storage_dir>/chain_dep_state -type f -name '*.cbor' | sort
<storage_dir>/chain_dep_state/000000000000002f.cbor
...
Multi-network parity matrix (closed by R212):
| Network | Operational verification | LSQ subcommands | Sidecars | Round |
|---|---|---|---|---|
| Preview | ✅ (Conway era) | 25/25 with YGG_LSQ_ERA_FLOOR=6 |
✅ | R205 |
| Preprod | ✅ (Allegra era) | 6/6 baseline | ✅ | R207 |
| Mainnet | ✅ (Byron at slot 397) | 5/5 baseline (utxo TBD) | ✅ | R212 |
Known limitation (closed in R213, 2026-04-30): query utxo
--whole-utxo --mainnet initially failed with BearerClosed. Root
cause was a 10-line semantic miscoding in the mux back-pressure
check (current + len > limit rejected single large payloads even
with empty buffer; should be current > limit per upstream
network-mux::egressSoftBufferLimit). After R213’s fix the query
returns the full mainnet AVVM bootstrap UTxO: 14 505 entries
totaling 31 112 484 745 ADA — matching byron-genesis.json::avvmDistr
exactly. See R213 in docs/operational-runs/.
8d. Mainnet sync unblocked — Byron EBB hash + same-slot tolerance (R211, 2026-04-30)
R211 closed the Phase E.2 critical path with a two-bug cascade fix:
Bug 1 — wrong hash prefix for Byron EBB headers. yggdrasil’s
node/src/sync.rs::point_from_raw_header
helper used byron_main_header_hash (prefix [0x82, 0x01]) for
EBB-shape headers. Byron EBBs require [0x82, 0x00] per
Cardano.Chain.Block.Header.boundaryHeaderHashAnnotated. Wrong
prefix → wrong hash → upstream BlockFetch can’t resolve the
upper-bound point → IOG peer closes mux mid-request.
Bug 2 — strict slot-monotonicity rejects Byron EBB→main_block at
same slot. Consensus ChainState::roll_forward rejected the
legitimate Byron transition where the genesis EBB at slot 0 is
followed by the first main block of epoch 0 also at slot 0 (Byron
EBBs are virtual epoch-boundary markers). The ledger-side check
already had Byron exemption; consensus-side was missing it.
Verification — mainnet now syncs:
$ rm -rf /tmp/ygg-r211e-mainnet-db
$ YGG_SYNC_DEBUG=1 timeout 60 ./target/release/yggdrasil-node run \
--network mainnet \
--database-path /tmp/ygg-r211e-mainnet-db \
--peer 3.135.125.51:3001 \
--max-concurrent-block-fetch-peers 1
[YGG_SYNC_DEBUG] shared applied
stable_block_count=0 epoch_events=0 rolled_back_tx_ids=0
tracking.tip=BlockPoint(SlotNo(197), HeaderHash(cf298afbb9eae55d…))
volatile/ 1 532 832 bytes ← non-zero
ledger/ 1 363 702 bytes ← checkpoint snapshots accumulating
Comparison R210 → R211:
| Signal | R210 | R211e |
|---|---|---|
[YGG_SYNC_DEBUG] applied |
0 | 6 |
volatile/ size |
0 B | 1.5 MB |
ledger/ size |
0 B | 1.4 MB |
| Final tip | Origin | slot 197 |
cleared-origin recoveries |
12 | 0 |
Code changes (4 files):
node/src/sync.rs— newbyron_ebb_header_hashhelper;decode_point_from_byron_raw_headerreturnsSome(Point)for EBBs (slot from innerepoch * BYRON_SLOTS_PER_EPOCH, hash via EBB prefix).crates/consensus/src/chain_state.rs— slot check relaxed from<=to<. Block-number contiguity check above catches re-application; Praos guarantees ≤ 1 block/slot post-Byron.node/src/runtime.rs— R210’sYGG_SYNC_DEBUG=1trace mirrored to shared-chaindb apply call site (the production NtN+NtC path).- Test updates:
roll_forward_accepts_same_slot_byron_ebb_main_pair,point_from_raw_header_decodes_observed_byron_serialised_header_envelopeupdated to expect EBB hash + slot=0 from inner header (the original test pinned the wrong slot 83 from outer envelope + main hash, masking the bug for ~200 rounds).
Strategic significance: yggdrasil now syncs mainnet end-to-end (subject to long-running stability + performance, separately tracked). The two-step diagnosis (R210 narrows to BlockFetch wire layer → R211 source-level diff identifies the encoding bug) is the canonical pattern for operational-parity work.
8c. Mainnet stall narrowed to BlockFetch wire layer (R210, 2026-04-30)
R210 added an opt-in YGG_SYNC_DEBUG=1 apply-side trace at the
apply_verified_progress_to_chaindb call site in
node/src/runtime.rs (~line 5008) to
answer R208’s open question: is the stall at BlockFetch (zero
blocks fetched per batch) or at apply (blocks fetched but
silently rejected)?
90 s mainnet run findings:
YGG_SYNC_DEBUG=1 timeout 90 ./target/release/yggdrasil-node run \
--network mainnet \
--database-path /tmp/ygg-r210-mainnet-db \
--peer 3.135.125.51:3001 \
--max-concurrent-block-fetch-peers 1
| Signal | Count |
| ------------------------------------------ | ------- |
| [YGG_SYNC_DEBUG] apply_verified_progress | 0 |
| [ygg-sync-debug] blockfetch-range | 634 |
| [ygg-sync-debug] demux-exit | 2 |
| Node.Recovery.Checkpoint cleared-origin | 12 |
| volatile/, immutable/, ledger/ | 0 B ea |
ChainSync header decodes cleanly (header_point_decoded=true
raw_header_len=94) for the first Byron-era range
Origin → SlotNo(648087), but the IOG backbone peer closes the
mux during the BlockFetch request, so apply_verified_progress
is never invoked and no checkpoint, sidecar, volatile, or
immutable file lands.
Conclusion: the R208 mainnet sync gap is at the BlockFetch wire layer, NOT at apply / ledger / storage. Every R208 hypothesis pointing at apply-path silent rejection or storage hand-off is now ruled out.
Narrowed root-cause candidates:
- Byron BlockFetch
MsgRequestRangeCBOR shape divergence on the request side (most likely). - NtN handshake version negotiation rejecting BlockFetch but accepting ChainSync.
- Byron EBB hash indirection upstream expects in the upper bound.
R211+ follow-up scope: capture MsgRequestRange bytes via
tcpdump/socat-relay against the same peer; run upstream
cardano-node 10.7.x for byte-comparison; fix in
crates/network/src/protocols/blockfetch_pool.rs
or the MsgRequestRange encoder.
The R210 instrumentation is permanent in the runtime, env-gated, zero-overhead when unset, and ready for use during the wire-byte diagnosis follow-up.
8a. Multi-network verification (R207, 2026-04-30)
R207 verified the same gates work on preprod (Shelley-era):
$ /workspaces/Cardano-node/target/release/yggdrasil-node run \
--network preprod \
--database-path /tmp/ygg-r207-preprod-db \
--socket-path /tmp/ygg-r207-preprod.sock \
--metrics-port 12407 &
sleep 35
$ ls -la /tmp/ygg-r207-preprod-db/*.cbor
114 nonce_state.cbor
1 ocert_counters.cbor
18 stake_snapshots.cbor
# Historical pre-R238 mirror files; current runs use chain_dep_state/*.cbor
# for nonce/OpCert and stake_snapshots.cbor for stake-query snapshots.
$ cardano-cli query tip --testnet-magic 1
{
"block": 87440,
"epoch": 4,
"era": "Allegra",
"slot": 87440,
"syncProgress": "1.40"
}
# 87K blocks synced in 35s
$ Sweep baseline cardano-cli queries:
OK: tip / protocol-parameters / era-history / slot-number /
utxo --whole-utxo / tx-mempool info
=== preprod: pass=6 fail=0 ===
R190 already verified the full era-gated + Conway suite on preview;
R207 confirms the always-available baseline subcommands work on
preprod (Shelley-era chain) without requiring YGG_LSQ_ERA_FLOOR.
Both networks demonstrate consistent yggdrasil parity end-to-end.
9. References
- Plan:
/home/vscode/.claude/plans/clever-shimmying-quokka.md - Operational runs:
docs/operational-runs/ - Cumulative parity matrix:
docs/UPSTREAM_PARITY.md - Per-round summaries:
docs/PARITY_SUMMARY.md - Roadmap:
docs/PARITY_PLAN.md - Workspace journal:
AGENTS.md
End of parity proof report (R206 / 2026-04-30).