Troubleshooting

A catalogue of common error messages and their resolutions. If your issue isn’t here, file a GitHub issue with the trace output.

Startup failures

`HashMismatch` for a genesis file

GenesisLoadError::HashMismatch {
    path: ".../shelley-genesis.json",
    expected: "1a3be38bcbb7911969283716ad7aa550250226b76a61fc51cc9a9a35d9276d81",
    actual:   "8d9f47..."
}

Cause: The genesis file does not match the expected Blake2b-256 hash from the config.

Resolution:

Verify you have not edited any genesis JSON files.
Re-clone the repository to get a fresh copy of the vendored configs.
If you are using a custom --config, confirm it points to the correct genesis files for the chosen network.

This check is intentional — a wrong genesis file silently corrupts every subsequent ledger state.

`OpCert signature verification failed against issuer cold key`

Cause: The OpCert was issued by a different cold key than the one configured.

Resolution:

Confirm --shelley-operational-certificate-issuer-vkey points to the correct cold-key vkey.
Re-run cardano-cli node issue-op-cert with the cold key whose vkey you have configured.
Restart with the new OpCert.

`KES period <n> outside valid OpCert window`

Cause: The current chain slot is past the OpCert’s validity window.

Resolution: Rotate KES key and reissue OpCert per Maintenance — KES rotation.

`Storage uninitialized`

Symptom: First-run warning from validate-config.

Resolution: This is normal on a fresh setup. run will initialise the database on first start.

Sync issues

Sync stalls with no progress

Symptom: yggdrasil_current_slot stops advancing.

Diagnose:

$ journalctl -u yggdrasil --since "5 minutes ago" | grep -E "Net.|ChainDB|Sync"

Look for:

Repeated reconnect events → peers unhealthy or local network issue.
MsgIntersectNotFound followed by genesis replay → see “ChainSync resets to genesis” below.
No events at all → the runtime is hung. Check for thread-pool exhaustion via htop/top.

ChainSync resets to genesis after every disconnect

Symptom: After a peer disconnect, the node re-syncs from slot 0.

Cause: Pre-6e8c8f9 regression where MsgFindIntersect was not issued after reconnect, defaulting peer read-pointer to Origin.

Resolution: Upgrade to current main. Already fixed.

`keep-alive timeout`

Symptom: Outbound connections drop every ~97 seconds.

Cause: KeepAlive heartbeat not being sent. Pre-d3f1c2a codec bug where MsgKeepAliveResponse and MsgDone had swapped tags.

Resolution: Upgrade. Already fixed.

`BlockFromFuture` rejecting valid blocks

Symptom: Trace error rejecting blocks whose declared slot is in the future.

Cause: Local clock is behind real time.

Resolution: Sync system clock (chrony, ntpd, systemd-timesyncd). Verify with chronyc tracking (drift should be < 100 ms).

Connection issues

`Connection refused` to all peers

Cause: Local network blocking outbound, or all configured peers down.

Resolution:

Test connectivity manually: nc -z backbone.cardano.iog.io 3001.
Check firewall: iptables -L -n | grep DROP.
Verify topology — bootstrap peers should be IOG-operated and very rarely down.

Inbound rate limiting hits

Symptom: yggdrasil_inbound_connections_rejected increasing.

Cause: Total inbound connections at accepted_connections_limit_hard (default 512).

Resolution:

For a relay serving many peers, raise AcceptedConnectionsLimit.hardLimit and softLimit.
For a block producer, this should never happen — your firewall should reject inbound at the IP layer.

`0 outbound connections established`

Symptom: Governor reports outbound_count = 0 after several ticks.

Diagnose:

Verify config points to reachable peers.
Check the handshake namespace traces for Refuse messages: journalctl -u yggdrasil | grep Handshake.
Confirm --network-magic matches the network of the configured peers.

Ledger / consensus errors

`ProtocolVersionTooHigh` on incoming block

Cause: A peer is on a newer protocol version than MaxKnownMajorProtocolVersion.

Resolution:

If a hard fork has just landed on mainnet, bump MaxKnownMajorProtocolVersion (e.g. from 10 to 11).
Alternatively, upgrade to a newer Yggdrasil release that bumps the default.

`BlockBodyHashMismatch` from a peer

Symptom: Trace shows a peer-sent block whose body hash doesn’t match the header.

Cause: Peer is misbehaving (or there’s a wire-codec bug on Yggdrasil’s side).

Resolution:

Yggdrasil treats this as peer-attributable: the peer is demoted and reconnect attempted.
If you see this from many peers, file a bug — likely a Yggdrasil decode regression.

`OcertCounterTooOld`

Cause: A peer block carries an OpCert sequence number lower than the most recent observed for that pool.

Resolution:

Yggdrasil is enforcing OpCert monotonicity. The peer’s block is invalid, and the peer is demoted.
If your own node logs this against your pool, you have an OpCert misconfiguration (issued counter went backwards). Fix the counter file.

Block production issues

`TraceForgedInvalidBlock` (Critical)

Cause: A block your node forged failed self-validation. This should never happen in healthy operation — it indicates a bug.

Action:

Capture the trace context.
Check for any local data corruption.
File a high-priority bug with the trace output.
Until resolved, run with the operational-certificate disabled (relay-only mode) to avoid further bad forges.

`TraceSlotIsImmutable`

Cause: Forge loop ticked for a slot that is at or behind the current chain tip.

Resolution:

This is informational — the node was leader for a slot that has already been claimed.
If frequent, your node is consistently behind tip. Check sync health.

Missed leader slots

Symptom: Pool tracker shows blocks were expected but not produced.

Diagnose:

Check journalctl for TraceNodeIsLeader events at the expected slots.
If the events fired but the block didn’t make it to the chain: peer connectivity issue at production time. Adoption traces (TraceAdoptedBlock / TraceDidntAdoptBlock) tell you whether your block was adopted by the relays.
If the events did not fire: VRF check declined, which is normal — leader election is probabilistic.

`Could not load KES key`

Cause: File missing, wrong permissions, or wrong format.

Resolution:

# ls -l /var/lib/yggdrasil/keys/
# stat -c '%a %U:%G %n' /var/lib/yggdrasil/keys/kes.skey
# Should be: 400 yggdrasil:yggdrasil

If correct, verify the file is a valid text-envelope KES key (starts with {"type":"KesSigningKey_ed25519_kes_2^6", ...}).

Storage / I/O

`Disk full`

Action:

df -h — confirm.
Identify large directories: du -h -d 1 /var/lib/yggdrasil.
Reduce max_ledger_snapshots in the config and restart — older snapshots will be deleted.
Move the database to a larger disk and update --database-path. Sync resumes.

`IO error: input/output error` from the database

Cause: Hardware-level disk problem.

Action:

Stop the node immediately.
dmesg | tail and smartctl -a /dev/<your-disk> to diagnose.
If the disk is failing, replace it. Restore the most recent chain DB backup.
If no backup, resync from genesis on the new disk.

Slow sync compared to peers

Diagnose:

iostat -x 5 6 — %util should be high during sync (you’re disk-bound during initial sync, that’s expected).
top — CPU usage. If consistently 100% on one core, the validator is the bottleneck.
Compare against cardano-node Haskell on similar hardware. Yggdrasil targets parity but performance characteristics may differ.

Performance issues

High memory usage

Yggdrasil targets 6–8 GB resident during steady-state mainnet operation. If you see > 16 GB:

Check yggdrasil_mempool_bytes — if it’s unbounded, see “Mempool growing” below.
Profile with heaptrack if you have a development build.
File an issue with the resident-set-size and top output.

Mempool growing unbounded

Cause: Tx admission rate exceeds eviction rate (block-application rollovers).

Mitigation:

Mempool has a hard cap (default ~4 MB total). Beyond that, new transactions are rejected.
If you see rejects from your own submission attempts, the network is congested — back off and retry.

Forge loop misses slots

Diagnose:

High CPU at slot tick time — look for competing processes.
High disk I/O — block production must read mempool snapshot quickly.
Network latency to relays — block adoption requires propagation through the relays.

A healthy block producer has htop showing < 50% sustained CPU and < 5 ms median I/O response.

Logs and diagnostics

Capture state for a bug report

$ yggdrasil-node --version > /tmp/diag.txt
$ yggdrasil-node validate-config --network mainnet --database-path /var/lib/yggdrasil/db >> /tmp/diag.txt 2>&1
$ yggdrasil-node status --database-path /var/lib/yggdrasil/db >> /tmp/diag.txt 2>&1
$ journalctl -u yggdrasil --since "1 hour ago" --no-pager > /tmp/journal.txt
$ ls -la /var/lib/yggdrasil/db > /tmp/db-listing.txt

Attach diag.txt, journal.txt, and db-listing.txt to the issue.

Increase trace verbosity for a single namespace

Edit config.json:

"TraceOptions": {
  "Net.BlockFetch": { "severity": "Debug", "detail": "DMaximum" }
}

Reload (restart node). Capture for the duration needed, then revert to avoid log volume.

Where to go next

Maintenance — preventative procedures.
Glossary — term definitions.
File issues at the GitHub repository.

Troubleshooting

Startup failures

HashMismatch for a genesis file

OpCert signature verification failed against issuer cold key

KES period <n> outside valid OpCert window

Storage uninitialized

Sync issues

Sync stalls with no progress

ChainSync resets to genesis after every disconnect

keep-alive timeout

BlockFromFuture rejecting valid blocks

Connection issues

Connection refused to all peers

Inbound rate limiting hits

0 outbound connections established

Ledger / consensus errors

ProtocolVersionTooHigh on incoming block

BlockBodyHashMismatch from a peer

OcertCounterTooOld

Block production issues

TraceForgedInvalidBlock (Critical)

TraceSlotIsImmutable

Missed leader slots

Could not load KES key

Storage / I/O

Disk full

IO error: input/output error from the database

Slow sync compared to peers

Performance issues

High memory usage

Mempool growing unbounded

Forge loop misses slots

Logs and diagnostics

Capture state for a bug report

Increase trace verbosity for a single namespace

Where to go next

`HashMismatch` for a genesis file

`OpCert signature verification failed against issuer cold key`

`KES period <n> outside valid OpCert window`

`Storage uninitialized`

`keep-alive timeout`

`BlockFromFuture` rejecting valid blocks

`Connection refused` to all peers

`0 outbound connections established`

`ProtocolVersionTooHigh` on incoming block

`BlockBodyHashMismatch` from a peer

`OcertCounterTooOld`

`TraceForgedInvalidBlock` (Critical)

`TraceSlotIsImmutable`

`Could not load KES key`

`Disk full`

`IO error: input/output error` from the database