Troubleshooting
A catalogue of common error messages and their resolutions. If your issue isn’t here, file a GitHub issue with the trace output.
Startup failures
HashMismatch for a genesis file
GenesisLoadError::HashMismatch {
path: ".../shelley-genesis.json",
expected: "1a3be38bcbb7911969283716ad7aa550250226b76a61fc51cc9a9a35d9276d81",
actual: "8d9f47..."
}
Cause: The genesis file does not match the expected Blake2b-256 hash from the config.
Resolution:
- Verify you have not edited any genesis JSON files.
- Re-clone the repository to get a fresh copy of the vendored configs.
- If you are using a custom
--config, confirm it points to the correct genesis files for the chosen network.
This check is intentional — a wrong genesis file silently corrupts every subsequent ledger state.
OpCert signature verification failed against issuer cold key
Cause: The OpCert was issued by a different cold key than the one configured.
Resolution:
- Confirm
--shelley-operational-certificate-issuer-vkeypoints to the correct cold-key vkey. - Re-run
cardano-cli node issue-op-certwith the cold key whose vkey you have configured. - Restart with the new OpCert.
KES period <n> outside valid OpCert window
Cause: The current chain slot is past the OpCert’s validity window.
Resolution: Rotate KES key and reissue OpCert per Maintenance — KES rotation.
Storage uninitialized
Symptom: First-run warning from validate-config.
Resolution: This is normal on a fresh setup. run will initialise the database on first start.
Sync issues
Sync stalls with no progress
Symptom: yggdrasil_current_slot stops advancing.
Diagnose:
$ journalctl -u yggdrasil --since "5 minutes ago" | grep -E "Net.|ChainDB|Sync"
Look for:
- Repeated reconnect events → peers unhealthy or local network issue.
MsgIntersectNotFoundfollowed by genesis replay → see “ChainSync resets to genesis” below.- No events at all → the runtime is hung. Check for thread-pool exhaustion via
htop/top.
ChainSync resets to genesis after every disconnect
Symptom: After a peer disconnect, the node re-syncs from slot 0.
Cause: Pre-6e8c8f9 regression where MsgFindIntersect was not issued after reconnect, defaulting peer read-pointer to Origin.
Resolution: Upgrade to current main. Already fixed.
keep-alive timeout
Symptom: Outbound connections drop every ~97 seconds.
Cause: KeepAlive heartbeat not being sent. Pre-d3f1c2a codec bug where MsgKeepAliveResponse and MsgDone had swapped tags.
Resolution: Upgrade. Already fixed.
BlockFromFuture rejecting valid blocks
Symptom: Trace error rejecting blocks whose declared slot is in the future.
Cause: Local clock is behind real time.
Resolution: Sync system clock (chrony, ntpd, systemd-timesyncd). Verify with chronyc tracking (drift should be < 100 ms).
Connection issues
Connection refused to all peers
Cause: Local network blocking outbound, or all configured peers down.
Resolution:
- Test connectivity manually:
nc -z backbone.cardano.iog.io 3001. - Check firewall:
iptables -L -n | grep DROP. - Verify topology — bootstrap peers should be IOG-operated and very rarely down.
Inbound rate limiting hits
Symptom: yggdrasil_inbound_connections_rejected increasing.
Cause: Total inbound connections at accepted_connections_limit_hard (default 512).
Resolution:
- For a relay serving many peers, raise
AcceptedConnectionsLimit.hardLimitandsoftLimit. - For a block producer, this should never happen — your firewall should reject inbound at the IP layer.
0 outbound connections established
Symptom: Governor reports outbound_count = 0 after several ticks.
Diagnose:
- Verify config points to reachable peers.
- Check the handshake namespace traces for
Refusemessages:journalctl -u yggdrasil | grep Handshake. - Confirm
--network-magicmatches the network of the configured peers.
Ledger / consensus errors
ProtocolVersionTooHigh on incoming block
Cause: A peer is on a newer protocol version than MaxKnownMajorProtocolVersion.
Resolution:
- If a hard fork has just landed on mainnet, bump
MaxKnownMajorProtocolVersion(e.g. from 10 to 11). - Alternatively, upgrade to a newer Yggdrasil release that bumps the default.
BlockBodyHashMismatch from a peer
Symptom: Trace shows a peer-sent block whose body hash doesn’t match the header.
Cause: Peer is misbehaving (or there’s a wire-codec bug on Yggdrasil’s side).
Resolution:
- Yggdrasil treats this as peer-attributable: the peer is demoted and reconnect attempted.
- If you see this from many peers, file a bug — likely a Yggdrasil decode regression.
OcertCounterTooOld
Cause: A peer block carries an OpCert sequence number lower than the most recent observed for that pool.
Resolution:
- Yggdrasil is enforcing OpCert monotonicity. The peer’s block is invalid, and the peer is demoted.
- If your own node logs this against your pool, you have an OpCert misconfiguration (issued counter went backwards). Fix the counter file.
Block production issues
TraceForgedInvalidBlock (Critical)
Cause: A block your node forged failed self-validation. This should never happen in healthy operation — it indicates a bug.
Action:
- Capture the trace context.
- Check for any local data corruption.
- File a high-priority bug with the trace output.
- Until resolved, run with the operational-certificate disabled (relay-only mode) to avoid further bad forges.
TraceSlotIsImmutable
Cause: Forge loop ticked for a slot that is at or behind the current chain tip.
Resolution:
- This is informational — the node was leader for a slot that has already been claimed.
- If frequent, your node is consistently behind tip. Check sync health.
Missed leader slots
Symptom: Pool tracker shows blocks were expected but not produced.
Diagnose:
- Check
journalctlforTraceNodeIsLeaderevents at the expected slots. - If the events fired but the block didn’t make it to the chain: peer connectivity issue at production time. Adoption traces (
TraceAdoptedBlock/TraceDidntAdoptBlock) tell you whether your block was adopted by the relays. - If the events did not fire: VRF check declined, which is normal — leader election is probabilistic.
Could not load KES key
Cause: File missing, wrong permissions, or wrong format.
Resolution:
# ls -l /var/lib/yggdrasil/keys/
# stat -c '%a %U:%G %n' /var/lib/yggdrasil/keys/kes.skey
# Should be: 400 yggdrasil:yggdrasil
If correct, verify the file is a valid text-envelope KES key (starts with {"type":"KesSigningKey_ed25519_kes_2^6", ...}).
Storage / I/O
Disk full
Action:
df -h— confirm.- Identify large directories:
du -h -d 1 /var/lib/yggdrasil. - Reduce
max_ledger_snapshotsin the config and restart — older snapshots will be deleted. - Move the database to a larger disk and update
--database-path. Sync resumes.
IO error: input/output error from the database
Cause: Hardware-level disk problem.
Action:
- Stop the node immediately.
dmesg | tailandsmartctl -a /dev/<your-disk>to diagnose.- If the disk is failing, replace it. Restore the most recent chain DB backup.
- If no backup, resync from genesis on the new disk.
Slow sync compared to peers
Diagnose:
iostat -x 5 6—%utilshould be high during sync (you’re disk-bound during initial sync, that’s expected).top— CPU usage. If consistently 100% on one core, the validator is the bottleneck.- Compare against
cardano-nodeHaskell on similar hardware. Yggdrasil targets parity but performance characteristics may differ.
Performance issues
High memory usage
Yggdrasil targets 6–8 GB resident during steady-state mainnet operation. If you see > 16 GB:
- Check
yggdrasil_mempool_bytes— if it’s unbounded, see “Mempool growing” below. - Profile with
heaptrackif you have a development build. - File an issue with the resident-set-size and
topoutput.
Mempool growing unbounded
Cause: Tx admission rate exceeds eviction rate (block-application rollovers).
Mitigation:
- Mempool has a hard cap (default ~4 MB total). Beyond that, new transactions are rejected.
- If you see rejects from your own submission attempts, the network is congested — back off and retry.
Forge loop misses slots
Diagnose:
- High CPU at slot tick time — look for competing processes.
- High disk I/O — block production must read mempool snapshot quickly.
- Network latency to relays — block adoption requires propagation through the relays.
A healthy block producer has htop showing < 50% sustained CPU and < 5 ms median I/O response.
Logs and diagnostics
Capture state for a bug report
$ yggdrasil-node --version > /tmp/diag.txt
$ yggdrasil-node validate-config --network mainnet --database-path /var/lib/yggdrasil/db >> /tmp/diag.txt 2>&1
$ yggdrasil-node status --database-path /var/lib/yggdrasil/db >> /tmp/diag.txt 2>&1
$ journalctl -u yggdrasil --since "1 hour ago" --no-pager > /tmp/journal.txt
$ ls -la /var/lib/yggdrasil/db > /tmp/db-listing.txt
Attach diag.txt, journal.txt, and db-listing.txt to the issue.
Increase trace verbosity for a single namespace
Edit config.json:
"TraceOptions": {
"Net.BlockFetch": { "severity": "Debug", "detail": "DMaximum" }
}
Reload (restart node). Capture for the duration needed, then revert to avoid log volume.
Where to go next
- Maintenance — preventative procedures.
- Glossary — term definitions.
- File issues at the GitHub repository.