Vault Unseal Failing After Restart — How to Fix It
HashiCorp Vault restarts sealed and won't come back up, blocking every service that reads secrets from it. Here's how to diagnose unseal failures and fix the root cause, not just unseal-and-pray.
Vault seals itself on every restart by design — it's a security feature, not a bug. The problem is when unsealing fails or behaves unexpectedly, and suddenly every service depending on Vault for secrets is stuck.
Step 1: Confirm the Actual Failure Mode
vault statusKey Value
--- -----
Seal Type shamir
Initialized true
Sealed true
Total Shares 5
Threshold 3
Unseal Progress 0/3
Sealed: true with Unseal Progress: 0/3 just means nobody's submitted a key yet — that's expected after a restart, not a failure. The actual failure cases look different.
Case 1: "Error: context deadline exceeded" During Unseal
vault operator unseal <key>
# Error: failed to unseal: context deadline exceededThis almost always means Vault's storage backend (Consul, etcd, integrated Raft storage) isn't reachable yet when you're submitting the unseal key. Check it first.
# For Raft integrated storage
vault operator raft list-peers
# For Consul backend
consul membersIf storage isn't responding, fix that first — unsealing can't succeed without it, no matter how many times you retry the unseal command.
Case 2: Unseal Succeeds But Vault Immediately Reseal
Look for this in Vault's logs right after a successful unseal:
journalctl -u vault -n 100 --no-pager | grep -i seal[WARN] core: vault is sealed
[ERROR] core: leadership setup failed: error="failed to acquire lock"
This pattern — unseal succeeds, then immediately reseals — usually means a high-availability storage backend issue: another Vault node already holds the leader lock and there's a split-brain or stale-lock situation.
# For Raft, check who actually holds leadership
vault operator raft list-peers
vault status | grep -i leader
# If a dead node is still listed as a peer holding state, remove it
vault operator raft remove-peer <dead-node-id>Case 3: "Error: invalid key" Even With Correct Unseal Keys
This is the scary one — it usually means one of two things:
You're using keys from a different cluster/init. If Vault storage was wiped and re-initialized (even accidentally, e.g. a fresh PVC after a bad Helm upgrade), the unseal keys from before are now meaningless. There's no recovery here except re-initializing and re-populating secrets from backup — which is exactly why Vault's storage backend needs the same backup discipline as a production database.
# Check init status — if this shows "false", your old keys are useless
vault status | grep InitializedKey shares are correct but threshold math is off. If your Shamir threshold is 3-of-5 and you're only getting 2 distinct people to submit keys because one keyholder left the company without handing theirs off, you're stuck below threshold. This is an organizational problem disguised as a technical one — fix it by re-keying with vault operator rekey once you do get unsealed, distributing new shares properly.
Case 4: Auto-Unseal (KMS) Failing
If you're using cloud KMS auto-unseal instead of manual Shamir keys, failures usually trace back to permissions or connectivity:
journalctl -u vault -n 50 | grep -i kms[ERROR] core: failed to unseal: error="failed to decrypt encrypted stored keys:
AccessDeniedException: User is not authorized to perform: kms:Decrypt"
# Verify the Vault instance's IAM role actually has kms:Decrypt on the right key
aws iam get-role-policy --role-name vault-server-role --policy-name vault-kms-unsealA common cause: the KMS key policy or IAM role was correct when Vault was first set up, but a later IAM cleanup or policy tightening silently removed the permission. Auto-unseal failures are often "it worked for months, then someone touched IAM" — check recent IAM changes around the time it broke.
# AWS CloudTrail — find recent changes to the relevant role/policy
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=ResourceName,AttributeValue=vault-server-role \
--max-results 20Preventing This Going Forward
# Run Vault HA with at least 3 nodes on integrated Raft storage —
# a single-node Vault has no redundancy if storage corrupts
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: vault
spec:
replicas: 3
# ... raft storage config with retry_join pointing at all 3 nodesAnd critically: practice the unseal procedure before you need it under pressure. Run a planned Vault restart in staging quarterly, with whoever holds unseal key shares actually doing the unsealing. The first time someone tries to remember their key-share procedure should not be during a production incident.
Set up Vault properly the first time: How to Set Up HashiCorp Vault
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AWS IAM Permission Denied Errors — How to Fix Every Variant (2026)
Getting 'Access Denied' or 'is not authorized to perform' errors in AWS? Here's how to diagnose and fix every IAM permission issue — EC2, EKS, Lambda, S3, and CLI.
AWS IRSA Permission Denied in Kubernetes — Fix
Your Kubernetes pod can't access AWS services even though IRSA is configured. Here's every reason IRSA fails and exactly how to debug and fix each one.
cert-manager Certificate Not Ready: Causes and Fixes
cert-manager Certificate stuck in a non-Ready state is a common Kubernetes TLS issue. This guide covers every root cause — DNS challenges, RBAC, rate limits, and issuer problems — with step-by-step fixes.