🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

Vault Unseal Failing After Restart — How to Fix It

HashiCorp Vault restarts sealed and won't come back up, blocking every service that reads secrets from it. Here's how to diagnose unseal failures and fix the root cause, not just unseal-and-pray.

DevOpsBoysJun 16, 20264 min read
Share:Tweet

Vault seals itself on every restart by design — it's a security feature, not a bug. The problem is when unsealing fails or behaves unexpectedly, and suddenly every service depending on Vault for secrets is stuck.

Step 1: Confirm the Actual Failure Mode

bash
vault status
Key                Value
---                -----
Seal Type          shamir
Initialized         true
Sealed              true
Total Shares        5
Threshold           3
Unseal Progress     0/3

Sealed: true with Unseal Progress: 0/3 just means nobody's submitted a key yet — that's expected after a restart, not a failure. The actual failure cases look different.

Case 1: "Error: context deadline exceeded" During Unseal

bash
vault operator unseal <key>
# Error: failed to unseal: context deadline exceeded

This almost always means Vault's storage backend (Consul, etcd, integrated Raft storage) isn't reachable yet when you're submitting the unseal key. Check it first.

bash
# For Raft integrated storage
vault operator raft list-peers
 
# For Consul backend
consul members

If storage isn't responding, fix that first — unsealing can't succeed without it, no matter how many times you retry the unseal command.

Case 2: Unseal Succeeds But Vault Immediately Reseal

Look for this in Vault's logs right after a successful unseal:

bash
journalctl -u vault -n 100 --no-pager | grep -i seal
[WARN]  core: vault is sealed
[ERROR] core: leadership setup failed: error="failed to acquire lock"

This pattern — unseal succeeds, then immediately reseals — usually means a high-availability storage backend issue: another Vault node already holds the leader lock and there's a split-brain or stale-lock situation.

bash
# For Raft, check who actually holds leadership
vault operator raft list-peers
vault status | grep -i leader
 
# If a dead node is still listed as a peer holding state, remove it
vault operator raft remove-peer <dead-node-id>

Case 3: "Error: invalid key" Even With Correct Unseal Keys

This is the scary one — it usually means one of two things:

You're using keys from a different cluster/init. If Vault storage was wiped and re-initialized (even accidentally, e.g. a fresh PVC after a bad Helm upgrade), the unseal keys from before are now meaningless. There's no recovery here except re-initializing and re-populating secrets from backup — which is exactly why Vault's storage backend needs the same backup discipline as a production database.

bash
# Check init status — if this shows "false", your old keys are useless
vault status | grep Initialized

Key shares are correct but threshold math is off. If your Shamir threshold is 3-of-5 and you're only getting 2 distinct people to submit keys because one keyholder left the company without handing theirs off, you're stuck below threshold. This is an organizational problem disguised as a technical one — fix it by re-keying with vault operator rekey once you do get unsealed, distributing new shares properly.

Case 4: Auto-Unseal (KMS) Failing

If you're using cloud KMS auto-unseal instead of manual Shamir keys, failures usually trace back to permissions or connectivity:

bash
journalctl -u vault -n 50 | grep -i kms
[ERROR] core: failed to unseal: error="failed to decrypt encrypted stored keys: 
AccessDeniedException: User is not authorized to perform: kms:Decrypt"
bash
# Verify the Vault instance's IAM role actually has kms:Decrypt on the right key
aws iam get-role-policy --role-name vault-server-role --policy-name vault-kms-unseal

A common cause: the KMS key policy or IAM role was correct when Vault was first set up, but a later IAM cleanup or policy tightening silently removed the permission. Auto-unseal failures are often "it worked for months, then someone touched IAM" — check recent IAM changes around the time it broke.

bash
# AWS CloudTrail — find recent changes to the relevant role/policy
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=ResourceName,AttributeValue=vault-server-role \
  --max-results 20

Preventing This Going Forward

yaml
# Run Vault HA with at least 3 nodes on integrated Raft storage —
# a single-node Vault has no redundancy if storage corrupts
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: vault
spec:
  replicas: 3
  # ... raft storage config with retry_join pointing at all 3 nodes

And critically: practice the unseal procedure before you need it under pressure. Run a planned Vault restart in staging quarterly, with whoever holds unseal key shares actually doing the unsealing. The first time someone tries to remember their key-share procedure should not be during a production incident.

Set up Vault properly the first time: How to Set Up HashiCorp Vault

🔧

Today I Fixed

Short real fixes from production — posted daily

Browse fixes
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments