Blog
HADESS
Cyber Security Magic

Kubernetes Backup: Velero, etcd Snapshots, and Disaster Recovery

Kubernetes Backup: Velero, etcd Snapshots, and Disaster Recovery

Part of the Cybersecurity Skills Guide — This article is one deep-dive in our complete guide series.

By HADESS Team | February 28, 2026 | Updated: February 28, 2026 | 5 min read

Your cluster will fail. Hardware dies, someone runs kubectl delete namespace production, or a bad deployment corrupts persistent data. Without tested backups, you are rebuilding from scratch. Here is how to set up Kubernetes backup properly.

etcd Snapshots: Backing Up Cluster State

All Kubernetes state lives in etcd. Losing etcd means losing every resource definition in your cluster — deployments, services, config maps, secrets, RBAC rules, everything.

Take regular etcd snapshots:

bash ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-$(date +%Y%m%d).db \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key `

Verify every snapshot after creation with etcdctl snapshot status. An unverified backup is not a backup. Store snapshots off-cluster — S3, GCS, or any object storage with versioning enabled. If your cluster is gone, your backup location needs to still be accessible.

For managed Kubernetes (EKS, GKE, AKS), the provider handles etcd. But you still need to back up your workloads and persistent data.

Velero: Workload and Volume Backup

Velero backs up Kubernetes resources and persistent volumes together. It talks to cloud provider APIs for volume snapshots and stores resource manifests in object storage.

Basic setup:

`bash
velero install \
--provider aws \
--bucket my-velero-backups \
--secret-file ./credentials-velero \
--backup-location-config region=us-east-1 \
--snapshot-location-config region=us-east-1
`

Create scheduled backups:

`bash
velero schedule create daily-backup \
--schedule="0 2 *" \
--include-namespaces production,staging \
--ttl 720h
`

Key practices:

  • Use include-namespaces rather than backing up everything. System namespaces rarely need backup from Velero since they are managed by your cluster provisioning.
  • Set TTL to avoid unbounded storage growth.
  • Label-based selection lets you back up specific workloads: –selector app=critical.
  • Restic/Kopia integration handles volumes that do not support native snapshots (like NFS or hostPath).

Disaster Recovery Planning

Backups without recovery testing are wishful thinking. Schedule regular restore drills:

1. Spin up a test cluster (or use a separate namespace) 2. Restore from backup using velero restore create –from-backup daily-backup-20260225` 3. Verify application functionality — not just that pods are running, but that the application works end-to-end 4. Document the restore time — leadership will ask for your RTO, and you need a real number

Cross-Cluster Restore

Velero supports restoring to a different cluster as long as both clusters can access the same backup storage location. This enables:

  • Migration between cloud providers or regions
  • Cluster upgrades — back up the old cluster, provision a new one, restore
  • Multi-region DR — maintain a warm standby cluster that can receive restores

Watch for differences in StorageClasses and Ingress configurations between source and target clusters. Use Velero’s resource mapping to handle naming differences.

Securing Your Backups

Backups contain secrets, config maps with credentials, and application data. Treat backup storage with the same security as production:

  • Enable encryption at rest on your backup bucket
  • Restrict access with IAM policies — only the Velero service account should write
  • Enable object versioning to prevent accidental or malicious deletion
  • Monitor backup completion with alerts on failures

Related Career Paths

Disaster recovery and backup strategy are core skills for Cloud Security Engineers. Assess your readiness on the skills page and identify where you need to build depth.

Next Steps

Related Guides in This Series

Take the Next Step

Browse 80+ skills on HADESS. Go to the browse 80+ skills on hadess on HADESS.

See your certification roadmap. Check out the see your certification roadmap.

Get started freeCreate your HADESS account and access all career tools.

Frequently Asked Questions

How long does it take to learn this skill?

Most practitioners build working proficiency in 4-8 weeks of dedicated study with hands-on practice. Mastery takes longer and comes primarily through on-the-job experience.

Do I need certifications for this skill?

Certifications validate your knowledge to employers but are not strictly required. Hands-on experience and portfolio projects often carry more weight in technical interviews. Check the certification roadmap for relevant options.

What career paths use this skill?

Explore the career path explorer to see which roles require this skill and how it fits into different cybersecurity specializations.

HADESS Team consists of cybersecurity practitioners, hiring managers, and career strategists who have collectively spent 50+ years in the field.

Leave a Reply

Your email address will not be published. Required fields are marked *