TL;DR #
- Start with managed database backups if using RDS/Cloud SQL—they’re automatic and battle-tested
- For self-hosted databases, use native tools (pg_dump, mysqldump) + cron + S3 lifecycle policies
- Test restores quarterly. A backup you can’t restore from is just wasted storage
- 3-2-1 rule still applies: 3 copies, 2 different media types, 1 offsite
- Don’t overthink it—a simple working backup beats a complex one that fails silently
Who this is for #
Teams of 3-10 engineers running production databases without a dedicated DBA. You know backups matter but aren’t sure if you’re overdoing it or setting yourself up for data loss. This covers both managed cloud databases and self-hosted setups.
The backup pyramid: match complexity to risk #
Not all data is created equal. Your backup strategy should match your actual recovery needs:
Tier 1: “We’d be annoyed but fine”
- Development databases
- Analytics data you can regenerate
- Solution: Daily snapshots, 7-day retention
Tier 2: “This would hurt but we’d recover”
- Production data with paper trails elsewhere
- User-generated content with recent backups
- Solution: Hourly snapshots, 30-day retention, tested quarterly
Tier 3: “Company-ending if lost”
- Financial records
- Core user data
- Compliance-regulated data
- Solution: Continuous replication, point-in-time recovery, 90+ day retention, monthly restore drills
Most small teams treat everything as Tier 3. This wastes time and money. Be honest about what actually matters.
Option 1: Managed database backups (boring is good) #
If you’re on RDS, Cloud SQL, or Azure Database, use their built-in backups. Yes, it costs more than self-hosting. No, you shouldn’t care at your scale.
AWS RDS example:
- Automated backups: enabled by default, 7-day retention
- Manual snapshots: unlimited retention, ~$0.095/GB/month
- Point-in-time recovery: restore to any second within retention window
- Cross-region backup: one click in console
Cost for 100GB production database:
- Automated backups: free (within retention period)
- One monthly snapshot kept for a year: ~$9.50/month
- Cross-region replication: ~$20/month
That’s $30/month for peace of mind. Your engineering time costs more.
Option 2: Self-hosted database backups #
Running databases on EC2/VMs? You’ll need to roll your own. Here’s a production-ready setup that won’t wake you at 3 AM.
The simple approach: cron + native tools + S3 #
#!/bin/bash
# /opt/scripts/backup-postgres.sh
set -euo pipefail
# Configuration
DB_NAME="production"
S3_BUCKET="mycompany-db-backups"
BACKUP_PREFIX="postgres"
PGPASSWORD="your-password-here" # Better: use .pgpass or IAM auth
# Generate backup filename with timestamp
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="${BACKUP_PREFIX}_${DB_NAME}_${TIMESTAMP}.sql.gz"
BACKUP_PATH="/tmp/${BACKUP_FILE}"
# Create backup
echo "Starting backup of ${DB_NAME}..."
pg_dump -h localhost -U postgres -d ${DB_NAME} | gzip > ${BACKUP_PATH}
# Upload to S3
echo "Uploading to S3..."
aws s3 cp ${BACKUP_PATH} s3://${S3_BUCKET}/daily/${BACKUP_FILE}
# Cleanup local file
rm ${BACKUP_PATH}
# Prune old backups (keep 30 days)
echo "Pruning old backups..."
aws s3 ls s3://${S3_BUCKET}/daily/ | \
awk '{print $4}' | \
sort -r | \
tail -n +31 | \
xargs -I {} aws s3 rm s3://${S3_BUCKET}/daily/{}
echo "Backup complete: ${BACKUP_FILE}"Add to crontab for hourly backups:
0 * * * * /opt/scripts/backup-postgres.sh >> /var/log/db-backup.log 2>&1This gives you hourly backups with 30-day retention. Total cost for 100GB database with daily 10% change rate: ~$5/month in S3 storage.
Level up: Add monitoring #
The scariest backup failure is the silent one. Add simple monitoring:
- Backup freshness check: Alert if latest backup is >25 hours old
- Backup size check: Alert if backup size drops >20% (corruption indicator)
- Test restore: Monthly cron job that restores to a test instance
Most monitoring tools (Datadog, New Relic, even CloudWatch) can check S3 object age. Use them.
Option 3: Continuous replication (when you need point-in-time) #
Need to recover to a specific transaction? You want continuous archiving:
PostgreSQL: WAL archiving to S3
- Set
archive_mode = onandwal_level = replica - Use
archive_commandto ship WAL files to S3 - Combine with daily base backups
- Recovery: restore base backup, replay WAL files
MySQL: Binary log shipping
- Enable binary logging
- Ship logs to S3 with mysqlbinlog
- Similar recovery process
This is more complex but gives you point-in-time recovery. Only worth it for Tier 3 data.
The 3-2-1 rule for small teams #
The classic backup rule: 3 copies, 2 different storage types, 1 offsite. Here’s how it maps to modern infrastructure:
- Primary: Your production database
- Secondary: S3 in same region (different storage type)
- Tertiary: S3 in different region or Glacier (offsite)
For most small teams, this translates to:
- Daily backups to S3 (automated lifecycle to Glacier after 30 days)
- Monthly snapshot to different region
- Quarterly backup to different cloud provider (if paranoid)
Testing restores: the part everyone skips #
A backup you’ve never restored is Schrödinger’s backup—simultaneously working and broken until observed.
Quarterly restore checklist:
- Pick a random daily backup from last month
- Restore to test instance
- Run basic smoke tests (row counts, recent data present)
- Document time to restore (your RTO)
- Delete test instance
Put it in the calendar. Make it someone’s OKR. Track it like deploys. Whatever it takes to actually do it.
When to upgrade your approach #
Your simple backup strategy stops being simple when:
- Restores take >4 hours (business can’t wait)
- You’re backing up >1TB (restore time becomes painful)
- Compliance requires specific retention/encryption
- You need cross-region HA (not just DR)
- Multiple databases need coordinated backups
At that point, look at:
- Dedicated backup tools (Percona XtraBackup, pgBackRest)
- Managed services (AWS Backup, Veeam)
- Database-native solutions (RDS Multi-AZ, Cloud SQL HA)
But don’t jump there prematurely. Most teams under 10 engineers don’t need enterprise backup solutions.
Common mistakes to avoid #
Backing up the replica instead of primary: Replicas can lag or diverge. Always backup from the source of truth.
Not testing encryption: That encrypted backup is useless if you lose the key. Store keys separately from backups.
Forgetting the schema: mysqldump --no-data for schema-only backups. Version control these.
Ignoring backup windows: That 3 AM backup might coincide with batch jobs. Check your backup impact.
Over-retaining: 7 years of daily backups for a startup MVP is hoarding, not strategy.
Related reads #
- Incident Response for Small Teams — because backups are half of disaster recovery
- Minimal DevOps Stack — where database backups fit in your infrastructure
- Terraform for Small Teams — automate your backup infrastructure