How To Automate Backups of MySQL & Postgres Databases To An Amazon S3 Bucket
An important part of proper disaster recovery and preparation processes is the automatic creation of backups of production databases.
As a digital agency, we develop a number of different iPhone, Android and web apps and in each project, we always need to plan for what the backup and recovery scenario would be for each of them.
In this post, I will show you how to automatically backup a MySQL and/or Postgres database (that is hosted on Amazon Ec2) to an S3 bucket hosted on AWS.
For most of the apps we develop, we use RDS to host our databases, which takes care of all of the automated backups for us, however, there are times when we need to self-host a database on an EC2 machine, generally when RDS doesn’t support the specific DB version we are using.
Now there are 2 approaches one could take to set this up:
- take the lazy man’s way out and use an off-the-shelf tool,
- or pull up your proverbial bootstraps, and punch out some bash script to get this done.
Obviously, since I am a masochist and enjoy always taking the long way to a solution, this post is meant to cover the manual approach.
For those of you who just want something to install and not have to manually set up, you should consider AutoMySQLBackup, which supports MySQL and MariaDB databases and can be configured to push .gzipped backup files to an AWS S3 bucket. Sadly though, it only supports MySQL-based databases and will not work for Postgres.
Now that we’ve dispensed with the weak-stomached operators out there, let’s drill into how to quickly and easily set up an automated nightly backup job for a MySQL or Postgres db.
In this example, I will be walking through the steps I took to automate the backup process for the Blue Label Labs corporate site that runs on a Bitnami-powered, WordPress EC2 instance in AWS>that hosts the website you are reading this on.
Prerequisites
- An AWS account (duh).
- This tutorial assumes your MySQL or Postgres database is running on an EC2 machine running some flavor of Linux. If your database is hosted on RDS, then stop right now, close the browser window and go out for a beer. Amazon already does automatic nightly snapshots of your RDS instance.
- MySQL:
- Ensure that the mysqldump utility is installed on your server.
- Create a .my.cnf file within the home directory to hold the credentials for the database.
- Postgres:
- Ensure that the pg_dump utility is installed on your server.
- Create a .pgpass file in the home directory with the credentials for the database.
- Ensure the AWS CLI is installed on the EC2 machine.
How to Setup Automatic Backups for MySQL and Postgres to AWS S3
1.) Create and configure the S3 bucket that will be used to store the backups
- In the Amazon S3 console, I will first create the bucket that will host the backup files, in this case I will go with the aptly named “bll-website-bucket”. Note the ARN of the bucket that is created as we will need it soon enough.
- Make sure you do not enable public access to this bucket!
2.) Create an IAM Policy to govern access to the S3 bucket
- Open the IAM console in AWS, then under Policies click “Create New”.
- In the resulting screen, click on “{} JSON” and paste the following (replacing <arn of S3 bucket> with the ARN of the bucket created in Step 1):
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:ListBucket", "s3:DeleteObject" ], "Resource": [ "<arn of S3 bucket>", "<arn of S3 bucket>/*" ] } ] }
- Name the policy “GrantAccessToBackupBucket” (clever, I know).
This bucket does exactly what the name suggests, it allows whichever IAM principal this policy is attached to have to read/write access to the S3 bucket setup to receive the backups.
3.) Create an IAM Role that is attached to the “GrantAccessToBackupBucket” Policy
- Return to the IAM console, open up the Roles page and click “Create Role”.
- On the resulting screen, select “EC2” as the Service, then click Next.
- On the permission screen search for the policy named “GrantAccessToBackupBucket” and attach it.
- Next we give our new IAM role a flashy name “BLL-Website-Instance-Role”
What I’ve done in this step is to create an IAM role that can be attached to an EC2 instance. This IAM role will enable the EC2 instance to transparently read/write to the S3 buckets for our backups without requiring me to do any configuration inside of the machine itself.
4.) Attach the IAM Role to the EC2 instance hosting the database
- Open the EC2 console, identify the Ec2 instance hosting the database, click on it and then under Actions->Security, select “Modify IAM role”.
- On the resulting page search for and select the IAM role named, you guessed it, “BLL-Website-Instance-Role”.
- Click Save.
- Optional: I’ve found that sometimes changing the IAM role on an EC2 machine doesn’t immediately take effect, so give your EC2 machine a reboot just to make sure it has the right permissions.
At this point, we’ve now setup our EC2 instance to have read/write access to the S3 bucket being used for backups. Note, up until this point the instructions I’ve laid out are pretty generic and have nothing to do with database backups, you can follow these steps if you are building an app and want your backend code to be able to read/write to an S3 bucket.
5.) Create backup script on EC2 machine to perform backup and upload to S3
- SSH into your Ec2 machine.
- Create a new file named “backup-website.sh”
- Grant execute permissions on the script by executing: chmod +x backup-website.sh
- Open up the file and paste the following:
For Postgres databases:
# Create the backup file TIME=$(date --utc "+%Y%m%d_%H%M%SZ") # Use the timestamp to construct a descriptive file name BACKUP_FILE="bll_website_${TIME}.pgdump" DATABASE_NAME="<database name>" pg_dump $DATABASE_NAME --format=custom > $BACKUP_FILE # Second, copy file to AWS S3 S3_BUCKET=s3://<url of S3 bucket> S3_TARGET=$S3_BUCKET/$BACKUP_FILE echo "Copying $BACKUP_FILE to $S3_TARGET" aws s3 cp $BACKUP_FILE $S3_TARGET #verify the backup was uploaded correctly echo "Backup completed for $DATABASE_NAME" BACKUP_RESULT=$(aws s3 ls $S3_BUCKET | tail -n 1) echo "Latest S3 backup: $BACKUP_RESULT" #clean up and delete the local backup file rm $BACKUP_FILE
For MySQL /MariaDB databases:
TIME=$(date --utc "+%Y%m%d_%H%M%SZ") BACKUP_FILE="bll_website_${TIME}.sql" DATABASE_NAME="<database name>" #execute mysqldump backup <path to mysqldump>/mysqldump $DATABASE_NAME > $BACKUP_FILE # Copy file to AWS S3 S3_BUCKET=s3://<url of S3 bucket> S3_TARGET=$S3_BUCKET/$BACKUP_FILE echo "Copying $BACKUP_FILE to $S3_TARGET" aws s3 cp $BACKUP_FILE $S3_TARGET echo "Backup completed for $DATABASE_NAME" BACKUP_RESULT=$(aws s3 ls $S3_BUCKET | tail -n 1) echo "Latest S3 backup: $BACKUP_RESULT" rm $BACKUP_FILE
For the astute reader out there you may have noticed there is really only one line of difference between the two! For my fellow Microsoft SQL Server-nauts out there, you should be able to take this example and extend it to SQL Server using freetds on Linux.
At this point, manually execute the backup-website.sh file and watch the console log output. It should be pretty clear whether or not the backup and upload succeeded or not.
6.) Automate the execution of the backup with cron job
- Open the crontab by typing “crontab -e”
- In the resulting editor window, we add the following line:
0 0 * * * ~/backup-website.sh &>> ~/backup-website.log
Note the above cron expression runs the backup script once per day at midnight. I have never really understood cron syntax, so if you want to run this at different frequencies then use this handy cron expression generator which will take all the guess work out of generating the right cron syntax to match your desired frequency.
7.) Test your backup!
Now that everything is set up and running, it’s a good idea to go to the S3 console, open up the backup bucket and download one of the backup files and manually try to restore it to a MySQL or Postgres server.
While having an automated backup script to create nightly backups is definitely a good thing, it will be all for naught if you end up creating backups that are either corrupted, missing data or misconfigured!
Bobby Gill