This script offers customizable backup management, and will expire and delete backups that don’t meet the standard S3 Management policies.
Backups to S3 can be managed with either S3 Versioning, S3 Classes or Lifecycle policies, however none of these methods offer the management of a typical backup rotation method, where daily backups, weekly and monthly backups are rotated. Combined with this script, an AWS S3 Lifecycle policy can then be used to move older backups to a cheaper storage class as required.
Using the configuration file with the lambda function, multiple different backup types and rotation policies in the same S3 bucket.
Support for both Folder and File backups. As some backups may be single file like a zip or tar.gz file, and some may be a bunch of files in a folder. This script supports both methods, so long as the folder name is based on a known Date/time format.
To ensure recent backups are not removed there is a minimum number of days option. This allows you to keep for example the last week of daily backups, or only the most recent days based on this option.
Keep a minimum number of backups, regardless of if they should be expired, unless there is at least a certain number of backups, no deletion will take place. This is a great protection mechanism, for backups that may be performed manually.
There is also the well utilized rotation of weekly and monthly backups, where a minimum number will be kept as desired.
This backup management script runs as a scheduled lambda function. It can run as often or as little as required, and is configured by a simple json configuration file stored with the lambda function.
If there is need to modify the configuration file to be located in the S3 bucket, be cautious that the retention could then be modified by a bad actor, removing your required backups. This is why it is kept centrally where it cannot be modified by the source.
The function works by first tagging each backup based on their date/time stamps, as one of either recent, weekly, or monthly. Backups that don’t received one of these tags, is then defined as expired, and removed as needed to meet the rotation requirements. A flow diagram is included at the end.
This script will attempt to automatically detect the date/time of each backup based on the file/folder name. Using the AWS S3 object timestamp is unreliable due to a couple of reasons
• Directories/Folders do not have a timestamp, as these objects don’t actually have any metadata within S3. • File timestamps or ‘Last Modified’ times are based on when the backup is placed on S3 (assuming
it is not modified). This may be different to when the backup was created, particularly in a migration to S3 event, or other sync failures.
This script detects a number of commonly found date/timestamps. You can see these in the
get_datetime_fromfile function in the lambda function.
The script will fall back to the file last modified date/time if an identified pattern cannot be found.
In the base log, you will see the basic information around number of backups for each type, any backups that are being expired, and where date time stamps have not been able to be obtained. If you require more detailed information for example the exact tagging given to each backup, you can enable Debugging from within the Lambda function.
Deployment is as a Lambda function with some simple environment variables to specify the bucketname and configuration file name to use. A sample cloudformation template is provided in the GIT repository.
Git repository can be located here: https://github.com/steven-geo/s3-backup-management The Lambda function requires a build function prior to deployment to upload the Lambda function to S3 first. Due to the size of the function, it cannot be embedded into the Cloudformation Template.
Default configuration can be set in he
lambda_function.py file or can be set in the backup configuration file.