This is a PSA for anyone who manages web servers or sites: not only should you have backups in place, you should make sure you understand how to use those backups if needed. You should also be aware of the frequency in which your backups run, and that they are covering all the data you need backed up including configurations, databases, user-generated data, etc.
For a case study: I recently nuked the entire contents of a server because of one dumb line in a bash script. The purpose of the script was to clear the log files on the server after they had been backed up to my local system:
Here’s the part of the script that did the deleting:
echo "Clearing the contents of production logs"
read -p "Are you sure you want to proceed? (y/n): " confirmation
if [ "$confirmation" != "y" ]; then
rm -rf $logPath
fi
The issue with this terrible script is the variable $logPath
was never defined. As a result, the rm -rf
command simply started deleting all the files on my server.
I should have known better because when I had ChatGPT generate the above code for me, it fortold the future and warned me of the dumb mistake I was going to make:
Please be very cautious when using rm -rf commands, especially with paths specified by variables, as it can result in the irreversible deletion of data. Ensure that the path is accurate, and you have a backup or confirmation of the directory you are deleting.
The rebuilding process
Fortunately, I had backups running as an add-on by my server provider, Linode. This meant I had a backup of not just my application’s code, but all of my server configurations as well as my databases.
Logging into my server provider, I found Linode offers to deploy backups to either your existing server or a brand new server.
Recently I had chosen the ”brand new server” option as I needed to access a backup of some corrupt assets on my server. This allowed me to access a snapshot of my server in time, without messing with its existing operation.
In this case, though, my existing server was toast, so I choose to deploy the backup directly to it. This process took roughly 25 minutes to complete, and my application was down for visitors during this time.
Once the backup was restored, I was able to immediately SSH into the rebuilt server because the backup included all of my configs including things like SSH authorized keys.
Amazingly, once the server was rebuilt from the backup, everything worked as it had been previously and my application was brought back online relatively quickly.
It’s because of this I highly reccomend your backup is a full server backup, configs and all. It’s not enough to just backup code and other assets.
What was lost
While the restore process went smoothly, there was some data loss as the issue occured at around 10am, and my last backup had been run at 10pm the night before. This means there was ~12 hours not accounted for in my backup.
Fortunately, all user-generated data on my application is stored externally on Amazon S3, so I didn’t loose data there. A case for putting your “eggs in multiple baskets”.
I did loose database interactions though. Fortunately, the application in question is not a high traffic one and there were only about 15 interactions lost, which I was able to mostly manually rebuild by looking through the log files that had been downloaded before the server had been nuked.
Takeaways
Reflecting on the above, here’s my concluding thoughts on my backup setup:
- 1x day backup is sufficient enough for the server itself, as the code itself is relatively static and also backed up in Github.
- If user generated data was being written to the server instead of something like Amazon S3, I should consider more frequent backups.
- I should have more frequent/separate backups for databases.
- Linode’s backup feature and the ability to restore from backups went very smoothly. 10/10 would recommend.