On Friday Dropbox took there cloud sharing service offline for scheduled maintenance but even after coming back online three hours later, some users experienced issues accessing their data stored in the Dropbox cloud.
However Dropbox worked extremely hard to correct the issues and the core service was fully restored by 4:40 PM PT on Sunday and now Dropbox has issued a detailed explanation of what happened and what they have learned.
“We use thousands of databases to run Dropbox. Each database has one master and two slave machines for redundancy. In addition, we perform full and incremental data backups and store them in a separate environment.
On Friday at 5:30 PM PT, we had a planned maintenance scheduled to upgrade the OS on some of our machines. During this process, the upgrade script checks to make sure there is no active data on the machine before installing the new OS.
A subtle bug in the script caused the command to reinstall a small number of active machines. Unfortunately, some master-slave pairs were impacted which resulted in the site going down.
Your files were never at risk during the outage. These databases do not contain file data. We use them to provide some of our features (for example, photo album sharing, camera uploads, and some API features).
To restore service as fast as possible, we performed the recovery from our backups. We were able to restore most functionality within 3 hours, but the large size of some of our databases slowed recovery, and it took until 4:40 PM PT today for core service to fully return.”
For more information on the Dropbox outage and what Dropbox have now learnt to implement to stop this happening again jump over to the official Dropbox Tech blog website for details.