Clubhouse is now FREE with all core features, for up to 10 users! Learnmore

Summary of the Clubhouse service disruption on June 1, 2017

Kurt Schrader

On the afternoon of June 1, 2017, Clubhouse experienced a partial service outage that occurred between 2:07pm ET and 3:52pm ET.

Organizations that added or modified something in Clubhouse during that period of time may not have had their data recorded in our database. We have been able to recover all customer data from the affected period, and are in the process of communicating the recovered data to all affected customers.

The root cause of the issue was the interaction of two previously undiscovered bugs in Datomic, the database software that we use to store data on the backend, that caused our database to enter an unstable state.

We have been working with the Datomic team over the last week to address the issues at hand and they have released a new version of Datomic with fixes to the issues that we encountered (http://docs.datomic.com/release-notices.html).

Statement from the Datomic team:

"Release 0.9.5561.50 fixes a bug in the catalog that, in the unlikely circumstance where one has deleted a database and restored it from a backup without first having called gc-deleted-dbs, can cause a subsequent gc-deleted-dbs to delete that (active) database."

We have now deployed the updated version of Datomic and do not expect to encounter this problem again.

Every day, thousands of companies trust Clubhouse to keep mission-critical data safe and secure, and we take that responsibility very seriously. Now that the issue is resolved, we are planning a complete review of our operations and recovery procedures to ensure we’re taking every preventative measure we can to ensure the reliability and security of your data.

Incident Timeline:

  • Jun 1, 12:00pm ET: We began the process of deleting old databases from the system.
  • Jun 1, 2:07pm ET: Our monitoring showed an increased rate of 500 errors occurring on our servers and we began our investigation.
  • Jun 1, 3:52pm ET: Our database was restored to a known good state. Some of the transactions from the prior 3 hours and 10 minutes were lost. The indexes that power historical reporting and historical activity feed were also non-functional.
  • Jun 7, 4:00pm ET: The database indexes were rebuilt and merged to restore reporting and activity feed functionality. All lost data was recovered is being communicated to affected customers.

If you have any questions about the outage please don’t hesitate to contact us at support@clubhouse.io