We are restarting the new database cluster to ensure new settings take effect.
Dec 6, 17:59 PST
Reporting data has begun to transfer from the new database. we are monitoring system load to ensure things continue to operate smoothly.
Dec 6, 11:04 PST
Traffic has been processing with no timeouts for over 20 minutes. All automated alarms have closed as conditions improve. We will re-enable report data within the next 20-30 minutes if conditions hold.
Dec 6, 10:27 PST
After cutting across to our failover database, response times and throughput have begun returning to regular levels. Reporting is not updating currently but will resume in 5-10 minutes, when we ensure the new database is running properly.
Dec 6, 10:11 PST
Timeouts are still happening. We are going to fail over to our emergency recovery database cluster. Lead processing will be impacted for several minutes.
Dec 6, 09:52 PST
The reboot has completed and we are now monitoring system health to verify that things are functioning normally. We will update this incident again within the next 10 minutes.
Dec 6, 09:26 PST
Due to ongoing timeouts we are going to restart our primary database. This will result in approximately 5-10 minutes of downtime on our entire platform. We plan on beginning this process in 10 minutes at 9:15am PST.
Dec 6, 09:04 PST
We saw drastic increase in our 95th percentile processing when hourly caps reset, resulting in roughly 5% of leads taking at least 15 seconds to process. This is directly related to the ongoing database rollback and will continue until recovery is complete. We are doing all we can to maintain system uptime in the mean time.
Dec 6, 07:18 PST
The rollback log on this task was significantly larger than we anticipated. While performance seems to be normal, our internal alarm may take longer to clear than we initially expected. We continue to monitor system health and will update this incident if conditions change.
Dec 6, 06:59 PST
Our 99th percentil response time has decreased from 20 seconds back down to under 10 seconds and is still falling closer to our typical numbers. While service appears to be running within regular bounds, we will keep this incident open until all of internal alarms meet nominal conditions and get automatically closed.
Dec 6, 06:26 PST
Beginning at approximately 6am PST, we began to experience timeouts in lead processing.
We have identified the cause as a long running maintenance task in our processing database. We have stopped the job and are watching processing times recover currently. We will monitor this issue until our regular processing time falls back in line with our regular performance.
Dec 6, 06:16 PST