Lead processing timeouts
Incident Report for Decision Cloud
Resolved
We are restarting the new database cluster to ensure new settings take effect.
Posted Dec 06, 2021 - 17:59 PST
Update
Reporting data has begun to transfer from the new database. we are monitoring system load to ensure things continue to operate smoothly.
Posted Dec 06, 2021 - 11:04 PST
Update
Traffic has been processing with no timeouts for over 20 minutes. All automated alarms have closed as conditions improve. We will re-enable report data within the next 20-30 minutes if conditions hold.
Posted Dec 06, 2021 - 10:27 PST
Update
After cutting across to our failover database, response times and throughput have begun returning to regular levels. Reporting is not updating currently but will resume in 5-10 minutes, when we ensure the new database is running properly.
Posted Dec 06, 2021 - 10:11 PST
Update
Timeouts are still happening. We are going to fail over to our emergency recovery database cluster. Lead processing will be impacted for several minutes.
Posted Dec 06, 2021 - 09:52 PST
Update
The reboot has completed and we are now monitoring system health to verify that things are functioning normally. We will update this incident again within the next 10 minutes.
Posted Dec 06, 2021 - 09:26 PST
Update
Due to ongoing timeouts we are going to restart our primary database. This will result in approximately 5-10 minutes of downtime on our entire platform. We plan on beginning this process in 10 minutes at 9:15am PST.
Posted Dec 06, 2021 - 09:04 PST
Update
We saw drastic increase in our 95th percentile processing when hourly caps reset, resulting in roughly 5% of leads taking at least 15 seconds to process. This is directly related to the ongoing database rollback and will continue until recovery is complete. We are doing all we can to maintain system uptime in the mean time.
Posted Dec 06, 2021 - 07:18 PST
Update
The rollback log on this task was significantly larger than we anticipated. While performance seems to be normal, our internal alarm may take longer to clear than we initially expected. We continue to monitor system health and will update this incident if conditions change.
Posted Dec 06, 2021 - 06:59 PST
Update
Our 99th percentil response time has decreased from 20 seconds back down to under 10 seconds and is still falling closer to our typical numbers. While service appears to be running within regular bounds, we will keep this incident open until all of internal alarms meet nominal conditions and get automatically closed.
Posted Dec 06, 2021 - 06:26 PST
Monitoring
Beginning at approximately 6am PST, we began to experience timeouts in lead processing.

We have identified the cause as a long running maintenance task in our processing database. We have stopped the job and are watching processing times recover currently. We will monitor this issue until our regular processing time falls back in line with our regular performance.
Posted Dec 06, 2021 - 06:16 PST
This incident affected: Decision Cloud - API (Lead Processing).