Monitoring - A small percentage of our overall traffic has been subject to an increased error and time out rate. We have identified the reason as being storage bandwidth constrained on our new database cluster. We are upgrading the primary storage array one disk at a time, and performance should improve linearly as upgrades complete over the next few hours. This evening we are going to allocate another read replica and move primary processing to a server with roughly 2x the current resources. We will continue to monitor this issue until all upgrades have been completed.
Dec 8, 11:09 PST

About This Site

This is our Decision Cloud status page, where you can get updates on how our systems are doing. If there are interruptions to service, we will post a note here.

As always, if you are experiencing any issues with Decision Cloud, don't hesitate to get in touch with us at support@insight.tm and we'll get back to you as soon as we can.

Decision Cloud - Web Site Degraded Performance
Decision Cloud - API (Lead Processing) Degraded Performance
Decision Cloud - Reports Degraded Performance
BigML Decisions ? Operational
File Builder ? Major Outage
90 days ago
99.85 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Reporting Delay
Fetching
Decision Cloud - Response Time
Fetching
Past Incidents
Dec 8, 2021

Unresolved incident: Lead Timeouts.

Dec 7, 2021

No incidents reported.

Dec 6, 2021
Resolved - We are restarting the new database cluster to ensure new settings take effect.
Dec 6, 17:59 PST
Update - Reporting data has begun to transfer from the new database. we are monitoring system load to ensure things continue to operate smoothly.
Dec 6, 11:04 PST
Update - Traffic has been processing with no timeouts for over 20 minutes. All automated alarms have closed as conditions improve. We will re-enable report data within the next 20-30 minutes if conditions hold.
Dec 6, 10:27 PST
Update - After cutting across to our failover database, response times and throughput have begun returning to regular levels. Reporting is not updating currently but will resume in 5-10 minutes, when we ensure the new database is running properly.
Dec 6, 10:11 PST
Update - Timeouts are still happening. We are going to fail over to our emergency recovery database cluster. Lead processing will be impacted for several minutes.
Dec 6, 09:52 PST
Update - The reboot has completed and we are now monitoring system health to verify that things are functioning normally. We will update this incident again within the next 10 minutes.
Dec 6, 09:26 PST
Update - Due to ongoing timeouts we are going to restart our primary database. This will result in approximately 5-10 minutes of downtime on our entire platform. We plan on beginning this process in 10 minutes at 9:15am PST.
Dec 6, 09:04 PST
Update - We saw drastic increase in our 95th percentile processing when hourly caps reset, resulting in roughly 5% of leads taking at least 15 seconds to process. This is directly related to the ongoing database rollback and will continue until recovery is complete. We are doing all we can to maintain system uptime in the mean time.
Dec 6, 07:18 PST
Update - The rollback log on this task was significantly larger than we anticipated. While performance seems to be normal, our internal alarm may take longer to clear than we initially expected. We continue to monitor system health and will update this incident if conditions change.
Dec 6, 06:59 PST
Update - Our 99th percentil response time has decreased from 20 seconds back down to under 10 seconds and is still falling closer to our typical numbers. While service appears to be running within regular bounds, we will keep this incident open until all of internal alarms meet nominal conditions and get automatically closed.
Dec 6, 06:26 PST
Monitoring - Beginning at approximately 6am PST, we began to experience timeouts in lead processing.

We have identified the cause as a long running maintenance task in our processing database. We have stopped the job and are watching processing times recover currently. We will monitor this issue until our regular processing time falls back in line with our regular performance.
Dec 6, 06:16 PST
Dec 5, 2021

No incidents reported.

Dec 4, 2021

No incidents reported.

Dec 3, 2021

No incidents reported.

Dec 2, 2021

No incidents reported.

Dec 1, 2021

No incidents reported.

Nov 30, 2021

No incidents reported.

Nov 29, 2021
Resolved - The incident appears to have been caused by a brief network outage. All systems appear to be functioning properly at this time.
Nov 29, 06:03 PST
Monitoring - Reports are once again available. We have not yet determined the cause of the outage, so we will continue to investigate the problem.
Nov 29, 05:42 PST
Investigating - Reports are currently experiencing an issue loading data in the web browser.

Reports data appears to still be syncing in the background at this time. We will determine the root cause of the rendering issue and update this incident.
Nov 29, 05:36 PST
Nov 28, 2021

No incidents reported.

Nov 27, 2021

No incidents reported.

Nov 26, 2021

No incidents reported.

Nov 25, 2021

No incidents reported.

Nov 24, 2021
Resolved - This incident has been resolved.
Nov 24, 14:46 PST
Monitoring - An application delay has been identified and a fix has been implemented. We are monitoring performance now.
Nov 24, 08:18 PST