INC0769829 - Presence, CivicLive, Resource Central & Talentova Sites Were Inaccessible
Incident Report for SchoolMessenger
Resolved
Resolution Notice & Summary:

Beginning at 01:55 PM CT on Thursday, February 4th, Intrado support teams began to receive monitoring alerts for multiple VM (Virtual Machine) storage drive failures in our Toronto processing facility. While the Intrado Incident response team began investigating the issue with the relevant technical resources, additional drive failures continued to be observed, cascading throughout the multiple VM clusters that are hosted in the data center.

The storage drive vendor was immediately engaged, and it was determined that the failures were a result of a defect in the storage drives' firmware. Intrado coordinated with the storage drive vendor to provide replacement parts; an effort that was delayed due to the scarcity of compatible drives. During that delayed period, Intrado technical teams repaired the hardware in place where possible, to restore services to the extent of the available capacity at that time. As parts began arriving at the site, Intrado technical teams immediately installed the new hardware, allowing for the restoration of additional clients as available capacity increased.

We sincerely apologize for this service disruption and the impact it may have had on your business. In addition to the existing plans for platform migration to Microsoft Azure in 2021, Intrado is implementing corrective actions in support of the ongoing stability and availability of our web-hosting services.
Posted Feb 09, 2021 - 20:40 CST
Update
Service has been restored to Presence, CivicLive, and Resource Central. Some restored sites may have experienced latency prior to technical teams making further configuration adjustments that have normalized performance. Talentova remains unavailable at this time, though restoration work is ongoing.
Posted Feb 09, 2021 - 18:05 CST
Update
Service has been restored to Presence, Civiclive, and Resource Central. Some latency may still be experienced on some sites. Talentova is still unavailable at this time.
Posted Feb 09, 2021 - 11:08 CST
Update
Though service has been restored to Presence, Civiclive, and Resource Central, Intrado is receiving reports of restored CivicLive websites having page display issues. Technical teams are engaged and troubleshooting. Talentova remains unavailable at this time.
Posted Feb 09, 2021 - 10:12 CST
Update
Service has been restored to Presence, Civiclive, and Resource Central. Some latency may still be experienced by some users. Talentova is still unavailable at this time.
Posted Feb 09, 2021 - 04:30 CST
Update
Preliminary RCA:

Incident Identifier: INC0769829

Problem Statement: Presence, CivicLive, and Talentova webservices in Intrado’s Toronto processing facility became unavailable due to a firmware defect in the platform’s solid-state drives, resulting in largescale service unavailability.

Date/Time of Impact: Impact began on Thursday, February 4th, 2021. While restoration has occurred for the majority of clients, as of the time of this RCA’s publication (2/8/21) Intrado is still performing efforts in support of client restoration.

Description: On Thursday, February 4th, 2021 at 1:55pm CT Intrado technical support was alerted to service unavailability for a subset of web-hosting services (Presence, CivicLive, and Talentova) located in Toronto processing facility. As Intrado technical support began to troubleshoot the restoration of these services, teams observed a continued expansion of service unavailability throughout the site. At that time it was determined that all web-hosting services were offline.

Presence and CivicLive are in the process of being migrated to the Cloud (Microsoft Azure) with this work expected to be substantially completed in 2021.

Root Cause: A firmware defect within the solid state drives used in the virtual storage supporting Presence, CivicLive, and Talentova. This issue was not related to a security incident.

Actions to Restore Service: Where possible, Intrado has repaired hardware to bring services back online to the extent of available capacity. Additional restoration of services will occur as additional drives are delivered and installed. This effort has been delayed due to the scarcity of compatible drives. All necessary teams are working continuously to bring service online.

Corrective Actions: Restoration of services remains Intrado’s primary focus. As previously mentioned, Presence and Civic will be migrated to Microsoft Azure. Once service is restored, and prior to migration to Microsoft Azure, Intrado is putting additional corrective actions in place to support the ongoing availability of the web-hosting platforms.
Posted Feb 08, 2021 - 20:27 CST
Update
Intrado continues to work diligently to restore our ability to deliver our Talentova, Presence, CivicLive, and Resource Central services. Increased latency has been observed on some previously restored sites; that issue is being investigated in parallel.

Root cause analysis still points to a hardware failure, and we can confirm that this incident is not cybersecurity related. Intrado's incident response team has continued to make progress on mitigating the impact, and we are working diligently to restore service for all customers. While we have restored over 80% of the services in these spaces, we continue to work around the clock to complete the remaining hardware replacement and restores from Feb 3 backups. Customers in the CivicLive and Talentova space that have not seen recovery will likely remain offline into the day on Monday as we continue to replace and restore, and a small subset of Presence customers may see the same delay as well. Additionally, Resource Central is currently inaccessible.

Other solutions you may use from Intrado remain unaffected by this incident.
Posted Feb 08, 2021 - 18:45 CST
Update
Intrado continues to work diligently to restore our ability to deliver our Talentova, Presence, and CivicLive services.

Root cause analysis still points to a hardware failure, and we can confirm that this incident is not cybersecurity related. Intrado's incident response team has continued to make progress on mitigating the impact, and we are working diligently to restore service for all customers. While we have restored over 70% of the services in these spaces, we continue to work around the clock to complete the remaining hardware replacement and restores from Feb 3 backups. Customers in the CivicLive and Talentova space that have not seen recovery will likely remain offline into the day on Monday as we continue to replace and restore, and a small subset of Presence customers may see the same delay as well.

Other solutions you may use from Intrado remain unaffected by this incident.
Posted Feb 07, 2021 - 19:55 CST
Update
Intrado continues to work diligently to restore our ability to deliver our Talentova, Presence, and CivicLive services.

Root cause analysis still points to a hardware failure, and we can confirm that this incident is not cybersecurity related. Intrado's incident response team has continued to make progress on mitigating the impact, and we are working diligently to restore service for all customers. While we have restored over 70% of the services in these spaces, we continue to work around the clock to complete the remaining hardware replacement and restores from Feb 3 backups. Customers in the CivicLive and Talentova space that have not seen recovery will likely remain offline into the day on Monday as we continue to replace and restore, and a small subset of Presence customers may see the same delay as well.

Other solutions you may use from Intrado remain unaffected by this incident.
Posted Feb 07, 2021 - 13:02 CST
Update
Intrado is experiencing technical difficulties interfering with our ability to deliver our Talentova, Presence, and CivicLive services.

Preliminary root cause analysis points to a hardware failure, and we can confirm that this incident is not cybersecurity related. Intrado's incident response team is continuing to make progress on mitigating the impact, and we are working diligently to restore service for all customers.

Other solutions you may use from Intrado remain unaffected by this incident.
Posted Feb 07, 2021 - 11:00 CST
Update
Intrado is experiencing technical difficulties interfering with our ability to deliver our Talentova, Presence, and CivicLive services.

Preliminary root cause analysis points to a hardware failure, and we can confirm that this incident is not cybersecurity related. Intrado technical teams are continuing to make progress on restoring service to customers. Please continue to check back for updates.
Posted Feb 07, 2021 - 08:00 CST
Update
Intrado is experiencing technical difficulties interfering with our ability to deliver our Talentova, Presence, and CivicLive services.

Preliminary root cause analysis points to a hardware failure, and we can confirm that this incident is not cybersecurity related. Intrado technical teams are continuing to make progress on restoring service to customers. Please continue to check back for updates.
Posted Feb 06, 2021 - 19:45 CST
Update
Intrado is experiencing technical difficulties interfering with our ability to deliver our Talentova, Presence, and CivicLive services.

Preliminary root cause analysis points to a hardware failure, and we can confirm that this incident is not cybersecurity related. Intrado technical teams are continuing to make progress on restoring service to customers. Please continue to check back for updates.
Posted Feb 06, 2021 - 15:45 CST
Update
Intrado is currently experiencing technical difficulties interfering with our ability to deliver our Talentova, Presence, and CivicLive services.

Preliminary root cause analysis points to a hardware failure, and we can confirm that this incident is not cybersecurity related. We've started the process of restoring service to customers. Please continue to check back for updates.
Posted Feb 05, 2021 - 19:50 CST
Identified
Intrado is currently experiencing technical difficulties interfering with our ability to deliver our Talentova, Presence, and CivicLive services. Our incident response team continues to actively work to implement the necessary steps to resolve the issue.

While we do not currently have an estimated time of resolution, progress on restoration is being made and we are working diligently to restore service for all customers as soon as possible. We will continue to provide updates as we have them.
Posted Feb 05, 2021 - 15:30 CST
Update
Intrado is currently experiencing technical difficulties interfering with our ability to deliver our Presence and CivicLive services. We have launched our incident response team and are actively working to implement the necessary steps to restore services.

While we do not currently have an estimated time of resolution, we are working diligently to restore your service as soon as possible. We will continue to provide updates as we have them.

Our customers are important to us and we take any interruption to your service very seriously. We sincerely apologize for this service disruption and the impact it may have on your business. Other solutions you may use from Intrado are not affected by this incident.

We will provide additional communications as further details become available. For real-time information please continue to come back to this site https://notification.west.com/.
Posted Feb 05, 2021 - 04:00 CST
Update
Presence and CivicLive sites remain inaccessible at this time. Intrado teams urgently continue to investigate this issue and have engaged all necessary vendors to assist in troubleshooting and mitigation efforts.
Posted Feb 04, 2021 - 20:50 CST
Update
Intrado has determined that all Presence and CivicLive sites are currently inaccessible. Teams urgently continue to investigate this issue.
Posted Feb 04, 2021 - 16:01 CST
Update
We are continuing to investigate this issue.
Posted Feb 04, 2021 - 15:18 CST
Update
CivicLive is also intermittently impacted
Posted Feb 04, 2021 - 15:30 CST
Investigating
Intrado is currently investigating certain SchoolMessenger Presence websites that may be inaccessible at this time.
Posted Feb 04, 2021 - 15:15 CST
This incident affected: Presence and CivicLive.