Console API and UI experiencing instability due underlaying outage in AWS EC2 service
Incident Report for F5 Distributed Cloud
Postmortem

RFO 2022-07-28 

Incident Started:2022-07-28 17:05 UTC 

Resolution started: 2022-07-28 17:10 UTC 

Incident Resolved:2022-07-28 21:24 UTC 

Summary: 

Our SaaS management plane uses AWS us-east-2 as primary region. AWS started having networking and power issues, which broke our EC2 instances accross all AZs. This resulted in console.ves.volterra.io login issues and errors.  

Root cause: 

Our SaaS management plane uses AWS us-east-2 as the primary region. AWS started having networking and power issues, which broke some of our EC2 instances accross all AZs. This resulted in console.ves.volterra.io login issues and API errors.  

Incident flow: 

On Thursday, July 17:05pm UTC, we received alerts about login and API errors. Right after notification of the issue, we started investigating the fastest solution to recover. At 17:10 we discovered that many of our VMs in AWS are in error state and AWS disclosed that it is experiencing outage in us-east-2. Because of this AWS outage, we were unable to stop/restart or remove VMs. At 21:24 pm UTC we completely recovered all services. 

Conclusion 

Management plane outage was caused by an outage in AWS region and all their availability zones.  

Corrective measures 

Operations and Engineering is working to improve failover to backup region to prevent console and API errors.

Posted Jul 30, 2022 - 06:28 UTC

Resolved
This incident has been resolved.
Posted Jul 28, 2022 - 21:38 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jul 28, 2022 - 21:24 UTC
Identified
One region in AWS is facing outage on EC2 services and we are working on restoring services on different VMs.
Posted Jul 28, 2022 - 17:16 UTC
This incident affected: Services (Portal & Customer Login, Customer Dashboard) and Customer Support, Docs and WebSite (Customer Support).