Console Login Disruption
Incident Report for F5 Distributed Cloud
Postmortem

Incident Started:2022-12-01 04:30 UTC 

Resolution started: 2022-12-01 04:32 UTC 

Incident Resolved:2022-12-01 05:03 UTC 

Summary:

F5XC SaaS management plane is running in AWS and had disruption due to very high CPU load for one of our controller services responsible for managing new logins.  The controller service was overloaded due to reception of very large packet(s).  We evacuated the overloaded controller service and moved to another AWS node and it did not solve the issue.  After we set the configuration that allows maximum message size that could be exchanged by sending messages in smaller chunks, the controller service was fully functional.  There was no impact to the data plane or existing sessions in Console.

Root cause:

The F5XC SaaS Controller is deployed in AWS.  One of the controller services received very large messages which overloaded the CPU and the service in question was nonfunctional.  Generally, when large messages are received, the controller service was configured to receive the message in its entirety and that caused CPU overload.  Had it been configured to receive a much smaller chunk at a time, we would not have seen this disruption.

Incident flow:

At 04:30 AM UTC, F5XC monitoring system alerted about one of the controller services running in SaaS Management plane was unavailable.  For very quick resolution, SRE engineers moved this service to another node in AWS and it did not solve the issue.  In depth analysis pointed to configuration responsible for managing very large messages. At 5:00 AM UTC controller service configuration was changed and at 5:03 AM UTC service was fully functional. F5XC Console was inaccessible for 33 minutes for new logins, existing sessions were not disrupted.

Conclusion

The F5XC SaaS Controller service responsible for managing new logins into Console is being further investigated for further enhancements and operational efficiencies.

Posted Dec 01, 2022 - 09:15 UTC

Resolved
The F5XC SaaS Controller service responsible for managing new logins into Console is being further investigated for further enhancements and operational efficiencies.
Posted Dec 01, 2022 - 03:00 UTC