Report Date: 2024-04-08
Incident Date(s): 2024-04-01 – 2024-04-02
On 2024-04-02 at 05:43 UTC, the F5 Distributed Cloud support team received the initial customer report of a delay in change propagation to load balancers. Upon receiving the report, the team initiated an investigation and determined that the issue was limited to the configuration propagation of new load balancers and modifications to existing ones. There was no impact on traffic processing.
Detailed analysis confirmed that the issue began on 2024-04-01 18:30 UTC, when the Distributed Cloud platform experienced an influx of anomalous traffic which affected the load balancer configuration propagation from the Global Controller and Regional Edges.
To address the issue, the F5 Distributed Cloud team deployed a hotfix to the Global Controller and Regional Edges. Post-fix validation confirmed the propagation delay was resolved on 2024-04-02 at 19:50 UTC.
The service degradation lasted 1 day, 1 hour, and 20 minutes.
Start time of Service Event | 2024-04-01 18:30 UTC |
Conclusion of Service Event | 2024-04-02 19:50 UTC |
Event duration | 1 day, 1 hour and 20 minutes |
Impact | Customers encountered delays in the application of load balancer configurations when creating a new configuration or modifying an existing one. |
Root cause | F5 Distributed Cloud Load Balancer service experienced a partial degradation due to a high volume of anomalous requests in network traffic coming from non-trusted systems. |
DATE | TIME (UTC) | ACTION |
---|---|---|
2024-04-01 | 18:30 | F5 Distributed Cloud Team observed configuration propagation delays. |
2024-04-02 | 06:01 | F5 Distributed Cloud Team started receiving reports of delays in executing configuration changes on the LB from customers. |
2024-04-02 | 06:10 | F5 Distributed Cloud Team sought additional information from the customer to understand the nature of the issue. |
2024-04-02 | 07:43 | F5 Distributed Cloud Team, after gathering the required information, escalated the issue to the internal engineering team. |
2024-04-02 | 08:23 | The F5 Distributed Cloud Team acknowledged the existence of a configuration push issue at the backend and commenced an investigation for the root cause. |
2024-04-02 | 11:15 | The F5 Distributed Cloud Team identified a high configuration queue within the system, which adversely affected the propagation of configurations updated by the customers on the console. |
2024-04-02 | 11:50 | The F5 Distributed Cloud Team discovered an issue with the Regional Edges and Global Controller and began deploying hotfix to address the issue. |
2024-04-02 | 17:00 | F5 Distributed Cloud Team completed the hotfix deployment to the Global Controller and continued updating the Regional Edges. |
2024-04-02 | 19:50 | The F5 Distributed Cloud Team successfully completed all hotfix deployment tasks for the Regional Edge (RE) clusters, and traffic is observed to be processing normally. The team closely monitored the situation to ensure that all Load Balancer propagation issues had been effectively resolved. End of service event. |
Yes, the service degradation is resolved, and the load balancer service is fully operational.
To stop the large, well crafted, distributed and fully randomized influx of anomalous network traffic, multiple countermeasures were applied to the Distributed Cloud platform, which inadvertently triggered congestion in the configuration path. This introduced a delay in the load balancer configuration propagation.
The F5 Distributed Cloud team deployed a hotfix to Global Controller and Regional Edges which restored normal operations of load balancer configurations propagation.
We will be taking several measures to prevent this service event from reoccurring and to ensure that we are better prepared to react to and recover from similar scenarios more quickly.
F5® understands how important reliability of the Distributed Cloud Platform is for customers. F5 will ensure the recommended changes in this document are canonized into our operational Methods of Procedure (MoP) moving forward. We are grateful you have chosen to partner with F5® for critical service delivery and are committed to evolving our platform and tooling to better anticipate and mitigate disruptions to Distributed Cloud Platform services.
F5 Glossary