Bots unstable
Incident Report for Chatlayer
Postmortem

Yesterday, we experienced about 1 hour of intermittent downtime of the bots of Chatlayer.ai in production. The instability was caused by a massive unexpected surge in requests, most of which were infinite loops caused by Chatlayer bot to external bot interactions.

To mitigate the problem, we blocked the problematic sessions and upscaled the resources in our cluster, which resolved the problem.

To avoid this problem in the future we will:

  • Implement automatic detection of infinite loop conversations & the blocking of these
  • Allow for more flexible scaling of our cluster to better support surges in load.
Posted Oct 09, 2020 - 15:19 CEST

Resolved
The issue is resolved. We will post a post mortem tomorrow with a root cause analysis of the problem and steps we are taking to fix it.
Posted Oct 08, 2020 - 21:06 CEST
Monitoring
We have upscaled the resources, the performance is improving again.
Posted Oct 08, 2020 - 20:38 CEST
Identified
We are noticing unstable performance on bots in production.
Posted Oct 08, 2020 - 20:19 CEST
This incident affected: Chatlayer Webhook API.