Issues with Single Sign-On
Incident Report for Upsales
Postmortem

On June 14th we experienced an outage for customers using our Single Sign On feature following a routine system update. Making them unable to log in. We were made aware of the issues 08:37 and started working on a solution at 08:40. This outage affected customers that are using our Single Sign On feature and we apologize for the inconvenience it may have caused. The service has now been restored.

What happened?

Upsales CRM consists of multiple services that communicate with each other. For system updates we are releasing new updates to multiple services at the same time. The update contained changes to functionality that is executed upon logging in to the system which didn’t work for customers using Single Sign On.
At first investigation the problem seemed to be external from Upsales service as there was errors related to one Single Sign On provider in the logs of the system. When more customers with different providers reported similar issues and us being unable to directly find a fix we decided to execute a rollback of the release. However while this rollback seemed to have executed properly it failed to execute across all of our services. Leadning the non-working service still running the latest version of the code. The error rates was quite low after the faulty rollback leading the team to believe that the problem had been mitigated. Afterwards we discovered that the rollback hadn’t completed successfully and did it again which brought back service for the affected customers.

Timeline

08:30 - The faulty code was released in production
08:37 - The team was first notified that users was unable to log in with Single Sign On
09:34 - A rollback of the release was executed. However leaving the system in a bad state with part of the system running an old version and other parts running the new version
11:53 - A second rollback was executed to revert the remaining parts of the system to an older version
11:59 - Incident fully mitigated

Where do we go from here?

Actions already taken

  • To avoid this happening again we have updated our rollback routines so a faulty rollback will be fully visible for the engineers.

Long term actions

  • We will review our routines regarding how long a new faulty release will be in production before deciding to execute a rollback
  • We will invest in improving our monitoring and alerting to quickly identify issues related to log-ins
Posted Jun 14, 2023 - 22:00 CEST

Resolved
This incident has been resolved
Posted Jun 14, 2023 - 13:28 CEST
Investigating
Some customers may experience issues logging into Upsales if you are using Single Sign-On. Our engineers are aware of the issue and are working towards a solution.
Posted Jun 14, 2023 - 11:25 CEST
This incident affected: Upsales App.