Unprecedented traffic levels to the Uphold retail platform (a combination of speculation in XRP, DOGE and Silver and a surge in sign-ups) caused intermittent availability and slower performance across 3 days (between 15.30 UTC on Saturday 30th January and 15.00 UTC on Monday 1st February)
Availability per day %
- Saturday Jan 30th saw 91.7% normal availability, with a slow or intermittent service at other times.
- Sunday Jan 31st saw 92.9% normal availability or server upgrades, with slow or intermittent service at other times.
- Monday Feb 1st saw 87.5% normal availability, with slow or intermittent service at other times.
During this period, targeted, cascading upgrade work was done. The first batch of work was successfully completed on Saturday evening on live systems. Sunday saw further remedial and scaling work. Then on Monday, the main project was parallelizing a key ‘connection pooling’ service, to permanently steady the platform. Remediations, while needed, had the effect of moving the bottleneck through different systems, causing an impact until parity was achieved.
A dedicated load testing function has been carved out from the current performance team, whose job it’ll be to ensure service can handle the load we saw and will build load tests for 10x standard load.
Enhanced intra-service monitoring has been implemented and will be tested under load.