‌Amazon's cloud ​services were largely back online on Friday after overheating at one of its data centers triggered an outage that impacted companies including cryptocurrency exchange Coinbase.
The cloud giant said it was making progress in resolving the issue after a rapid spike in temperatures at a single data center in northern Virginia on Thursday knocked out power. A full recovery would take ⁠several hours, it said.
Coinbase said its services were restored after the outage hampered their availability. Derivatives marketplace CME Group was also back up after facing some issues, though it was not clear if the issues at AWS and CME were related.
Overheating in data centers ‌has been a key problem for companies: advanced AI and cloud servers crunching data require massive amounts of power and give off intense heat. To regulate the heat, data center operators have been ‌increasingly turning to water or specialized coolants, which are thousands of ‌times more efficient than traditional air cooling.
Thursday's outage was the second major overheating-driven disruption in ‌recent months, after CME Group suffered ‌one of its longest outages in years last November, due to a cooling failure at data centers run by CyrusOne.
At 7:27 ​a.m. ET, outage reports ‌for AWS on outage tracking website ​Downdetector had gone down to just 54, ⁠from a peak of nearly 600 on Thursday night.
AWS has been bringing additional cooling system capacity online but said it was taking longer than expected to add the capacity required to safely ​restore all remaining ⁠affected systems.
The cloud computing platform ⁠also said it had shifted traffic away from the impacted Availability Zone for most services. An "Availability Zone" comprises one or more connected physical data centers and are designed to operate independently within an ⁠AWS Region.
In an update on its website, CME, the world's largest derivatives marketplace, said it has completed essential maintenance work and users are now able to log in to its CME Direct trading platform. It did not identify the cause of the issues.
Both CME and AWS did not immediately respond to requests for comment.
AWS also saw a major outage ‌last October that caused global turmoil among thousands of sites, including some of the most popular apps like Snapchat and ​Reddit.
That was the largest internet disruption since the CrowdStrike malfunction in 2024 hobbled technology systems in hospitals, banks and airports, highlighting the vulnerability of the world's interconnected technologies. (Reporting by Mrinmay Dey in Mexico City, Deborah Sophia, Rhea Rose Abraham and Shivani Tanna in Bengaluru; Additional reporting by ​Rishabh Jaiswal; Writing by Shubham Kalia; ‌Editing by Sumana Nandy, Kim Coghill and Devika Syamnath)