Apple Enables Thunderbolt 5 Mac AI Clusters to Run Trillion-Parameter Models

In a direct move to challenge the dominance of Nvidia’s DGX AI workstation desktops, Apple has announced a significant capability for its existing Thunderbolt 5-supporting Macs: the ability to operate as linked “AI clusters.” This functionality allows multiple Mac Studio desktops or MacBook Pro laptops to connect and pool their resources for tandem AI model processing, enabling them to tackle much larger models than they could handle individually.

This feature will be rolled out with the upcoming macOS 26.2 beta and utilizes Apple’s open-source framework, MLX (Machine Learning Exchange), which is an Application Programming Interface (API) designed for developers to create and test new AI models. Apple collaborated with a developer, Exo Labs, to create the tandem processing tool known as EXO 1.0.

EXO 1.0 can facilitate up to four Mac Studio desktops equipped with the M3 Ultra chip, or two MacBook Pro laptops, to work on the same AI models. By pooling their unified memory via the Thunderbolt 5 connections, these clusters can run models containing up to a massive one trillion parameters. This is a first for Apple using the Thunderbolt 5 standard for this purpose.

In a recent web demonstration, Apple showcased four M3 Ultra Mac Studio desktops running a 1-trillion parameter model, Kimi-K2-Thinking. The entire cluster consumed less than 500 watts (W) of power. This is a potential efficiency advantage, as it is far less than the 700W a single traditional GPU might draw in a conventional AI cluster, and notably less than the theoretical maximum draw of rival Nvidia DGX Spark systems.

Separately, Apple provided insight into the AI performance of its next-generation M5 chip, which will also be accessible to developers via MLX in macOS 26.2. The M5 boasts a significant upgrade to its neural accelerator in each GPU core, which is critical for matrix-multiplication operations that dominate machine learning workloads.

In evaluating the M5 hardware, Apple found that the Time-to-First-Token (TTFT) metric—how quickly a Large Language Model (LLM) generates its first piece of information after a prompt—was up to four times faster than on its M4 counterparts when running the Qwen model. This acceleration is compute-bound, showcasing the M5’s raw power.

While the 4x gain does not extend to the entire prompt, the increased memory bandwidth of the M5 still resulted in an overall LLM performance improvement between 19% and 27% across various models. Image generation also saw massive gains, with M5 hardware performing up to 3.8 times faster than M4 alternatives when generating a 1,024-by-1,024 image using the FLUX-dev-4bit model.

These advances mark an exciting development for both users taking advantage of Apple Intelligence features on macOS and for developers who want a powerful, energy-efficient platform for creating new AI models, setting the stage for a compelling rivalry with Nvidia’s specialized hardware.

Sources

Apple Unlocks Mac AI Clusters Using Thunderbolt 5 to Rival Nvidia DGX