Alphabet's Google is working on a new initiative to improve its own artificial intelligence chips so they can work well in running PyTorch, a software most people use to build and run AI models.
By making PyTorch run smoothly on Google's chips, the tech giant aims to weaken Nvidia's longstanding dominance in the AI computing market, according to Reuters.
Google is pushing hard to make its Tensor Processing Units a viable alternative to Nvidia's market-leading GPUs. These chips are becoming an important source of revenue for Google Cloud as it seeks to show investors that its AI investments are actually generating returns.
What is this new initiative?
However, just having powerful hardware alone is not enough to spur adoption. The new initiative, internally known as “TorchTPU,” is designed to remove a major barrier that has slowed the uptake of TPU chips by making them fully compatible with PyTorch software and easier for developers to use.
Google is also considering open-sourcing parts of the software to accelerate adoption among customers, Reuters reported, citing people familiar with the matter.
PyTorch, an open-source project heavily supported by Meta Platforms, is used by developers worldwide to make AI models. In Silicon Valley, very few developers write every line of code that chips from Nvidia, Advanced Micro Devices or Google will actually execute.
Instead, those developers rely on tools like PyTorch, which is a collection of pre-written code libraries and frameworks that automate many common tasks in developing AI software.
Nvidia vs Google
Nvidia’s engineers have spent years optimising their chips to ensure that PyTorch-based software runs as fast and efficiently as possible. Google, by contrast, has traditionally relied on a different framework, Jax, used extensively by its internal teams, with its TPU chips optimised through a compiler called XLA. As a result, much of Google's AI software stack and performance optimization has been built around Jax, creating a growting mismatch between how Google designs and uses its chips and how most customers prefer to work with them.
Alphabet had long reserved the majority share of its own chips, or TPUs, for in-house use only. That changed in 2022, when Google’s cloud computing unit successfully lobbied to oversee the group that sells TPUs. The move drastically increased Google Cloud's allocation of TPUs and as customers' interest in AI has grown, Google has sought to capitalize by ramping up production and sales of TPUs to external customers.
But the mismatch between the PyTorch frameworks used by most of the world’s AI developers and the Jax frameworks that Google’s chips are currently most finely tuned to run means that most developers cannot easily adopt Google’s chips and get them to perform as well as Nvidia’s without undertaking significant, extra engineering work. Such work takes time and money in the fast-paced AI race.
If the project becomes successful, Google's “TorchTPU” initiative has the potential to significantly reduce switching costs for firms that would be seeking alternatives to Nvidia’s GPUs.
Nvidia’s dominance in the AI chip market is a result of not only its hardware but by its CUDA software ecosystem, which is deeply embedded in PyTorch and has become the default method by which companies train and run large AI models, Reuters reported.
Google ties up with Meta
In order to accelerate the pace of growth, Google is now working closely with Meta, the creator and steward of PyTorch, according to Reuters. The two tech giants have simultaneously been discussing deals for Meta to access more TPUs.
Early offerings for Meta were structured as Google-managed services, under which customers such as Meta deployed Google's chips, which were designed to run Google software and models. On the other hand, Google was tasked with providing operational support.
Meta has a strategic interest in developing software that makes it easier to run TPUs, as this could help lower inference costs and diversify its AI infrastructure away from Nvidia’s GPUs, strengthening its negotiating power, according to the Reuters report.