Google Takes Aim at Nvidia: TPU 8t and 8i Signal a New Phase in AI Chips

Share this article
Spread the word on social media
Google's TPU 8t and 8i challenge Nvidia's GPU lead
Google announced two purpose-built chips, the TPU 8t for training and the TPU 8i for inference. The company said they are expected to debut later this year. Sundar Pichai said, according to Google, the company now processes more than 16 billion tokens per minute, up from 10 billion last quarter, and the company said it is directing just over half of its 2026 machine-learning compute investment to Cloud.
What happened: Google splits training and inference into specialized chips
Google moved away from a one-chip-fits-all approach it used previously, releasing an eighth-generation TPU family that separates training and inference workloads. TPU 8t targets high-throughput model training while TPU 8i focuses on inference efficiency, and Google said the chips include larger on-chip memory to raise on-chip capacity and reduce off-chip traffic.
The launch follows a broader industry shift toward vertically integrated silicon. Amazon Web Services introduced Trainium and Inferentia years ago, and cloud providers have been racing to lower per-token and per-epoch costs for large models. Google says the chips are being used internally for Gemini models and will be offered to Cloud customers as a direct alternative to Nvidia GPUs.
Why this matters: cost, scale, and the ecosystem are on the line
Nvidia currently supplies the de facto standard GPUs for large-scale training and inference, and many data-center clusters run hundreds to thousands of A100 or H100 accelerators. That installed base creates a high switching friction for customers, including enterprise customers and hyperscalers who optimize software stacks for CUDA.
Google’s move matters because specialization reduces cost per operation. If TPU 8t and 8i deliver even a 20% improvement in throughput per dollar compared with comparable GPU instances, Cloud could win market share on price and latency. Pichai’s statement that over 50% of 2026 ML compute investment will go to Cloud signals Google intends to capture that economics rather than buy all capacity from third parties.
History offers precedent. Amazon’s Inferentia reduced inference costs for AWS customers and won sizable adoption in recommendation and NLP workloads. When a cloud provider controls silicon and stack, it can reprice compute and expand margins. But the counterweight is software lock-in; Nvidia’s CUDA ecosystem and optimized libraries remain deeply entrenched, and many enterprise ML pipelines assume GPU-first compatibility.
Bull case
If TPU 8t and TPU 8i match Google’s internal claims, they can cut training and inference costs materially and attract large customers away from GPU-centric instances. A conservative 15% to 30% cost advantage could flip economics for enterprise training runs that consume millions of compute hours, accelerating migration to Google Cloud and improving gross margin on cloud services.
Google is already “customer zero” for Gemini and reports processing growth from 10 billion to 16 billion tokens per minute quarter-over-quarter. That volume will stress-test the chips at scale and provide compelling benchmarks for enterprise buyers if published transparently.
Bear case
Nvidia’s customer and developer ecosystem is the real moat. Customers optimized on CUDA see switching costs in code, tooling, and validation that can take 6 to 24 months and millions in migration cost. If TPU performance or software portability lags, adoption will be limited to Greenfield workloads or internal Google projects.
Performance claims without transparent, third-party benchmarks are risky. If TPU instances lag H100-class GPUs on critical dense-matrix operations or if memory bandwidth remains a bottleneck, customers will stick with Nvidia despite a higher price per instance.
What this means for investors: concrete signals and tickers to watch
Short term, watch Google’s product rollout and benchmarks. Key signals: published throughput-cost comparisons to H100/A100, announced customer migrations, and Google Cloud margin improvement. If benchmarked claims arrive in the next 6 to 12 months and Cloud gross margin rises, GOOGL should re-rate higher as compute economics improve.
Specific tickers to monitor: GOOGL for cloud and Gemini monetization; NVDA for the durability of GPU pricing power; AMZN for AWS’s Trainium and Inferentia competitive response; MSFT for Azure’s hardware partnerships and enterprise pull; META for internal infrastructure bets and model deployment patterns. For each, track three metrics: price-performance benchmarks, customer win announcements, and quarterly cloud gross margin trends.
Actionable takeaways
- Buy GOOGL on credible benchmarks and a visible path to improved Cloud margins; watch for a 50–100 basis-point margin lift over 12–18 months as an inflection signal.
- Hold NVDA unless revenue growth slows or ASPs decline; NVDA’s ecosystem still commands pricing power, but margin pressure could arise if hyperscalers internalize more silicon.
- Monitor AMZN and MSFT for competitive silicon and partnership announcements; these names will define how fast hyperscalers diversify away from Nvidia.
“Our first-party models now process more than 16 billion tokens per minute, up from 10 billion last quarter,” Sundar Pichai said, according to Google.
Google’s TPU 8t and 8i are a consequential strategic bet, not an incremental product release. If the chips deliver real, verifiable cost and latency advantages, they will reshape cloud compute economics. Investors should prepare to act on three concrete milestones in the next 12 months: independent benchmarks, customer migrations, and measurable cloud margin improvement.
---