Distributed Edge Inference Changes Everything

Nov 21, 2025

Written by

Ari Weil is Vice President of Cloud Computing and Delivery Product Marketing at Akamai.

We're witnessing the third great scaling wave in AI, and it demands a fundamental rethinking of critical infrastructure. While the industry has been mesmerized by the race to build ever-larger frontier models, and businesses are making early efforts to identify killer apps and ROI, the real battleground for enterprise AI value creation has shifted to the edge.

The evolution of AI scaling waves

Why discuss these as scaling waves? Because waves are forces that build, crest, and transform the landscape — with the momentum, inevitability, and cascading effects that trigger the next wave. The three waves, which are unstoppable forces that are reshaping entire industries, include:

First wave: Pre-training at massive scale
Second wave: Post-training and fine-tuning
Third wave: Multistep inference and reasoning

Let me explain why these matter for every business leader who is making AI infrastructure decisions today.

The first wave

The first wave — pre-training at massive scale — gave us GPT-4, Claude, and the frontier models that introduced enterprises to AI's transformative potential. These models became our productivity copilots, our brainstorming partners, our first glimpse into what artificial general intelligence might enable.

They also, however, came with a few catches: astronomical compute costs, significant latency for global users, and the inherent limitations of centralized processing.

As an industry, we ignore the latency because streaming tokens are a fresh way to engage — Oh, look, the AI is thinking.

The second wave

The second wave — post-training and fine-tuning — brought AI closer to business reality. Enterprises learned to adapt foundation models to their specific domains by using proprietary data and institutional knowledge.

This wave delivered the first real ROI stories: customer service automation that actually understood context, code generation that followed company standards, and conversational interfaces that felt native to existing applications.

Yet even these specialized models remain tethered to centralized clouds. As use cases develop, this centralization will create bottlenecks for real-time, latency-sensitive use cases, such as personalized recommendations, autonomous vehicles and manufacturing, and physical AI and robotics that will require responses in milliseconds not seconds or minutes.

A lack of distribution hasn’t become a bottleneck because the use cases that will require it aren’t being used at scale … yet.

The third wave

Now we're entering the third wave, which requires models to reason through complex workflows, maintain context across extended interactions, and deliver responses in real time at global scale.

This isn't just about making models bigger — it's about making them work harder, think longer, and operate everywhere your users are. To make AI useful at scale, it needs to be fast, secure, accurate, and engaging.

Distributed inference will enable us to bring these various factors together at the edge, and it will change everything about how we engage, monetize, and grow our businesses.

The infrastructure challenge no one talks about … yet

Here's what most infrastructure providers won't tell you: The economics of centralized AI inference break down catastrophically for real time use cases at scale. When every inference request requires a round trip to a distant data center, you're not just dealing with latency, you're facing compounding costs from bandwidth, queuing delays, and the sheer physics of distance.

Consider what happens when an AI agent needs to perform multistep reasoning. Each step might require multiple model calls, vector database lookups, and API integrations. In a centralized architecture, you're looking at hundreds of milliseconds per step, which quickly add up to unacceptable response times for real-world applications.

Now, multiply that by thousands of concurrent users across different geographies, and you understand why the hyperscalers' centralized approach hits a wall.

Akamai Inference Cloud: Built for real-time distributed AI

This is precisely why we built Akamai Inference Cloud. We're not trying to compete with hyperscalers to determine who can build the biggest cluster in any small region. We're solving the actual problem enterprises face: How to deliver AI inference at planetary scale with local performance.

Our approach leverages three critical advantages that only Akamai can deliver:

Infrastructure at the edge
Platform-native AI operations
Security that understands AI

Infrastructure at the edge

We're deploying NVIDIA Blackwell RTX PRO 6000 Server Edition GPUs across our global network — not in a handful of mega data centers, but distributed where your users actually are. This isn't experimental; it's production-ready infrastructure built on the same network that already delivers approximately 30% of web traffic.

Platform-native AI operations

Through NVIDIA Inference Microservices (NIM) running on our Linode Kubernetes Engine (LKE), we're making it simple to deploy, scale, and manage AI workloads. Your teams get access to integrated vector databases for retrieval-augmented generation (RAG) implementations, object storage for model artifacts, and global load balancing that automatically routes inference requests to the optimal location.

This isn't bolted-on AI — it's AI-native platform design.

Security that understands AI

Here's where it gets especially interesting. Traditional security wasn't built for AI workloads. We've developed purpose-built protections across every layer, including:

Application security that discovers AI-augmented apps and APIs, continuously assesses their security posture, and applies intelligent controls
AI firewall capabilities specifically designed to protect prompts from injection attacks and models from extraction attempts
Workload security that uses AI itself to analyze east-west traffic patterns, automatically generate optimal segmentation policies, and continuously adapt to emerging threats
Access security through our Secure Enterprise Browser that controls employee interactions with LLM interfaces and prevents data exfiltration, combined with Zero Trust Network Access and multi-factor authentication for infrastructure access
Infrastructure security that provides real-time DNS and network posture assessment with AI-driven policy recommendations
Generative SIEM interfaces that transform security operations by making vast amounts of telemetry data conversationally accessible to teams in the security operations center

When you move inference from centralized clouds to the edge, latency drops (making real-time AI interactions actually feel real time), bandwidth costs decrease (by processing data where it's generated), and compliance becomes manageable (because you can guarantee data processing happens within specific geographic boundaries). Furthermore, reliability improves dramatically through true redundancy — not just via multiple availability zones in one region, but through genuinely distributed processing.

But the real transformation happens at the application level. Suddenly, use cases that were impossible become trivial. Multi-agent workflows that previously required complex orchestration become simple. Real-time personalization at scale becomes economically viable.

What this means for your AI strategy

If you're still thinking about AI infrastructure in terms of training clusters and model hosting, then you're solving yesterday's problem. The enterprises that win in this third wave will be those that recognize inference — not training — as the critical bottleneck, and distributed edge infrastructure as the solution to that problem.

The question isn't whether you need distributed inference infrastructure. The question is whether you'll build it yourself (good luck with that), wait for the hyperscalers to eventually offer it (while your competitors move ahead), or use the platform that's already operating at planetary scale.

At Akamai, we've spent 25 years solving the hard problems of distributed computing, security, and global scale. Akamai Inference Cloud is the natural evolution of the infrastructure that already powers the internet.

The third wave of AI is here. The infrastructure to support it is here. The only question is: Are you ready to ride the wave?