Executive summary
Akamai Inference Cloud is a full-stack cloud platform that enables organizations to build, protect, and optimize AI-powered applications at the edge.
The platform is designed to support agentic systems that adapt to users, communicate with other agents, and act in real time.
Key features include NVIDIA Blackwell GPUs, managed Kubernetes, vector databases, and AI-aware security.
The platform empowers three specific users: machine learning operations engineers, AI engineers, and agentic system architects.
Akamai Inference Cloud provides a reliable, secure, and scalable foundation for deploying advanced AI systems anywhere.
In 2017, a research paper quietly changed the course of technology. Attention Is All You Need introduced the Transformer architecture, a new kind of model for processing language and data that would soon underpin nearly every major advancement in artificial intelligence (AI). At the time, the breakthrough was confined mostly to academic and developer circles.
Five years later, in November 2022, OpenAI launched ChatGPT. For the first time, the broader public could interact firsthand with a system built on what this architecture made possible. It was a glimpse into a new kind of interface — not just with machines, but with knowledge itself.
Now, just three years after that launch, OpenAI reports more than 700 million weekly active users.
The World Economic Forum has noted: AI is introducing a dual disruption to the workforce. On one side, automation is generating overcapacity in established roles. On the other, demand for AI fluency is accelerating faster than education and hiring systems can adapt. The old models for work — and for how we prepare people to do it — are evolving quickly.
This is the end of the beginning.
Looking backward to move forward
Twenty-seven years ago, the world stood at a similar inflection point. The internet was expanding rapidly, and questions of scale, reliability, and security were unresolved. In that context, a group of researchers at MIT founded Akamai with a clear mission: to solve the “World Wide Wait.”
They did it by bringing compute, storage, and networking closer to the points of creation and consumption in a model that has since been frequently imitated.
The rise of the agentic web has brought us full circle, reintroducing new scale and proximity challenges unique to AI and the inference required to realize its potential.
Akamai Inference Cloud, which we announced today, builds on the distributed architecture work we pioneered nearly three decades ago to expand AI inference from core data centers to the edge and, once again, to remove the bottlenecks and move past the limits of centralized infrastructure.
The agentic web
This new generation of intelligent systems no longer waits for human commands or input; they observe, reason, and act on behalf of users who express their intent in natural language. These systems take initiative, coordinate with other systems, and deliver outcomes without step-by-step instructions. This is the agentic web.
The agentic web is changing how people and machines interact with digital services. Experiences are becoming conversational, multimodal, and personalized. Interfaces adapt to the user’s intent, not the other way around. A person might ask for a recommendation and receive it as a narrated summary, a visual comparison, or a written breakdown depending on their preferences, context, and device. The system selects the format and tone that fits best.
Agent-driven interactions need new ways to support them
As these agent-driven interactions become pervasive, enterprises need new ways to support them. Inference must move closer to users. Response times need to be predictable and low. Tools and memory must be available in real time. The entire stack must support agents working on behalf of users and systems, not just handling one-off requests.
This shift is already underway, but today’s centralized cloud platforms were not designed to support it. Companies are forced to choose between raw infrastructure or narrow solutions. What is missing is a platform built specifically for agentic AI, one that reduces complexity, accelerates development, and delivers intelligent behavior at global scale.
Akamai Inference Cloud makes the future possible
Akamai Inference Cloud makes this future possible. Its approach to cloud and AI is centered on the needs of agentic systems and applications that adapt to users, communicate with other agents, and act in real time.
Its unique distributed architecture is specifically designed to support these patterns, bringing the high-performance compute, storage, and orchestration needed for complex inference workloads and applying routing, control, and responsiveness closer to the user.
Our customers are facing four critical missions:
Power the AI-enabled application
Manage AI as a new traffic channel
Resource AI agents for enterprise workloads
Enable responsible AI consumption by employees
Power the AI-enabled application
Every enterprise will embed intelligence into their applications. This represents the next stage of application architecture — from responsive design to multicloud and now to AI-integrated, real-time systems. Akamai remains the trusted backbone that enables and secures each evolution.
Manage AI as a new traffic channel
Users are reaching brands through AI platforms just as they once did through search, social, or mobile. Every brand, app, and API will need to define its desired versus undesired AI interactions and manage that traffic intelligently to turn AI traffic from risk to opportunity.
Resource AI agents for enterprise workloads
Our customers are using AI agents to operate parts of their business — from infrastructure management to data analysis. Agents need access to first-class resources associated with f internal and external systems but with appropriate guardrails — trust, identity, observability, and safety — so that enterprises can scale their AI-operated environments with confidence and efficiency.
Enable responsible AI consumption by employees
Employees across every enterprise are consuming AI services — Copilot, Cursor, ChatGPT, Claude, and others. Enterprises must manage the responsible use, cost, and data protection of this consumption.
Akamai Inference Cloud is how inference scales.
What is Akamai Inference Cloud?
Akamai Inference Cloud is a full-stack cloud platform designed to build, protect, and optimize the next generation of intelligent applications that are empowered by AI. It offers compute, storage, networking, orchestration, security, and developer tooling that is aligned to the unique requirements of real-time inference, agentic systems, and intelligence that lives closer to the user (Table).
Build |
Protect |
Optimize |
|
|---|---|---|---|
The problems |
|
|
|
The solution(s) |
Distributed intelligent infrastructure with a developer platform |
AI-aware bot management and API security (app, API, and AI protections, working in concert with AI-aware bot management) |
AI connectivity mesh for humans and agents |
The products |
|
|
|
Akamai Inference Cloud is a full-stack cloud platform designed to build, protect, and optimize the next generation of intelligent applications that are empowered by AI
Who we’re building for
Akamai Inference Cloud is a modular platform that meets customers where they are. Whether you are consuming hosted API endpoints from OpenAI and Gemini in your apps or building an agentic workflow around your own fine-tuned, distilled models, Akamai Inference Cloud allows you to build, protect, and optimize at the edge.
Specifically, we are empowering three specific users:
Machine learning operations engineers (MLOps): Engineers who automate the entire machine learning lifecycle to ensure models are continuously retrained, deployed, and monitored for performance in production
AI engineers: Data scientists or software engineers who build end-to-end agentic applications, often using pre-trained models, and bridge the gap between data science research and production software
Agentic system architects: An architect who has evolved from the traditional system — one who designs, builds, and manages complex, autonomous agentic systems that can independently reason, plan, act, and adapt to achieve high-level business goals
With Akamai Inference Cloud, we are not locking users into a specific paradigm or solution but providing customers with flexibility to rent infrastructure, develop on a serverless platform, and seamlessly combine complex systems based on their preferences.
Putting NVIDIA’s AI stack closer to where decisions are made
On October 28, 2025, we announced the Akamai Inference Cloud and our goal to bring intelligent, agentic AI inference to the edge where personalized experiences, real-time decisions, and smart agents require it.
Customers now have access to the latest generation of NVIDIA Blackwell GPUs, coupled with NVIDIA BlueField networking; tiered memory across GDDR7, DRAM, and NVMe; high-performance, scalable block and object storage; managed vector databases; and virtual private cloud networking.
An MLOps engineer can rent a single GPU by the hour or build a high-performance inference cluster with up to 8 NVIDIA RTX PRO™ 6000 Blackwell Server Edition GPUs, NVIDIA BlueField-3Ⓡ DPUs, 128 vCPUs, 1,472 GB of DRAM and 8,192 GB of NVMe (Figure 1).
Akamai Inference Cloud is optimized for time to first token (TTFT) and tokens per second (TPS). When combined with the Akamai distributed edge infrastructure, Akamai Inference Cloud can reduce latency for real-time and interactive intelligent applications.
NVIDIA Blackwell GPUs deliver incredible performance, as described in our benchmarking analysis.
Deploy and monitor agentic applications with App Platform
To further help platform engineers, we go beyond just providing infrastructure. Platform engineers can easily deploy and monitor agentic applications with our pre-engineered cloud native platform that makes it simple to deploy large language models (LLMs), agents, and knowledge bases at scale.
The platform is highly customizable, yet opinionated — it accelerates deployment, reduces operational overhead, and includes pre-integrated AI-ready components like vector databases, LLM frameworks, and OpenAI-compatible APIs into a single self-service portal. App Platform is optimized to run on LKE, Akamai’s managed Kubernetes engine, and is portable to any conformant Kubernetes cluster.
App Platform for LKE integrates a suite of more than 30 trusted Cloud Native Computing Foundation (CNCF) open source tools including KServe, a Kubernetes-native framework for serving and scaling machine learning models in production, and Kubeflow Pipelines, a platform for building, deploying, and managing ML workflows on Kubernetes.
App Platform provides both the Kubernetes framework and the AI components needed for engineers to build their own AI platform. This helps avoid DIY approaches that demand heavy integration when building and maintaining your own Kubernetes-based platform or bespoke stack (Figure 2).
Akamai Inference Cloud — Designed for NVIDIA AI Enterprise Integrations
NVIDIA AI Enterprise is the software platform built to streamline your journey from AI development to production. This cloud native suite accelerates and simplifies how you build, deploy, and scale AI applications. Using powerful tools like NVIDIA inference microservices (NIM) and neural modules (NeMo) microservices, it helps you cut infrastructure costs and significantly speed up your time to market (Figure 3).
Akamai Inference Cloud is evolving with native functionality to accommodate the entire suite of NVIDIA AI Enterprise software. The platform provides a reliable, secure, and scalable foundation for organizations of all sizes to deploy advanced AI systems anywhere — in the cloud, in the data center, or at the edge — all backed by an extensive partner ecosystem.
Find out more
Akamai Inference Cloud is evolving rapidly with many new product launches planned through 2026. Follow the Akamai blog or go to our website for more information on Akamai Inference Cloud.
Tags