I Built a RAG Bot to Decode Airline Bureaucracy (So You Don't Have To)

Dec 19, 2025

Written by

Lena Hall is an expert in practical AI adoption, data engineering, cloud and pragmatic architecture, and driving strategic AI integration at scale. She also has extensive experience leading large, high-performing technical teams. She helps developers and businesses get real results from AI by defining data and AI strategies, integrating LLMs into complex systems, connecting AI solutions with custom data and business tools, and optimizing outcomes through proven and innovative architectures. Lena has 15+ years of deeply technical background as a solution architect and a technical leader in large-scale data, analytics, machine learning, and cloud computing. She frequently shares practical knowledge on her LinkedIn channel and at industry conferences as an international keynote speaker.

I once got stranded at an airport because my first flight leg was delayed by air traffic control (ATC). The connecting flight left without me, and the gate agent looked me dead in the eye and said the airline wasn't responsible because the delay was "weather and air traffic control–related."

If I had known the exact clause in their contract of carriage, I might have stood a chance at an argument. But nobody has time to read 40 pages of legalese on a phone screen while standing at a gate. Between that and trying to figure out exactly how many checked bags I could bring, I’d reached my limit.

I got tired of spending time reading terms of service. So, naturally, I built a robot to read them for me.

This blog post is a walk-through of how I built and deployed a retrieval-augmented generation (RAG) chatbot with evidence-based Q&A capabilities (Fig. 1). The goal was to ingest messy airline policy documents, vectorize them, and let a large language model (LLM) answer questions such as"Am I entitled to a hotel if ATC caused the delay?" with actual citations.

I used Terraform to deploy this on managed Kubernetes and PostgreSQL services on Akamai Cloud. If you want to skip the story and just grab the code, check out the GitHub repository. If you prefer to watch the video version of this post, check it out on YouTube.

The stack: Why these tools?

There are a million ways to build a RAG bot today, but I wanted a balance of power and simplicity. For the orchestration, I chose LangChain and LangGraph because I needed to manage conversation state, not just fire off one-off queries. For the vector store, I went with PostgreSQL combined with pgvector.

Unpopular opinion: You probably don't need a dedicated vector database like Pinecone or Weaviate. If you already have Postgres in your stack, the pgvector extension keeps your architecture significantly simpler by keeping your relational data and embeddings in the same place.

The whole thing runs on Linode Kubernetes Engine (LKE) within Akamai Cloud (Figure 1). It gives me managed K8s without the complexity tax of AWS, and I can use their S3-compatible object storage for the raw PDF documents. The API is served via FastAPI because it’s almost 2026 and I refuse to work without type hints.

The phases

The four phases include:

Setting up the infrastructure (IaC)
Building the RAG pipeline
Deploying and creating secrets
Validating the application

Phase 1: Setting up the infrastructure (IaC)

I didn't want to click around a web console all day, so the entire infrastructure is defined in Terraform. The setup requires a Kubernetes cluster, object storage for the policies, and two Postgres databases — one for the vector embeddings and one for the chat history state (Figure 2).

You start by cloning the repo and navigating to the Terraform folder. You only need to copy the example variables file and paste in your Linode API token. A standard run of the Terraform “init, plan, and apply” trilogy spins up the cluster and databases. Terraform then outputs a JSON file containing the database connection strings and S3 keys. You should guard this file with your life, as it essentially holds the keys to the kingdom.

Phase 2: Building the RAG pipeline

The core logic lives in the Python application, and the workflow is straightforward. First, we need to get the knowledge base into the cloud. I used s3cmd to sync a local folder full of PDF airline policies up to the newly created object storage bucket. In a production environment, you might have an automated pipeline for this, but manually syncing via command-line interface (CLI) feels satisfyingly tangible.

Once the files are in the cloud, the application pulls the documents down, splits them into digestible chunks using LangChain, and sends them to the OpenAI embeddings API. The resulting vector representations are stored in the pgvector instance.

When you ask a question such as "What are the restrictions on snowboards?" the system converts that query to a vector, finds the nearest neighbors in Postgres, and feeds those specific text chunks to the LLM as context to generate the final answer.

Phase 3: Deploying and creating secrets

This is where things usually get messy, but App Platform handles a lot of the heavy lifting for the K8s configuration. Security is non-negotiable here, so we aren't hardcoding keys. I used Sealed Secrets in Kubernetes to manage the sensitive data. You create secrets for the OpenAI API Key, the database connection strings derived from the Terraform output, and the S3 credentials.

The build process is standard: Build the Docker image, push it to the registry, and point the App Platform workload to it. The configuration injects the secrets as environment variables, ensuring the app can talk to the database and the OpenAI API without exposing credentials in the repo. Once deployed, the service is exposed on port 80.

Phase 4: Validating the brain

When the pods first come online, the database is empty. The infrastructure is there, but the brain is blank. I exposed a couple of admin endpoints to handle this initialization manually.

First, I send a POST request to the initialization endpoint. This installs the vector extension on the Postgres instance and sets up the tables for conversation history. Next, I hit the indexing endpoint. This kicks off a background job that reads the PDFs from the object storage, vectorizes them, and fills the database. This is the magic moment where the bot actually learns the rules.

Does it actually work?

After the indexing job finished, I navigated to the service endpoint to test it against my past experiences. I asked, “Am I entitled to a hotel if ATC caused the delay?” (Figure 3).

The bot scanned the vectors, retrieved the relevant "contract of carriage" chunks, and delivered the bad news: Airlines generally do not provide accommodation for delays caused by force majeure or air traffic control. It even cited the specific text from the policy.

The next logical step

You just tagged along as I built a specialized legal expert using Python, Kubernetes, and Postgres that can navigate bureaucracy faster than any human agent.

Now, let's talk about the obvious bottleneck in this architecture: the reliance on external APIs like OpenAI. It’s great for getting started fast, but as you scale a RAG system, you eventually hit a wall. Maybe the API costs get too high at volume, or maybe you have strict data privacy requirements that don’t allow you to send payload data out to a third party.

The next logical step for a production system like this is eliminating the OpenAI dependency and self-hosting an open source model like gpt-oss, Llama, or Mistral directly in your cluster. But running inference on those models efficiently requires serious compute horsepower.

Since we are already running this on Akamai’s LKE, the upgrade path is actually pretty straightforward. When you’re ready to own the whole stack, you can provision GPU-backed nodes right into this same Kubernetes cluster. This lets you run your vectorization and inference workloads, right next to your data, giving you better latency control and total data sovereignty.

Let us know what you think!