A technical, pragmatic guide for founders and CTOs who want to add LLM capabilities to a product already in production — without big-bang rewrites or vendor promises.
When a SaaS decides to adopt LLMs, the first temptation is to redesign everything: new vector database, new infrastructure, new backend. Six months later, the product still hasn't shipped and the original roadmap has accumulated technical debt.
There's a saner alternative: incremental integration. Identify one high-impact feature, add the LLM as an isolated service layer, measure, and only then expand. It's the strangler fig principle applied to AI.
This guide assumes you have a SaaS in production — Node.js, Python, or Rails, it doesn't matter much — and you want to add intelligence without stopping the product.
Before touching code, map use cases on a complexity/value axis. Low complexity and high value come first:
The recommended architecture for getting started: a dedicated module (or lightweight microservice) that encapsulates all prompting logic, output parsing, and fallback. The rest of the application calls it like any other service.
This ensures business code isn't coupled to the model vendor. Tomorrow you switch from OpenAI to Anthropic — only changes in a single module.
In TypeScript, the Vercel AI SDK is the most practical choice: it abstracts providers, offers streaming out of the box, and has native support for tool calling and structured output with Zod.
The biggest mistake we see in LLM integrations in production: trusting free text from the model and trying to parse it manually with regex. Works in demos, fails with real users.
The solution: define the output schema with Zod (TypeScript) or Pydantic (Python) from day one. The main providers support JSON mode and function calling to guarantee conformance.
Practical example: if the LLM will classify a support ticket, the output must be `{ category: 'billing' | 'technical' | 'other', confidence: number, summary: string }` — validated by the schema, never free string.
Retrieval-Augmented Generation (RAG) is powerful but adds complexity: embeddings, vector store, retrieval pipeline, relevance to calibrate. Only implement it when the use case requires context that changes frequently and is too large to fit in the context window.
For many B2B SaaS, the prompt with static context (product, policies, FAQ) covers the first 80% of cases. Start simple.
When RAG is needed, pgvector on existing Postgres is the path with least overhead for small teams. Only migrate to Pinecone, Qdrant, or Weaviate when you have concrete reasons (scale, latency, advanced filters).
LLMs in production without observability are an expensive black box. The minimum viable: log tokens used per request, latency, and a sample of prompts + outputs for manual auditing.
For teams that want to go further: Langfuse and Langsmith are the most widely adopted tools for LLM pipeline tracing. Both have free plans sufficient to get started.
Without this visibility floor, it's impossible to tell if the model is regressing, where costs are growing, or what's failing in responses.
Avoid these patterns that create technical debt:
Not necessarily. Most cases start with the existing DB. You only add pgvector (Postgres) or a separate vector store (Pinecone, Qdrant) when dynamic context is truly needed. Don't optimise prematurely.
For most European B2B SaaS: OpenAI GPT-4o for general quality, Anthropic Claude 3.5 Sonnet for long-text tasks and structured reasoning, open-source models (Llama, Mistral) if you need on-premise data or zero token cost at volume. Start with a hosted API and only migrate to self-hosted when you have real usage data.
RAG (Retrieval-Augmented Generation) with citable sources is the most robust answer. The model only responds based on documents you control. For critical flows, add output verification with Zod or a second model validating the structure.
It depends a lot on the model and tokens per request. GPT-4o Mini costs ~$0.15 per million input tokens — for most B2B SaaS, costs are under €200/month in the first 6 months. Our article on real AI costs for SMEs goes deeper into the calculations.
Próximo passo
Integrating LLMs into an existing product? We help you evaluate the right approach for your stack — without selling hype.
Talk to Simmple →