AI Development Stack: Our Battle-Tested Production Setup for 2025 - AI development insights from Orris AI
Technical
January 20, 2025
18 min read

AI Development Stack: Our Battle-Tested Production Setup for 2025

The exact tools, frameworks, and services we use to ship AI products in 4 weeks. Complete stack breakdown with costs and alternatives.

AI Development Stack: Our Battle-Tested Production Setup for 2025

After building 50+ AI products, we've refined our tech stack to the absolute essentials. This is the exact setup we use to deliver production-ready AI applications in 4 weeks for $10K.

The Complete Stack Overview

Quick Reference

Frontend: - Framework: Next.js 14 + TypeScript - Styling: Tailwind CSS + shadcn/ui - State: Zustand + React Query - Analytics: PostHog + Vercel Analytics Backend: - Runtime: Node.js + Bun - API: tRPC or REST - Database: PostgreSQL + Prisma - Cache: Redis + Upstash AI/ML: - LLMs: OpenAI + Anthropic + Groq - Vectors: Pinecone + pgvector - Framework: LangChain + LlamaIndex - Monitoring: Helicone + Langfuse Infrastructure: - Hosting: Vercel + Railway - Storage: AWS S3 + Cloudflare R2 - Queue: BullMQ + Upstash - Monitoring: Sentry + Better Stack

Frontend Stack Deep Dive

Core Framework: Next.js 14

Why Next.js over everything else:

// The power of App Router + Server Components // app/dashboard/page.tsx import { Suspense } from 'react'; import { getAIMetrics } from '@/lib/ai'; export default async function Dashboard() { // Server-side AI calls - no API needed const metrics = await getAIMetrics(); return ( <Suspense fallback={<DashboardSkeleton />}> <AIMetricsChart data={metrics} /> </Suspense> ); } // Streaming AI responses built-in // app/api/chat/route.ts import { OpenAIStream, StreamingTextResponse } from 'ai'; export async function POST(req: Request) { const { messages } = await req.json(); const response = await openai.chat.completions.create({ model: 'gpt-4-turbo-preview', messages, stream: true, }); const stream = OpenAIStream(response); return new StreamingTextResponse(stream); }

Cost: Free framework, ~$20/month hosting on Vercel

UI Components: shadcn/ui + Tailwind

Why this combo wins:

// Beautiful, accessible components in minutes import { Button } from "@/components/ui/button"; import { Card, CardContent, CardDescription, CardHeader, CardTitle } from "@/components/ui/card"; import { Badge } from "@/components/ui/badge"; export function AIAgentCard({ agent }) { return ( <Card className="hover:shadow-lg transition-shadow"> <CardHeader> <div className="flex justify-between items-start"> <div> <CardTitle>{agent.name}</CardTitle> <CardDescription>{agent.description}</CardDescription> </div> <Badge variant={agent.status === 'active' ? 'default' : 'secondary'}> {agent.status} </Badge> </div> </CardHeader> <CardContent> <div className="grid grid-cols-2 gap-4 text-sm"> <div> <p className="text-muted-foreground">Model</p> <p className="font-medium">{agent.model}</p> </div> <div> <p className="text-muted-foreground">Requests/day</p> <p className="font-medium">{agent.requestCount.toLocaleString()}</p> </div> </div> <Button className="w-full mt-4" size="sm"> Configure Agent → </Button> </CardContent> </Card> ); }

Cost: Free (open source)

State Management: Zustand + React Query

// Global state with Zustand (simpler than Redux) import { create } from 'zustand'; import { persist } from 'zustand/middleware'; interface AIStore { messages: Message[]; addMessage: (message: Message) => void; clearChat: () => void; model: string; setModel: (model: string) => void; } export const useAIStore = create<AIStore>()( persist( (set) => ({ messages: [], addMessage: (message) => set((state) => ({ messages: [...state.messages, message] })), clearChat: () => set({ messages: [] }), model: 'gpt-3.5-turbo', setModel: (model) => set({ model }), }), { name: 'ai-storage', } ) ); // Server state with React Query import { useQuery, useMutation } from '@tanstack/react-query'; export function useAICompletion() { return useMutation({ mutationFn: async (prompt: string) => { const response = await fetch('/api/complete', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt }), }); return response.json(); }, onSuccess: (data) => { // Update UI optimistically queryClient.invalidateQueries({ queryKey: ['ai-history'] }); }, }); }

Backend Stack Deep Dive

API Layer: tRPC for Type Safety

// Fully typed API without code generation // server/routers/ai.ts import { z } from 'zod'; import { router, publicProcedure } from '../trpc'; import { openai } from '@/lib/openai'; export const aiRouter = router({ complete: publicProcedure .input(z.object({ prompt: z.string().min(1).max(10000), model: z.enum(['gpt-3.5-turbo', 'gpt-4']), temperature: z.number().min(0).max(2).default(0.7), })) .mutation(async ({ input }) => { const completion = await openai.chat.completions.create({ model: input.model, messages: [{ role: 'user', content: input.prompt }], temperature: input.temperature, }); return { text: completion.choices[0].message.content, usage: completion.usage, }; }), generateImage: publicProcedure .input(z.object({ prompt: z.string(), size: z.enum(['256x256', '512x512', '1024x1024']), })) .mutation(async ({ input }) => { const image = await openai.images.generate({ prompt: input.prompt, size: input.size, n: 1, }); return image.data[0].url; }), }); // Client-side usage with full type safety const { mutate: complete } = trpc.ai.complete.useMutation(); // TypeScript knows exactly what's required complete({ prompt: "Explain quantum computing", model: "gpt-4", // Autocomplete! temperature: 0.5 });

Database: PostgreSQL + Prisma

// schema.prisma - Your entire database in one file generator client { provider = "prisma-client-js" } datasource db { provider = "postgresql" url = env("DATABASE_URL") } model User { id String @id @default(cuid()) email String @unique name String? credits Int @default(100) subscription Subscription? conversations Conversation[] createdAt DateTime @default(now()) updatedAt DateTime @updatedAt } model Conversation { id String @id @default(cuid()) userId String user User @relation(fields: [userId], references: [id]) messages Message[] model String @default("gpt-3.5-turbo") title String? createdAt DateTime @default(now()) updatedAt DateTime @updatedAt @@index([userId, createdAt]) } model Message { id String @id @default(cuid()) conversationId String conversation Conversation @relation(fields: [conversationId], references: [id]) role String // 'user' | 'assistant' | 'system' content String @db.Text tokens Int? createdAt DateTime @default(now()) @@index([conversationId, createdAt]) }
// Type-safe database queries import { prisma } from '@/lib/prisma'; export async function getUserConversations(userId: string) { return prisma.conversation.findMany({ where: { userId }, include: { messages: { take: 1, orderBy: { createdAt: 'desc' } }, _count: { select: { messages: true } } }, orderBy: { updatedAt: 'desc' }, take: 20 }); }

Cost: ~$20/month for managed PostgreSQL (Railway/Supabase)

Caching: Redis + Upstash

// Edge-compatible Redis client import { Redis } from '@upstash/redis'; const redis = new Redis({ url: process.env.UPSTASH_REDIS_REST_URL!, token: process.env.UPSTASH_REDIS_REST_TOKEN!, }); // Intelligent caching for AI responses export async function getCachedCompletion( prompt: string, model: string ): Promise<string | null> { // Create hash of prompt for cache key const key = `completion:${model}:${hashPrompt(prompt)}`; // Check cache const cached = await redis.get<string>(key); if (cached) return cached; // Generate new completion const completion = await generateCompletion(prompt, model); // Cache with TTL based on model const ttl = model === 'gpt-4' ? 3600 : 1800; // 1hr for GPT-4, 30min for others await redis.set(key, completion, { ex: ttl }); return completion; } // Rate limiting export async function checkRateLimit( userId: string, limit: number = 100 ): Promise<boolean> { const key = `rate:${userId}:${Date.now() / 60000 | 0}`; // Per minute const count = await redis.incr(key); if (count === 1) { await redis.expire(key, 60); } return count <= limit; }

Cost: Free tier covers most MVPs, ~$10/month for production

AI/ML Stack Deep Dive

LLM Orchestration: LangChain

// Advanced AI chains and agents import { ChatOpenAI } from 'langchain/chat_models/openai'; import { BufferMemory } from 'langchain/memory'; import { ConversationChain } from 'langchain/chains'; import { ChatPromptTemplate, MessagesPlaceholder } from 'langchain/prompts'; // Multi-model setup with fallbacks const models = { fast: new ChatOpenAI({ modelName: 'gpt-3.5-turbo', temperature: 0.7, maxRetries: 2, }), smart: new ChatOpenAI({ modelName: 'gpt-4-turbo-preview', temperature: 0.5, maxRetries: 1, }), creative: new ChatOpenAI({ modelName: 'gpt-4', temperature: 1.2, maxRetries: 1, }), }; // Conversation with memory export function createAIAssistant(systemPrompt: string) { const memory = new BufferMemory({ returnMessages: true, memoryKey: 'history', }); const prompt = ChatPromptTemplate.fromMessages([ ['system', systemPrompt], new MessagesPlaceholder('history'), ['human', '{input}'], ]); return new ConversationChain({ llm: models.smart.withFallbacks([models.fast]), prompt, memory, }); } // RAG pipeline import { VectorStoreRetriever } from 'langchain/vectorstores/base'; import { RetrievalQAChain } from 'langchain/chains'; export async function createRAGChain(vectorStore: VectorStoreRetriever) { return RetrievalQAChain.fromLLM( models.smart, vectorStore, { returnSourceDocuments: true, k: 5, // Return top 5 relevant documents } ); }

Vector Database: Pinecone + pgvector

// Pinecone for production scale import { PineconeClient } from '@pinecone-database/pinecone'; import { OpenAIEmbeddings } from 'langchain/embeddings/openai'; import { PineconeStore } from 'langchain/vectorstores/pinecone'; const pinecone = new PineconeClient(); await pinecone.init({ apiKey: process.env.PINECONE_API_KEY!, environment: process.env.PINECONE_ENVIRONMENT!, }); const index = pinecone.Index('knowledge-base'); // Store documents export async function indexDocuments(documents: Document[]) { const embeddings = new OpenAIEmbeddings(); await PineconeStore.fromDocuments( documents, embeddings, { pineconeIndex: index, namespace: 'production', } ); } // pgvector for smaller scale (cheaper) -- PostgreSQL with pgvector extension CREATE EXTENSION IF NOT EXISTS vector; CREATE TABLE embeddings ( id SERIAL PRIMARY KEY, content TEXT, embedding vector(1536), metadata JSONB, created_at TIMESTAMP DEFAULT NOW() ); -- Create index for similarity search CREATE INDEX ON embeddings USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- Similarity search query SELECT content, metadata, 1 - (embedding <=> $1) as similarity FROM embeddings WHERE 1 - (embedding <=> $1) > 0.8 ORDER BY embedding <=> $1 LIMIT 10;

Cost: Pinecone ~$70/month, pgvector ~$0 (uses existing PostgreSQL)

AI Monitoring: Helicone + Langfuse

// Helicone for OpenAI monitoring (1 line setup!) import { Configuration, OpenAIApi } from 'openai'; const configuration = new Configuration({ apiKey: process.env.OPENAI_API_KEY, basePath: 'https://oai.hconeai.com/v1', defaultHeaders: { 'Helicone-Auth': `Bearer ${process.env.HELICONE_API_KEY}`, 'Helicone-Cache-Enabled': 'true', }, }); const openai = new OpenAIApi(configuration); // Langfuse for detailed tracing import { Langfuse } from 'langfuse'; const langfuse = new Langfuse({ publicKey: process.env.LANGFUSE_PUBLIC_KEY!, secretKey: process.env.LANGFUSE_SECRET_KEY!, }); // Trace complex AI workflows export async function processWithTracing(input: string) { const trace = langfuse.trace({ name: 'ai-processing', userId: 'user-123', metadata: { version: '1.0.0' }, }); // Track each step const span = trace.span({ name: 'openai-completion', input: { prompt: input }, }); const result = await openai.createCompletion({ model: 'gpt-3.5-turbo', prompt: input, }); span.end({ output: result.data, usage: result.data.usage, }); await langfuse.flush(); return result; }

Cost: Helicone free tier, Langfuse ~$20/month

Infrastructure Stack

Hosting: Vercel + Railway

// vercel.json for optimal configuration { "functions": { "app/api/ai/route.ts": { "maxDuration": 30, // 30 seconds for AI requests "memory": 1024 } }, "crons": [ { "path": "/api/cron/cleanup", "schedule": "0 0 * * *" // Daily cleanup } ] } // railway.toml for backend services [build] builder = "nixpacks" buildCommand = "pnpm install && pnpm build" [deploy] startCommand = "pnpm start" healthcheckPath = "/api/health" restartPolicyType = "on-failure" restartPolicyMaxRetries = 3 [services.web] port = 3000 protocol = "http" [services.worker] startCommand = "pnpm worker"

Cost: Vercel ~$20/month, Railway ~$20/month

Background Jobs: BullMQ

// Queue processing for heavy AI tasks import { Queue, Worker } from 'bullmq'; import { Redis } from 'ioredis'; const connection = new Redis(process.env.REDIS_URL!); // Create queue export const aiQueue = new Queue('ai-processing', { connection, defaultJobOptions: { removeOnComplete: 100, removeOnFail: 500, attempts: 3, backoff: { type: 'exponential', delay: 2000, }, }, }); // Add jobs export async function queueAITask(data: AITask) { return aiQueue.add('process', data, { priority: data.priority || 0, delay: data.delay || 0, }); } // Process jobs const worker = new Worker( 'ai-processing', async (job) => { const { type, payload } = job.data; switch (type) { case 'generate-embeddings': return generateEmbeddings(payload); case 'process-document': return processDocument(payload); case 'batch-completion': return batchCompletion(payload); default: throw new Error(`Unknown job type: ${type}`); } }, { connection, concurrency: 5, limiter: { max: 10, duration: 1000, // 10 jobs per second }, } ); // Monitor progress worker.on('progress', (job, progress) => { console.log(`Job ${job.id} is ${progress}% complete`); });

Monitoring: Sentry + Better Stack

// Comprehensive error tracking import * as Sentry from '@sentry/nextjs'; Sentry.init({ dsn: process.env.SENTRY_DSN, environment: process.env.NODE_ENV, integrations: [ new Sentry.Integrations.Http({ tracing: true }), new Sentry.Integrations.Prisma({ client: prisma }), ], tracesSampleRate: 0.1, profilesSampleRate: 0.1, beforeSend(event, hint) { // Filter out sensitive data if (event.request?.data) { delete event.request.data.apiKey; delete event.request.data.password; } return event; }, }); // Custom AI error tracking export function trackAIError(error: Error, context: any) { Sentry.captureException(error, { tags: { ai_model: context.model, ai_provider: context.provider, }, extra: { prompt: context.prompt?.substring(0, 100), tokens: context.tokens, cost: context.cost, }, }); } // Better Stack for uptime monitoring import { BetterStack } from '@betterstack/node'; const monitor = new BetterStack({ apiKey: process.env.BETTER_STACK_API_KEY!, }); // Health checks export async function healthCheck() { const checks = await Promise.allSettled([ prisma.$queryRaw`SELECT 1`, redis.ping(), openai.listModels(), ]); const status = checks.every(c => c.status === 'fulfilled') ? 'healthy' : 'degraded'; monitor.heartbeat('api-health', { status }); return status; }

Cost: Sentry ~$26/month, Better Stack ~$10/month

Development Tools

Local Development

# .env.local - Development environment DATABASE_URL="postgresql://localhost:5432/myapp_dev" REDIS_URL="redis://localhost:6379" OPENAI_API_KEY="sk-dev-..." # docker-compose.yml for local services version: '3.8' services: postgres: image: pgvector/pgvector:pg15 environment: POSTGRES_PASSWORD: postgres ports: - "5432:5432" volumes: - postgres_data:/var/lib/postgresql/data redis: image: redis:alpine ports: - "6379:6379" mailhog: image: mailhog/mailhog ports: - "1025:1025" - "8025:8025"

Testing

// AI testing with mocked responses import { describe, it, expect, vi } from 'vitest'; import { render, screen, waitFor } from '@testing-library/react'; import userEvent from '@testing-library/user-event'; vi.mock('@/lib/openai', () => ({ complete: vi.fn().mockResolvedValue({ text: 'Mocked AI response', usage: { total_tokens: 100 }, }), })); describe('AI Chat Component', () => { it('should display AI response', async () => { const user = userEvent.setup(); render(<AIChatComponent />); const input = screen.getByRole('textbox'); const button = screen.getByRole('button', { name: /send/i }); await user.type(input, 'Test prompt'); await user.click(button); await waitFor(() => { expect(screen.getByText('Mocked AI response')).toBeInTheDocument(); }); }); });

Cost Breakdown

Monthly Costs for Typical SaaS

ServiceFree TierStartup ($)Scale ($)
Hosting
VercelYes20100+
RailwayNo2050+
Database
PostgreSQLNo20100+
RedisYes1050+
AI/ML
OpenAI APINo100-5001000+
PineconeYes70200+
Monitoring
SentryYes26100+
Better StackYes1030+
PostHogYes050+
Total-$276$1,680+

Cost Optimization Tips

  1. Start with free tiers: Most services offer generous free tiers
  2. Use caching aggressively: Reduce AI API calls by 60-80%
  3. Batch process when possible: Lower priority = lower cost
  4. Monitor usage closely: Set up alerts before limits
  5. Choose the right model: GPT-3.5 for 90% of tasks

Migration Path

Phase 1: MVP (Week 1-2)

  • Next.js + Vercel
  • PostgreSQL + Prisma
  • OpenAI API
  • Basic monitoring

Phase 2: Growth (Week 3-4)

  • Add Redis caching
  • Implement queues
  • Add vector search
  • Enhanced monitoring

Phase 3: Scale (Week 5-6)

  • Multi-model support
  • Advanced caching
  • Performance optimization
  • Full observability

Common Pitfalls to Avoid

1. Over-engineering

❌ Kubernetes for 100 users ✅ Vercel + Railway

2. Wrong database

❌ MongoDB for relational data ✅ PostgreSQL with JSONB

3. Expensive AI calls

❌ GPT-4 for everything ✅ Model routing by use case

4. No caching

❌ Direct API calls always ✅ Multi-layer caching

5. Poor monitoring

❌ console.log debugging ✅ Proper observability stack

Quick Start Template

# Clone our battle-tested starter git clone https://github.com/orristech/ai-saas-starter cd ai-saas-starter # Install dependencies pnpm install # Set up environment cp .env.example .env.local # Add your API keys # Run database migrations pnpm prisma migrate dev # Start development pnpm dev

The Bottom Line

This stack has powered 15+ successful AI products. It's not the shiniest or newest—it's battle-tested and proven. With this setup, you can:

  • Ship in 4 weeks instead of 6 months
  • Handle 100K+ users without rewriting
  • Keep costs under $300/month initially
  • Scale to millions when needed

Stop overthinking your stack. Use what works, ship fast, iterate based on users.


About the Author: James is the founder of Orris AI. Get the complete starter template at github.com/orris-ai.

Ready to Build Your AI MVP?

Launch your AI-powered product in 4 weeks for a fixed $10K investment.

Schedule Free Consultation →