ChatPDF.ai

A microservices-based RAG chatbot for PDF ingestion, vector storage in Pinecone, and low-latency conversational retrieval with streaming responses.

OpenAIFastAPIReactPineconeTailwindDockerTypeScript

The Problem

Searching through long PDF documents — contracts, research papers, technical manuals — is tedious. Ctrl+F only finds exact matches, not semantic meaning. I wanted a tool where you upload a PDF and ask natural-language questions, getting answers grounded in the actual document content.

The Solution

I built ChatPDF.ai as a microservices RAG (Retrieval-Augmented Generation) pipeline.

PDF ingestion service — Extracts text, splits into overlapping chunks, generates embeddings via OpenAI, and stores vectors in Pinecone.
Retrieval service — On each query, embeds the question, performs similarity search in Pinecone, and retrieves the top-k relevant chunks.
Generation service — Feeds retrieved chunks as context to OpenAI with the user's question, producing grounded answers.
Streaming React frontend — Responses stream token-by-token for a real-time chat experience.

What Went Wrong

The initial chunking strategy used fixed-size splits (500 tokens), which frequently cut sentences mid-thought. Retrieved chunks often lacked the context needed to answer questions accurately, producing hallucinated completions.

The fix: I switched to semantic chunking that respects paragraph and section boundaries, with a 100-token overlap between chunks to preserve cross-boundary context. This significantly improved retrieval relevance and reduced hallucination in answers.

Results

Low-latency retrieval via Pinecone vector similarity search
Streaming responses for real-time chat UX
Microservices architecture enables independent scaling of each pipeline stage

Interested in working together?

Let's Talk