ChatPDF.ai
A microservices-based RAG chatbot for PDF ingestion, vector storage in Pinecone, and low-latency conversational retrieval with streaming responses.
The Problem
Searching through long PDF documents — contracts, research papers, technical manuals — is tedious. Ctrl+F only finds exact matches, not semantic meaning. I wanted a tool where you upload a PDF and ask natural-language questions, getting answers grounded in the actual document content.
The Solution
I built ChatPDF.ai as a microservices RAG (Retrieval-Augmented Generation) pipeline.
- PDF ingestion service — Extracts text, splits into overlapping chunks, generates embeddings via OpenAI, and stores vectors in Pinecone.
- Retrieval service — On each query, embeds the question, performs similarity search in Pinecone, and retrieves the top-k relevant chunks.
- Generation service — Feeds retrieved chunks as context to OpenAI with the user's question, producing grounded answers.
- Streaming React frontend — Responses stream token-by-token for a real-time chat experience.
What Went Wrong
The initial chunking strategy used fixed-size splits (500 tokens), which frequently cut sentences mid-thought. Retrieved chunks often lacked the context needed to answer questions accurately, producing hallucinated completions.
The fix: I switched to semantic chunking that respects paragraph and section boundaries, with a 100-token overlap between chunks to preserve cross-boundary context. This significantly improved retrieval relevance and reduced hallucination in answers.
Results
- Low-latency retrieval via Pinecone vector similarity search
- Streaming responses for real-time chat UX
- Microservices architecture enables independent scaling of each pipeline stage
Interested in working together?
Let's Talk