DocsPodcast: AI Document to Podcast

2025
AI/ML • RAG • TTS
Full-Stack AI Application

Transform any document into an AI-powered podcast or chat with your files using intelligent Q&A. Supports PDF, DOCX, and TXT with dual LLM architecture, multi-speaker TTS, and a premium dark-themed interface with glassmorphism effects.

Featured RAG TTS LLM Podcast

Project Screenshots

Screenshot 1 Screenshot 1
Screenshot 2 Screenshot 2
Screenshot 3 Screenshot 3
Screenshot 4 Screenshot 4
Screenshot 5 Screenshot 5

Click to spread cards • Click image to enlarge

About This Project

DocsPodcast is a full-stack AI application that takes your documents (PDF, DOCX, TXT) and turns them into two powerful experiences: an engaging multi-speaker podcast with natural-sounding voices, and an intelligent Q&A chat system that answers questions grounded in your document content.

The system uses a Dual LLM Architecture with OpenRouter (GPT-4o-mini) as primary and Google Gemini 1.5 Flash as fallback, ensuring high availability. Documents are chunked and stored in a ChromaDB vector database for semantic retrieval, enabling context-aware responses exclusively grounded in the uploaded content.

For podcast generation, the system creates multi-speaker scripts with 2-4 speakers, then uses a tiered TTS system: Gemini 2.5 Flash TTS with multiple voice characters (Kore, Puck, Charon, Fenrir) as primary, with Google gTTS as fallback. Individual speaker segments are generated separately and combined into one seamless audio file.

Key Features

Dual LLM Architecture — OpenRouter + Gemini fallback for high availability
RAG Pipeline with ChromaDB for semantic document retrieval
Multi-speaker podcast generation with 2-4 natural voices
Tiered TTS: Gemini 2.5 Flash TTS → gTTS fallback
English and Arabic podcast generation support
Premium dark theme with glassmorphism and micro-animations
Drag & drop upload with 0.5MB size limit validation
Rich markdown rendering in Q&A chat with typing animation