Perspectiva Studio
AI Content Creation Engine
85K+ lines zero-framework content creation studio. Blog Studio with 4-phase AI pipeline, Audiobook narration with 4 TTS providers, Super Chat with 14 function-calling tools, Gemini Live with 8 voices, Video Studio with 4 AI motion providers, Publications for 6 social platforms, and client-side vector search — all vanilla JS.
Year
2024 — Present
Role
Full-Stack Developer
Tech Stack
9 technologies
The Challenge
Content creation across audio, text, image, video and PDF lives in separate tools with no shared context. Each format demands its own provider integrations, storage, and publishing pipeline.
- Format silos — blog, audiobook, video and PDF tools can't share session data or AI analysis
- Provider lock-in — switching between 10+ AI providers (LLM, TTS, STT, image, video) requires rewriting integrations
- No offline workflow — cloud-only tools break when connectivity drops mid-session
- Context lost between sessions — transcription, analysis and coaching insights aren't searchable after the fact
The Approach
Build a zero-framework monolith (85K+ lines of vanilla JS, IIFE module pattern) that manages the full content lifecycle — ideation → writing → illustration → narration → publishing — with AI at every step and a centralized Model Registry routing to any provider.
- Single HTML entry point — 14,615-line index.html dynamically loads 20+ JS modules with no build tools
- Model Registry — 7 AI categories (LLM, Image, TTS, STT, Embedding, Video, Other), 10+ providers, transparent provider switching via a single callAI() abstraction
- Hybrid persistence — IndexedDB for instant local access + R2 cloud sync; app works fully offline
- SessionTools pattern — identical function-calling interface shared between text chat (10 tools) and Super Chat (14 tools), so both text and voice AI execute the same operations
The Solution
Perspectiva Studio ships 12 integrated modules plus 9 backend endpoints (~2,920 lines of Cloudflare Workers):
- Blog Studio — 4-phase AI pipeline (analyze → generate → assemble → auto-images), 8 section types, 4 writing tones, full SEO suite with Schema.org JSON-LD and RSS
- Audiobook Studio — 4 TTS providers (Browser, gTTS, MeloTTS, ElevenLabs with cloned voices), page-by-page narration, visual temperature system (literal → metaphorical)
- Super Chat — 5 AI providers, 14 function-calling tools (session introspection + web search), 7 purpose modes, creator detection with 17 regex patterns
- Gemini Live — WebSocket bidirectional voice with 8 voices, 10 session tools via function calling, dynamic system instructions from live session context
- Video Studio — 4 motion tiers (static → Ken Burns → parallax 3D → AI video), 4 AI providers (Luma, Runway, Kling, Haiper), 4 aspect ratios
- Publications Studio — 6 social platforms (Instagram, IG Story, X, Facebook, LinkedIn, TikTok) with platform-specific tone, character limits and hashtag optimization
- Embeddings Search — client-side vector store (IndexedDB), Gemini text-embedding-004, cosine similarity, 1K-char chunks with 100-char overlap
- AI Coach — 5 trigger patterns with 5-second debounce, real-time floating suggestions during live sessions
- Creative Library — dual persistence (IndexedDB + R2), 7 asset types (blogs, audiobooks, PDFs, images, publications, videos, audios)
- Model Tester — batch health checks per AI category, per-model parameter tuning via config modal
- Cost Viewer — per-perspectiva, per-category, per-provider and per-model cost tracking with daily trend charts
- CYOA Engine — graph-based interactive fiction with inventory, flags, stats and 4 ending types
Key Results
- 85K+ lines of zero-framework vanilla JS (IIFE modules, no build tools)
- Blog Studio: 4-phase AI pipeline, 8 section types, auto-SEO with Schema.org + RSS
- Audiobook: 4 TTS providers including ElevenLabs cloned voices
- Super Chat: 14 function-calling tools across 5 AI providers
- Gemini Live: 8 voices + 10 session tools via WebSocket
- Video Studio: 4 AI motion providers (Luma, Runway, Kling, Haiper)
- Publications: 6 social platforms with platform-specific optimization
- Client-side vector search: embeddings + cosine similarity in IndexedDB
- 9 backend endpoints on Cloudflare Workers (~2,920 lines)