Perspectiva Studio

AI Content Creation Engine

85K+ lines zero-framework content creation studio. Blog Studio with 4-phase AI pipeline, Audiobook narration with 4 TTS providers, Super Chat with 14 function-calling tools, Gemini Live with 8 voices, Video Studio with 4 AI motion providers, Publications for 6 social platforms, and client-side vector search — all vanilla JS.

Year

2024 — Present

Role

Full-Stack Developer

Tech Stack

9 technologies

The Challenge

Content creation across audio, text, image, video and PDF lives in separate tools with no shared context. Each format demands its own provider integrations, storage, and publishing pipeline.

Format silos — blog, audiobook, video and PDF tools can't share session data or AI analysis
Provider lock-in — switching between 10+ AI providers (LLM, TTS, STT, image, video) requires rewriting integrations
No offline workflow — cloud-only tools break when connectivity drops mid-session
Context lost between sessions — transcription, analysis and coaching insights aren't searchable after the fact

The Approach

Build a zero-framework monolith (85K+ lines of vanilla JS, IIFE module pattern) that manages the full content lifecycle — ideation → writing → illustration → narration → publishing — with AI at every step and a centralized Model Registry routing to any provider.

Single HTML entry point — 14,615-line index.html dynamically loads 20+ JS modules with no build tools
Model Registry — 7 AI categories (LLM, Image, TTS, STT, Embedding, Video, Other), 10+ providers, transparent provider switching via a single callAI() abstraction
Hybrid persistence — IndexedDB for instant local access + R2 cloud sync; app works fully offline
SessionTools pattern — identical function-calling interface shared between text chat (10 tools) and Super Chat (14 tools), so both text and voice AI execute the same operations

The Solution

Perspectiva Studio ships 12 integrated modules plus 9 backend endpoints (~2,920 lines of Cloudflare Workers):

Blog Studio — 4-phase AI pipeline (analyze → generate → assemble → auto-images), 8 section types, 4 writing tones, full SEO suite with Schema.org JSON-LD and RSS
Audiobook Studio — 4 TTS providers (Browser, gTTS, MeloTTS, ElevenLabs with cloned voices), page-by-page narration, visual temperature system (literal → metaphorical)
Super Chat — 5 AI providers, 14 function-calling tools (session introspection + web search), 7 purpose modes, creator detection with 17 regex patterns
Gemini Live — WebSocket bidirectional voice with 8 voices, 10 session tools via function calling, dynamic system instructions from live session context
Video Studio — 4 motion tiers (static → Ken Burns → parallax 3D → AI video), 4 AI providers (Luma, Runway, Kling, Haiper), 4 aspect ratios
Publications Studio — 6 social platforms (Instagram, IG Story, X, Facebook, LinkedIn, TikTok) with platform-specific tone, character limits and hashtag optimization
Embeddings Search — client-side vector store (IndexedDB), Gemini text-embedding-004, cosine similarity, 1K-char chunks with 100-char overlap
AI Coach — 5 trigger patterns with 5-second debounce, real-time floating suggestions during live sessions
Creative Library — dual persistence (IndexedDB + R2), 7 asset types (blogs, audiobooks, PDFs, images, publications, videos, audios)
Model Tester — batch health checks per AI category, per-model parameter tuning via config modal
Cost Viewer — per-perspectiva, per-category, per-provider and per-model cost tracking with daily trend charts
CYOA Engine — graph-based interactive fiction with inventory, flags, stats and 4 ending types

Key Results

85K+ lines of zero-framework vanilla JS (IIFE modules, no build tools)
Blog Studio: 4-phase AI pipeline, 8 section types, auto-SEO with Schema.org + RSS
Audiobook: 4 TTS providers including ElevenLabs cloned voices
Super Chat: 14 function-calling tools across 5 AI providers
Gemini Live: 8 voices + 10 session tools via WebSocket
Video Studio: 4 AI motion providers (Luma, Runway, Kling, Haiper)
Publications: 6 social platforms with platform-specific optimization
Client-side vector search: embeddings + cosine similarity in IndexedDB
9 backend endpoints on Cloudflare Workers (~2,920 lines)

Tech Stack

Vanilla JS (85K+ lines) ElevenLabs FLUX Imagen 3 DALL-E 3 R2 Gemini IndexedDB WebSocket

$ cat project.json

{

"name": "Perspectiva Studio",

"status": "production",

"stack": [9],

"results": [9]

}

Previous Project

AI Gateway

Next Project

NutriNen Baby