Perspectiva Studio: 19,000 Lines of Vanilla JS That Create Audiobooks, Blogs, and AI Coach Sessions
Gonzalo Monzón
Founder & Lead Architect
What if one editor could produce narrated audiobooks, illustrated blog posts, professional PDFs, and interactive AI coaching sessions — all from the same interface? Perspectiva Studio is exactly that: a 19,000+ line Vanilla JS content engine that orchestrates 10+ AI models across text, image, voice, and search to create publish-ready content at the push of a button.
No React, no build step, no npm dependencies. Just raw JS, CSS, and an absurd number of AI provider integrations. This article covers the five creation modules, the publishing pipeline, the image generation strategy across 5 providers, and the developer tools (block versioning, embeddings search, cost viewer) that make the whole thing manageable.
Blog Studio: 12 Section Types, Zero Manual Formatting
Blog Studio generates complete articles with AI. Not just paragraphs of text — structured articles with 12 distinct section types: paragraph, list, blockquote, code block, image, table, FAQ, timeline, comparison, callout, statistics, and embedded media.
The generation pipeline:
- Topic input — a title, brief, or even a one-liner idea
- AI structures the article — Claude or Gemini creates a section plan with types, ordering, key points per section
- Content generation — each section is generated with its appropriate format (tables get data, FAQs get Q&A pairs, timelines get chronological entries)
- Image generation — FLUX/DALL-E/Imagen 3 create contextual images for image sections
- SEO automation — slugs, meta descriptions, Open Graph tags, Schema.org structured data, canonical URLs — all generated automatically
- Static generation — optimized HTML with lazy-loaded images and embedded JSON-LD
The output publishes simultaneously to multiple destinations: cadences.app, Codex storefronts (client white-label sites), RSS feeds, and sitemaps. One click, multiple platforms.
Audiobook Studio: Professional Narration Pipeline
This is where things get genuinely impressive. Audiobook Studio converts long-form text into professionally narrated audiobooks using ElevenLabs:
Text → Intelligent Segmentation → TTS (ElevenLabs) → Audio chunks
→ Continuous narration → Metadata → Publication
Key capabilities:
| Feature | Detail |
|---|---|
| 15+ voices | ElevenLabs Multilingual v2 and Turbo v2.5 — different personalities, accents, tones |
| Intelligent chunking | Automatic chapter segmentation for long texts — respects paragraph boundaries and natural breaks |
| Prosody control | Pauses, emphasis, speed adjustable per section |
| Cloud storage | Audio stored in R2 with signed URLs — no public exposure |
| Publication | Available on Codex storefront as a purchasable audiobook or free content |
The chunking system is critical. A 50,000-word book can't be sent to ElevenLabs in one API call — it needs to be segmented into manageable chunks (typically 3,000-5,000 characters) while respecting natural boundary points. The system stitches the audio chunks into seamless continuous narration with consistent voice throughout.
AI Coach: Real-Time Interactive Sessions
AI Coach goes beyond text — it's a bidirectional voice conversation with an AI that has full context of your content and organization:
- You speak, the coach responds with voice — real conversation, not typing
- Live transcription — ElevenLabs Scribe diarizes the conversation (who said what)
- Persistent context — the coach remembers the entire session, building on previous exchanges
- Function calling — mid-conversation, the coach can execute actions: create content, search data, generate images
The Gemini Live integration takes this further with 8 distinct voices, audio bidirectional streaming, and real-time function calling — Gemini can trigger tools while speaking to you.
Use cases: content brainstorming sessions, editorial review, strategy discussions, data exploration guided by voice. The transcription becomes a structured artifact that can be converted into blog posts, meeting notes, or action items.
PDF Studio: Professional Document Generation
A 3-step pipeline for professional documents:
- Structure — AI defines the document skeleton: sections, hierarchy, content types
- Content — Each section populated with generated or curated content
- Format — Professional formatting: tables, headers, typography, branding (colors, logos, fonts matching the organization)
Optimized for both print and digital output. Used for client reports, proposals, and documentation that needs to look polished without manual design work.
Image Generation: 5 Providers, 1 Interface
Perspectiva Studio integrates 5 image generation providers, each with different cost-quality tradeoffs:
| Provider | Model | Cost | Quality | Best For |
|---|---|---|---|---|
| Cloudflare AI | FLUX Schnell | Free | Good | Drafts, internal content |
| Cloudflare AI | SDXL Lightning | Free | Very good | Blog posts, social media |
| OpenAI | DALL-E 3 | $0.04-0.08 | Excellent | Hero images, featured content |
| Imagen 3 | $0.02-0.04 | Excellent | Photorealistic, marketing | |
| Recraft | Recraft V3 | Variable | Excellent | Illustrations, brand-consistent art |
The strategy: use free Cloudflare models for iteration and drafts, then switch to premium providers (DALL-E 3 or Imagen 3) for final outputs. Most blog posts use FLUX Schnell or SDXL Lightning — free, fast, and good enough. Client-facing hero images get DALL-E 3 treatment. Total image generation cost for a typical month: under $5.
All images are stored in R2 with signed URLs and automatically optimized for web delivery (lazy loading, responsive sizes).
The Publishing Pipeline
Content created in Perspectiva goes through an automated publishing pipeline:
Content created in Perspectiva
│
├── SEO automation
│ ├── Slug generated from title
│ ├── Meta description via AI
│ ├── Open Graph tags
│ ├── Schema.org structured data
│ └── Canonical URLs
│
├── Static generation (SSG)
│ ├── Optimized HTML
│ ├── Images with lazy loading
│ └── Embedded JSON-LD
│
└── Multi-destination
├── cadences.app/perspectiva/[slug]
├── Codex storefront (if configured)
├── RSS feed
└── Sitemap.xml
Every piece of content is SEO-ready from the moment it's created. No manual tagging, no separate SEO workflow. The AI generates meta descriptions that are actually good (because it has full context of the content), and Schema.org markup is specific to content type (Article, AudioObject, FAQPage, etc.).
Developer Tools: The Hidden Power
Block Versioning
Every content section has its own version history. Changed a paragraph and regret it? Roll back to any previous version with one click. Visual diff shows exactly what changed between versions. This isn't Git-level complexity — it's instant undo/redo at the block level.
Embeddings Search
All content is vectorized through Cloudflare Vectorize. Instead of keyword search, you can search by meaning: "articles about cost optimization" finds content about budget management, pricing strategies, and efficiency — even if those exact words don't appear. Results ranked by contextual relevance using all-MiniLM-L6-v2 embeddings.
Model Registry & Tester
A dashboard for managing every AI model integrated into the system. Quick-test any model with sample prompts, compare latency and output quality side by side, enable/disable models on the fly. When a new model launches (happens weekly now), we add it to the registry, run our test suite, and it's available across all modules within minutes.
Cost Viewer
Real-time cost tracking integrated with our AI Gateway:
- Cost breakdown by model, provider, and content category
- Historical spending trends
- Budget alerts before you hit limits
- Per-article and per-audiobook cost attribution
The Numbers: 19,000+ Lines of Vanilla JS
| Metric | Value |
|---|---|
| perspectiva-studio.js | 19,000+ lines |
| index.html | 14,000+ lines |
| AI models integrated | 10+ |
| Blog section types | 12 |
| TTS voices | 15+ |
| Image providers | 5 |
| Dependencies | 0 |
19,000 lines of JavaScript in a single file. No modules, no imports, no bundler. Sounds insane? It works better than you'd expect. The file is organized by section (blog generation, audiobook pipeline, AI coach, image generation, publishing, tools). With modern IDE search, navigating 19K lines is faster than navigating 200 files in a typical React project.
And the deployment? The entire studio is static files served from Cloudflare Pages. No Node.js server, no Docker containers, no cold starts. Global CDN, instant loads.
AI Provider Matrix
| Function | Providers |
|---|---|
| Text Generation | Claude Sonnet 4, Gemini 2.5, GPT-4o |
| Image Generation | FLUX, SDXL, DALL-E 3, Imagen 3, Recraft V3 |
| Narration (TTS) | ElevenLabs (Multilingual v2, Turbo v2.5), Edge TTS |
| Transcription (STT) | ElevenLabs Scribe, OpenAI Whisper |
| Embeddings | Cloudflare Vectorize, all-MiniLM-L6-v2 |
Multi-provider by design. When OpenAI has an outage (happens more than you'd think), we switch to Gemini. When Gemini's image generation has policy blocks, we fall back to FLUX. Redundancy isn't a luxury — it's what keeps the content pipeline running 24/7.
Key Takeaways
1. A single editor for all content types is a force multiplier. Switching between tools (Canva for images, WordPress for blog, Descript for audio) kills momentum. One interface that handles text, audio, images, and interactive sessions means content creation flows instead of stalling.
2. Free-tier AI models are good enough for 80% of content. FLUX Schnell and SDXL Lightning produce images that work perfectly for blog posts and internal content. Premium models are reserved for client-facing hero images. Total image cost: under $5/month.
3. AI Coach sessions are the most underrated feature. Voice-based brainstorming with an AI that can execute actions mid-conversation (generate images, search data, create drafts) is dramatically more productive than typing prompts.
4. Block versioning makes AI-assisted editing fearless. When AI can regenerate any section, you need version history at the block level, not the document level. Try 5 different AI-generated introductions, compare them, pick the best — all without losing work.
5. 19,000 lines in one file is a feature, not a bug. When you control the entire codebase and there are zero dependencies, a monolith file is actually easier to maintain than a sprawling file tree. Search is instant, there are no import chains to trace, and deployment is copying files.
Tags
About the Author
Gonzalo Monzón
Founder & Lead Architect
Gonzalo Monzón is a Senior Solutions Architect & AI Engineer with over 26 years building mission-critical systems in Healthcare, Industrial Automation, and enterprise AI. Founder of Cadences Lab, he specializes in bridging legacy infrastructure with cutting-edge technology.
Related Articles
Why We Use 7 AI Providers (Not Just One) — And How We Track Every Cent
Vendor lock-in is a trap. Here's how our AI Gateway routes 11,200+ calls/month between Gemini, GPT-4o, Claude, DeepSeek, Groq, and more — with automatic fallback, cost tracking to the cent, and a ~$184/month total AI bill across 7 providers.
From 4-Hour Response Time to Instant: How Our AI Voice Agents Make Real Phone Calls
Twilio for calls, Gemini Flash for real-time conversation, ElevenLabs for 15+ natural voices. We built AI agents that confirm appointments in 35 seconds, qualify leads with 3 questions, and switch between Spanish, English, and Catalan mid-call. Plus: God Mode lets humans supervise and intervene live.
Synapse Studio: A 2D Virtual Office Where AI Agents Do the Real Work
We built a SimTower-style animated office where AI agents with multimodal capabilities — vision, image generation, web search, iterative image evolution — collaborate on real tasks. Zero dependencies, pure Vanilla JS, running on Cloudflare.