AI & LLMs September 22, 2025 · 12 min read

Perspectiva Studio: 19,000 Lines of Vanilla JS That Create Audiobooks, Blogs, and AI Coach Sessions

Gonzalo Monzón

Founder & Lead Architect

What if one editor could produce narrated audiobooks, illustrated blog posts, professional PDFs, and interactive AI coaching sessions — all from the same interface? Perspectiva Studio is exactly that: a 19,000+ line Vanilla JS content engine that orchestrates 10+ AI models across text, image, voice, and search to create publish-ready content at the push of a button.

No React, no build step, no npm dependencies. Just raw JS, CSS, and an absurd number of AI provider integrations. This article covers the five creation modules, the publishing pipeline, the image generation strategy across 5 providers, and the developer tools (block versioning, embeddings search, cost viewer) that make the whole thing manageable.

Blog Studio: 12 Section Types, Zero Manual Formatting

Blog Studio generates complete articles with AI. Not just paragraphs of text — structured articles with 12 distinct section types: paragraph, list, blockquote, code block, image, table, FAQ, timeline, comparison, callout, statistics, and embedded media.

The generation pipeline:

Topic input — a title, brief, or even a one-liner idea
AI structures the article — Claude or Gemini creates a section plan with types, ordering, key points per section
Content generation — each section is generated with its appropriate format (tables get data, FAQs get Q&A pairs, timelines get chronological entries)
Image generation — FLUX/DALL-E/Imagen 3 create contextual images for image sections
SEO automation — slugs, meta descriptions, Open Graph tags, Schema.org structured data, canonical URLs — all generated automatically
Static generation — optimized HTML with lazy-loaded images and embedded JSON-LD

The output publishes simultaneously to multiple destinations: cadences.app, Codex storefronts (client white-label sites), RSS feeds, and sitemaps. One click, multiple platforms.

Audiobook Studio: Professional Narration Pipeline

This is where things get genuinely impressive. Audiobook Studio converts long-form text into professionally narrated audiobooks using ElevenLabs:

Text → Intelligent Segmentation → TTS (ElevenLabs) → Audio chunks
  → Continuous narration → Metadata → Publication

Key capabilities:

Feature	Detail
15+ voices	ElevenLabs Multilingual v2 and Turbo v2.5 — different personalities, accents, tones
Intelligent chunking	Automatic chapter segmentation for long texts — respects paragraph boundaries and natural breaks
Prosody control	Pauses, emphasis, speed adjustable per section
Cloud storage	Audio stored in R2 with signed URLs — no public exposure
Publication	Available on Codex storefront as a purchasable audiobook or free content

The chunking system is critical. A 50,000-word book can't be sent to ElevenLabs in one API call — it needs to be segmented into manageable chunks (typically 3,000-5,000 characters) while respecting natural boundary points. The system stitches the audio chunks into seamless continuous narration with consistent voice throughout.

AI Coach: Real-Time Interactive Sessions

AI Coach goes beyond text — it's a bidirectional voice conversation with an AI that has full context of your content and organization:

You speak, the coach responds with voice — real conversation, not typing
Live transcription — ElevenLabs Scribe diarizes the conversation (who said what)
Persistent context — the coach remembers the entire session, building on previous exchanges
Function calling — mid-conversation, the coach can execute actions: create content, search data, generate images

The Gemini Live integration takes this further with 8 distinct voices, audio bidirectional streaming, and real-time function calling — Gemini can trigger tools while speaking to you.

Use cases: content brainstorming sessions, editorial review, strategy discussions, data exploration guided by voice. The transcription becomes a structured artifact that can be converted into blog posts, meeting notes, or action items.

PDF Studio: Professional Document Generation

A 3-step pipeline for professional documents:

Structure — AI defines the document skeleton: sections, hierarchy, content types
Content — Each section populated with generated or curated content
Format — Professional formatting: tables, headers, typography, branding (colors, logos, fonts matching the organization)

Optimized for both print and digital output. Used for client reports, proposals, and documentation that needs to look polished without manual design work.

Image Generation: 5 Providers, 1 Interface

Perspectiva Studio integrates 5 image generation providers, each with different cost-quality tradeoffs:

Provider	Model	Cost	Quality	Best For
Cloudflare AI	FLUX Schnell	Free	Good	Drafts, internal content
Cloudflare AI	SDXL Lightning	Free	Very good	Blog posts, social media
OpenAI	DALL-E 3	$0.04-0.08	Excellent	Hero images, featured content
Google	Imagen 3	$0.02-0.04	Excellent	Photorealistic, marketing
Recraft	Recraft V3	Variable	Excellent	Illustrations, brand-consistent art

The strategy: use free Cloudflare models for iteration and drafts, then switch to premium providers (DALL-E 3 or Imagen 3) for final outputs. Most blog posts use FLUX Schnell or SDXL Lightning — free, fast, and good enough. Client-facing hero images get DALL-E 3 treatment. Total image generation cost for a typical month: under $5.

All images are stored in R2 with signed URLs and automatically optimized for web delivery (lazy loading, responsive sizes).

The Publishing Pipeline

Content created in Perspectiva goes through an automated publishing pipeline:

Content created in Perspectiva
  │
  ├── SEO automation
  │     ├── Slug generated from title
  │     ├── Meta description via AI
  │     ├── Open Graph tags
  │     ├── Schema.org structured data
  │     └── Canonical URLs
  │
  ├── Static generation (SSG)
  │     ├── Optimized HTML
  │     ├── Images with lazy loading
  │     └── Embedded JSON-LD
  │
  └── Multi-destination
        ├── cadences.app/perspectiva/[slug]
        ├── Codex storefront (if configured)
        ├── RSS feed
        └── Sitemap.xml

Every piece of content is SEO-ready from the moment it's created. No manual tagging, no separate SEO workflow. The AI generates meta descriptions that are actually good (because it has full context of the content), and Schema.org markup is specific to content type (Article, AudioObject, FAQPage, etc.).

Developer Tools: The Hidden Power

Block Versioning

Every content section has its own version history. Changed a paragraph and regret it? Roll back to any previous version with one click. Visual diff shows exactly what changed between versions. This isn't Git-level complexity — it's instant undo/redo at the block level.

Embeddings Search

All content is vectorized through Cloudflare Vectorize. Instead of keyword search, you can search by meaning: "articles about cost optimization" finds content about budget management, pricing strategies, and efficiency — even if those exact words don't appear. Results ranked by contextual relevance using all-MiniLM-L6-v2 embeddings.

Model Registry & Tester

A dashboard for managing every AI model integrated into the system. Quick-test any model with sample prompts, compare latency and output quality side by side, enable/disable models on the fly. When a new model launches (happens weekly now), we add it to the registry, run our test suite, and it's available across all modules within minutes.

Cost Viewer

Real-time cost tracking integrated with our AI Gateway:

Cost breakdown by model, provider, and content category
Historical spending trends
Budget alerts before you hit limits
Per-article and per-audiobook cost attribution

The Numbers: 19,000+ Lines of Vanilla JS

Metric	Value
perspectiva-studio.js	19,000+ lines
index.html	14,000+ lines
AI models integrated	10+
Blog section types	12
TTS voices	15+
Image providers	5
Dependencies	0

19,000 lines of JavaScript in a single file. No modules, no imports, no bundler. Sounds insane? It works better than you'd expect. The file is organized by section (blog generation, audiobook pipeline, AI coach, image generation, publishing, tools). With modern IDE search, navigating 19K lines is faster than navigating 200 files in a typical React project.

And the deployment? The entire studio is static files served from Cloudflare Pages. No Node.js server, no Docker containers, no cold starts. Global CDN, instant loads.

AI Provider Matrix

Function	Providers
Text Generation	Claude Sonnet 4, Gemini 2.5, GPT-4o
Image Generation	FLUX, SDXL, DALL-E 3, Imagen 3, Recraft V3
Narration (TTS)	ElevenLabs (Multilingual v2, Turbo v2.5), Edge TTS
Transcription (STT)	ElevenLabs Scribe, OpenAI Whisper
Embeddings	Cloudflare Vectorize, all-MiniLM-L6-v2

Multi-provider by design. When OpenAI has an outage (happens more than you'd think), we switch to Gemini. When Gemini's image generation has policy blocks, we fall back to FLUX. Redundancy isn't a luxury — it's what keeps the content pipeline running 24/7.

Key Takeaways

1. A single editor for all content types is a force multiplier. Switching between tools (Canva for images, WordPress for blog, Descript for audio) kills momentum. One interface that handles text, audio, images, and interactive sessions means content creation flows instead of stalling.

2. Free-tier AI models are good enough for 80% of content. FLUX Schnell and SDXL Lightning produce images that work perfectly for blog posts and internal content. Premium models are reserved for client-facing hero images. Total image cost: under $5/month.

3. AI Coach sessions are the most underrated feature. Voice-based brainstorming with an AI that can execute actions mid-conversation (generate images, search data, create drafts) is dramatically more productive than typing prompts.

4. Block versioning makes AI-assisted editing fearless. When AI can regenerate any section, you need version history at the block level, not the document level. Try 5 different AI-generated introductions, compare them, pick the best — all without losing work.

5. 19,000 lines in one file is a feature, not a bug. When you control the entire codebase and there are zero dependencies, a monolith file is actually easier to maintain than a sprawling file tree. Search is instant, there are no import chains to trace, and deployment is copying files.

About the Author

Gonzalo Monzón

Founder & Lead Architect

Gonzalo Monzón is a Senior Solutions Architect & AI Engineer with over 26 years building mission-critical systems in Healthcare, Industrial Automation, and enterprise AI. Founder of Cadences Lab, he specializes in bridging legacy infrastructure with cutting-edge technology.

View full profile → Connect on LinkedIn

AI & LLMs

9 min read

Why We Use 7 AI Providers (Not Just One) — And How We Track Every Cent

Vendor lock-in is a trap. Here's how our AI Gateway routes 11,200+ calls/month between Gemini, GPT-4o, Claude, DeepSeek, Groq, and more — with automatic fallback, cost tracking to the cent, and a ~$184/month total AI bill across 7 providers.

From 4-Hour Response Time to Instant: How Our AI Voice Agents Make Real Phone Calls

Twilio for calls, Gemini Flash for real-time conversation, ElevenLabs for 15+ natural voices. We built AI agents that confirm appointments in 35 seconds, qualify leads with 3 questions, and switch between Spanish, English, and Catalan mid-call. Plus: God Mode lets humans supervise and intervene live.

Synapse Studio: A 2D Virtual Office Where AI Agents Do the Real Work

We built a SimTower-style animated office where AI agents with multimodal capabilities — vision, image generation, web search, iterative image evolution — collaborate on real tasks. Zero dependencies, pure Vanilla JS, running on Cloudflare.

September 8, 2025

Read Article →

Perspectiva Studio: 19,000 Lines of Vanilla JS That Create Audiobooks, Blogs, and AI Coach Sessions

Blog Studio: 12 Section Types, Zero Manual Formatting

Audiobook Studio: Professional Narration Pipeline

AI Coach: Real-Time Interactive Sessions

PDF Studio: Professional Document Generation

Image Generation: 5 Providers, 1 Interface

The Publishing Pipeline

Developer Tools: The Hidden Power

Block Versioning

Embeddings Search

Model Registry & Tester

Cost Viewer

The Numbers: 19,000+ Lines of Vanilla JS

AI Provider Matrix

Key Takeaways

Tags

About the Author

Related Articles

Why We Use 7 AI Providers (Not Just One) — And How We Track Every Cent

From 4-Hour Response Time to Instant: How Our AI Voice Agents Make Real Phone Calls

Synapse Studio: A 2D Virtual Office Where AI Agents Do the Real Work

Perspectiva Studio: 19,000 Lines of Vanilla JS That Create Audiobooks, Blogs, and AI Coach Sessions

Blog Studio: 12 Section Types, Zero Manual Formatting

Audiobook Studio: Professional Narration Pipeline

AI Coach: Real-Time Interactive Sessions

PDF Studio: Professional Document Generation

Image Generation: 5 Providers, 1 Interface

The Publishing Pipeline

Developer Tools: The Hidden Power

Block Versioning

Embeddings Search

Model Registry & Tester

Cost Viewer

The Numbers: 19,000+ Lines of Vanilla JS

AI Provider Matrix

Key Takeaways

Tags

About the Author

Stay in the loop

Related Articles

Why We Use 7 AI Providers (Not Just One) — And How We Track Every Cent

From 4-Hour Response Time to Instant: How Our AI Voice Agents Make Real Phone Calls

Synapse Studio: A 2D Virtual Office Where AI Agents Do the Real Work