Back to Blog
AI & LLMs September 22, 2025 · 12 min read

Perspectiva Studio: 19,000 Lines of Vanilla JS That Create Audiobooks, Blogs, and AI Coach Sessions

GM

Gonzalo Monzón

Founder & Lead Architect

What if one editor could produce narrated audiobooks, illustrated blog posts, professional PDFs, and interactive AI coaching sessions — all from the same interface? Perspectiva Studio is exactly that: a 19,000+ line Vanilla JS content engine that orchestrates 10+ AI models across text, image, voice, and search to create publish-ready content at the push of a button.

No React, no build step, no npm dependencies. Just raw JS, CSS, and an absurd number of AI provider integrations. This article covers the five creation modules, the publishing pipeline, the image generation strategy across 5 providers, and the developer tools (block versioning, embeddings search, cost viewer) that make the whole thing manageable.

Blog Studio: 12 Section Types, Zero Manual Formatting

Blog Studio generates complete articles with AI. Not just paragraphs of text — structured articles with 12 distinct section types: paragraph, list, blockquote, code block, image, table, FAQ, timeline, comparison, callout, statistics, and embedded media.

The generation pipeline:

  1. Topic input — a title, brief, or even a one-liner idea
  2. AI structures the article — Claude or Gemini creates a section plan with types, ordering, key points per section
  3. Content generation — each section is generated with its appropriate format (tables get data, FAQs get Q&A pairs, timelines get chronological entries)
  4. Image generation — FLUX/DALL-E/Imagen 3 create contextual images for image sections
  5. SEO automation — slugs, meta descriptions, Open Graph tags, Schema.org structured data, canonical URLs — all generated automatically
  6. Static generation — optimized HTML with lazy-loaded images and embedded JSON-LD

The output publishes simultaneously to multiple destinations: cadences.app, Codex storefronts (client white-label sites), RSS feeds, and sitemaps. One click, multiple platforms.

Audiobook Studio: Professional Narration Pipeline

This is where things get genuinely impressive. Audiobook Studio converts long-form text into professionally narrated audiobooks using ElevenLabs:

Text → Intelligent Segmentation → TTS (ElevenLabs) → Audio chunks
  → Continuous narration → Metadata → Publication

Key capabilities:

FeatureDetail
15+ voicesElevenLabs Multilingual v2 and Turbo v2.5 — different personalities, accents, tones
Intelligent chunkingAutomatic chapter segmentation for long texts — respects paragraph boundaries and natural breaks
Prosody controlPauses, emphasis, speed adjustable per section
Cloud storageAudio stored in R2 with signed URLs — no public exposure
PublicationAvailable on Codex storefront as a purchasable audiobook or free content

The chunking system is critical. A 50,000-word book can't be sent to ElevenLabs in one API call — it needs to be segmented into manageable chunks (typically 3,000-5,000 characters) while respecting natural boundary points. The system stitches the audio chunks into seamless continuous narration with consistent voice throughout.

AI Coach: Real-Time Interactive Sessions

AI Coach goes beyond text — it's a bidirectional voice conversation with an AI that has full context of your content and organization:

  • You speak, the coach responds with voice — real conversation, not typing
  • Live transcription — ElevenLabs Scribe diarizes the conversation (who said what)
  • Persistent context — the coach remembers the entire session, building on previous exchanges
  • Function calling — mid-conversation, the coach can execute actions: create content, search data, generate images

The Gemini Live integration takes this further with 8 distinct voices, audio bidirectional streaming, and real-time function calling — Gemini can trigger tools while speaking to you.

Use cases: content brainstorming sessions, editorial review, strategy discussions, data exploration guided by voice. The transcription becomes a structured artifact that can be converted into blog posts, meeting notes, or action items.

PDF Studio: Professional Document Generation

A 3-step pipeline for professional documents:

  1. Structure — AI defines the document skeleton: sections, hierarchy, content types
  2. Content — Each section populated with generated or curated content
  3. Format — Professional formatting: tables, headers, typography, branding (colors, logos, fonts matching the organization)

Optimized for both print and digital output. Used for client reports, proposals, and documentation that needs to look polished without manual design work.

Image Generation: 5 Providers, 1 Interface

Perspectiva Studio integrates 5 image generation providers, each with different cost-quality tradeoffs:

ProviderModelCostQualityBest For
Cloudflare AIFLUX SchnellFreeGoodDrafts, internal content
Cloudflare AISDXL LightningFreeVery goodBlog posts, social media
OpenAIDALL-E 3$0.04-0.08ExcellentHero images, featured content
GoogleImagen 3$0.02-0.04ExcellentPhotorealistic, marketing
RecraftRecraft V3VariableExcellentIllustrations, brand-consistent art

The strategy: use free Cloudflare models for iteration and drafts, then switch to premium providers (DALL-E 3 or Imagen 3) for final outputs. Most blog posts use FLUX Schnell or SDXL Lightning — free, fast, and good enough. Client-facing hero images get DALL-E 3 treatment. Total image generation cost for a typical month: under $5.

All images are stored in R2 with signed URLs and automatically optimized for web delivery (lazy loading, responsive sizes).

The Publishing Pipeline

Content created in Perspectiva goes through an automated publishing pipeline:

Content created in Perspectiva
  │
  ├── SEO automation
  │     ├── Slug generated from title
  │     ├── Meta description via AI
  │     ├── Open Graph tags
  │     ├── Schema.org structured data
  │     └── Canonical URLs
  │
  ├── Static generation (SSG)
  │     ├── Optimized HTML
  │     ├── Images with lazy loading
  │     └── Embedded JSON-LD
  │
  └── Multi-destination
        ├── cadences.app/perspectiva/[slug]
        ├── Codex storefront (if configured)
        ├── RSS feed
        └── Sitemap.xml

Every piece of content is SEO-ready from the moment it's created. No manual tagging, no separate SEO workflow. The AI generates meta descriptions that are actually good (because it has full context of the content), and Schema.org markup is specific to content type (Article, AudioObject, FAQPage, etc.).

Developer Tools: The Hidden Power

Block Versioning

Every content section has its own version history. Changed a paragraph and regret it? Roll back to any previous version with one click. Visual diff shows exactly what changed between versions. This isn't Git-level complexity — it's instant undo/redo at the block level.

Embeddings Search

All content is vectorized through Cloudflare Vectorize. Instead of keyword search, you can search by meaning: "articles about cost optimization" finds content about budget management, pricing strategies, and efficiency — even if those exact words don't appear. Results ranked by contextual relevance using all-MiniLM-L6-v2 embeddings.

Model Registry & Tester

A dashboard for managing every AI model integrated into the system. Quick-test any model with sample prompts, compare latency and output quality side by side, enable/disable models on the fly. When a new model launches (happens weekly now), we add it to the registry, run our test suite, and it's available across all modules within minutes.

Cost Viewer

Real-time cost tracking integrated with our AI Gateway:

  • Cost breakdown by model, provider, and content category
  • Historical spending trends
  • Budget alerts before you hit limits
  • Per-article and per-audiobook cost attribution

The Numbers: 19,000+ Lines of Vanilla JS

MetricValue
perspectiva-studio.js19,000+ lines
index.html14,000+ lines
AI models integrated10+
Blog section types12
TTS voices15+
Image providers5
Dependencies0

19,000 lines of JavaScript in a single file. No modules, no imports, no bundler. Sounds insane? It works better than you'd expect. The file is organized by section (blog generation, audiobook pipeline, AI coach, image generation, publishing, tools). With modern IDE search, navigating 19K lines is faster than navigating 200 files in a typical React project.

And the deployment? The entire studio is static files served from Cloudflare Pages. No Node.js server, no Docker containers, no cold starts. Global CDN, instant loads.

AI Provider Matrix

FunctionProviders
Text GenerationClaude Sonnet 4, Gemini 2.5, GPT-4o
Image GenerationFLUX, SDXL, DALL-E 3, Imagen 3, Recraft V3
Narration (TTS)ElevenLabs (Multilingual v2, Turbo v2.5), Edge TTS
Transcription (STT)ElevenLabs Scribe, OpenAI Whisper
EmbeddingsCloudflare Vectorize, all-MiniLM-L6-v2

Multi-provider by design. When OpenAI has an outage (happens more than you'd think), we switch to Gemini. When Gemini's image generation has policy blocks, we fall back to FLUX. Redundancy isn't a luxury — it's what keeps the content pipeline running 24/7.

Key Takeaways

1. A single editor for all content types is a force multiplier. Switching between tools (Canva for images, WordPress for blog, Descript for audio) kills momentum. One interface that handles text, audio, images, and interactive sessions means content creation flows instead of stalling.

2. Free-tier AI models are good enough for 80% of content. FLUX Schnell and SDXL Lightning produce images that work perfectly for blog posts and internal content. Premium models are reserved for client-facing hero images. Total image cost: under $5/month.

3. AI Coach sessions are the most underrated feature. Voice-based brainstorming with an AI that can execute actions mid-conversation (generate images, search data, create drafts) is dramatically more productive than typing prompts.

4. Block versioning makes AI-assisted editing fearless. When AI can regenerate any section, you need version history at the block level, not the document level. Try 5 different AI-generated introductions, compare them, pick the best — all without losing work.

5. 19,000 lines in one file is a feature, not a bug. When you control the entire codebase and there are zero dependencies, a monolith file is actually easier to maintain than a sprawling file tree. Search is instant, there are no import chains to trace, and deployment is copying files.

Tags

Content Creation Audiobooks ElevenLabs Image Generation Vanilla JS AI Coach

About the Author

Gonzalo Monzón

Gonzalo Monzón

Founder & Lead Architect

Gonzalo Monzón is a Senior Solutions Architect & AI Engineer with over 26 years building mission-critical systems in Healthcare, Industrial Automation, and enterprise AI. Founder of Cadences Lab, he specializes in bridging legacy infrastructure with cutting-edge technology.

Stay in the loop

Get notified when we publish new articles about AI automation, use cases, and practical guides.