Synapse Studio: A 2D Virtual Office Where AI Agents Do the Real Work
Gonzalo Monzón
Founder & Lead Architect
What if your AI wasn't a chat window but a virtual office? A visual space where you can see agents walking between desks, sitting down to work on tasks, thinking, collaborating — and delivering real results? That's Synapse Studio: a 2D animated office (think SimTower meets AI) where multi-agent teams with 7 multimodal capabilities process real tasks autonomously.
No React. No frameworks. Zero npm dependencies. Just Vanilla JS, CSS animations, and Cloudflare's edge infrastructure. This article covers the architecture, the multi-agent orchestration, and the iterative image evolution system that makes it unique.
The Concept: Why a Virtual Office?
AI is abstract. You send a prompt, you get a response. But when you're orchestrating multiple AI agents working in parallel on a complex task — research, analysis, image generation, quality review, final compilation — the abstraction breaks down. Who's doing what? What's the status? Where's the bottleneck?
We turned the abstraction into a spatial metaphor:
- Each agent has a desk in a 2D office building
- Departments occupy different floors (Marketing, Sales, Support, Creative)
- Agents physically move to collaborate — you see them walk to a colleague's desk
- Visual states: idle (at desk), thinking (animation), working (typing), collaborating (at another desk)
- Task progress is visible in real-time through the office activity
It's not a gimmick — it's a genuine UX solution to the multi-agent observability problem. When you have 4 agents working on a task, you see the work happening.
7 Multimodal Capabilities
Each agent can be configured with any combination of these capabilities:
| Capability | Code | What It Does | Provider |
|---|---|---|---|
| LLM (Text) | Base | Chat, reasoning, analysis | Gemini / Claude / GPT-4o / Groq |
| Web Search | F1 | Real-time internet search | Groq Compound Beta |
| Vision (ITT) | F2 | Analyze and understand images | Gemini 2.0 Flash |
| Text-to-Image (TTI) | F3 | Generate images from text | FLUX Schnell / SDXL Lightning |
| Image-to-Image (I2I) | F6 | Transform images iteratively | Stable Diffusion 1.5 / SDXL |
| Speech-to-Text (STT) | F4 | Transcribe audio | ElevenLabs Scribe / Whisper |
| Text-to-Speech (TTS) | F5 | Generate voice narration | ElevenLabs |
The key insight: agents aren't just chatbots with extra features. They're multimodal workers. A "Visual Designer" agent has TTI + I2I + Vision. A "Researcher" has LLM + Web Search. A "Fullstack Media" agent has all 7 capabilities. The role defines the tool belt, not the other way around.
Multi-Agent Orchestration
When a task arrives — from Cadences, a public form, a scraper, or a webhook — the orchestration engine takes over:
Incoming task
│
├── Orchestrator AI analyzes the task
│ └── Generates a multi-agent execution plan
│
├── Role assignment
│ ├── Agent 1: Research (data gathering, web search)
│ ├── Agent 2: Expertise (deep analysis with premium AI)
│ ├── Agent 3: Review (quality check, fact verification)
│ └── Agent 4: Compilation (final deliverable)
│
├── Dual AI execution
│ ├── Fast AI: quick responses (Groq, DeepSeek)
│ └── Deep AI: thorough analysis (Gemini 2.5, Claude)
│
└── Delivery
├── Email
├── WhatsApp
└── Webhook
The Dual AI system is critical for cost optimization. The Fast AI handles initial analysis, routing, and simple subtasks at near-zero cost (Groq is essentially free at our volume). The Deep AI only activates for complex reasoning, creative generation, or quality-critical outputs — saving ~70% on AI costs compared to running everything through premium models.
1-8 Agents Per Task
Not every task needs 8 agents. A simple "research this topic" might use 1 agent with LLM + Web Search. A complex "create a marketing campaign with visuals" might use 5 agents across Research, Copy, Design, Review, and Compilation roles. The orchestrator decides based on task complexity.
I2I: Iterative Image Evolution
This is the most innovative feature. Instead of generating one image and hoping it's right, Synapse Studio evolves images through chains of transformations:
Original image → I2I Transform (strength: 0.7) → Result v1
│
AI proposes next step
│
Auto-subtask created
│
Result v1 → I2I Transform → v2
│
...continues
Each transformation has two key parameters:
- Strength (0.0-1.0): How much the image changes. 0.3 = subtle refinement. 0.9 = dramatic reimagining.
- Guidance (1-20): How literally the AI follows the text prompt. Higher = more literal.
After each transformation, the AI analyzes the result and proposes the next iteration — what to change, what to keep, what strength to use. The chain continues until max_depth is reached or the AI decides the result is optimal.
Use cases: logo refinement, concept art iteration, style transfer evolution, progressive detail enhancement. Each step builds on the previous one, maintaining visual consistency while exploring creative directions.
Two Built-In Bots
SynapseBot (The Architect)
A conversational wizard that creates complete agencies from natural language:
"I need a marketing agency with 5 agents specializing in social media, copywriting, and visual design"
SynapseBot generates the entire structure: agency name, departments, agents with predefined roles and capabilities. Instant multi-agent team setup.
CommandBot (The CEO)
An AI CEO that supervises operations in real-time:
- Approves/rejects subtasks proposed by agents
- Monitors execution progress
- Makes prioritization decisions when agents compete for resources
- Can override agent decisions when quality doesn't meet standards
3-Level Quality System
Every organization can choose its quality tier, balancing cost vs. output quality:
| Level | AI Provider | Image Model | Cost |
|---|---|---|---|
| L1 | Workers AI (free) | FLUX Schnell (free) | $0 |
| L2 | Groq / DeepSeek | FLUX Schnell | ~$0.001/task |
| L3 | Gemini / Claude / GPT-4 | SDXL Lightning | ~$0.02-0.10/task |
L1 is completely free — using Cloudflare's Workers AI models. Good enough for internal tasks, drafts, and experimentation. L3 is premium quality for client-facing deliverables. Most organizations run L2 for routine work and escalate to L3 for final outputs.
Input/Output Sources
Tasks can come from anywhere and results can go anywhere:
| Input Sources | Output Destinations |
|---|---|
| Cadences Bridge (automatic flow) | |
| Public DATA_TABLE forms | |
| Web scrapers | Webhook (any URL) |
| Webhooks (external APIs) | Back to Cadences |
The Cadences Bridge is the most interesting: tasks created in Cadences automatically flow to Synapse Studio for AI processing. Results flow back. The user never leaves Cadences — they just see their task completed with AI-generated deliverables attached.
The Subtask System: Agents That Think Ahead
Agents don't just complete their assigned work — they propose additional work when they identify gaps:
- Agent detects that it needs additional information or a complementary output
- Proposes a subtask with type and parameters
- CommandBot (CEO AI) validates the proposal
- Subtask executes with its own pipeline
- Result feeds back into the original task
Subtask types: additional research, image generation, I2I evolution, data analysis, complementary writing. This means a simple "write a blog post about Morocco" can organically expand into: research current travel trends → generate hero image → write the post → evolve the image with I2I → review quality → compile final output with images embedded.
Zero Dependencies, Maximum Control
Synapse Studio is built with zero npm dependencies. No React, no Vue, no build step. Pure Vanilla JS + CSS.
Why? Because the animation system — agents walking between floors, sitting at desks, showing thought bubbles, status indicators — requires precise control over DOM manipulation and CSS transitions. No framework overhead, no virtual DOM reconciliation delays. Every animation runs at 60fps because there's nothing between our code and the browser.
The 5 main JS files:
- synapse-canvas.js — The 2D office renderer (building, floors, desks, agents)
- synapse-engine.js — Task orchestration, agent assignment, execution pipeline
- synapse-ui.js — Panels, interactions, responsive layout
- config-panel.js — Agency/agent/capability configuration
- detail-panel.js — Rich result display (images, I2I comparisons, text)
Backend: a single [[path]].js catch-all route on Cloudflare Workers. Storage: R2 for images, D1 for agencies/agents/tasks/results. Total infrastructure cost? Part of the $65/month Cloudflare bill we detailed in our Cloudflare article.
Key Takeaways
1. Spatial metaphors solve observability. When you have multiple AI agents working in parallel, a chat log isn't enough. A visual office where you see agents move, think, and collaborate makes the invisible visible.
2. Dual AI is the cost sweet spot. Fast AI for routing and simple tasks, Deep AI for quality-critical outputs. The 70% cost savings make multi-agent systems economically viable even for small teams.
3. Image evolution > one-shot generation. The I2I chain system produces dramatically better results than single-prompt image generation, because each iteration refines while maintaining visual consistency.
4. The subtask proposal system is emergent behavior. Agents that can identify gaps and propose additional work create outcomes that exceed what you'd get from rigid task assignment. The CEO Bot prevents scope explosion while allowing organic expansion.
5. Zero dependencies isn't a limitation — it's freedom. No framework lock-in, no dependency vulnerabilities, no build complexity. The entire studio ships as static files to Cloudflare Pages. Deploys in seconds.
Tags
About the Author
Gonzalo Monzón
Founder & Lead Architect
Gonzalo Monzón is a Senior Solutions Architect & AI Engineer with over 26 years building mission-critical systems in Healthcare, Industrial Automation, and enterprise AI. Founder of Cadences Lab, he specializes in bridging legacy infrastructure with cutting-edge technology.
Related Articles
How a 3-Person Team Handles 200+ Leads Per Month with AI
A travel agency was drowning in WhatsApp messages and Excel spreadsheets. We built a complete digital storefront with ML lead scoring, AI chat, automated workflows, and voice agents — all on Cloudflare at $0/month hosting. Now they handle 5x more leads without hiring.
Why We Use 7 AI Providers (Not Just One) — And How We Track Every Cent
Vendor lock-in is a trap. Here's how our AI Gateway routes 11,200+ calls/month between Gemini, GPT-4o, Claude, DeepSeek, Groq, and more — with automatic fallback, cost tracking to the cent, and a ~$184/month total AI bill across 7 providers.
Perspectiva Studio: 19,000 Lines of Vanilla JS That Create Audiobooks, Blogs, and AI Coach Sessions
We built a full content creation engine — audiobooks with 15+ ElevenLabs voices, blog articles with AI-generated images from 5 providers, PDF documents, and real-time AI Coach sessions — all in zero-dependency Vanilla JS running on Cloudflare.