Automation January 29, 2026 · 10 min read

Building a WhatsApp Bot That Doesn't Get Banned — The Desktop Agent Approach

Gonzalo Monzón

Founder & Lead Architect

In Spain and Latin America, business happens on WhatsApp. Not email, not Slack — WhatsApp. Our clients needed to automate follow-ups, appointment reminders, and lead qualification at scale. The official Business API was too expensive and too restrictive. Every unofficial library we tried got the number banned within 3 weeks. So we built something different: a desktop agent that doesn't fight WhatsApp's rules — it plays by them, at human speed.

The Three Paths of WhatsApp Automation (And Why Two Fail)

Before building anything, we evaluated every option on the market:

Path 1: Official WhatsApp Business API (Meta)

The "safe" choice. But it has real limitations that kill it for most SMB use cases:

$0.05–0.15 per message — at 200+ messages/day, that's $300–900/month just in messaging fees
24-hour window: You can only reply within 24 hours of the user's last message. After that, you need pre-approved templates
Template approval: Every outbound message template needs Meta's approval — 24-72h review, frequent rejections
No multimedia freedom: Limited attachment types, no audio messages, no location sharing in templates
Business verification: Requires Facebook Business Manager, domain verification, 2-week onboarding

Path 2: Unofficial Libraries (Baileys, whatsapp-web.js)

Open-source libraries that reverse-engineer WhatsApp's protocol. Fast to set up, but:

Ban rate: 85%+ within 4 weeks in our testing — WhatsApp detects non-browser connections
Protocol breaks: WhatsApp updates their protocol regularly, breaking these libraries for days
No session recovery: When a session breaks, you need to re-scan the QR code manually

Path 3: Our Approach — The Desktop Agent

What if instead of trying to bypass WhatsApp, we just... used it? Like a human would. Through a real browser, with realistic timing, on a real desktop app. That's the core insight behind our WhatsApp Local Agent.

The Architecture: Electron + Playwright + WebSocket

The agent is an Electron desktop app running on the client's machine (Windows/macOS/Linux). Inside, it runs a Playwright-controlled browser instance pointed at WhatsApp Web. The key components:

Playwright browser: A real Chromium instance with WhatsApp Web loaded — not headless, actually visible on screen
WebSocket bridge: Bidirectional connection to Cadences cloud. Receives tasks, reports results
SQLite queue: Local message queue with retry logic, delivery status tracking, and deduplication
Session persistence: Automatic session restoration — no re-scanning QR codes after restarts

The flow is simple:

Cadences Workflow triggers "Send WhatsApp"
  → WebSocket → Desktop Agent receives task
    → Agent opens/finds the conversation
    → Types message with human-like delays
    → Sends text / image / audio / location
    → Reports delivery status back via WebSocket
  → Workflow continues with result

The Anti-Detection Layer: 6 Techniques

This is where the engineering matters. WhatsApp's detection systems look for automation patterns. We eliminate every single one:

1. Stealth Plugin

We use puppeteer-extra-plugin-stealth adapted for Playwright. It patches 11 different browser fingerprint leaks: WebDriver detection, Chrome DevTools protocol, navigator.plugins, WebGL renderer strings, and more. The browser looks indistinguishable from a regular Chrome install.

2. Natural Typing Speed

Not fixed delays. We modeled actual human typing behavior:

Base speed: 40-60 WPM with Gaussian variance
Pauses between words: 80-200ms (longer for punctuation)
Occasional "thinking" pauses: 1-3s mid-sentence
Typo simulation disabled (too risky) but hesitation patterns included

3. Realistic Mouse Movements

Every click is preceded by a mouse movement following a Bézier curve from the current position. No teleportation. Movement speed varies, with acceleration and deceleration curves matching human motor control.

4. Random Action Delays

Between every action (open chat, type, send, next chat), the agent waits 1-5 seconds with random distribution. Between sessions (batches of 10-15 messages), it takes a 3-8 minute break — simulating a human doing other work.

5. Rate Limiting

Hard limits that no human would exceed:

Max 20 new conversations per hour
Max 60 messages per hour across all chats
No messaging between 1 AM and 7 AM (configurable timezone)
Weekend mode: 50% reduced volume

6. Rotating Fingerprints

User-Agent strings rotate weekly. Viewport size varies slightly between sessions. Language preferences match the phone's locale. Every detail that could flag the session as automated is randomized within human-plausible ranges.

Full Capability Set

The agent isn't just a text sender. It handles everything a human can do in WhatsApp Web:

Send text: With natural typing delays and Markdown formatting
Send images: File upload with optional caption text
Send audio: Pre-recorded audio or real-time TTS (via ElevenLabs/Edge TTS)
Send location: GPS coordinates with place name
Read chats: Extract last N messages from any conversation — text, timestamps, sender
Search chats: Find conversations by contact name or phone number
Audio transcription: Automatically transcribe received voice messages using Groq Whisper / OpenAI / Cloudflare Workers AI
Read receipts: Detect blue ticks and online status

The audio transcription is particularly powerful: a client sends a 2-minute voice message, the agent transcribes it in real-time, feeds it to an AI for interpretation, and can respond with a contextually appropriate text — all within 15 seconds.

Integration with Cadences Workflows

The real power isn't the agent alone — it's the agent as a node in a workflow. In the Cadences visual workflow editor, WhatsApp actions are first-class nodes:

WhatsApp Send node: Send message to a specific number or to a variable (from CRM, Data Table, etc.)
WhatsApp Read node: Get recent messages from a contact — useful for checking if they replied
WhatsApp Wait node: Pause the workflow until the contact replies (with configurable timeout)
STT node: Transcribe any audio message received during the conversation

Example workflow: Lead comes in → AI scores the lead → WhatsApp introduces the service → waits for reply → if reply contains pricing keywords, generates personalized quote with AI → sends quote as PDF → schedules follow-up in 48h.

Cost Comparison: API vs Desktop Agent

📊 Official API cost: $0.05–0.15/message → ~$450/month at our volume

🖥️ Desktop Agent cost: $0/message → electricity + one machine running

📋 Template approvals needed: API: Yes (24-72h). Agent: No — any message, any time

📎 Multimedia support: API: Limited. Agent: Full (images, audio, video, location, documents)

8+ Months in Production: The Numbers

🛡️ Account bans: 0 (across 3 different phone numbers, 3 different clients)

📱 Messages sent: 14,000+ automated messages total

⏱️ Uptime: 98.7% (downtime: Windows updates + WhatsApp Web maintenance)

🤝 Response rate: 31% (compare: email marketing averages 8-12%)

🎤 Voice messages transcribed: 2,400+ (automatic, in-conversation)

🔄 Avg delivery time: 4-12 seconds per message (human-speed)

What Can Go Wrong (And How We Handle It)

Honesty time — this approach isn't bulletproof. Here are the real risks and our mitigations:

WhatsApp Web updates: Meta occasionally changes the DOM structure. Our selectors break. Mitigation: we use data-testid attributes when available (stable) and have a DOM change detection system that alerts us within minutes
Session expiration: WhatsApp Web sessions can expire after 14 days of inactivity. Mitigation: daily keepalive pings
Machine goes offline: Laptop closes, power outage. Mitigation: SQLite queue persists — on restart, all queued messages are sent in order
Rate limit detection: If we accidentally exceed safe volumes, WhatsApp shows captchas. Mitigation: the agent detects captchas, pauses all activity, and alerts the operator

Ethics and Responsible Use

This is a powerful tool, and we take responsible use seriously:

No cold spam: We only message contacts who have previously interacted with the business or explicitly opted in
Unsubscribe mechanism: Every automated conversation includes a way to opt out. "STOP" or "No más mensajes" immediately blocklists the contact
Human handoff: When the AI detects frustration, complaints, or complex requests, it immediately transfers to a human operator with full conversation context
Volume limits: We enforce hard daily caps per client. No client sends more than 150 messages/day

The human-like behavior isn't deception — it's user experience design. Nobody wants to feel like they're talking to a robot. The timing, the natural flow, the occasional pause — it creates a conversation that feels respectful of the recipient's time.

The Technical Stack

Electron: Desktop app shell — cross-platform, auto-updates, system tray integration
Playwright: Browser automation — more reliable than Puppeteer for long-running sessions
puppeteer-extra-plugin-stealth: Anti-fingerprinting for browser detection bypass
SQLite (sql.js): Local queue, message history, delivery tracking, contact blocklist
WebSocket: Real-time bidirectional communication with Cadences cloud
Groq/OpenAI/CF Workers AI: STT for voice message transcription (selectable per client)

The entire agent is ~3,500 lines of JavaScript. It runs on any machine with 4GB RAM and a stable internet connection. No Docker, no servers, no cloud infra — just a desktop app that WhatsApp sees as a regular user browsing the web.

About the Author

Gonzalo Monzón

Founder & Lead Architect

Gonzalo Monzón is a Senior Solutions Architect & AI Engineer with over 26 years building mission-critical systems in Healthcare, Industrial Automation, and enterprise AI. Founder of Cadences Lab, he specializes in bridging legacy infrastructure with cutting-edge technology.

View full profile → Connect on LinkedIn

Results & ROI

12 min read

How a 3-Person Team Handles 200+ Leads Per Month with AI

A travel agency was drowning in WhatsApp messages and Excel spreadsheets. We built a complete digital storefront with ML lead scoring, AI chat, automated workflows, and voice agents — all on Cloudflare at $0/month hosting. Now they handle 5x more leads without hiring.

No-Code Workflows That Actually Work in Production — 7,000 Lines of Execution Engine

Most "no-code" tools break at the first real-world edge case. We built a visual workflow engine with 20+ node types, Canvas API at 60fps, Durable Objects for long-running execution, and step-by-step debugging. Here's how 7,073 lines of engine code make drag-and-drop actually production-grade.

Cadences: The AI-Native Project Management Platform We Built from Scratch

Cadences isn't another Asana clone with an AI chatbot bolted on. It's a full platform with 7+ AI providers, voice agents that make real phone calls, WhatsApp automation, IoT control, visual workflow engine, offline-first sync, and edge computing — all built on React 18 + Cloudflare Workers with 107+ database migrations.

August 11, 2025

Read Article →

Building a WhatsApp Bot That Doesn't Get Banned — The Desktop Agent Approach

The Three Paths of WhatsApp Automation (And Why Two Fail)

Path 1: Official WhatsApp Business API (Meta)

Path 2: Unofficial Libraries (Baileys, whatsapp-web.js)

Path 3: Our Approach — The Desktop Agent

The Architecture: Electron + Playwright + WebSocket

The Anti-Detection Layer: 6 Techniques

1. Stealth Plugin

2. Natural Typing Speed

3. Realistic Mouse Movements

4. Random Action Delays

5. Rate Limiting

6. Rotating Fingerprints

Full Capability Set

Integration with Cadences Workflows

Cost Comparison: API vs Desktop Agent

8+ Months in Production: The Numbers

What Can Go Wrong (And How We Handle It)

Ethics and Responsible Use

The Technical Stack

Tags

About the Author

Related Articles

How a 3-Person Team Handles 200+ Leads Per Month with AI

No-Code Workflows That Actually Work in Production — 7,000 Lines of Execution Engine

Cadences: The AI-Native Project Management Platform We Built from Scratch

Building a WhatsApp Bot That Doesn't Get Banned — The Desktop Agent Approach

The Three Paths of WhatsApp Automation (And Why Two Fail)

Path 1: Official WhatsApp Business API (Meta)

Path 2: Unofficial Libraries (Baileys, whatsapp-web.js)

Path 3: Our Approach — The Desktop Agent

The Architecture: Electron + Playwright + WebSocket

The Anti-Detection Layer: 6 Techniques

1. Stealth Plugin

2. Natural Typing Speed

3. Realistic Mouse Movements

4. Random Action Delays

5. Rate Limiting

6. Rotating Fingerprints

Full Capability Set

Integration with Cadences Workflows

Cost Comparison: API vs Desktop Agent

8+ Months in Production: The Numbers

What Can Go Wrong (And How We Handle It)

Ethics and Responsible Use

The Technical Stack

Tags

About the Author

Stay in the loop

Related Articles

How a 3-Person Team Handles 200+ Leads Per Month with AI

No-Code Workflows That Actually Work in Production — 7,000 Lines of Execution Engine

Cadences: The AI-Native Project Management Platform We Built from Scratch