Back to Blog
Automation January 29, 2026 · 10 min read

Building a WhatsApp Bot That Doesn't Get Banned — The Desktop Agent Approach

GM

Gonzalo Monzón

Founder & Lead Architect

In Spain and Latin America, business happens on WhatsApp. Not email, not Slack — WhatsApp. Our clients needed to automate follow-ups, appointment reminders, and lead qualification at scale. The official Business API was too expensive and too restrictive. Every unofficial library we tried got the number banned within 3 weeks. So we built something different: a desktop agent that doesn't fight WhatsApp's rules — it plays by them, at human speed.

The Three Paths of WhatsApp Automation (And Why Two Fail)

Before building anything, we evaluated every option on the market:

Path 1: Official WhatsApp Business API (Meta)

The "safe" choice. But it has real limitations that kill it for most SMB use cases:

  • $0.05–0.15 per message — at 200+ messages/day, that's $300–900/month just in messaging fees
  • 24-hour window: You can only reply within 24 hours of the user's last message. After that, you need pre-approved templates
  • Template approval: Every outbound message template needs Meta's approval — 24-72h review, frequent rejections
  • No multimedia freedom: Limited attachment types, no audio messages, no location sharing in templates
  • Business verification: Requires Facebook Business Manager, domain verification, 2-week onboarding

Path 2: Unofficial Libraries (Baileys, whatsapp-web.js)

Open-source libraries that reverse-engineer WhatsApp's protocol. Fast to set up, but:

  • Ban rate: 85%+ within 4 weeks in our testing — WhatsApp detects non-browser connections
  • Protocol breaks: WhatsApp updates their protocol regularly, breaking these libraries for days
  • No session recovery: When a session breaks, you need to re-scan the QR code manually

Path 3: Our Approach — The Desktop Agent

What if instead of trying to bypass WhatsApp, we just... used it? Like a human would. Through a real browser, with realistic timing, on a real desktop app. That's the core insight behind our WhatsApp Local Agent.

The Architecture: Electron + Playwright + WebSocket

The agent is an Electron desktop app running on the client's machine (Windows/macOS/Linux). Inside, it runs a Playwright-controlled browser instance pointed at WhatsApp Web. The key components:

  • Playwright browser: A real Chromium instance with WhatsApp Web loaded — not headless, actually visible on screen
  • WebSocket bridge: Bidirectional connection to Cadences cloud. Receives tasks, reports results
  • SQLite queue: Local message queue with retry logic, delivery status tracking, and deduplication
  • Session persistence: Automatic session restoration — no re-scanning QR codes after restarts

The flow is simple:

Cadences Workflow triggers "Send WhatsApp"
  → WebSocket → Desktop Agent receives task
    → Agent opens/finds the conversation
    → Types message with human-like delays
    → Sends text / image / audio / location
    → Reports delivery status back via WebSocket
  → Workflow continues with result

The Anti-Detection Layer: 6 Techniques

This is where the engineering matters. WhatsApp's detection systems look for automation patterns. We eliminate every single one:

1. Stealth Plugin

We use puppeteer-extra-plugin-stealth adapted for Playwright. It patches 11 different browser fingerprint leaks: WebDriver detection, Chrome DevTools protocol, navigator.plugins, WebGL renderer strings, and more. The browser looks indistinguishable from a regular Chrome install.

2. Natural Typing Speed

Not fixed delays. We modeled actual human typing behavior:

  • Base speed: 40-60 WPM with Gaussian variance
  • Pauses between words: 80-200ms (longer for punctuation)
  • Occasional "thinking" pauses: 1-3s mid-sentence
  • Typo simulation disabled (too risky) but hesitation patterns included

3. Realistic Mouse Movements

Every click is preceded by a mouse movement following a Bézier curve from the current position. No teleportation. Movement speed varies, with acceleration and deceleration curves matching human motor control.

4. Random Action Delays

Between every action (open chat, type, send, next chat), the agent waits 1-5 seconds with random distribution. Between sessions (batches of 10-15 messages), it takes a 3-8 minute break — simulating a human doing other work.

5. Rate Limiting

Hard limits that no human would exceed:

  • Max 20 new conversations per hour
  • Max 60 messages per hour across all chats
  • No messaging between 1 AM and 7 AM (configurable timezone)
  • Weekend mode: 50% reduced volume

6. Rotating Fingerprints

User-Agent strings rotate weekly. Viewport size varies slightly between sessions. Language preferences match the phone's locale. Every detail that could flag the session as automated is randomized within human-plausible ranges.

Full Capability Set

The agent isn't just a text sender. It handles everything a human can do in WhatsApp Web:

  • Send text: With natural typing delays and Markdown formatting
  • Send images: File upload with optional caption text
  • Send audio: Pre-recorded audio or real-time TTS (via ElevenLabs/Edge TTS)
  • Send location: GPS coordinates with place name
  • Read chats: Extract last N messages from any conversation — text, timestamps, sender
  • Search chats: Find conversations by contact name or phone number
  • Audio transcription: Automatically transcribe received voice messages using Groq Whisper / OpenAI / Cloudflare Workers AI
  • Read receipts: Detect blue ticks and online status

The audio transcription is particularly powerful: a client sends a 2-minute voice message, the agent transcribes it in real-time, feeds it to an AI for interpretation, and can respond with a contextually appropriate text — all within 15 seconds.

Integration with Cadences Workflows

The real power isn't the agent alone — it's the agent as a node in a workflow. In the Cadences visual workflow editor, WhatsApp actions are first-class nodes:

  • WhatsApp Send node: Send message to a specific number or to a variable (from CRM, Data Table, etc.)
  • WhatsApp Read node: Get recent messages from a contact — useful for checking if they replied
  • WhatsApp Wait node: Pause the workflow until the contact replies (with configurable timeout)
  • STT node: Transcribe any audio message received during the conversation

Example workflow: Lead comes in → AI scores the lead → WhatsApp introduces the service → waits for reply → if reply contains pricing keywords, generates personalized quote with AI → sends quote as PDF → schedules follow-up in 48h.

Cost Comparison: API vs Desktop Agent

📊 Official API cost: $0.05–0.15/message → ~$450/month at our volume

🖥️ Desktop Agent cost: $0/message → electricity + one machine running

📋 Template approvals needed: API: Yes (24-72h). Agent: No — any message, any time

📎 Multimedia support: API: Limited. Agent: Full (images, audio, video, location, documents)

8+ Months in Production: The Numbers

🛡️ Account bans: 0 (across 3 different phone numbers, 3 different clients)

📱 Messages sent: 14,000+ automated messages total

⏱️ Uptime: 98.7% (downtime: Windows updates + WhatsApp Web maintenance)

🤝 Response rate: 31% (compare: email marketing averages 8-12%)

🎤 Voice messages transcribed: 2,400+ (automatic, in-conversation)

🔄 Avg delivery time: 4-12 seconds per message (human-speed)

What Can Go Wrong (And How We Handle It)

Honesty time — this approach isn't bulletproof. Here are the real risks and our mitigations:

  • WhatsApp Web updates: Meta occasionally changes the DOM structure. Our selectors break. Mitigation: we use data-testid attributes when available (stable) and have a DOM change detection system that alerts us within minutes
  • Session expiration: WhatsApp Web sessions can expire after 14 days of inactivity. Mitigation: daily keepalive pings
  • Machine goes offline: Laptop closes, power outage. Mitigation: SQLite queue persists — on restart, all queued messages are sent in order
  • Rate limit detection: If we accidentally exceed safe volumes, WhatsApp shows captchas. Mitigation: the agent detects captchas, pauses all activity, and alerts the operator

Ethics and Responsible Use

This is a powerful tool, and we take responsible use seriously:

  • No cold spam: We only message contacts who have previously interacted with the business or explicitly opted in
  • Unsubscribe mechanism: Every automated conversation includes a way to opt out. "STOP" or "No más mensajes" immediately blocklists the contact
  • Human handoff: When the AI detects frustration, complaints, or complex requests, it immediately transfers to a human operator with full conversation context
  • Volume limits: We enforce hard daily caps per client. No client sends more than 150 messages/day

The human-like behavior isn't deception — it's user experience design. Nobody wants to feel like they're talking to a robot. The timing, the natural flow, the occasional pause — it creates a conversation that feels respectful of the recipient's time.

The Technical Stack

  • Electron: Desktop app shell — cross-platform, auto-updates, system tray integration
  • Playwright: Browser automation — more reliable than Puppeteer for long-running sessions
  • puppeteer-extra-plugin-stealth: Anti-fingerprinting for browser detection bypass
  • SQLite (sql.js): Local queue, message history, delivery tracking, contact blocklist
  • WebSocket: Real-time bidirectional communication with Cadences cloud
  • Groq/OpenAI/CF Workers AI: STT for voice message transcription (selectable per client)

The entire agent is ~3,500 lines of JavaScript. It runs on any machine with 4GB RAM and a stable internet connection. No Docker, no servers, no cloud infra — just a desktop app that WhatsApp sees as a regular user browsing the web.

Tags

WhatsApp Automation Anti-Detection Playwright Electron Desktop Tools WebSocket

About the Author

Gonzalo Monzón

Gonzalo Monzón

Founder & Lead Architect

Gonzalo Monzón is a Senior Solutions Architect & AI Engineer with over 26 years building mission-critical systems in Healthcare, Industrial Automation, and enterprise AI. Founder of Cadences Lab, he specializes in bridging legacy infrastructure with cutting-edge technology.

Stay in the loop

Get notified when we publish new articles about AI automation, use cases, and practical guides.