Engineering July 15, 2025 · 14 min read

Local-First Desktop Tools: Why We Moved 73K Lines Off the Cloud

Gonzalo Monzón

Founder & Lead Architect

Not everything belongs in the cloud. When your workload involves ML model training on private data, real-time IoT device communication, browser automation with anti-detection, or local audio processing — latency, privacy, and hardware access make cloud-first architectures impractical. We built 5 local-first desktop tools totaling ~73K lines of code across 114 files. Each tool runs independently on the user's machine, syncs with our cloud platform when needed, and handles workloads that would be impossible (or illegal) to run remotely.

This is the engineering story: why local-first, how each tool works, and the architectural patterns that let a small team maintain 73K lines across 5 desktop apps without going insane.

Why Local-First? The Cloud Isn't Always the Answer

Cloud computing solves real problems — scalability, availability, zero-ops. But some workloads have constraints that make cloud deployment awkward or impossible:

Privacy: ML model training on customer data that can't leave the machine. GDPR doesn't care about your cloud provider's compliance certification when the data shouldn't be transmitted at all
Hardware access: IoT devices connected via MQTT, Serial ports, and Bluetooth don't have cloud endpoints. You need a process running on the same network — or the same USB port
Anti-detection: Browser automation for web scraping requires persistent browser profiles, stealth plugins, and human-like behavior patterns. Running this from a cloud IP range is a detection signal
Latency: Real-time audio processing (speech-to-text, text-to-speech) needs sub-100ms response. A round trip to an API adds 200-500ms of latency that breaks the conversational flow
Cost: Running a GPU-intensive ML training job 24/7 on cloud infrastructure costs more per month than the laptop doing the training. Local hardware is already paid for

Our answer: Electron for the shell, Fastify for the local API, SQLite for persistence, and Cadences cloud sync for data that matters. Each tool is a standalone application that works offline, syncs when connected, and handles its workload locally.

The Shared Architecture: 5 Tools, 1 Pattern

All 5 tools follow the same architectural pattern, which dramatically reduces maintenance cost:

┌─────────────────────────────┐
│       Electron Shell        │  ← OS integration, tray, notifications
│  ┌───────────────────────┐  │
│  │    Fastify HTTP API   │  │  ← localhost:PORT, versioned routes
│  │  ┌─────────────────┐  │  │
│  │  │  SQLite (local)  │  │  │  ← better-sqlite3 or sql.js WASM
│  │  └─────────────────┘  │  │
│  │  ┌─────────────────┐  │  │
│  │  │ Cadences Sync   │  │  │  ← WebSocket/REST to cloud
│  │  └─────────────────┘  │  │
│  └───────────────────────┘  │
└─────────────────────────────┘

This pattern means every tool exposes the same interface: a local HTTP API that other tools (or Cadences cloud) can call. The ML Trainer exposes /v1/embeddings. The Scraper exposes /api/jobs. The IoT Hub exposes /api/devices. Same pattern, different domain.

Why Fastify (Not Express)

Fastify gives us schema validation, plugin architecture, and ~3x the throughput of Express — and on desktop, where the sole client is the local Electron UI, that headroom means we can handle WebSocket streams, file uploads, and ML inference requests simultaneously without blocking. The plugin system keeps each tool's routes modular.

Why SQLite (Not IndexedDB, Not PostgreSQL)

SQLite is the only database engine that makes sense for local-first desktop tools:

Single file — no server, no connection string, no port conflicts. The database is a file next to the app
better-sqlite3 — synchronous API that's faster than async alternatives for single-user desktop workloads. No callback hell for simple reads
WAL mode — concurrent reads during writes. The Electron UI can read device state while the IoT Hub writes sensor data
Portable backups — copy the .sqlite file, and you have a complete backup. No dump/restore ceremonies

Tool 1: ML Trainer (11.5K Lines)

The ML Trainer is an OpenAI-compatible local API server for training and serving ML models. It exposes the same endpoints as OpenAI — /v1/embeddings, /v1/chat/completions, /v1/classify, /v1/entities, /v1/predict, /v1/similarity — so any client that speaks OpenAI protocol can use locally-trained models without code changes.

Training Types

Type	Model	Use Case
Embeddings	MiniLM-L6-v2	Semantic search, similarity matching
Classifier	DistilBERT	Text categorization, intent detection
NER	BERT-NER	Named entity recognition in domain text
Regression	Custom	Numerical prediction with cross-validation
LoRA Fine-tuning	GPT-2 / TinyLlama	Domain-specific text generation

Models are trained locally using @xenova/transformers (the JS port of Hugging Face Transformers) and registered in Cadences as LOCAL_MODEL type. This means Cadences workflows can route AI tasks to either cloud providers (GPT-4, Claude, Gemini) or local models based on cost, privacy, or performance requirements.

Tool 2: WhatsApp Agent (20.4K Lines)

The WhatsApp Agent is a Playwright-based automation system with a deep stealth layer that simulates human behavior. It's the largest single tool at 20.4K lines, and most of that complexity comes from one thing: not getting detected as a bot.

The humanDelay Module

The core anti-detection system simulates human interaction patterns:

humanType — types messages character by character with variable delays. Each keystroke has a Gaussian-distributed delay (mean ~80ms, σ ~30ms) with occasional pauses at word boundaries
humanClick — moves the mouse along a Bézier curve to the target element before clicking. The curve parameters are randomized, and the movement speed follows a bell curve (fast in the middle, slow at start/end)
readingDelay — calculates a realistic reading time based on message length in words per minute (~200-250 WPM with variance). After receiving a message, the agent "reads" it before responding

AI Pipeline

Messages flow through a Smart Pipeline with 5 AI providers (Cloudflare Workers AI, Gemini, Groq, OpenAI, DeepSeek) with automatic failover. Voice messages are transcribed via Groq/OpenAI Whisper or a local Workers-based Whisper endpoint. Images are analyzed via Gemini/OpenAI Vision or local LLaVA models.

Tool 3: Scraper (25.3K Lines)

The Scraper is a multi-purpose data extraction engine with 7 specialized scraper types. At 25.3K lines, it's the largest tool — and the most architecturally diverse, because each scraper type handles fundamentally different data sources.

The 7 Scraper Types

Type	Lines	Target	Key Feature
Real Estate	1,256	Property listing portals	Anti-detection with Gaussian delays + anti-bot evasion
Freelance	~2,800	9+ job platforms (4 regions)	Cross-platform normalization of postings
Documents	~1,200	PDF, Excel, CSV, Word, TXT, JSON	Format detection + content extraction
FileSystem	~900	Local disk scan	MD5/SHA256 dedup across drives
ML Pipeline	~600	TensorFlow.js bridge	Feeds training data to ML Trainer
Legal	~1,500	5 government gazette sources	Procurement + regulation monitoring
API	~800	REST/GraphQL endpoints	Generic API polling with transform rules

The real estate scraper implements anti-detection with Gaussian-distributed request delays (not uniform random, which is a detection signal), anti-bot evasion techniques, and intelligent pagination that detects the initial page from any URL. The freelance scraper normalizes job postings from 9+ platforms across Europe, US, and LatAm into a unified format for Cadences to process.

Supporting Systems

AI File Analyzer (735 lines) — classifies and summarizes extracted documents using AI
Cross-Disk Sync (700 lines) — synchronizes scraped data across multiple storage locations
Remote Worker — polls Cadences for scraping jobs, enabling cloud-triggered local execution

Tool 4: IoT Hub (8.4K Lines)

The IoT Hub manages physical devices across 3 core protocols — MQTT v5.1, SerialPort v12, and HTTP polling — with 14 additional protocol types defined in the architecture (CoAP, Modbus, Zigbee, Z-Wave, LoRa, BLE, RTSP, ONVIF...).

Device Registry

60+ device types organized across 10 categories: Environmental (temperature, humidity, air quality), Security (motion, door/window, smoke), Energy (smart plugs, solar inverters), Camera (IP cameras with PTZ), Industrial (PLC, flow meters), and more. Each device type has a capability matrix that determines which protocols, commands, and data formats it supports.

Automation Engine (796 Lines)

The Automation Engine is the brain of the IoT Hub — a rule engine with 6 trigger types:

device_state — fires when a device attribute changes (e.g., motion detected)
threshold — fires when a numeric value crosses a boundary (e.g., temperature > 30°C)
schedule — fires at cron-like intervals
sunrise/sunset — fires relative to solar position (useful for outdoor lighting and blinds)
webhook — fires on external HTTP trigger from Cadences or other systems
scene — fires multiple actions in sequence (e.g., "movie mode" = dim lights + close blinds + turn on projector)

A Camera Manager (619 lines) handles PTZ (pan-tilt-zoom) control and FFmpeg integration for stream recording. The entire IoT state auto-syncs to Cadences every 5 minutes.

Tool 5: AudioHub (7.4K Lines)

AudioHub bridges JavaScript and Python for real-time audio processing. It's the smallest tool but solves a critical problem: running Whisper STT and TTS locally when cloud latency is unacceptable.

The Python Bridge Pattern

AudioHub spawns a Python subprocess and communicates via JSON-line IPC over stdin/stdout. Each line is a JSON object with a command and payload:

// Node.js → Python (stdin)
{"cmd": "transcribe", "file": "/tmp/audio.wav", "model": "base"}

// Python → Node.js (stdout)
{"status": "ok", "text": "Hello, how are you?", "confidence": 0.94}

The bridge includes capability detection (which Python packages are installed? Is CUDA available?) and a 10-second startup timeout. If the Python process fails, AudioHub falls back to cloud STT providers.

Audio Pipeline

STT: Local Whisper (configurable model size: tiny/base/small/medium/large) with automatic language detection
TTS: 3-provider fallback chain — ElevenLabs (12 voice presets), OpenAI TTS, then Edge-TTS/gTTS/pyttsx3 as free fallbacks
Bluetooth: Device scanning for earpieces and headsets with heartbeat monitoring for connection stability
Caching: MD5-based audio file caching to avoid re-synthesizing identical text

The Cloud Sync Layer

Every tool syncs with Cadences cloud, designed around one principle: local-first, cloud-eventual. If the internet drops, every tool continues working. When connectivity returns, changes sync automatically.

ML Trainer — syncs trained model metadata and metrics. Models stay local (too large to upload), but capabilities are registered so Cadences can route inference requests
WhatsApp Agent — syncs contacts, conversations, and group membership to Cadences CRM
Scraper — pushes extracted data to Cadences data tables. Remote Worker polls for new jobs
IoT Hub — pushes device state snapshots every 5 minutes. Receives automation rule updates
AudioHub — syncs transcription results and TTS usage metrics

Lessons After 73K Lines of Desktop Code

1. Electron is fine, actually

The "Electron is bloated" criticism misses the point for internal tools. We don't ship these to millions of users — they run on known machines with 16GB+ RAM. The development speed of using web technologies for the UI, combined with Node.js for system access, beats native desktop development by 5-10x for our team size.

2. Fastify on desktop is underrated

Running a proper HTTP server inside an Electron app sounds unusual, but it unlocks powerful patterns: inter-tool communication, external API access, webhook reception, and a clean separation between UI and business logic. Every tool is its own microservice that happens to have a desktop UI.

3. SQLite eliminates an entire category of bugs

No network partitions. No connection pool exhaustion. No ORM mapping mismatches. Database operations are synchronous function calls that either succeed or throw. After years of fighting PostgreSQL connection limits and Redis cache invalidation, SQLite on desktop is refreshingly simple.

4. The stealth tax is real

Roughly 30% of the WhatsApp Agent's code exists purely for anti-detection: human-like delays, mouse movement simulation, browser fingerprint management, session persistence. This is a maintenance burden that grows with every platform update. Build stealth systems only when the use case genuinely requires them.

5. Python bridges work surprisingly well

JSON-line IPC over stdin/stdout is simple, debuggable, and fast enough for audio processing. We considered gRPC, WebSocket, and HTTP for the Python bridge — JSON-lines won because it requires zero infrastructure: spawn a process, write lines, read lines. No ports, no certificates, no service discovery.

Local-first isn't for everything. But when your workload needs privacy, hardware access, or sub-100ms latency, the cloud is the wrong default. 73K lines of desktop code later, we've learned that the best architecture is the one that puts computation where the data and devices actually are — and sometimes, that's the user's own machine.

About the Author

Gonzalo Monzón

Founder & Lead Architect

Gonzalo Monzón is a Senior Solutions Architect & AI Engineer with over 26 years building mission-critical systems in Healthcare, Industrial Automation, and enterprise AI. Founder of Cadences Lab, he specializes in bridging legacy infrastructure with cutting-edge technology.

View full profile → Connect on LinkedIn

Automation

10 min read

Building a WhatsApp Bot That Doesn't Get Banned — The Desktop Agent Approach

The official WhatsApp API costs $0.05–0.15 per message and restricts what you can send. Unofficial bots get banned in weeks. We built a third way: an Electron desktop agent with Playwright, human-like behavior, and zero bans after 8+ months in production.

No-Code Workflows That Actually Work in Production — 7,000 Lines of Execution Engine

Most "no-code" tools break at the first real-world edge case. We built a visual workflow engine with 20+ node types, Canvas API at 60fps, Durable Objects for long-running execution, and step-by-step debugging. Here's how 7,073 lines of engine code make drag-and-drop actually production-grade.

Edge Computing: Why We Bet Everything on Cloudflare (And What $65/Month Gets You)

No servers, no containers, no Kubernetes. We run 14+ interconnected products on 9 Cloudflare products — Workers, D1, R2, Durable Objects, Pages, KV, Vectorize, Workers AI and WAF. $65/month for what would cost $400-600 on AWS. Here's the full architecture.

August 25, 2025

Read Article →

Local-First Desktop Tools: Why We Moved 73K Lines Off the Cloud

Why Local-First? The Cloud Isn't Always the Answer

The Shared Architecture: 5 Tools, 1 Pattern

Why Fastify (Not Express)

Why SQLite (Not IndexedDB, Not PostgreSQL)

Tool 1: ML Trainer (11.5K Lines)

Training Types

Tool 2: WhatsApp Agent (20.4K Lines)

The humanDelay Module

AI Pipeline

Tool 3: Scraper (25.3K Lines)

The 7 Scraper Types

Supporting Systems

Tool 4: IoT Hub (8.4K Lines)

Device Registry

Automation Engine (796 Lines)

Tool 5: AudioHub (7.4K Lines)

The Python Bridge Pattern

Audio Pipeline

The Cloud Sync Layer

Lessons After 73K Lines of Desktop Code

1. Electron is fine, actually

2. Fastify on desktop is underrated

3. SQLite eliminates an entire category of bugs

4. The stealth tax is real

5. Python bridges work surprisingly well

Tags

About the Author

Related Articles

Building a WhatsApp Bot That Doesn't Get Banned — The Desktop Agent Approach

No-Code Workflows That Actually Work in Production — 7,000 Lines of Execution Engine

Edge Computing: Why We Bet Everything on Cloudflare (And What $65/Month Gets You)

Local-First Desktop Tools: Why We Moved 73K Lines Off the Cloud

Why Local-First? The Cloud Isn't Always the Answer

The Shared Architecture: 5 Tools, 1 Pattern

Why Fastify (Not Express)

Why SQLite (Not IndexedDB, Not PostgreSQL)

Tool 1: ML Trainer (11.5K Lines)

Training Types

Tool 2: WhatsApp Agent (20.4K Lines)

The humanDelay Module

AI Pipeline

Tool 3: Scraper (25.3K Lines)

The 7 Scraper Types

Supporting Systems

Tool 4: IoT Hub (8.4K Lines)

Device Registry

Automation Engine (796 Lines)

Tool 5: AudioHub (7.4K Lines)

The Python Bridge Pattern

Audio Pipeline

The Cloud Sync Layer

Lessons After 73K Lines of Desktop Code

1. Electron is fine, actually

2. Fastify on desktop is underrated

3. SQLite eliminates an entire category of bugs

4. The stealth tax is real

5. Python bridges work surprisingly well

Tags

About the Author

Stay in the loop

Related Articles

Building a WhatsApp Bot That Doesn't Get Banned — The Desktop Agent Approach

No-Code Workflows That Actually Work in Production — 7,000 Lines of Execution Engine

Edge Computing: Why We Bet Everything on Cloudflare (And What $65/Month Gets You)