Back to Blog
Engineering July 15, 2025 · 14 min read

Local-First Desktop Tools: Why We Moved 73K Lines Off the Cloud

GM

Gonzalo Monzón

Founder & Lead Architect

Not everything belongs in the cloud. When your workload involves ML model training on private data, real-time IoT device communication, browser automation with anti-detection, or local audio processing — latency, privacy, and hardware access make cloud-first architectures impractical. We built 5 local-first desktop tools totaling ~73K lines of code across 114 files. Each tool runs independently on the user's machine, syncs with our cloud platform when needed, and handles workloads that would be impossible (or illegal) to run remotely.

This is the engineering story: why local-first, how each tool works, and the architectural patterns that let a small team maintain 73K lines across 5 desktop apps without going insane.

Why Local-First? The Cloud Isn't Always the Answer

Cloud computing solves real problems — scalability, availability, zero-ops. But some workloads have constraints that make cloud deployment awkward or impossible:

  • Privacy: ML model training on customer data that can't leave the machine. GDPR doesn't care about your cloud provider's compliance certification when the data shouldn't be transmitted at all
  • Hardware access: IoT devices connected via MQTT, Serial ports, and Bluetooth don't have cloud endpoints. You need a process running on the same network — or the same USB port
  • Anti-detection: Browser automation for web scraping requires persistent browser profiles, stealth plugins, and human-like behavior patterns. Running this from a cloud IP range is a detection signal
  • Latency: Real-time audio processing (speech-to-text, text-to-speech) needs sub-100ms response. A round trip to an API adds 200-500ms of latency that breaks the conversational flow
  • Cost: Running a GPU-intensive ML training job 24/7 on cloud infrastructure costs more per month than the laptop doing the training. Local hardware is already paid for

Our answer: Electron for the shell, Fastify for the local API, SQLite for persistence, and Cadences cloud sync for data that matters. Each tool is a standalone application that works offline, syncs when connected, and handles its workload locally.

The Shared Architecture: 5 Tools, 1 Pattern

All 5 tools follow the same architectural pattern, which dramatically reduces maintenance cost:

┌─────────────────────────────┐
│       Electron Shell        │  ← OS integration, tray, notifications
│  ┌───────────────────────┐  │
│  │    Fastify HTTP API   │  │  ← localhost:PORT, versioned routes
│  │  ┌─────────────────┐  │  │
│  │  │  SQLite (local)  │  │  │  ← better-sqlite3 or sql.js WASM
│  │  └─────────────────┘  │  │
│  │  ┌─────────────────┐  │  │
│  │  │ Cadences Sync   │  │  │  ← WebSocket/REST to cloud
│  │  └─────────────────┘  │  │
│  └───────────────────────┘  │
└─────────────────────────────┘

This pattern means every tool exposes the same interface: a local HTTP API that other tools (or Cadences cloud) can call. The ML Trainer exposes /v1/embeddings. The Scraper exposes /api/jobs. The IoT Hub exposes /api/devices. Same pattern, different domain.

Why Fastify (Not Express)

Fastify gives us schema validation, plugin architecture, and ~3x the throughput of Express — and on desktop, where the sole client is the local Electron UI, that headroom means we can handle WebSocket streams, file uploads, and ML inference requests simultaneously without blocking. The plugin system keeps each tool's routes modular.

Why SQLite (Not IndexedDB, Not PostgreSQL)

SQLite is the only database engine that makes sense for local-first desktop tools:

  • Single file — no server, no connection string, no port conflicts. The database is a file next to the app
  • better-sqlite3 — synchronous API that's faster than async alternatives for single-user desktop workloads. No callback hell for simple reads
  • WAL mode — concurrent reads during writes. The Electron UI can read device state while the IoT Hub writes sensor data
  • Portable backups — copy the .sqlite file, and you have a complete backup. No dump/restore ceremonies

Tool 1: ML Trainer (11.5K Lines)

The ML Trainer is an OpenAI-compatible local API server for training and serving ML models. It exposes the same endpoints as OpenAI — /v1/embeddings, /v1/chat/completions, /v1/classify, /v1/entities, /v1/predict, /v1/similarity — so any client that speaks OpenAI protocol can use locally-trained models without code changes.

Training Types

TypeModelUse Case
EmbeddingsMiniLM-L6-v2Semantic search, similarity matching
ClassifierDistilBERTText categorization, intent detection
NERBERT-NERNamed entity recognition in domain text
RegressionCustomNumerical prediction with cross-validation
LoRA Fine-tuningGPT-2 / TinyLlamaDomain-specific text generation

Models are trained locally using @xenova/transformers (the JS port of Hugging Face Transformers) and registered in Cadences as LOCAL_MODEL type. This means Cadences workflows can route AI tasks to either cloud providers (GPT-4, Claude, Gemini) or local models based on cost, privacy, or performance requirements.

Tool 2: WhatsApp Agent (20.4K Lines)

The WhatsApp Agent is a Playwright-based automation system with a deep stealth layer that simulates human behavior. It's the largest single tool at 20.4K lines, and most of that complexity comes from one thing: not getting detected as a bot.

The humanDelay Module

The core anti-detection system simulates human interaction patterns:

  • humanType — types messages character by character with variable delays. Each keystroke has a Gaussian-distributed delay (mean ~80ms, σ ~30ms) with occasional pauses at word boundaries
  • humanClick — moves the mouse along a Bézier curve to the target element before clicking. The curve parameters are randomized, and the movement speed follows a bell curve (fast in the middle, slow at start/end)
  • readingDelay — calculates a realistic reading time based on message length in words per minute (~200-250 WPM with variance). After receiving a message, the agent "reads" it before responding

AI Pipeline

Messages flow through a Smart Pipeline with 5 AI providers (Cloudflare Workers AI, Gemini, Groq, OpenAI, DeepSeek) with automatic failover. Voice messages are transcribed via Groq/OpenAI Whisper or a local Workers-based Whisper endpoint. Images are analyzed via Gemini/OpenAI Vision or local LLaVA models.

Tool 3: Scraper (25.3K Lines)

The Scraper is a multi-purpose data extraction engine with 7 specialized scraper types. At 25.3K lines, it's the largest tool — and the most architecturally diverse, because each scraper type handles fundamentally different data sources.

The 7 Scraper Types

TypeLinesTargetKey Feature
Real Estate1,256Property listing portalsAnti-detection with Gaussian delays + anti-bot evasion
Freelance~2,8009+ job platforms (4 regions)Cross-platform normalization of postings
Documents~1,200PDF, Excel, CSV, Word, TXT, JSONFormat detection + content extraction
FileSystem~900Local disk scanMD5/SHA256 dedup across drives
ML Pipeline~600TensorFlow.js bridgeFeeds training data to ML Trainer
Legal~1,5005 government gazette sourcesProcurement + regulation monitoring
API~800REST/GraphQL endpointsGeneric API polling with transform rules

The real estate scraper implements anti-detection with Gaussian-distributed request delays (not uniform random, which is a detection signal), anti-bot evasion techniques, and intelligent pagination that detects the initial page from any URL. The freelance scraper normalizes job postings from 9+ platforms across Europe, US, and LatAm into a unified format for Cadences to process.

Supporting Systems

  • AI File Analyzer (735 lines) — classifies and summarizes extracted documents using AI
  • Cross-Disk Sync (700 lines) — synchronizes scraped data across multiple storage locations
  • Remote Worker — polls Cadences for scraping jobs, enabling cloud-triggered local execution

Tool 4: IoT Hub (8.4K Lines)

The IoT Hub manages physical devices across 3 core protocols — MQTT v5.1, SerialPort v12, and HTTP polling — with 14 additional protocol types defined in the architecture (CoAP, Modbus, Zigbee, Z-Wave, LoRa, BLE, RTSP, ONVIF...).

Device Registry

60+ device types organized across 10 categories: Environmental (temperature, humidity, air quality), Security (motion, door/window, smoke), Energy (smart plugs, solar inverters), Camera (IP cameras with PTZ), Industrial (PLC, flow meters), and more. Each device type has a capability matrix that determines which protocols, commands, and data formats it supports.

Automation Engine (796 Lines)

The Automation Engine is the brain of the IoT Hub — a rule engine with 6 trigger types:

  • device_state — fires when a device attribute changes (e.g., motion detected)
  • threshold — fires when a numeric value crosses a boundary (e.g., temperature > 30°C)
  • schedule — fires at cron-like intervals
  • sunrise/sunset — fires relative to solar position (useful for outdoor lighting and blinds)
  • webhook — fires on external HTTP trigger from Cadences or other systems
  • scene — fires multiple actions in sequence (e.g., "movie mode" = dim lights + close blinds + turn on projector)

A Camera Manager (619 lines) handles PTZ (pan-tilt-zoom) control and FFmpeg integration for stream recording. The entire IoT state auto-syncs to Cadences every 5 minutes.

Tool 5: AudioHub (7.4K Lines)

AudioHub bridges JavaScript and Python for real-time audio processing. It's the smallest tool but solves a critical problem: running Whisper STT and TTS locally when cloud latency is unacceptable.

The Python Bridge Pattern

AudioHub spawns a Python subprocess and communicates via JSON-line IPC over stdin/stdout. Each line is a JSON object with a command and payload:

// Node.js → Python (stdin)
{"cmd": "transcribe", "file": "/tmp/audio.wav", "model": "base"}

// Python → Node.js (stdout)
{"status": "ok", "text": "Hello, how are you?", "confidence": 0.94}

The bridge includes capability detection (which Python packages are installed? Is CUDA available?) and a 10-second startup timeout. If the Python process fails, AudioHub falls back to cloud STT providers.

Audio Pipeline

  • STT: Local Whisper (configurable model size: tiny/base/small/medium/large) with automatic language detection
  • TTS: 3-provider fallback chain — ElevenLabs (12 voice presets), OpenAI TTS, then Edge-TTS/gTTS/pyttsx3 as free fallbacks
  • Bluetooth: Device scanning for earpieces and headsets with heartbeat monitoring for connection stability
  • Caching: MD5-based audio file caching to avoid re-synthesizing identical text

The Cloud Sync Layer

Every tool syncs with Cadences cloud, designed around one principle: local-first, cloud-eventual. If the internet drops, every tool continues working. When connectivity returns, changes sync automatically.

  • ML Trainer — syncs trained model metadata and metrics. Models stay local (too large to upload), but capabilities are registered so Cadences can route inference requests
  • WhatsApp Agent — syncs contacts, conversations, and group membership to Cadences CRM
  • Scraper — pushes extracted data to Cadences data tables. Remote Worker polls for new jobs
  • IoT Hub — pushes device state snapshots every 5 minutes. Receives automation rule updates
  • AudioHub — syncs transcription results and TTS usage metrics

Lessons After 73K Lines of Desktop Code

1. Electron is fine, actually

The "Electron is bloated" criticism misses the point for internal tools. We don't ship these to millions of users — they run on known machines with 16GB+ RAM. The development speed of using web technologies for the UI, combined with Node.js for system access, beats native desktop development by 5-10x for our team size.

2. Fastify on desktop is underrated

Running a proper HTTP server inside an Electron app sounds unusual, but it unlocks powerful patterns: inter-tool communication, external API access, webhook reception, and a clean separation between UI and business logic. Every tool is its own microservice that happens to have a desktop UI.

3. SQLite eliminates an entire category of bugs

No network partitions. No connection pool exhaustion. No ORM mapping mismatches. Database operations are synchronous function calls that either succeed or throw. After years of fighting PostgreSQL connection limits and Redis cache invalidation, SQLite on desktop is refreshingly simple.

4. The stealth tax is real

Roughly 30% of the WhatsApp Agent's code exists purely for anti-detection: human-like delays, mouse movement simulation, browser fingerprint management, session persistence. This is a maintenance burden that grows with every platform update. Build stealth systems only when the use case genuinely requires them.

5. Python bridges work surprisingly well

JSON-line IPC over stdin/stdout is simple, debuggable, and fast enough for audio processing. We considered gRPC, WebSocket, and HTTP for the Python bridge — JSON-lines won because it requires zero infrastructure: spawn a process, write lines, read lines. No ports, no certificates, no service discovery.

Local-first isn't for everything. But when your workload needs privacy, hardware access, or sub-100ms latency, the cloud is the wrong default. 73K lines of desktop code later, we've learned that the best architecture is the one that puts computation where the data and devices actually are — and sometimes, that's the user's own machine.

Tags

Electron Local-First Desktop IoT Privacy Machine Learning Automation

About the Author

Gonzalo Monzón

Gonzalo Monzón

Founder & Lead Architect

Gonzalo Monzón is a Senior Solutions Architect & AI Engineer with over 26 years building mission-critical systems in Healthcare, Industrial Automation, and enterprise AI. Founder of Cadences Lab, he specializes in bridging legacy infrastructure with cutting-edge technology.

Stay in the loop

Get notified when we publish new articles about AI automation, use cases, and practical guides.