Why Every Database Migration Expert Started with Key-Value: From MUMPS Globals to D1
Gonzalo Monzón
Founder & Lead Architect
In 1997, my first production database was a MUMPS global — a hierarchical key-value tree stored on an HP 9000 running MSM. In 2024, my production database is Cloudflare D1 — SQLite at the edge. Between those two points, I migrated data through Caché, SQL Server, PostgreSQL, and more variations of "store this, retrieve that" than I care to count. The surprising lesson: every database is fundamentally a key-value store with varying amounts of ceremony on top. Understanding this changes how you think about every migration you'll ever do.
The Global: Where It All Started
MUMPS globals are the simplest possible data structure that could work for a hospital. A global is a persistent, hierarchical, sparse array — essentially a tree where every node has a key and optionally a value.
A patient record looked like this:
^PATIENT(12345,"NAME") = "García López, María"
^PATIENT(12345,"DOB") = "19450312"
^PATIENT(12345,"ALLERGIES",1) = "Penicillin"
^PATIENT(12345,"ALLERGIES",2) = "Sulfa"
^PATIENT(12345,"LAB","20130415","GLU") = 98
^PATIENT(12345,"LAB","20130415","HB") = 14.2
^PATIENT(12345,"LAB","20130416","GLU") = 105
No schema. No types. No foreign keys. No ALTER TABLE. You just wrote data with a hierarchical key and read it back. The key itself was the schema — the naming convention determined the data model. And it worked for decades in hospitals handling millions of patient records.
Why Globals Were Brilliant (and Dangerous)
The global model had real strengths that modern NoSQL databases rediscovered independently:
- Schema flexibility: Adding a new field was trivial — just write to a new key path. No migration scripts, no downtime, no schema versioning. If patient 12345 needed a "PACEMAKER_MODEL" field, you set
^PATIENT(12345,"PACEMAKER_MODEL") = "Medtronic MiniMed"and it existed - Sparse data: Not every patient has allergies. Not every visit has lab results. In a relational model, you need nullable columns or junction tables. In globals, absent data simply doesn't exist as a key — it takes zero space
- Natural hierarchies: Patient → visits → lab results → individual tests maps perfectly to the global tree. No JOINs needed. Traversing a patient's history was a single
$ORDERwalk down the tree - O(1) access: Retrieving a known value —
^PATIENT(12345,"NAME")— was a single B-tree lookup. No query optimizer. No execution plan. Direct addressing
The danger was equally real: the application logic was the schema. There was no database-level enforcement of data integrity. If a developer stored a date as "March 15" instead of "20130315", the database accepted it silently. Consistency depended entirely on disciplined coding conventions, passed down as institutional knowledge between MUMPS programmers. One careless developer could corrupt the implicit schema for an entire subsystem.
Caché: The Bridge Database
InterSystems Caché was the evolutionary successor to MSM — it maintained the global storage model but added a SQL projection layer on top. You could access the same data as globals (for legacy MUMPS code) or as SQL tables (for modern applications).
This dual-access model was revolutionary for migration. During my years supporting healthcare systems with Caché, I learned what I now consider the fundamental principle of database migration: you don't migrate the data — you add a new access layer.
250 Clients, Two Paradigms
At the height of my Caché work, I supported approximately 250 concurrent client connections to a healthcare system running 24/7. Some of those clients were MUMPS terminals accessing globals directly. Others were web applications querying SQL views. Both hit the same underlying data.
This taught me something that most database engineers never experience: the same data, accessed through different paradigms, reveals different problems. A query that was instant via global access ($GET(^PATIENT(12345,"NAME")) — direct key lookup) might be slow via SQL (SELECT name FROM patients WHERE id = 12345 — query parse, plan, execute, return). Conversely, an aggregate query trivial in SQL (SELECT AVG(glucose) FROM lab_results WHERE date > '2013-01-01') would require a full global traversal in MUMPS.
The lesson: storage models are access patterns. The same bytes on disk can be fast or slow depending on how you ask for them. Migration isn't about moving data — it's about changing which access patterns are fast.
The EAV Pattern: NoSQL Inside SQL Server
At Werfen (a multinational medical diagnostics company), I encountered the Entity-Attribute-Value (EAV) pattern in a Delphi application backed by SQL Server. The product, MediVector, managed medical device configurations for 40+ hospitals.
The EAV schema looked like this:
CREATE TABLE device_attributes (
entity_id INT, -- which device
attribute VARCHAR(100), -- what property
value NVARCHAR(MAX) -- the value (always a string)
);
-- A device might have:
-- (1001, 'model', 'ACLTOP 700')
-- (1001, 'serial_number', 'WF-2019-44821')
-- (1001, 'firmware_version', '3.2.1')
-- (1001, 'last_calibration', '2024-01-15')
-- (1001, 'hospital', 'Hospital del Mar')
-- (1001, 'department', 'Hematology')
If this looks familiar, it's because EAV is globals with SQL overhead. Same idea — entity + attribute path = value. But instead of the direct B-tree access of globals, every read goes through SQL's query parser, optimizer, and executor. You get the flexibility of key-value storage but pay the full cost of relational infrastructure.
The Migration: EAV to Proper REST
My job at Werfen was to migrate MediVector from the Delphi/EAV architecture to a .NET Core REST API with a proper normalized schema. The migration revealed the same pattern I'd seen with MUMPS → SQL:
- Discover the implicit schema. EAV tables have no schema — the schema lives in the application code. I had to read thousands of lines of Delphi to understand which attributes were always present (mandatory), which were optional, and which were calculated. This is exactly the same work I did reverse-engineering MUMPS globals
- Normalize gradually. Instead of a big-bang migration, I created SQL views that projected the EAV data as normalized tables. The old Delphi app continued writing EAV rows; the new .NET API read from the normalized views. Sound familiar? It's the Strangler Fig again — same pattern, different database
- Preserve access patterns. The old application was optimized for "show me all attributes of device X" (a single
WHERE entity_id = 1001query). The new schema was optimized for "show me all devices where firmware is outdated" (a proper indexed column query). Both had to work during the transition
The Circle Closes: SQLite at the Edge
Today, at Cadences Lab, our production database is Cloudflare D1 — SQLite running at the edge of Cloudflare's network. And here's the ironic twist: SQLite's storage engine is, at its core, a B-tree key-value store.
When you write INSERT INTO users (name, email) VALUES ('Ana', '[email protected]'), SQLite stores it as a key-value pair where the key is the rowid and the value is the serialized row data. The SQL layer is ceremony — useful, important ceremony that provides queries, types, constraints, and indexing. But underneath, it's key-value all the way down.
The full circle of my career:
| Era | Database | Storage | Access | Schema |
|---|---|---|---|---|
| 1997 | MSM/MUMPS | B-tree globals | Direct key | Convention |
| 2006 | Caché | B-tree globals + SQL | Key or SQL | Convention + projected |
| 2015 | SQL Server (EAV) | B-tree pages | SQL over key-value | Application code |
| 2018 | .NET/SQL Server | B-tree pages | SQL (normalized) | Explicit DDL |
| 2024 | D1/SQLite | B-tree pages | SQL at edge | Explicit DDL |
Every single one is a B-tree storing key-value pairs. The difference is the ceremony layer — how much structure the system imposes between your intention and the bits on disk.
What 25 Years of Key-Value Taught Me
1. Every migration is a schema discovery project
Whether you're migrating MUMPS globals, EAV tables, or a MongoDB collection to PostgreSQL, the hardest part is never the data transfer. It's understanding the implicit schema — the rules encoded in application logic that constrain what the data actually looks like, as opposed to what the database allows it to look like. Budget 60% of migration time for schema archaeology.
2. Dual-access transitions are the only safe path
Every successful migration I've done maintained both old and new access patterns simultaneously. Caché let globals and SQL coexist. My Werfen migration used SQL views over EAV. The MUMPS→SQL CDC engine kept both systems live. Never cut over in one step. The risk isn't the data — it's the access patterns you didn't know existed.
3. Storage model ≠ data model
MUMPS globals could store relational data. SQL Server with EAV stored document data. SQLite stores everything as key-value pairs under the hood. The storage model is an implementation detail. What matters is the data model — the logical structure of your domain — and whether your access patterns are aligned with it.
4. Schema flexibility is a spectrum, not a binary
MUMPS globals: maximum flexibility, zero enforcement. SQL with constraints: minimum flexibility, maximum enforcement. EAV: somewhere in between (schemaless values, but SQL structure around them). Modern systems like D1 with JSON columns occupy yet another point on this spectrum. Choose the flexibility level that matches your team's discipline. More flexibility demands more coding discipline.
5. The best database is the one you understand completely
MUMPS programmers who understood globals deeply built systems that ran for 30 years. SQL experts who understand query plans avoid the performance traps that catch everyone else. I've seen teams adopt MongoDB, PostgreSQL, or DynamoDB not because they understood the trade-offs, but because it was popular. Then they spent years fighting the database instead of using it.
My recommendation: start with SQLite. It's the purest expression of "key-value + SQL ceremony" available today. With D1, it runs at the edge. It has no configuration. No connection pooling. No cluster management. It forces you to understand the fundamentals — B-trees, indexes, query plans — without the complexity of distributed systems. When you outgrow it (and you may not — SQLite handles more than people think), you'll know exactly why you need something more complex.
6. The circle always closes
In 1997, I stored patient data as hierarchical key-value pairs in MUMPS globals. In 2024, I store application state as key-value pairs in D1's B-tree pages, accessed via SQL. The paradigms shift — relational, document, graph, time-series — but the fundamental operation remains: store a value at a key, retrieve it later, make it fast. If you understand this primitive deeply, every new database technology becomes a variation on a theme you already know.
The engineers who struggle with database migrations are invariably the ones who learned a specific database product instead of learning data storage primitives. Learn how B-trees work. Learn how write-ahead logs provide durability. Learn how indexes trade write performance for read performance. Then every database — from MUMPS globals to Cloudflare D1 — becomes a familiar instrument playing a different tune.
Tags
About the Author
Gonzalo Monzón
Founder & Lead Architect
Gonzalo Monzón is a Senior Solutions Architect & AI Engineer with over 26 years building mission-critical systems in Healthcare, Industrial Automation, and enterprise AI. Founder of Cadences Lab, he specializes in bridging legacy infrastructure with cutting-edge technology.
Related Articles
Edge Computing: Why We Bet Everything on Cloudflare (And What $65/Month Gets You)
No servers, no containers, no Kubernetes. We run 14+ interconnected products on 9 Cloudflare products — Workers, D1, R2, Durable Objects, Pages, KV, Vectorize, Workers AI and WAF. $65/month for what would cost $400-600 on AWS. Here's the full architecture.
SQLite Is the Production Database You Already Know (You Just Don't Know It Yet)
DHH hit 30,000 concurrent writers on SQLite. We run 14+ products on Cloudflare D1 (SQLite at the edge) for $5/month. SQLite isn't the toy database you think it is — it's powering everything from 37signals' ONCE to our entire multi-tenant platform. Here's why the industry is converging on the world's most deployed database.
Vanilla JS Is the Assembly Language of the Browser (And That's Why We Use It)
Gmail ships 20MB of JavaScript. Slack ships 55MB. Our cookie consent system? 4KB, zero dependencies. When you understand the low-level primitives — vanilla JS, pure CSS, raw DOM APIs — you don't need a framework to tell you what the browser already knows. But we're not framework haters: we use Astro for SSG and React islands where they genuinely help. The difference is that we choose our tools — they don't choose us.