What I learned from one million accounts

Two years as tech lead at a fintech that scaled past a million accounts. The lesson wasn't about scale. It was about the difference between code that runs and code that earns trust.

Sometime in early 2021, around two in the morning, I sat at the kitchen table with my laptop, looking at a dashboard showing roughly 4,800 customer accounts in a state that should not have existed. Their balances disagreed with their transaction histories by amounts ranging from a few cents to several thousand. The discrepancies were concentrated in the previous six hours. And I knew, looking at the dashboard, that the root cause had been live in production for three days.

That night taught me the difference between code that runs and code that earns trust. The two are not the same thing. The first is what you ship. The second is what your customers are actually paying for, and what you are actually responsible for, even when nobody talks about it that way.

Two years as tech lead at a fintech that scaled past a million accounts gave me five lessons that I now bring to every engagement. The lessons sound obvious read in a list. They are very hard to actually do. The cost of not doing them is paid quietly for months and then spectacularly all at once.

Code that runs vs code that earns trust

Code that runs ships features, passes tests, doesn't crash under normal load. The CI is green. The deploy succeeded. Customers can use the thing. By every metric a typical software project tracks, the code is fine.

Code that earns trust still does the right thing when reality refuses to behave. When three subsystems are talking to each other and one was just deployed five minutes ago. When a network partition splits write traffic for ninety seconds and heals itself. When a backfill is mid-flight. When the clock is wrong on one server. When a retry loop fires twice. When customer data has a shape you didn't anticipate. When the model your AI feature depends on silently changes between Tuesday and Wednesday. (See eval harnesses are the new test suite for the AI version of this.)

Most code is never tested for any of this. You find out about it in production. The fintech taught me to test for it on purpose, before shipping, because the cost of finding out in production with real money was unacceptable.

Lesson one: idempotency at every layer

Every operation that touches money must be safe to repeat. This sounds simple read in a sentence. It is structurally hard to enforce across an entire system.

An example: a customer makes a payment. The request goes through the API, hits the payment processor, succeeds. The processor sends a webhook back to confirm. The webhook is retried by the processor for resilience — they cannot be sure the first attempt landed, so they send three. If your code charges the customer once per webhook received, you charged them three times. The customer notices. Compliance notices. Your morning is bad.

The fix is not "be careful." The fix is structural: every endpoint, every job, every webhook handler is designed from the ground up to be safely re-runnable. Each request carries an idempotency key. Every state change is wrapped in a deduplicated insert. Retries are explicit and bounded. The system is built on the assumption that any operation might be invoked twice, and proves correctness in that assumption.

The cost is care during design — you write more code, you think about edge cases, you build deduplication tables. The benefit is sleeping at night. At a million accounts, processing thousands of payments per minute, the assumption that "things only happen once" is not safe. Build for the world that exists.

Lesson two: the audit trail is the actual product

I used to think the product was the app — the screens customers see, the API endpoints partners integrate with, the dashboard that shows balances. The fintech taught me that the actual product was a complete, accurate, queryable record of every change to every account, ever, with attribution.

Why? Because everyone needed it. Customer support needed it to answer "what happened to my account on October 17th." Compliance needed it to satisfy regulators. Engineering needed it to debug the dashboard mismatch from the opening of this piece. Auditors needed it for the annual review. The board needed it for board meetings. Without it, every one of these conversations was a slow, error-prone reconstruction from logs.

Once you accept that the audit trail is the product, system design changes:

Append-only event logs, not just the latest state.
Every state derived from the event stream, never directly mutated.
Every change attributed to an actor (user, system, job, webhook), with a timestamp, a reason, and a correlation ID linking to the originating request.
Every backfill that adjusts historical state recorded as its own event, with provenance.

This is more code. It is structurally more disciplined than the typical mutate-the-row pattern most apps use. It is also the difference between "we'll look into it and get back to you" and "here's exactly what happened to your account at 14:23:08 GMT, who or what triggered it, and the resulting state, with a permalink."

The first answer loses customers slowly. The second answer keeps them.

Lesson three: backfills, migrations, and the cost of a bad afternoon

At a million accounts, a ten-minute "maintenance window" cost real money — failed transactions, support tickets, a small public-relations problem on Twitter. By the time we had grown past half a million accounts, downtime was no longer something we could schedule. Migrations had to be online. Backfills had to be incremental. Schema changes had to be backward-compatible with the running version of the application during the rollout window.

The discipline that emerged:

Always shadow-write, then read. Never atomic-flip a column. The new column is written alongside the old one, the read path is updated to prefer the new one, the old one is decommissioned only after every read confirms parity.
Backfills run in batches with rate limiting and resume-from-last-success checkpoints. A backfill that fails at row 850,000 of 1,000,000 must be able to pick up at row 850,001, not start over.
Every migration tested on a production-shaped staging environment with realistic data volume, realistic write traffic, realistic read traffic. "It worked locally" is not a useful signal at scale.
Roll-forward, not roll-back. By the time you've shipped a migration to a million-account system, the rollback path is more dangerous than fixing forward. Plan for forward.

The discipline of "we never take downtime" forced a level of architectural care that most projects never need. It is also a discipline that, once internalized, applies at every scale. Even a small SaaS with a few hundred accounts benefits from idempotent migrations and resumable backfills. The cost of building it that way is small in the early days. The cost of not building it that way grows nonlinearly with the account count.

Lesson four: observability before features

Before we shipped a feature, we shipped its observability. Logs structured into queryable formats. Metrics tagged with the dimensions you'd actually want to filter by. Traces correlated end-to-end across services. Dashboards built before launch, not after the first incident.

You cannot fix what you cannot see. You cannot see what your application does not surface. The instrumentation has to come first.

This sounds obvious. It is also commonly skipped, because it costs time during the development of the feature and produces no immediately visible value. Then the feature ships, an incident happens, and the postmortem reads "we were unable to determine the root cause because we lacked the necessary instrumentation." The cost of that conversation is much higher than the cost of building the dashboards before launch.

The rule I now use, scaled down to my current solo practice: a feature is not done when the code is shipped. A feature is done when I can observe whether it is behaving correctly without asking the customer. If I cannot answer that question from a dashboard, the feature is not done.

Lesson five: the team's shared mental model

A million-account fintech does not run on individuals. It runs on a team that knows how each member thinks under pressure. When the dashboard lights up at 02:14 on a Thursday morning, the engineer on call needs to know — without asking — what the senior would do, what the lead would prioritize, where to look first, and which person to wake up if it's worse than expected.

This shared mental model is not built in an offsite. It is built through:

Pre-mortems before launches. "What's the worst-case failure mode here? What's our response if it happens?"
Post-mortems after every incident. Blameless. Recorded. Indexed. The lessons compound across years.
Documented runbooks for every common failure. "If this alert fires, do these things in this order." Reduces the cognitive load on the responder during the worst possible time to be cognitively loaded.
Real conversations about real incidents. Not just the postmortem document — the part where you sit with the team and talk about what felt confusing, what felt scary, what someone wishes they had known.

The team's shared knowledge is the resilience. Documents support it. Tools support it. But the actual store of knowledge is in the heads of the people, and the practice of keeping that knowledge synchronized is the practice of operating a serious system.

I run solo now. The shared mental model is between me and the named tools (Cassiopeia, Atlas, Cartographer, Sextant, Polaris) and the documented runbooks I keep for myself. The discipline is the same. The number of minds is different.

What this means for engagements now

Every engagement I run today inherits these five lessons, even when the system involved is much smaller than a million accounts. The discipline does not require scale. It requires the willingness to apply care that most projects defer until "we're bigger."

Idempotency on every endpoint that touches money or state. Even on a 100-customer client portal, the webhook handler is built to deduplicate. The cost is trivial. The protection is not.
Append-only audit trails on everything that matters. The client portal I provision on day one of every engagement has a complete event log of every change to every record, with attribution.
Online migrations as default. If a schema change needs downtime, it gets re-designed until it doesn't.
Observability before features. Every AI feature has its eval harness. Every API endpoint has its dashboard. Every job has its alert.
A single shared mental model. In the solo practice, that mental model lives in the runbooks I keep, the naming of the tools I use, and the discipline of postmortems even when the only postmortem participant is me.

The lessons learned at a million accounts apply at a hundred accounts. The cost of applying them is small at small scale. The cost of not applying them is paid invisibly until something breaks, at which point the cost shows up all at once.

The closing argument

Code that runs is the table-stakes work — it ships, it passes tests, it doesn't crash. Code that earns trust is the work that stays right when reality refuses to behave. The difference shows up most clearly at scale, but the discipline that produces it can be applied at any scale, and should be.

What scaled past a million accounts wasn't the architecture. The architecture was the consequence. What scaled was the team's shared insistence that every line of code be the kind of code customers can rely on without thinking about it. That insistence is portable. It travels. It's the discipline I now bring to every engagement, even — especially — the small ones, because they are how the discipline is practiced before it gets tested at scale.

The 4,800 accounts I sat looking at one Tuesday morning in early 2021 were eventually all reconciled. The bug that caused the discrepancies was traced (a missing idempotency check in a webhook retry path), fixed, and tested. No customer lost a cent. The audit trail let us reconstruct exactly what had happened to each account and explain it to anyone who asked. The dashboard let us see the discrepancies as they appeared, before any customer noticed.

That is what code that earns trust looks like. It is more work to build. It is the only kind worth shipping when real things are on the line.

— Mikkel, Bangkok