Engineer on the optimisation arc
Provstiskyen: performance work on a 10-year SaaS
Profiled and fixed the cold-start path on a 44,000-line R Shiny production app: 50-second logins down to 18, and 35-minute deploys down to 80 seconds, all on the existing codebase. The full rewrite that came later was made possible by a year of targeted optimisation work first.
50s → 18s
App startup time
35 min → 80 s
CI build time
~44k
Lines of R in the legacy app
17
Modules migrated to the new platform
The starting point
Provstiskyen is a Danish administration platform for parish councils: the bookkeeping, reporting, and appropriations workflow that runs ~40 of the country's ~100 deaneries. It had been built and maintained for roughly ten years by the founder, single-handed, in R and Shiny, with ShinyProxy hosting one R process per user on a 16-core / 64 GB host. (R is a popular default for business and biology in Denmark; less so for serving a production multi-tenant web app to thousands of users.) It worked. It had steady customers. It was a success-at-scale problem: every new user added a fixed per-user resource bill, and the cold-start experience was rough.
I joined as the first hired engineer in May 2024, originally to help move the platform to Kubernetes. The remit grew from there. Across the next year and a half I led three successive waves of work: an infrastructure migration, a deep performance and build-pipeline optimisation pass on the existing R Shiny app, and finally, once the architecture itself had become the ceiling, a full rewrite onto a modern stack designed to carry the platform another decade. The optimisation work is what matters most about this case. The rewrite is the natural follow-on once the existing system has been measured, profiled, and pushed as far as its architecture allows; the optimisation work would have shipped on its own merits even if the rewrite had never been authorised.
Wave 1: Kubernetes migration (May 2024)
The flat 16-core host was replaced with Google Kubernetes Engine. Pods scale up on demand and scale to zero when nobody is logged in. The architecture didn't change yet, but the bill did: from a fixed monthly cost regardless of usage, to paying only for what the platform actually serves. Same app, same code, lower floor.
Wave 2: Three optimisations on the existing R Shiny app
Kubernetes solved the cost-and-elasticity story. Cold-start was still the visible problem. Users felt every second of the 50-second login wait, and deploys took long enough that we were shipping monthly instead of daily. The next year was a sustained optimisation pass on the legacy stack, before any rewrite.
Pre-warmed pod pool
ShinyProxy normally spins up a fresh R process on user login. That process is the 50-second cost. I changed the cluster topology to keep a small pool of warm pods ready ahead of demand: when a user logs in, an already-running pod is claimed instantly and a new warm one is started in the background. The login latency the user experiences becomes the speed of the load-balancer redirect, not the speed of R booting.
The cost trade-off is real: you pay for the warm pods that aren't being used yet. But the pool size is small relative to total capacity, and the UX win is dramatic.
Base image: 35-minute builds → 80 seconds
Every deploy of the legacy app reinstalled the entire R-package dependency tree from source: about 35 minutes per build, which made shipping anything during the workday painful. I split the Dockerfile into two layers: a base image with R and all package dependencies installed once (rebuilt only when the dependency manifest changes), and a thin app layer on top that contains only the source code.
Cold base-image rebuilds still take 35 minutes; routine app rebuilds run in roughly 80 seconds. That's a 26× drop on the hot path, and it unblocked the deploy cadence. We went from monthly to multiple times per day without trying.
Flame graph + Polars + a single stored procedure
Startup time was 50 seconds even after the pod-pool work, because every newly-claimed pod had to load the app's data into memory before serving the first page. I profiled the startup path with R's flame-graph tooling and found two things worth fixing:
- Many separate database round-trips during the initialisation phase. Each one was small and harmless on its own, but they were sequential, and sequential network round-trips against MariaDB stack up fast. I consolidated them into a single stored procedure that returns every table the app needs in one response.
- Two compute-heavy R functions on the startup path: financial data transformations the app needed before any UI could render. I ported both to Polars (the Rust-based dataframe library, called from R via its arrow integration). More than 10 seconds saved on those two functions alone.
Combined effect on warm-pool-claimed pods: 50 seconds → 18 seconds of startup, on top of the pre-warming already giving users an instant pod.
Wave 3: The platform rewrite (July 2025 onwards)
After a year of optimising R Shiny, it became clear the architecture itself had a
ceiling. Per-user processes don't scale to ten times the user count, the R ecosystem
for web-first concerns (auth, tenancy, caching) is thinner than the Python or
TypeScript equivalents, and a ~44,000-line R codebase (the legacy app is one large
app.R plus the surrounding modules and helper scripts) was slowing
iteration on new features. I proposed a from-scratch rewrite. The framing I used to
argue for it was that this was about the right tool for where we're going,
not that the old was wrong.
The stack I chose, with the reasoning:
| Layer | Choice | Why |
|---|---|---|
| Backend | FastAPI + Polars | Async-friendly, automatic OpenAPI, dataframe operations that beat anything R can offer for the analytics workloads. |
| Frontend | Astro 6 + React 19 + TanStack Query | Static-first shell, React islands only where pages need interactivity. Charts via Plotly.js with WebGL fallback for older hardware. |
| Database | MariaDB (Cloud SQL) | Same engine as the legacy app — the shared production database is the strangler-fig bridge while modules migrate one at a time. |
| Cache | DragonflyDB | Drop-in Redis protocol, much higher throughput per node. Cache-aside pattern with Arrow IPC serialisation. |
| Auth | Auth0 | Separate staging vs production tenants so local development can't ever touch real user data. |
| Hosting | GKE | Already in place from Wave 1; no reason to change. |
Migration strategy: Strangler Fig
A big-bang rewrite of a live system is irresponsible at this scale. Instead, the NGINX ingress sits in front of both apps and routes per-feature: new module is live → ingress sends that path to the new platform → old module gets switched off. The legacy app shrinks as the new one grows. There is never a "migration weekend." Users don't see the migration happening to them.
Backend architecture
Dumb router, smart service. Routers do HTTP semantics only (request
parsing, status codes, content types). Services own business logic and call data
loaders. Data loaders are split: BaseDataLoader abstract → MariaDBDataLoader
for SQL → CachedDataLoader wrapping the above with DragonflyDB cache-aside.
Each layer is independently testable; tests mock at the data-loader boundary so real
service + router code runs against known fixtures.
Single-page-per-island. Every Astro page renders exactly one
client:only="react" component that wraps an AuthenticatedLayout. Two
islands on a page would mean two React trees with disjoint Auth0 / QueryClient /
nanostore contexts: a class of bug we ran into early and locked out structurally.
Snapshot-immutable analysis runs. The Analyse module (the last
major piece, currently in build) runs four engine modules (buildings, activities,
cemeteries, administration) against a shared cached data loader. Each run's input
config and output blob are persisted together (gzipped JSON in the
analysis_runs.results column), so two runs can be diffed to explain
why their outputs differ. Outputs are parity-tested against a fixture from the
legacy R app so we know the new engines agree with the old ones to within tolerance.
Where things stand today
17 modules are migrated and live on the new platform. The Analyse module's four sub-engines are implemented and parity-tested; the remaining work is the parsonages engine and the cross-module summary. 271 backend tests pass (non-integration). The access matrix is locked behind a parametrised test suite covering 11 user personas against 13 endpoint groups (149 tests total) so no permission regression can ship without flagging itself.
DragonflyDB hit rates run above 95%. Sub-second response times across every migrated module. The legacy R app is on a defined sunset path; we expect it switched off within the next two quarters.
What I think this case shows
Three things, mostly:
- Optimisation first, rewrite later (or never). The R Shiny app got twelve months of focused performance and infrastructure work before any rewrite began. The 50-second login was 18 seconds before a single FastAPI route existed. The optimisation wins were the headline outcome; the rewrite is the natural follow-on once the architecture itself is the bottleneck.
- A 26× build-time win is the most valuable optimisation you can do for a small team, because it changes how the team works. Shipping monthly versus shipping daily isn't 30× faster delivery; it's a different culture.
- The "boring" stack is the right answer for a 10-year platform. FastAPI + Polars + Astro + React isn't novel. It's specifically chosen because it can be built on for many years without bet-the-company technology decisions.
Related work
This site
Tachyon
The same haversine kernel walked from a naïve pandas `.apply` through C++, Rust, Zig SIMD, and finally an analyzer-driven V7 in Zig that reads its own compiled assembly to land at 150 GB/s, plus a WebGPU compute lab in the browser. End-to-end demo of the optimisation work I do for clients.
Optimization DevOps FullstackEnterprise CI cluster
Jenkins pipeline right-sizing
Took 2,600 production pipelines from 8% to ~60% memory utilisation by building per-build telemetry, then designing bins from real percentile data. Same hardware, several multiples more headroom, no rewrite of any pipeline required.
DevOps Observability Optimization