Performance · Infrastructure · Cost · Migrations
I make slow software fast, and expensive systems cheap.
I find what's actually slow, expensive, or hard to maintain, then ship the
fix on the codebase you already have: a tuned hot path, a cheaper cluster,
or a full rewrite when the architecture itself is the ceiling. Always with
the numbers to prove it moved. As one worked
example,
this site
walks the same haversine kernel from a naïve pandas
.apply all the way down to a Zig + AVX-512 implementation
sustaining 150 GB/s, with the maintenance cost of each rung made
explicit.
Haversine · 1M pairs · 9950X3D
Δ 46k×
- V0 pandas .apply-
- V1 numpy vectorised229×
- V2 Polars planner351×
- Z1 Zig naïve scalar505×
- Z2 Zig polynomial403×
- Z3 Zig AVX2 SIMD41.4k×
- Z4 Zig AVX-512 SIMD83.2k×
- Z5 + multithreading21k×
- Z6 + Estrin's scheme26k×
- Z7 + FMA + pool46k×
What I've built
Real systems with real numbers. Click any card for the writeup.
Two acts on a parish-admin platform
Provstiskyen: optimising then rewriting a 10-year SaaS
50s → 18s
App startup
Two acts on a 44,000-line R Shiny platform that runs about half of Denmark's deaneries. Act I cut cold start from 50s to 18s and deploys from 35min to 80s on the existing codebase. Act II, once the architecture itself was the ceiling, is a full rewrite onto FastAPI, Polars, and React: performant by default, far more maintainable, with the legacy app retiring as the last module ports across.
Enterprise CI cluster
Jenkins pipeline right-sizing
8% → 60%
RAM utilisation
Took 2,600 production pipelines from 8% to ~60% memory utilisation by building per-build telemetry, then designing bins from real percentile data. Same hardware, several multiples more headroom, no rewrite of any pipeline required.
Equities backtester + risk budget
Thoth
DSR + CI
Selection-bias-aware ranking
An equities backtester built with the statistical honesty of a real-money system: selection-bias-aware deflated Sharpe with explicit trial count, stationary block-bootstrap confidence intervals, predicted-vs-realized calibration against a live trade journal, correlation-adjusted quarter-Kelly sizing with heat / sector / currency caps. The engine work (pure Polars strategies, vectorised regime detection with hysteresis, threaded scanner) is what makes the trust layer cheap enough to actually run every morning.
Anonymised, long-catalogue specialty e-commerce
Inventory decision engine
+27%
Pinball-90 tail forecast lift
Replacing a legacy 121K-line per-SKU integer-programming procurement system, whose actual demand forecaster was this-year-over-last-year, with a two-stage decision engine: a LightGBM quantile demand forecaster feeding a HiGHS LP capital allocator. A four-way ablation cleanly attributes wins between the forecaster and the allocator, on a simulation engine that runs ~30× faster than the Python-idiomatic baseline and is locked by nine source-level invariants.
This site
Tachyon
9,840 ns → 210 ps
Python V0 → Zig V7 per pair
The same haversine kernel walked from a naïve pandas `.apply` through C++, Rust, Zig SIMD, and finally an analyzer-driven V7 in Zig that reads its own compiled assembly to land at 150 GB/s, plus a WebGPU compute lab in the browser. End-to-end demo of the optimisation work I do for clients.
Horus / Neper / Maat
Home GitOps cluster
4 nodes
ARM64 GitOps cluster
Bare-metal Kubernetes on 4× Raspberry Pi 4 with Flux, Cilium, Tailscale, an in-cluster Zot registry, and MinIO. Hands-on platform engineering: the same GitOps patterns I apply to bigger clusters at work.
How I work
- Measure first. The diagnostic usually delivers more value than the fix, because most teams never had numbers to argue from.
- Cheap fix before expensive rewrite. A targeted profile and a 100-line change ships in days. A rewrite takes quarters and might not converge. Most slow code has a cheap fix waiting in the existing codebase, and the discipline is finding it.
- Defensible methods. Numbers come from real percentile distributions, not vibes; every choice of tool gets justified on paper before it goes anywhere near production.
- Phased rollout with rollback. Shadow mode first, then a single canary, then staged batches. Every step has a defined revert path before it ships.
Now
What's next
Day-job DevOps work is the bulk of the week. Outside of that, I keep pushing on Provstiskyen's Analyse module: the last piece before the legacy R Shiny app can finally retire.
If something you run is slower than it should be, get in touch .