Skip to content

Performance · Optimization · Engineering

I make slow software fast.

Performance work across the stack: algorithms, application code, build pipelines, infrastructure. I profile what's actually slow and ship the fix on the codebase you already have. As one worked example, this site walks the same haversine kernel from a naïve pandas .apply all the way down to a Zig + AVX-512 implementation sustaining 150 GB/s, with the maintenance cost of each rung made explicit.

What I've built

Real systems with real numbers. Click any card for the writeup.

Engineer on the optimisation arc

Provstiskyen: performance work on a 10-year SaaS

50s → 18s

App startup

Profiled and fixed the cold-start path on a 44,000-line R Shiny production app: 50-second logins down to 18, and 35-minute deploys down to 80 seconds, all on the existing codebase. The full rewrite that came later was made possible by a year of targeted optimisation work first.

Optimization Fullstack DevOps
R Shiny FastAPI Polars Docker GKE MariaDB

Backtesting engine and scanner

Thoth

<2 ms

per-ticker backtest

Hot-loop discipline at the small scale: a 13-strategy backtest of the US equities universe finishes in seconds. Pure Polars expressions, threaded bulk runner, regime-gated strategies. The kind of code-level performance work I bring to bigger systems.

Optimization Fullstack Data
FastAPI Polars PostgreSQL TimescaleDB Astro React

Anonymised, Danish specialty retailer

Inventory simulation arena

9 invariants

hot-loop locks

Polars-based simulation engine where seven candidate inventory strategies compete on years of real demand data. Nine source-level invariants guard the hot loop from accidental refactor regressions: the optimisation contract is written into the test suite.

Optimization Fullstack Data
FastAPI Polars MariaDB React Pixi Vite

This site

Tachyon

9,100 → 0.29 ns/pair

Python V0 → Zig V7

The same haversine kernel walked from a naïve pandas `.apply` through C++, Rust, Zig SIMD, and finally an analyzer-driven V7 in Zig that reads its own compiled assembly to land at 150 GB/s, plus a WebGPU compute lab in the browser. End-to-end demo of the optimisation work I do for clients.

Optimization DevOps Fullstack
Python Zig Rust C++ WebGPU FastAPI Astro Fly.io

Horus / Neper / Maat

Home GitOps cluster

4 nodes

ARM64 GitOps cluster

Bare-metal Kubernetes on 4× Raspberry Pi 4 with Flux, Cilium, Tailscale, an in-cluster Zot registry, and MinIO. The infrastructure layer of the optimisation work; same patterns I apply to bigger clusters at work.

DevOps Fullstack
Kubernetes Flux Cilium Tailscale MinIO Zot ARM64

How I work

  • Measure first. The diagnostic usually delivers more value than the fix, because most teams never had numbers to argue from.
  • Cheap fix before expensive rewrite. A targeted profile and a 100-line change ships in days. A rewrite takes quarters and might not converge. Most slow code has a cheap fix waiting in the existing codebase, and the discipline is finding it.
  • Defensible methods. Numbers come from real percentile distributions, not vibes; every choice of tool gets justified on paper before it goes anywhere near production.
  • Phased rollout with rollback. Shadow mode first, then a single canary, then staged batches. Every step has a defined revert path before it ships.

Now

What's next

Day-job performance-tuning work is the bulk of the week. Outside of that, I keep pushing on Provstiskyen's Analyse module: the last piece before the legacy R Shiny app can finally retire.

If something you run is slower than it should be,  get in touch .