Skip to content

Performance · Infrastructure · Cost · Migrations

I make slow software fast, and expensive systems cheap.

I find what's actually slow, expensive, or hard to maintain, then ship the fix on the codebase you already have: a tuned hot path, a cheaper cluster, or a full rewrite when the architecture itself is the ceiling. Always with the numbers to prove it moved. As one worked example, this site walks the same haversine kernel from a naïve pandas .apply all the way down to a Zig + AVX-512 implementation sustaining 150 GB/s, with the maintenance cost of each rung made explicit.

Haversine · 1M pairs · 9950X3D

Δ 46k×

  1. V0
    pandas .apply
    -
  2. V1
    numpy vectorised
    229×
  3. V2
    Polars planner
    351×
  4. Z1
    Zig naïve scalar
    505×
  5. Z2
    Zig polynomial
    403×
  6. Z3
    Zig AVX2 SIMD4
    1.4k×
  7. Z4
    Zig AVX-512 SIMD8
    3.2k×
  8. Z5
    + multithreading
    21k×
  9. Z6
    + Estrin's scheme
    26k×
  10. Z7
    + FMA + pool
    46k×

What I've built

Real systems with real numbers. Click any card for the writeup.

Two acts on a parish-admin platform

Provstiskyen: optimising then rewriting a 10-year SaaS

50s → 18s

App startup

Two acts on a 44,000-line R Shiny platform that runs about half of Denmark's deaneries. Act I cut cold start from 50s to 18s and deploys from 35min to 80s on the existing codebase. Act II, once the architecture itself was the ceiling, is a full rewrite onto FastAPI, Polars, and React: performant by default, far more maintainable, with the legacy app retiring as the last module ports across.

DevOps Optimization Fullstack
R Shiny FastAPI Polars React TanStack Kubernetes MariaDB DragonflyDB Auth0

Enterprise CI cluster

Jenkins pipeline right-sizing

8% → 60%

RAM utilisation

Took 2,600 production pipelines from 8% to ~60% memory utilisation by building per-build telemetry, then designing bins from real percentile data. Same hardware, several multiples more headroom, no rewrite of any pipeline required.

DevOps Observability Optimization Data
Kubernetes Jenkins OpenTelemetry Thanos Grafana Polars Groovy

Equities backtester + risk budget

Thoth

DSR + CI

Selection-bias-aware ranking

An equities backtester built with the statistical honesty of a real-money system: selection-bias-aware deflated Sharpe with explicit trial count, stationary block-bootstrap confidence intervals, predicted-vs-realized calibration against a live trade journal, correlation-adjusted quarter-Kelly sizing with heat / sector / currency caps. The engine work (pure Polars strategies, vectorised regime detection with hysteresis, threaded scanner) is what makes the trust layer cheap enough to actually run every morning.

Optimization Fullstack Data
FastAPI Polars PostgreSQL React Vite TanStack

Anonymised, long-catalogue specialty e-commerce

Inventory decision engine

+27%

Pinball-90 tail forecast lift

Replacing a legacy 121K-line per-SKU integer-programming procurement system, whose actual demand forecaster was this-year-over-last-year, with a two-stage decision engine: a LightGBM quantile demand forecaster feeding a HiGHS LP capital allocator. A four-way ablation cleanly attributes wins between the forecaster and the allocator, on a simulation engine that runs ~30× faster than the Python-idiomatic baseline and is locked by nine source-level invariants.

Optimization Fullstack Data
FastAPI Polars LightGBM HiGHS MariaDB MongoDB React TanStack

This site

Tachyon

9,840 ns → 210 ps

Python V0 → Zig V7 per pair

The same haversine kernel walked from a naïve pandas `.apply` through C++, Rust, Zig SIMD, and finally an analyzer-driven V7 in Zig that reads its own compiled assembly to land at 150 GB/s, plus a WebGPU compute lab in the browser. End-to-end demo of the optimisation work I do for clients.

Optimization DevOps Fullstack
Python Zig Rust C++ WebGPU FastAPI Astro Fly.io

Horus / Neper / Maat

Home GitOps cluster

4 nodes

ARM64 GitOps cluster

Bare-metal Kubernetes on 4× Raspberry Pi 4 with Flux, Cilium, Tailscale, an in-cluster Zot registry, and MinIO. Hands-on platform engineering: the same GitOps patterns I apply to bigger clusters at work.

DevOps Fullstack
Kubernetes Flux Cilium Tailscale MinIO Zot ARM64

How I work

  • Measure first. The diagnostic usually delivers more value than the fix, because most teams never had numbers to argue from.
  • Cheap fix before expensive rewrite. A targeted profile and a 100-line change ships in days. A rewrite takes quarters and might not converge. Most slow code has a cheap fix waiting in the existing codebase, and the discipline is finding it.
  • Defensible methods. Numbers come from real percentile distributions, not vibes; every choice of tool gets justified on paper before it goes anywhere near production.
  • Phased rollout with rollback. Shadow mode first, then a single canary, then staged batches. Every step has a defined revert path before it ships.

Now

What's next

Day-job DevOps work is the bulk of the week. Outside of that, I keep pushing on Provstiskyen's Analyse module: the last piece before the legacy R Shiny app can finally retire.

If something you run is slower than it should be,  get in touch .