Skip to content

Performance · Optimization · Engineering

I make slow software fast.

Performance work across the stack: algorithms, application code, build pipelines, infrastructure. I profile what's actually slow and ship the fix on the codebase you already have. As one worked example, this site walks the same haversine kernel from a naïve pandas .apply all the way down to a Zig + AVX-512 implementation sustaining 150 GB/s, with the maintenance cost of each rung made explicit.

Haversine · 1M pairs · 9950X3D

Δ 46k×

  1. V0
    pandas .apply
  2. V1
    numpy vectorised
    229×
  3. V2
    Polars planner
    351×
  4. Z1
    Zig naïve scalar
    505×
  5. Z2
    Zig polynomial
    403×
  6. Z3
    Zig AVX2 SIMD4
    1.4k×
  7. Z4
    Zig AVX-512 SIMD8
    3.2k×
  8. Z5
    + multithreading
    21k×
  9. Z6
    + Estrin's scheme
    26k×
  10. Z7
    + FMA + pool
    46k×

What I've built

Real systems with real numbers. Click any card for the writeup.

How I work

  • Measure first. The diagnostic usually delivers more value than the fix, because most teams never had numbers to argue from.
  • Cheap fix before expensive rewrite. A targeted profile and a 100-line change ships in days. A rewrite takes quarters and might not converge. Most slow code has a cheap fix waiting in the existing codebase, and the discipline is finding it.
  • Defensible methods. Numbers come from real percentile distributions, not vibes; every choice of tool gets justified on paper before it goes anywhere near production.
  • Phased rollout with rollback. Shadow mode first, then a single canary, then staged batches. Every step has a defined revert path before it ships.

Now

What's next

Day-job performance-tuning work is the bulk of the week. Outside of that, I keep pushing on Provstiskyen's Analyse module: the last piece before the legacy R Shiny app can finally retire.

If something you run is slower than it should be,  get in touch .