Jenkins pipeline right-sizing

The problem

A large enterprise CI cluster ran 2,600 Jenkins pipelines on a uniform pod allocation: 8 GB RAM and 4 CPU each. Capacity incidents were a recurring operational theme. Queues built up. Builds occasionally failed for unclear scheduling reasons. The general suspicion was over-allocation, but nobody had numbers.

There was a structural reason for that. The Jenkins agent setup used long-lived, generic pods drawn from a static pool, and none of them advertised which job was currently running. Resource attribution per pipeline didn't exist. You could see that the cluster was loaded; you couldn't see by what.

So before sizing the bins, the diagnostic infrastructure had to be built. That ended up being the more interesting half of the project.

What I built

1. Per-build pod identity

Replaced the static label-based agent reference with the Kubernetes Jenkins plugin's declarative kubernetes { inheritFrom ... } block, combined with idleMinutes 0 and podRetention never(). This makes the plugin render a fresh pod spec at build start, with access to the live Run object. It interpolates the job name, build number, and a uniqueness hash into the pod name, and injects build metadata as pod annotations (runUrl, buildUrl).

The result: every Jenkins build runs in a single-use pod whose Kubernetes identity deterministically points back to a specific build. cAdvisor metrics now correlate to Jenkins runs.

2. Annotation scraping in OpenTelemetry

The pod-name path had a sharp edge: Kubernetes caps names at 63 characters and the plugin truncates from the front when job names are long, so regex-based extraction was lossy.

Extended the OTel collector's k8sattributes processor to promote the pod's annotations into metric labels (job_name from runUrl, job_run from buildUrl). This gave clean, structured identifiers, immune to truncation, queryable directly in Thanos.

3. Data pipeline

Built a Grafana panel with three queries that produce one row per build: max_over_time of working-set memory, max_over_time(rate(...)) of CPU time, and count_over_time × scrape interval as an approximate duration. CSV-exported the three tables, joined them in Polars on (job_name, job_run), computed per-pipeline distribution statistics (P50, P75, P90, P95, P99, max, mean) for both metrics.

The first surprise: median peak RAM was 441 MB against an 8 GB allocation. P95 across all pipelines was 999 MB. Average utilization of the existing allocation was around 8%.

4. Defensible bin design

The temptation here is k-means with k=3 and ship. K-means is the wrong tool: it's non-deterministic on 1D data, assumes equal-variance clusters (CI workloads are heavily right-skewed), and optimizes centroid distance rather than bin-fit. Jenks natural breaks is the right tool for 1D segmentation: deterministic, designed to minimize within-class variance, and cited as the standard method for choropleth binning for decades.

Final bins, each with ~50% safety overprovisioning above the measured P95 of the pipelines assigned to it:

Bin	RAM (req = limit)	CPU request	Target population	Share of population
XS	768 MiB	500 m	P95 ≤ 500 MB	~50%
S	1.25 GiB	1.0 core	P95 ≤ 1 GB	~44%
M	2 GiB	1.5 cores	P95 ≤ 1.5 GB	~5%
L	3 GiB	2.0 cores	P95 ≤ 2.5 GB	<1%

RAM request equals limit for Guaranteed QoS, preventing eviction under node memory pressure mid-build. CPU limit is deliberately unset, so pods can burst above their request when nodes are underutilised (CPU throttles gracefully; memory does not).

5. Centralised registry in a shared pipeline library

Sizing decisions live in a single YAML resource inside the organisation's shared Jenkins pipeline library, with a four-level resolution chain: disabled list → per-pipeline assignment → default_bin fallback → ultimate fallback to the original 8 GB template. An unmigrated sentinel keeps every pipeline not yet on the list running on the original template, so the migration is strictly opt-in. Every resolved template is echoed in the build console with its source (assigned:extra-small, default:unmigrated, disabled), so what the pipeline is actually getting is visible from the first line of build output.

Rollout discipline

The migration is intentionally slow and reversible.

Shadow mode. The resolver shipped to production before any pipeline was assigned to a smaller bin; every pipeline still resolved to the original 8 GB template. This validated the lookup path under real production load.
Canary. One pipeline I knew well, comfortably below the XS threshold, watched for a week. Build success, build duration, peak RAM, OOM events. JVM heap behaviour audited (any hardcoded -Xmx exceeding the bin's RAM would fail at startup).
Staged batches. Five pipelines, then twenty, then a hundred. Diverse teams in each batch so a team-specific failure mode (e.g. a shared JVM tuning convention) wouldn't take out one group entirely.

Things that went wrong (and how I caught them)

A case study with no surprises is a case study someone is hiding. Four real ones from this project:

Cloud-level retention overrode pipeline-level idleMinutes 0. Pods were sticking around for exactly 3 minutes + 5–20 seconds after each build. The 3 minutes was a default in the Jenkins cloud config that was winning against the declarative override. Fixed by adding podRetention never() explicitly and dropping the cloud default to 0.
The cloud-level concurrency cap caused cross-workload contention. During master rollout, unrelated pipelines couldn't spawn agents, because Jenkins was rejecting them because the 30-pod cloud-level cap was held by the ephemeral master pods churning through it. The fix wasn't to add more nodes; it was to understand which limit was binding.
Default bin pointing at an uncreated template would have broken every unmigrated pipeline. Caught in review: the initial default_bin: medium would have resolved to a pod template that didn't exist yet. Introduced the unmigrated sentinel that maps explicitly to the original 8 GB template, so opting out is the default instead of opting in.
Frontend pipelines blew the bin with a 2 GB SonarQube scanner. Pipelines that build frontend code trigger SonarQube to spawn a Node.js analysis process that can take around 2 GB on its own, enough to OOM a pipeline sized for its non-frontend peak. They are pinned to the original template through the opt-out list, which is exactly the escape hatch the registry was built to provide.

What this is worth

Moving utilization from 8% to ~60% on clusters of this scale represents real, ongoing infrastructure savings at a scale where the dollar figure is worth a conversation rather than a published number.

The takeaway I'd point at instead is structural: the most expensive part of work like this is usually the missing telemetry, not the missing decisions. Pod identity, attribution, percentile analysis, defensible bin design: those are the deliverable. The bins themselves fall out of the data once the data exists.