Why Hardware Projects Overrun by 12+ Months
Hardware programs don't slip because engineers are slow. They slip because decisions are deferred. The four structural reasons, and how to spot them.
A hardware program does not overrun by twelve months in one event. It overruns a week at a time, every week, for a year, and nobody sees it until the ship date has moved twice and the CM is asking for a new window.
The conventional explanation is execution. Engineers were slow. A supplier went sideways. A bug was harder than expected. Each is true in the small. None is the cause.
Hardware programs slip because decisions are deferred. Every week a decision stays open, the cost of making it rises, and the work that cannot start without it piles up. By the time the slip is visible, the decision that caused it was avoided six months earlier. From Prototype to Production covers the five decisions that most often get deferred on the way to first shipment. This article covers the mechanics of how they get deferred, and what to do about it.
The twelve-month overrun is not an engineering problem. It is a decision problem. Teams that ship on schedule are not faster. They are more decisive. They close questions early, even when the answer is provisional, and refuse to let parallel work drift while the question stays open.
Hardware projects don’t slip because engineers are slow
Walk into a slipping program and ask where the delay is. You will get a coherent story. Mech is waiting on a tolerance decision. Firmware is waiting on a board rev. Procurement is waiting on a locked BOM. The CM is waiting on a test fixture spec. Everybody is blocked on somebody else, and the central question — which decision is actually open — has no owner.
This is not engineers being slow. It is the program not converging. Each discipline waits for a call that never comes, optimises locally, and the system drifts. The hardware product development process, reframed describes why phase charts make this worse, not better.
Four structural causes show up on every slipping program. Each has a tell, and a fix that has to be applied before the slip is visible on the Gantt chart.
Questions stay open for weeks because nobody owns the call. Parallel work drifts while the decision waits.
Mech, electronics, and firmware each optimise locally. The interfaces between them rot.
A bug bounces between teams for weeks. No single owner. No instrumented reproduction. No forward progress.
A BOM that was never properly scrubbed. Lead times, EOL, MOQ, second sources — all assumed, none verified.
Cause 1 — Decision latency
The most common slip driver is a decision that should have taken a day sitting open for six weeks.
A question surfaces — which connector family, which processor, where the split between real-time and supervisory control lives. It gets raised in an architecture review. Somebody says “let’s think about it.” The meeting ends with no owner and no date. Two weeks later the same question is on the agenda. The team has spent those weeks producing more context, not making the call.
The tell: an architecture review runs ninety minutes with no action items assigned. “We should probably” appears three or more times. The same question appears on two consecutive weekly agendas.
The cost: every open decision blocks downstream work. Firmware cannot commit to a memory footprint. Mech cannot commit to an envelope. Procurement cannot get real quotes. Each discipline stops — which shows up as slip — or guesses and continues, which shows up as rework.
The fix: every open question gets a named owner, a decide-by date, and a default. “If this is not decided by Friday, we default to option A.” The default forces a call, because nobody wants it accidentally locked in.
The most expensive deferred decisions are architecture calls — what runs on what processor, where the real-time boundary sits. We cover the specific case of a Raspberry Pi in a production architecture in When Raspberry Pi Strands Production. Programs that defer this one call routinely lose six months to it.
Cause 2 — Parallel drift between disciplines
A healthy program converges at interfaces: the PCB fits the enclosure, the firmware fits the memory, the test rig fits the line. A slipping program drifts. Each discipline optimises locally, and the interfaces rot silently.
The typical case: mech freezes an enclosure based on a PCB outline from three weeks ago. In those weeks, electrical added a heatsink, moved a connector, widened a board by four millimetres for EMC. Nobody flagged it. Mech finds out at the next fit check. A week of work becomes scrap.
The tell: the last cross-discipline integration review was more than two weeks ago. Disciplines exchange artefacts on ad-hoc request rather than on a cadence. “I thought you knew” appears in retro notes.
The cost: drift surfaces as rework (a board re-spin) or as late surprise (the unit doesn’t fit, firmware doesn’t boot on the shipped rev). Integration debt compounds super-linearly.
The fix: a weekly cross-discipline integration review with a single owner — usually the technical lead — who holds the interface spec and forces every change through it. The spec is a living document. Every field has an owner. No discipline commits to work against a pre-change interface without explicit re-baselining.
Unglamorous. Also the highest-leverage meeting on a hardware program.
Cause 3 — The firmware/hardware blame cycle
A bug appears at the firmware/hardware seam — a timing glitch, an intermittent I2C error, a sensor that drifts with temperature. Firmware says it’s hardware. Hardware says it’s firmware. Neither team has the instrumentation to prove it. The bug bounces for three weeks.
Somebody senior then takes an afternoon, hooks up a logic analyser, and finds the answer in an hour. Usually both — a marginal hardware behaviour the firmware was not defensive against. The three weeks are gone.
The tell: a bug ticket has more than two status changes between “hardware” and “firmware” with no new data in between. No shared reproduction setup with instrumented I/O. The last scope trace is more than a week old.
The cost: seam bugs are the single most common source of multi-week unplanned slip in the DVT-to-PVT window. Left unresolved, they collapse trust between teams, which slows everything.
The fix: no bug gets reassigned across the boundary without a reproduction on an instrumented bench. Logic analyser, scope, or firmware trace attached to the ticket. If neither team can reproduce it deterministically, that is the work. Guessing whose problem it is does not count as debugging.
A program we inherited had twenty-three open “intermittent” seam bugs. Six weeks of disciplined repro closed nineteen. Four were genuinely hard. The rest were visibility problems.
Cause 4 — Parts of unknown provenance
Every slipping hardware program has a BOM that was never scrubbed. Parts were added during prototyping because they were in the drawer, in a reference design, or recommended by the distributor. Nobody went back line by line.
The failure mode is quiet. The prototype works. DVT works. The CM quotes the BOM and comes back with twenty questions: this connector has a sixteen-week lead time, this MCU is on EOL notice, this sensor has an MOQ of twenty-five thousand, this PMIC is allocated to automotive through Q3. Each answer triggers a sub-decision. A two-week scrub becomes a four-month trickle of surprises.
The tell: the BOM has no columns for production-volume lead time, EOL status, MOQ, and second source. The last full BOM review with procurement in the room was never. More than three parts are marked “TBD” or “distributor recommended.”
The cost: sourcing-driven slip is the most common cause of missing a first production window by a quarter or more. Also the most preventable. Every surprise is a question somebody could have asked in DVT.
The fix: a structured BOM review at DVT, and again at PVT with the CM in the room, producing a line-item answer for every critical part: real volume lead time, EOL horizon, MOQ, pin-compatible second source, price trajectory. No critical part survives unless all five are answered. Why Hardware MVPs Fail and From Prototype to Production call this the BOM maturity gate.
Boring work. Also the cheapest month of engineering a program can spend.
How to spot slip two months before it happens
Slip is visible months before it hits the schedule. The signals are behavioural, not metric.
Concrete signals a program is eight weeks from a slip it does not yet see:
- An architecture decision has been deferred in two or more consecutive reviews.
- The cross-discipline integration review has been skipped twice because “the team is heads-down.”
- A bug ticket has moved between firmware and hardware with no new instrumented data.
- A part on the critical path is marked “distributor recommended” and has never been priced at production volume.
- The CM asks a question the design team needs more than twenty-four hours to answer.
- The program lead cannot name the three decisions currently blocking the most work.
Any one is a yellow flag. Two or more, and the program is already slipping — the schedule just hasn’t caught up. The EVT, DVT, PVT gate framework exists partly to force these signals into the open on a schedule, rather than at first production.
What a program on schedule actually looks like
Programs that ship on schedule have a specific feel. Fewer meetings. The ones that happen end with named action items and decide-by dates. Every discipline knows what it is blocked on and who owns the unblock. The interface spec is a live document, not a slide from two months ago.
The program lead answers three questions in thirty seconds: what decisions are open, who owns each, and when each closes. If any answer is fuzzy, the program is drifting.
The bias at each level:
- Architecture: close questions early with a provisional answer rather than keep them open for a better one. An open question costs schedule every day.
- Execution: instrumented debugging over argued debugging. When something does not work, the first move is the probe, not the hypothesis.
- Sourcing: the BOM as a living document — lead time, EOL, second source, MOQ, price trajectory carried as columns, reviewed on a cadence.
The difference between shipping in twelve months and overrunning by twelve is not talent. It is the weekly discipline of closing decisions, enforcing interfaces, instrumenting bugs, and scrubbing BOMs. Our framework page describes how we run this as ad-hoc CTO — not a methodology, a set of weekly disciplines that keep the four causes from compounding into a quarter of lost time.
The programs we inherit from slipping teams almost never need more engineers. They need decisions that have been open for three months closed by Friday. A different kind of work, and usually why someone external has to do it.
Program slipping, and not sure which of the four causes is driving it?
We come in as ad-hoc CTO and senior product team to close the decisions that are keeping your hardware program open. Every engagement starts with a fixed-scope diagnostic — no open-ended billing, no ambiguous timelines.
Start a Conversation