Embedded Systems Development: A Systems Discipline
Embedded systems development is a system architecture discipline, not a firmware-writing one. How the layers fit together, and why most teams get the boundary wrong.
Most embedded programs we are asked to rescue do not have a firmware problem. They have an architecture problem surfacing as firmware pain. The symptoms look like firmware — timing jitter, OTA that bricks units, watchdog resets under load, a codebase forked across three SKUs — but the root cause is upstream. Somebody decided which processor owned which responsibility, where the real-time boundary lived, and how the mechanical, electrical, firmware, application, and cloud layers would negotiate. Those decisions were made implicitly, by whichever engineer was in front of a whiteboard first, and the team has been paying interest on them ever since.
Embedded systems development, done well, is a systems architecture discipline. It decides where behavior lives, which deadlines are physics-bound, what is allowed to fail and how, and how a unit shipping today will still be buildable and updateable in five years. The firmware is the artifact. It is not the discipline.
Embedded systems development is not the act of writing firmware for a chosen MCU. It is the act of deciding which chip does what, where the hard-real-time boundary lives, how the layers above and below it negotiate, and what the product still looks like three revisions and five years from now. Get the architecture right and the firmware is plumbing. Get it wrong and no amount of firmware talent unsticks the program.
Embedded systems development is not firmware development
Most written material on embedded development reads like a vendor tutorial. Choose a microcontroller. Install a toolchain. Write a HAL. Blink an LED. Useful the first time an engineer opens a datasheet; nearly useless as a description of what embedded development actually is when the product has to ship at scale, certify in multiple jurisdictions, survive a decade in the field, and share behavior with a backend, a mobile app, and a factory provisioning line.
Firmware development is writing and debugging code that runs on an embedded target. It is necessary. It is also downstream of every interesting decision.
Embedded systems development is the act of deciding:
- Which processor class is in the product, and why — M0 for a sensor node because power budget dominates, M4 or M7 for real-time control because DSP throughput and deterministic interrupt latency dominate, an A-series SoC running Linux for the supervisory layer.
- Whether there is one processor or two, and what the boundary between them carries — a fieldbus, a shared memory region, a signed IPC channel, a hardware interlock.
- What runs bare-metal, what runs on an RTOS, what runs on Linux, and why those choices are not interchangeable.
- Where the real-time deadlines live, how tight they are, and what happens when they are missed.
- How the device, the cloud, the manufacturing tooling, and the test fixtures share a single view of identity, configuration, and firmware.
- What the product does when a component goes end-of-life in year four, when a CVE drops against the TLS library in year six, when a unit has run for nine years uninterrupted.
A firmware engineer who has not been asked these questions will answer them anyway, implicitly, in the shape of the code they write. The answers will be local. Locally correct, globally wrong — because local optimization across layers is how embedded products fail in production. This is the Phase 1 work that From Prototype to Production pivots on: architecture decisions made before the first line of firmware determine whether the product becomes a fleet or stays a demonstration.
The layers most teams conflate
An embedded product is five concurrent disciplines, each making commitments the others have to honor. Easy to name, hard to keep separate under schedule pressure.
Five layers. Each makes commitments the others must honor.
The mechanical layer pins the thermal envelope, the vibration profile, the ingress spec. The electrical layer picks processor class, sensor topology, power tree, and analog front-end inside those constraints — an MCU choice is not a firmware choice but a decision about deterministic interrupt latency, DSP throughput, peripheral set, package availability in ten years, second-source availability, and whether secure boot and a hardware root of trust are on the die. The firmware layer owns the boundaries between ISR and task context, DMA-driven and CPU-driven paths, bare-metal regions and RTOS-scheduled regions. The application layer — typically Linux on an A-series SoC — owns supervisory behavior: state machines that span seconds to hours, UI, fleet protocols, cloud connectivity, logging, diagnostics. The cloud layer owns fleet identity, OTA distribution, telemetry ingestion, remote diagnostics. It is not optional on a connected product; its shape is decided by what the device can emit, authenticate, and receive.
The failure mode we see most often is a firmware-first team treating the other four layers as fixed boundary conditions around the code they own. They are not fixed. They are co-decided, and the interesting architecture work happens at the negotiation between two adjacent layers.
Where the real-time boundary actually lives
“Real-time” is the most abused term in embedded development. It gets used to mean “fast,” “responsive,” “we use an RTOS.” None of these are the definition. Real-time means the system has a deadline, the deadline is bounded, and missing the deadline is either a failure or has a known, acceptable consequence.
The interesting question is not “is this real-time.” It is “which behaviors are on a physics-bound clock and which are not.” That boundary is the most important architectural line in the product. Most teams draw it in the wrong place and do not realize they drew it at all.
A motor control loop: PWM updates must land inside a fixed window relative to commutation phase, or the motor loses torque, draws excess current, heats, and destroys itself or its driver. Deadline in microseconds, no acceptable miss rate. The controller must have deterministic interrupt latency, a known worst-case ISR path, and nothing above it allowed to preempt it. Hard real-time.
A UI update showing motor temperature: deadline is “before the operator notices staleness” — hundreds of milliseconds. Miss consequence is mild. The control loop does not care if the screen is late. Not real-time at all — it is supervisory.
A single MCU can sometimes host both. It can also fail to, subtly. A UI framework, a network stack, a logging subsystem, or a filesystem on the same processor as a hard real-time loop will eventually steal cache lines, take interrupts, block on memory, and introduce jitter the control loop cannot tolerate. The motor will be fine most of the time. Not on the run where the network stack handles a DHCP renewal while the UI redraws a chart.
Three viable architectures, and the choice is not stylistic:
- One processor, bare-metal or thin RTOS. Works when the full supervisory surface is small. Good for deeply embedded devices with minimal UI and no cloud.
- One processor, dual-context. A hard-real-time region (ISR, DMA, tight loops) plus an RTOS scheduling the rest. Workable on M4/M7-class MCUs when the supervisory layer is modest. Requires strict discipline about what runs where and what preempts what.
- Two processors. A dedicated real-time MCU handling physics-bound loops; a separate supervisory processor (Linux SoC or higher-end MCU) handling UI, connectivity, state; connected by an industrial-grade bus with deterministic timing. This is where most serious industrial, robotics, and power-electronics products converge.
The third pattern is how Aerones ended up after the Beckhoff migration. The previous architecture put motor commutation, winch tension regulation, safety interlocks, mission planning, telemetry, and remote access on a Raspberry Pi — a Linux SoC with no deterministic interrupt guarantees, a consumer thermal envelope, and component availability measured in quarters. For prototypes, it worked. For a fleet servicing 10,000 turbines a year at 120 meters in crosswinds, it was a liability. The migrated architecture moved hard real-time (motor commutation, winch tension regulation, safety interlocks) onto industrial Beckhoff controllers with guaranteed cycle times in the low-microsecond range, and pushed supervisory functions (mission planning, telemetry, Starlink uplink, remote operator control) onto the supervisory layer. An industrial fieldbus between them carried control-plane traffic with deterministic timing. The system did not fail at height in a crosswind because the real-time boundary was finally drawn where physics demanded it, and everything above it was free to be late without consequence.
The hard-real-time / supervisory split (and why to care)
Every embedded product above a certain complexity benefits from drawing this split explicitly. Even products that do not need two physical processors benefit from the logical separation.
The hard-real-time side owns behaviors whose deadlines come from physics: motor commutation, power-stage switching, PFC control, safety interlocks and e-stop handling, sensor sampling where aliasing matters, closed-loop control (PID, MPC) running at kilohertz or above, fieldbus cycle participation (EtherCAT, Profinet IRT, CANopen sync).
The supervisory side owns behaviors whose deadlines come from humans, networks, or business logic: mission planning and sequencing, UI and audio, connectivity (LTE, Wi-Fi, Starlink), telemetry aggregation, OTA orchestration, diagnostic flows, configuration and identity.
Three design rules follow:
Asymmetric trust. The supervisory side can ask the real-time side to do things. It cannot override safety decisions the real-time side makes. If the real-time side declares an interlock, no supervisory command unlatches it without an explicit operator action routed through a hardware-enforced path. “Safety-critical” is not a label on a function; it is a constraint on who can tell whom what.
Deterministic interface. The link between the two sides — fieldbus, shared memory, dedicated SPI — has a defined cycle time, a defined worst-case latency, and a defined behavior on link loss. The last is skipped most often and causes the most expensive failures.
Graceful degradation, layered. When the supervisory side crashes or goes unreachable, the real-time side continues the physics-bound work and enters a safe holding state. When the real-time side declares a fault, the supervisory side reports it and does not attempt to work around it. At Aerones, this is why a robot at height with a momentarily degraded Starlink link does not become a six-figure crane deployment. The real-time controller is not waiting on the cloud.
Marconi Technologies ships this split differently — the product class is different, the principle identical. Their UPS platform separates 20 kHz real-time control of the power conversion stage from the application layer handling UI, LTE monitoring, diagnostics, and fleet reporting. When building mains drops and the UPS has to transfer to battery in under four milliseconds, the real-time layer does it. The application layer finds out afterward. The system does not negotiate with the cloud during a transfer event, because the cloud is not on a four-millisecond clock.
Device software, cloud, manufacturing tooling, test — one architecture
The second most common embedded architecture mistake, after misplacing the real-time boundary, is treating the device, the cloud, the manufacturing line, and the test infrastructure as separate projects. They are not. They share identity, configuration, and firmware. An architecture that does not recognize this produces a product that works on the bench and fails on the line.
Identity. Every unit has a serial, a device certificate, a manufacturing signature, and an identity that persists from factory flash through every OTA for its service life. The cloud recognizes a unit the first time it connects. The factory provisions identity during end-of-line test without the design engineer present. OTA knows which variant to ship to which unit. Field service looks up a unit by serial and gets its full firmware history. One data model, not four.
Configuration. Per-unit calibration (sensor offsets, motor parameters, ADC trim) is produced at end-of-line test and must persist across OTA. Per-variant config (SKU, populated peripherals, regulatory region) is produced at manufacturing. Per-deployment config (site, customer, policy set) is produced in the field. These three sources feed into one configuration surface the firmware reads at boot. A design that does not separate them cleanly produces units that lose calibration on update, variants that ship with wrong region settings, or deployments that reset to defaults on power cycle.
Firmware provisioning. The factory flashes firmware at end-of-line. The field receives OTA. Both are deployment targets. Both need signed binaries, a partition strategy supporting rollback, and a bootloader that enforces the signature chain. The factory flow is not “a human with a laptop” — it is a fixture with a cable, a script, a CM operator, and a data log. If the firmware team has not delivered a factory-flash path alongside the OTA path, the CM will build one, and it will be wrong.
Test. End-of-line test produces calibration values, verifies functional behavior, exercises interfaces, and writes pass/fail into the manufacturing system of record. It has to run at line speed with a line operator, not the design engineer. Test firmware is a first-class artifact, not a side project. In many programs we rescue, the test rig is an afterthought — a laptop running a Python script the original engineer wrote in a hurry — and it becomes the bottleneck of first production. The EVT, DVT, PVT sequence has a specific place for test-fixture maturation; skipping it costs a first production run.
Device software, cloud, manufacturing tooling, and field service are one architecture. The same firmware binary across all; the same identity model connecting all; the same OTA flow patching all. Programs that design this as one thing ship. Programs that design it as four ship late, if at all. This sits inside the broader sequence in The Hardware Product Development Process, Reframed — embedded architecture is one of the decisions that determines whether the transition holds together.
Long-lifecycle constraints: OTA, security, component obsolescence
Embedded products outlive the teams that design them. Consumer products ship for three years; industrial for ten to fifteen; infrastructure for twenty. The architecture has to survive the horizon.
OTA. Over-the-air update is not a feature added once connectivity ships. It is an architectural decision made at EVT at the latest. It requires a bootloader with a partition table supporting A/B slots or a known-good fallback; signed firmware with a verified signature chain; anti-rollback protection; a fleet policy controlling staged rollout, canary deployment, and forced recovery; and a device state machine that handles power loss mid-update without bricking. All of these have to exist before the first connected unit ships. Retrofitting OTA onto a product that did not plan for it is a common reason programs lose six months.
Security. The threat model is not a web application’s. The attacker has physical access — probe the debug port, dump flash, desolder the MCU, extract keys from EEPROM. Protection lives in hardware: secure boot rooted in on-die OTP, flash encryption, disabled debug interfaces on production units, per-unit device certificates, hardware-backed key storage. These choices are made when the electrical layer picks the processor, because not every processor supports them, and adding them later means a respin. The software supply chain also has to be audited — every third-party library, bootloader, TLS stack — because a CVE in OpenSSL five years from now is a fleet-wide exposure.
Component obsolescence. The MCU you ship in year one will have a different availability profile in year five. The sensor with the best datasheet today may be allocated to automotive by the time you need volume. A serious embedded architecture plans for this: critical components have a qualified second source at pin and footprint level, or at minimum a BSP abstraction that lets a drop-in swap happen without application-layer changes. When a component goes EOL, the migration is a scoped engineering task, not an emergency.
Service and diagnostics. A unit in the field will eventually misbehave. The architecture decides whether that is debuggable from the cloud, from a technician’s laptop on-site, or only by shipping the unit back. Good embedded systems expose structured telemetry, log rings that survive reboots, signed and rate-limited remote diagnostic commands. Bad ones have none of this, and every field issue becomes a forensics project.
When consumer components strand you (and when they don’t)
The single most common architectural mistake at the prototype-to-production transition is a product built on consumer components that cannot survive the run. Raspberry Pi is the emblematic case. Off-the-shelf NUCs, dev-kit modules with opaque long-term support, USB-connected peripherals repurposed as product components — same failure profile.
The question is not “is this part good enough for the prototype.” The question is whether it can stay in the product at volume, in the field, for the lifetime the business requires. The axes that matter:
- Thermal. Consumer SBCs are designed for ambient 0–40 °C, often without forced airflow. A part that throttles at 55 °C is not viable in a product that sees 70 °C on a summer day.
- Vibration and shock. Consumer modules use connectors and mounting schemes that fail under industrial vibration profiles.
- EMC. Consumer EMC is not industrial or medical EMC. Many consumer SBCs cannot pass without rework that compromises what made them attractive.
- Lifecycle. Consumer components are supported for quarters, not years. A decade-long product cannot depend on a part that will be revised in eighteen months.
- Determinism. Consumer Linux on a consumer SoC is not deterministic. Fine for supervisory work. Not for real-time control, and no amount of kernel tuning closes the gap reliably.
- Certification. A part without a certification lineage forces the product to certify from scratch, multiplying cost and time.
Not every product is wrong to ship on consumer hardware. A Raspberry Pi Compute Module 4 in a properly thermal-managed enclosure, on a custom carrier, running a locked-down Yocto image, doing supervisory work only, is a valid production architecture for some product classes. The question is whether the architecture made the decision deliberately, or drifted into it because that is what the prototype used. We have written this out in detail at When Raspberry Pi Strands Production (And When It Doesn’t). Short version: a prototype lives on the parts that are easy to buy now; a product lives on the parts that will be buildable in three years. If the two do not match, the architecture has a migration problem, cheaper to solve at EVT than at PVT.
What good embedded architecture looks like, measured
Good embedded architecture is not subjective. It produces measurable outcomes that poor architecture does not.
The outcome-level signals of a program that did embedded architecture well:
- First-pass factory yield above 90% on first real production. Firmware provisioning without a laptop. End-of-line test catches the defects that actually occur. Calibration survives OTA.
- OTA rollout measured in days, not quarters. Signed binary, staged rollout policy, canary fleet, rollback that works. No unit bricked in the field.
- Timing budget documented and measured. Every hard-real-time loop has a worst-case execution time measured on hardware under load, not estimated from compiler output. Every deadline has an observed jitter bound.
- A single application codebase across variants. SKU differences sit inside a BSP layer. The application, the cloud protocol, and the UI are identical across variants. No parallel forks drifting apart.
- Field debuggability without shipping units back. Structured telemetry, persistent log rings, signed remote commands, enough instrumentation that a field issue becomes a ticket, not a forensic project.
- A component lifecycle plan for every critical part. Second sources qualified at pin and footprint level, or a documented migration path with a scoped cost. No surprises in year four.
These are the outputs of a program that treated embedded systems development as a systems architecture discipline from day one. They are rare in programs that did not. The delta is not talent — it is where the architecture decisions were made, when, and with which constraints in view. The place this fits in a program is documented end-to-end in our framework. Embedded architecture is not a standalone deliverable; it sits inside the broader process by which a hardware product is taken from a working prototype to a fleet that ships, updates, and survives.
Good embedded architecture is 80% decided in Phase 1. The real-time boundary, the processor split, the OTA strategy, the identity model, the manufacturing and test path — these are Phase 1 decisions that shape every later phase. Everything after is mostly plumbing. The programs that get embedded right treat the first four weeks as the most consequential work in the product, not the last four before ship.
Related reading
- From Prototype to Production
- The Hardware Product Development Process, Reframed
- When Raspberry Pi Strands Production (And When It Doesn’t)
- EVT, DVT, PVT: What Each Gate Actually Decides
Architecting an embedded product right now — or trying to unblock one that is stuck between firmware pain and production?
We work as ad-hoc CTO and senior product team on exactly this kind of program. Every engagement starts with a fixed-scope definition phase — no open-ended billing, no ambiguous timelines.
Start a Conversation