Embedded system

An embedded system is a special-purpose computer system, which is completely encapsulated by the device it controls. An embedded system has specific requirements and performs pre-defined tasks, unlike a general-purpose personal computer. An embedded system is a programmed hardware device. A programmable hardware chip is the 'raw material' and it is programmed with particular applications. This is to be understood in comparison to older systems with full functional hardware or systems with general purpose hardware and externally loaded software. Embedded systems are a combination of hardware and software which facilitates mass production and variety of application.

History

File:Apollo-guidance-computer.jpg

Apollo Guidance Computer.
source: The Computer History Museum (fair use)

The first recognizably modern embedded system was the Apollo Guidance Computer, developed by Charles Stark Draper at the MIT Instrumentation Laboratory. Each flight to the moon had two. They ran the inertial guidance systems of both the command module and LEM.

At the project's inception, the Apollo guidance computer was considered the riskiest item in the Apollo project. The use of the then new monolithic integrated circuits, to reduce the size and weight, increased this risk.

File:Minuteman I Computer D-17.jpg

Autonetics D-17 guidance computer from a Minuteman I missile.

The first mass-produced embedded system was the guidance computer for the Minuteman missile in 1961. It was the Autonetics D-17 guidance computer, built using discrete transistor logic and a hard disk for main memory. When the Minuteman II went into production in 1966, the D-17 was replaced with a new computer that used integrated circuits, and was the first volume user of them. Without this program, integrated circuits might never have reached a usable price-point.

The crucial design features of the Minuteman computer were that its guidance algorithm could be reprogrammed later in the program, to make the missile more accurate, and the computer could also test the missile, saving cable and connector weight.

Examples of embedded systems

automatic teller machines (ATMs)
cellular telephones and telephone switches
computer network equipment, including routers, timeservers and firewalls
computer printers
copiers
disk drives (floppy disk drives and hard disk drives)
engine controllers and antilock brake controllers for automobiles
home automation products, like thermostats, air conditioners, sprinklers, and security monitoring systems
handheld calculators
household appliances, including microwave ovens, washing machines, television sets, DVD players/recorders
inertial guidance systems, flight control hardware/software and other integrated systems in aircraft and missiles
medical equipment
measurement equipment such as digital storage oscilloscopes, logic analyzers, and spectrum analyzers
multifunction wristwatches
Multifunctional printers (MFPs)
personal digital assistants (PDAs), that is, small handheld computers with PIMs and other applications
mobile phones with additional capabilities, for example, mobile digital assistants with cellphone and PDA and Java (MIDP)
programmable logic controllers (PLCs) for industrial automation and monitoring
stationary videogame consoles and handheld game consoles
wearable computer

Characteristics

Embedded computer systems constitute the widest possible use of computer systems; it includes all computers other than those specifically intended as general-purpose computers. Examples of embedded systems range from a portable music player, to real-time control of systems like the space shuttle. They are characterized by providing a function, or functions, that is not itself a computer.

The majority of commercial embedded systems are designed to perform selected functions at a low cost. Many, but not all embedded systems have real-time system constraints that must be met. These systems may need to be very fast for some functions, but most of its other functions will probably not need speed. These systems meet their real-time constraints with a combination of special purpose hardware and software tailored to the system requirements.

It is difficult to characterize embedded systems as to speed, or cost requirements, but for high volume systems, cost will often dominate much of the system design. Fortunately, most systems have limited real-time requirements that can usually be met with a combination of custom hardware and a limited amount of high performance software. Take for instance a digital setup box for satellite television. Even though such a system has to process 10s of megabits of continuous-data per second most of the heavy lifting is done by custom hardware that parses, directs, and decode the multi-channel digital stream down into a single video output. The embedded CPU is left to set up data paths, handle interrupts at frame boundaries, generate and display graphics, etc. to enable the settop look and feel. Therefore, often many parts of an embedded system will require low performance compared to primary mission of the system. This allows architecture of an embedded system to be intentionally simplified to lower costs compared to a general-purpose computer accomplishing the same task, by using a CPU that is “good enough” for these secondary functions.

For embedded systems that are not high volume personal computers can often be conscripted into service either by limiting the programs or by replacing the operating system with a real-time operating system. In this case special purpose hardware may be replaced by one or more high performance CPUs. Still, some embedded system may require both high performance CPUs, special hardware, and large memories to accomplish a required task.

In the domain of high volume embedded system, e.g. a portable music player, reducing cost becomes a major concern. These systems will often have just a few chips, a highly integrated CPU, a custom chip that controls all other functions and a single memory chip. In these designs each component is selected and designed to minimize system cost..

The software written for many embedded systems, especially those without a disk drive is sometimes called firmware, the name for software that is embedded in hardware devices, e.g. in one or more ROM/Flash memory IC chips.

Programs on an embedded system often run with real-time constraints with limited hardware resources: often there is no disk drive, operating system, keyboard or screen. The software may not have anything remotely like a file system, or if one is present, a flash drive may replace rotating media. If a user interface is present, it may be a small keypad and liquid crystal display.

Embedded systems reside in machines that are expected to run continuously for years without errors. Therefore the software and Firmware is usually developed and tested more carefully than Software for Personal computers. Many embedded systems avoid mechanical moving parts such as Disk drives, switches or buttons because these are unreliable compared to solid-state parts such as Flash memory

In addition, the embedded system may be outside the reach of humans (down an oil well borehole, launched into outer space, etc.), so the embedded system must be able to restart itself even if catastrophic data corruption has taken place. This is usually accomplished with a standard electronic part called a watchdog timer that resets the computer unless the software periodically resets the timer.

Design of embedded systems

The electronics usually uses either a microprocessor or a microcontroller. Some large or old systems use general-purpose mainframe computers or minicomputers.

User interfaces

User interfaces for embedded systems vary wildly, and thus deserve some special comment.

Interface designers at PARC, Apple Computer, Boeing and HP minimize the number of types of user actions. For example, their systems use two buttons (the absolute minimum) to control a menu system (just to be clear, one button should be "next menu entry" the other button should be "select this menu entry").

A touch-screen or screen-edge buttons also minimize the types of user actions.

Another basic trick is to minimize and simplify the type of output. Designs sometimes use a status light for each interface plug, or failure condition, to tell what failed. A cheap variation is to have two light bars with a printed matrix of errors that they select- the user can glue on the labels for the language that he speaks.

For example, Boeing's standard test interface is a button and some lights. When you press the button, all the lights turn on. When you release the button, the lights with failures stay on. The labels are in Basic English.

Another example is probably right next to you. Look at a computer printer. Very often the lights are labelled with stick-on labels that can be printed in any language. In some markets, devices are delivered with several sets of labels, so customers can pick the most comfortable language.

Designers use colors. Red means the users can get hurt (think of blood). Yellow means something might be wrong. Green means the status is OK/good. This is intentionally like a stop-light, because most people understand those.

Most designs arrange for a display to change immediately after a user action. If the machine is going to do anything, it usually starts within 7 seconds, or gives progress reports.

If a design needs a screen, many designers use plain text. It's preferred because users have been reading signs for years. A GUI is pretty and can do anything, but typically adds a year from artist, approval and translator delays and one or two programmers to a project's cost, without adding any value. Often, an overly-clever GUI actually confuses users, because it can use unfamiliar symbols.

If a design needs to point to parts of the machine (as in copiers), these are often labelled with numbers on the actual machine, that are visible with the doors closed.

A network interface is just a remote screen. It behaves much like any other user interface.

One of the most successful general-purpose screen-based interfaces is the two menu buttons and a line of text in the user's native language. It's used in pagers, medium-priced printers, network switches, and other medium-priced situations that require complex behavior from users.

When there's text, the designer chooses one or more languages. The default language is usually the one most widely understood by the targetted group of users.

Most designers try to use the native character sets, no matter how painful. People with peculiar character sets feel coddled and loved when their language shows up on machinery they use.

Text is usually translated by professional translators, even if native speakers are on staff. Marketing staff have to be able to tell foreign distributors that the translations are professional.

A foreign organization often tries to give the highest-volume distributor the duty to review and correct any translations in their native language. This stops critiques by other native speakers, who tend to believe that no foreign organization will ever know their language as well as they.

Another common trick is that modes are made absolutely clear on the user's display. If an interface has modes, they are almost always reversible in an obvious way.

Most authorities consider a usability test more important than any number of opinions. Designers recommend testing the user interface for usability at the earliest possible instant. A commonly-used quick, dirty test is to ask an executive secretary to use cardboard models drawn with magic markers, and manipulated by an engineer. The videotaped result is likely to be both humorous and very educational. In the tapes, every time the engineer talks, the interface has failed because it would cause a service call.

In well-run organizations, one person approves the user interface. Often this is a customer, the major distributor or someone directly responsible for selling the system. Committees do not quickly make decisions, and some people never do. This causes avoidable, expensive delays.

Platform

There are many different CPU architectures used in embedded designs such as ARM, MIPS, Coldfire/68k, PowerPC, X86, PIC, 8051, Atmel AVR etc.

This in contrast to the desktop computer market, which as of this writing (2003) is limited to just a few competing architectures, mainly the Intel/AMD x86, and the Apple/Motorola/IBM PowerPC, used in the Apple Macintosh. Side note: with the growing acceptance of Java in this field, there is a tendency to even further eliminate the dependency on specific CPU/hardware (and OS) requirements.

Standard PC/104 is a typical base for small, low-volume embedded and ruggedized system design. These often use DOS, Linux or an embedded real-time operating system such as QNX or Inferno.

A common configuration for very-high-volume embedded systems is the system on a chip, an application-specific integrated circuit, for which the CPU was purchased as intellectual property to add to the IC's design. A related common scheme is to use a field-programmable gate array, and program it with all the logic, including the CPU. Most modern FPGAs are designed for this purpose.

Tools

Like typical computer programmers, embedded system designers use compilers, assemblers, and debuggers to develop embedded system software.

Those software tools can come from several sources:

Software companies that specialize in the embedded market
Ported from the GNU software development tools. (cross-compiler: http://www.kegel.com/linux/embed/ )
Sometimes, development tools for a personal computer can be used if the embedded processor is a close relative to a common PC processor.

Embedded system designers also use a few software tools rarely used by typical computer programmers.

More common are utility programs to add a checksum or CRC to a program, so it can check its program data before executing it.
Less common are utility program to turn data files into code, so one can include any kind of data in a program.
Uncommon are Synchronous programming languages, used for extra reliability.

Operating system

These systems often have no operating system, or a specialized embedded operating system (often a real-time operating system), or the programmer is assigned to port one of these to the new system.

Built-In Self-Test

Most embedded systems have some degree or amount of built-in self test. There are several basic types:

Testing the computer: CPU, RAM, and program memory. These often run once at power-up. In safety-critical systems, they are also run periodically, or over time.
Tests of peripherals: These simulate inputs and read-back or measure outputs. A surprising number of communication, analog and control systems can have these tests, often very cheaply.
Tests of power: These usually measure each rail of the power supply, and may check the input (battery or mains) as well. Power supplies are often highly stressed, with low margins.
Communication tests: These verify the receipt of a simple message from connected units. The internet, for example, has the ICMP message "ping."
Cabling tests: These usually run a wire in a serpentine arrangement through representative pins of the cables that have to be attached. Synchronous communications systems, like telephone media, often use "sync" tests for this purpose. Cable tests are cheap, and extremely useful when the unit has plugs.
Rigging tests: Often a system has to be adjusted when it is installed. Rigging tests provide indicators to the person that installs the system.
Consumables tests: These measure what a system uses up, and warn when the quantities are low. The most common example is the gas gauge of a car. The most complex examples may be the automated medical analysis systems that maintain inventories of chemical reagents.
Operational tests: These measure things that a user would care about to operate the system. Notably, these have to run when the system is operating. This includes navigational instruments on aircraft, a car's speedometer, and disk-drive lights.
Safety tests: These run within a 'safety interval', and assure that the system is still reliable. The safety interval is usually a time less than the minimum time that can cause harm.

Reliability regimes

Reliability has different definitions depending on why people want it. Interestingly, there are relatively few types of reliability, and system with similar types employ similar types of embeedded system designs and built-in-self tests:

The system is too unsafe, or inaccessible to repair. (Space systems, undersea cables, navigational beacons, bore-hole systems, and oddly, automobiles and mass-produced products) Generally, the embedded system tests subsystems, and switches redundant spares on line, or incorporates "limp modes" that provide partial function. Often mass-produced equipment for consumers (such as cars, PCs or printers) falls in this category because repairs are expensive and repairmen far away, when compared to the initial cost of the unit.
The system cannot be safely shut down. (Aircraft navigation, reactor control systems, some chemical factory controls, engines on single-engine aircraft) Like the above, but "limp modes" are less tolerable. Often the backups are selected by an operator.
The system will lose large amounts of money when shut down. (Telephone switches, factory controls, bridge and elevator controls, automated sales and service) These usually have a few go/no-go tests, with on-line spares or limp-modes using alternative equipment and manual procedures.
The system cannot be operated when unsafe. (Medical equipment, aircraft equipment with hot spares, such as engines) The testing can be quite exotic, but the only action is to shut down the whole unit and indicate a failure.
The system cannot be operated when it will lose large amounts of money. (chemical factory controls, financial systems) Very similar to medical equipment, above.

Debugging

Debugging is usually performed with an in-circuit emulator, or some type of debugger that can interrupt the microcontroller's internal microcode.

The microcode interrupt lets the debugger operate in hardware in which only the CPU works. The CPU-based debugger can be used to test and debug the electronics of the computer from the viewpoint of the CPU. This feature was pioneered on the PDP-11.

Developers should insist on debugging which shows the high-level language, with breakpoints and single-stepping, because these features are widely available. Also, developers should write and use simple logging facilities to debug sequences of real-time events.

PC or mainframe programmers first encountering this sort of programming often become confused about design priorities and acceptable methods. Mentoring, code-reviews and egoless programming are recommended.

As the complexity of embedded systems grows, higher level tools and operating systems are migrating into machinery where it makes sense. For example, cellphones, personal digital assistants and other consumer computers often need significant software that is purchased or provided by a person other than the manufacturer of the electronics. In these systems, an open programming environment such as Linux, OSGi or Embedded Java is required so that the third-party software provider can sell to a large market.

Most such open environments have a reference design that runs on a personal computer. Much of the software for such systems can be developed on a conventional PC. However, the porting of the open environment to the specialized electronics and the development of the device drivers for the electronics are usually still the responsibility of a classic embedded software engineer. In some cases, the engineer works for the integrated circuit manufacturer, but there is still such a person somewhere.

Start-up

All embedded systems have start-up code. Usually it disables interrupts, sets up the electronics, tests the computer (RAM, CPU and software), and then starts the application code. Many embedded systems recover from short-term power failures by restarting (without recent self-tests). Restart times under a tenth of a second are common.

Many designers have found one or more hardware plus software-controlled LEDs useful to indicate errors during development (and in some instances, after product release, to produce troubleshooting diagnostics). A common scheme is to have the electronics turn on all of the LED(s) at reset (thereby proving that power is applied and the LEDs themselves work), whereupon the software changes the LED pattern as the Power-On Self Test executes. After that, the software may blink the LED(s) or set up light patterns during normal operation to indicate program execution progress and/or errors. This serves to reassure most technicians/engineers and some users. An interesting exception is that on electric power meters and other items on the street, blinking lights are known to attract attention and vandalism.

Types of embedded software architectures

There are several basically different types of software architectures in common use.

The control loop

In this design, the software simply has a loop. The loop calls subroutines. Each subroutine manages a part of the hardware or software. Interrupts generally set flags, or update counters that are read by the rest of the software.

A simple API disables and enables interrupts. Done right, it handles nested calls in nested subroutines, and restores the preceding interrupt state in the outermost enable. This is one of the simplest methods of creating an exokernel.

Typically, there's some sort of subroutine in the loop to manage a list of software timers, using a periodic real time interrupt. When a timer expires, an associated subroutine is run, or flag is set.

Any expected hardware event should be backed-up with a software timer. Hardware events fail about once in a trillion times. That's about once a year with modern hardware. With a million mass-produced devices, leaving out a software timer is a business disaster.

State machines may be implemented with a function-pointer per state-machine (in C++, C or assembly, anyway). A change of state stores a different function into the pointer. The function pointer is executed every time the loop runs.

Many designers recommend reading each IO device once per loop, and storing the result so the logic acts on consistent values.

Many designers prefer to design their state machines to check only one or two things per state. Usually this is a hardware event, and a software timer.

Designers recommend that hierarchical state machines should run the lower-level state machines before the higher, so the higher run with accurate information.

Complex functions like internal combustion controls are often handled with multi-dimensional tables. Instead of complex calculations, the code looks up the values. The software can interpolate between entries, to keep the tables small and cheap.

One major weakness of this system is that it does not guarantee a time to respond to any particular hardware event.

Careful coding can easily assure that nothing disables interrupts for long. Thus interrupt code can run at very precise timings.

Another major weakness of this system is that it can become complex to add new features. Algorithms that take a long time to run must be carefully broken down so only a little piece gets done each time through the main loop.

This system's strength is its simplicity, and on small pieces of software the loop is usually so fast that nobody cares that it is not predictable.

Another advantage is that this system guarantees that the software will run. There is no mysterious operating system to blame for bad behavior.

Nonpreemptive multitasking

This system is very similar to the above, except that the loop is hidden in an API. One defines a series of tasks, and each task gets its own subroutine stack. Then, when a task is idle, it calls an idle routine (usually called "pause", "wait", "yield", or etc.).

An architecture with similar properties is to have an event queue, and have a loop that removes events and calls subroutines based on a field in the queue-entry.

The advantages and disadvantages are very similar to the control loop, except that adding new software is easier. One simply writes a new task, or adds to the queue-interpreter.

Preemptive timers

Take any of the above systems, but add a timer system that runs subroutines from a timer interrupt. This adds completely new capabilities to the system. For the first time, the timer routines can occur at a guaranteed time.

Also, for the first time, the code can step on its own data structures at unexpected times. The timer routines must be treated with the same care as interrupt routine(s).

Preemptive tasks

Take the above nonpreemptive task system, and run it from a preemptive timer or other interrupts.

Suddenly the system is quite different. Any piece of task code can damage the data of another task—they must be precisely separated. Access to shared data must be rigidly controlled by some synchronization strategy, for example message queues or semaphores. (Recently non-blocking synchronization strategies have been developed).

Often, at this stage, the developing organization buys a real-time operating system. This can be a wise decision if the organization lacks people with the skills to write one, or if the port of the operating system to the hardware will be used in several products. Otherwise, be aware that it usually adds six to eight weeks to the schedule, and forever after programmers can blame delays on it.

Office-style operating systems

These are popular for embedded projects that have no systems budget. In the opinion of at least one author of this article, they are usually a mistake. Here's the logic:

Operating systems are specially-packaged libraries of reusable code. If the code does something useful, the designer saves time and money. If not, it's worthless.

Operating systems for business systems lack interfaces to embedded hardware. Example: if one uses Linux to write a motor controller or telephone switch, most of the real control operations end up as numbered functions in an IOCTL call. Meanwhile, the normal read, write, fseek, interface is purposeless. So the operating system actually interferes with development.

Most embedded systems perform no office work, so most code of office operating systems is wasted. Example: most embedded systems never use a file system or screen, so file system and GUI logic is wasted. Unused code is just a reliability liability.

Office style operating systems protect the hardware from user programs. That is, they interfere with embedded systems development profoundly.

Operating systems must invariably be ported to an embedded system. That is, the hardware driver code must always be written anyway. This is the most difficult part of the operating system, so little is saved by using one.

The genuinely useful, portable features of operating systems are small pieces of code. Examples: a basic TCP/IP interface is about 3,000 lines of C code; as is a simple file system. If a design needs these, they can be had for less than 10% of the typical embedded system's development budget, without royalty, just by writing them. And, if the needed code is sufficiently generic, the back of embedded systems magazines typically have vendors selling royalty-free C implementation.

Nevertheless Embedded Linux is increasing in popularity, especially on the more powerful embedded devices such as Wireless Routers and GPS Navigation Systems. Here are some of the reasons:

Ports to common embedded platforms are available.

The abilty to re-use publically available code for Device Drivers, Web Servers, Firewalls, and numerous other utilities.

The ability to configure the distribution to exclude unneeded functionality.

The abilty to develop embedded applications user mode, makes the development process easier and more portable.

Exotic custom operating systems

Some systems require safe, timely, reliable or efficient behavior unobtainable with the above architectures. There are well-known tricks to construct these systems:

Hire a real system programmer. They cost a little more, but can save years of debugging, and the associated loss of revenue.

RMA (rate monotonic analysis), can be used to find whether a set of tasks can run under a defined hardware system. In its simplest form, the designer assures that the quickest-finishing tasks have the highest priorities, and that on average, the CPU has at least 30% of its time free.

Harmonic tasks optimize CPU efficiency. Basically, designers assure that everything runs from a heartbeat timer. It's hard to do this with a real-time operating system, because these usually switch tasks when they wait for an I/O device.

Systems with exactly two levels of priority (usually running, and interrupts-disabled) cannot have Priority inversion problems in which a higher priority task waits for a lower priority task to release a semaphore or other resource.

Systems with monitors can't have deadlocks. A monitor locks a region of code from interrupts or other preemption. If the monitor is only applied to small, fast pieces of code, this can work acceptably well. If the monitor API can be proven to run to completion in all cases, (say, if it merely disabels interrupts) then no hangs are possible.

This means that systems that use dual priority and monitors are safe and reliable because they lack both deadlocks and priority inversion. If the monitors run to completion, they will never hang. If they use harmonic tasks, they can even be fairly efficient. However, RMA can't characterize these systems, and levels of priority had better not exist anywhere, including in the operating system and hardware.

External links

How a real time operating system works
Embedded Systems Portal
Windows Embedded Developer Center
Embedded systems and VLIW processors
Embedded C++ Homepage
QNX Homepage
Embedded Systems Discussion Groups, Books, Jobs, and More
Embedded Software Design
Universities that have Embedded Systems Research groups
Embedded Systems Programming
The EE Compendium - Electronic Engineering and Embedded Systems Programming
DeviceTools - Tools and silicon for embedded device developers