Minimal instruction set computer

- (Not to be confused with multiple instruction set computer, also abbreviated MISC, such as the HLH Orion or the OROCHI VLIW processor.)
Minimal Instruction Set Computer (MISC) is a processor architecture with a very small number of basic operations and corresponding opcodes. Such instruction sets are commonly stack-based rather than register-based to reduce the size of operand specifiers.
Such a stack machine architecture is inherently simpler since all instructions operate on the top-most stack entries.
As a result of the stack architecture is an overall smaller instruction set, a smaller and faster instruction decode unit with overall faster operation of individual instructions.
Separate from the stack definition of a MISC architecture, is the MISC architecture being defined with respect to the number of instructions supported.
- Typically a Minimal Instruction Set Computer is viewed as having 32 or fewer instructions,[1][2][3] where NOP, RESET and CPUID type instructions are generally not counted by consensus due to their fundamental nature.
- 32 instructions is viewed as the highest allowable number of instructions for a MISC, as 16 or 8 instructions are closer to what is meant by "Minimal Instructions".
- A MISC CPU cannot have zero instructions as that is a zero instruction set computer.
- A MISC CPU cannot have one instruction as that is a one instruction set computer[4]
- The implemented CPU instructions should by default not support a wide set of inputs, so this typically means an 8-bit or 16-bit CPU.
- If a CPU has an NX bit, it is more likely to be viewed as being CISC or RISC.
- MISC chips typically don't have hardware memory protection of any kind unless there is an application specific reason to have the feature.
- If a CPU has a microcode subsystem, that excludes it from being a MISC system.
- The only addressing mode considered acceptable for a MISC CPU to have is load/store, the same as for RISC CPUs.
- MISC CPUs can typically have between 64 KB to 4 GB of accessible addressable memory—but most MISC designs are under 1 megabyte.
Also, the instruction pipelines of MISC as a rule tend to be very simple. Instruction pipelines, branch prediction, out-of-order execution, register renaming and speculative execution broadly exclude a CPU from being classified as a MISC architecture system.
History
Some of the first digital computers implemented with instruction sets were by modern definition Minimal Instruction Set computers.
Among these various computers, only ILLIAC and ORDVAC had compatible instruction sets.
- Manchester Small-Scale Experimental Machine (SSEM), nicknamed "Baby" (University of Manchester, England) made its first successful run of a stored-program on June 21, 1948.
- EDSAC (University of Cambridge, England) was the first practical stored-program electronic computer (May 1949)
- Manchester Mark 1 (University of Manchester, England) Developed from the SSEM (June 1949)
- CSIRAC (Council for Scientific and Industrial Research) Australia (November 1949)
- EDVAC (Ballistic Research Laboratory, Computing Laboratory at Aberdeen Proving Ground 1951)
- ORDVAC (U-Illinois) at Aberdeen Proving Ground, Maryland (completed November 1951)[5]
- IAS machine at Princeton University (January 1952)
- MANIAC I at Los Alamos Scientific Laboratory (March 1952)
- ILLIAC at the University of Illinois, (September 1952)
Early stored-program computers
- The IBM SSEC had the ability to treat instructions as data, and was publicly demonstrated on January 27, 1948. This ability was claimed in a US patent.[6] However it was partially electromechanical, not fully electronic. In practice, instructions were read from paper tape due to its limited memory.[7]
- The Manchester SSEM (the Baby) was the first fully electronic computer to run a stored program. It ran a factoring program for 52 minutes on June 21, 1948, after running a simple division program and a program to show that two numbers were relatively prime.
- The ENIAC was modified to run as a primitive read-only stored-program computer (using the Function Tables for program ROM) and was demonstrated as such on September 16, 1948, running a program by Adele Goldstine for von Neumann.
- The BINAC ran some test programs in February, March, and April 1949, although was not completed until September 1949.
- The Manchester Mark 1 developed from the SSEM project. An intermediate version of the Mark 1 was available to run programs in April 1949, but was not completed until October 1949.
- The EDSAC ran its first program on May 6, 1949.
- The EDVAC was delivered in August 1949, but it had problems that kept it from being put into regular operation until 1951.
- The CSIR Mk I ran its first program in November 1949.
- The SEAC was demonstrated in April 1950.
- The Pilot ACE ran its first program on May 10, 1950 and was demonstrated in December 1950.
- The SWAC was completed in July 1950.
- The Whirlwind was completed in December 1950 and was in actual use in April 1951.
- The first ERA Atlas (later the commercial ERA 1101/UNIVAC 1101) was installed in December 1950.
Design weaknesses
The disadvantage of an MISC is that instructions tend to have more sequential dependencies, reducing overall instruction-level parallelism. Those MISC architectures which have much in common with the Forth programming language and the Java Virtual Machine tend to be weak in providing full instruction-level parallelism.
A technique called macro-op fusion can be used to regain some of the otherwise lost parallelism, albeit at the cost of adding significant complexity to the instruction decoder logic. In this approach, the CPU secretly recognizes whole patterns of instructions, decoding and executing supported patterns as single instructions. For example, DROP DUP or DROP 0 in Forth could be executed as a single instruction (either copying the 2nd top of stack to the top of stack or replacing the top of stack with the false flag value of 0), as well as common idioms such as + @ or addr + ! (where addr is some literal number representing a base address) to support indexed addressing.
For example, given the following code, which might be found inside the inner loop of a graphics sprite subsystem:
\ Transfer a word of data from a buffer pointed at by P to another buffer pointed at by Q. \ Update pointers for the next iteration. @ fetches data, ! stores data. p @ @ q @ ! p @ 256 + p ! q @ 80 + q !
might run in 18 clock cycles, if we ignore instruction fetch overhead and assume the typical 1 clock per instruction found in most contemporary MISC architectures. It would also access external memory fairly frequently. With macro-op fusion combined with caching techniques, p @ and q @ could be mapped to an internal, unnamed register fetch, assuming a cache of some kind exists for this purpose. In this way, p @ @ (interpreted as store to address in the p @ register) can be recognized as a single instruction once the internal register cache is hot. That reduces p @ @ q @ ! from 6 cycles to 2 cycles under these conditions.
Along the same lines, p @ 256 + can be recognized as a single instruction, decoding to a typical 3-operand RISC addition operation, where the top of stack is the destination, the register assigned to p @ as the source, and 256 as the immediate constant. A more advanced decoder can recognize p @ 256 + p ! as a single RISC operation, executing that entire phrase in a single cycle.
Best case, this code can be made to run in only four clock cycles with aggressive macro-op fusion techniques. This would give the programmer the illusion that the CPU is executing (roughly) 4 instructions at a time per clock cycle.
If we keep everything on the stack,
OVER OVER @ SWAP ! 256 + SWAP 80 + SWAP
we can expect this to run in 11 clocks on a truly minimal microarchitecture. With macro-op fusion, we can recognize OVER OVER as a single 2DUP operation, and SWAP ! occurs with sufficient frequency in real-world programs that it should be treated as a single instruction as well. So the first line can run as quickly as 3 cycles. The second line can be run as quickly as only 2 cycles, if N + and SWAP N + SWAP are recognized as valid macro-ops as well. This would bring the total runtime down to 5 cycles, with no expensive register caching is needed. However, the cost here is that your data stack requires random access capabilities for the ALU circuitry.
Notable CPUs
Probably the most commercially successful MISC was the original INMOS transputer archecture that had no floating-point unit. However, many eight-bit microcontrollers (for embedded computer applications) fit into this category.
Each STEREO spacecraft includes two P24 MISC CPUs and two CPU24 MISC CPUs.[8][9]
See also
References
- ^ Chen-hanson Ting and Charles H. Moore. "MuP21--A High Performance MISC Processor". 1995.
- ^ Michael A. Baxter. "Minimal instruction set computer architecture and multiple instruction issue method". 1993.
- ^ Richard Halverson, Jr. and Art Lew. "An FPGA-Based Minimal Instruction Set Computer". 1995. p. 23.
- ^ Kong, J.H.; Ang, L.-M.; Seng, K.P. "Minimal Instruction Set AES Processor using Harvard Architecture". 2010. doi:10.1109/ICCSIT.2010.5564522
- ^ James E. Robertson (1955), Illiac Design Techniques, report number UIUCDCS-R-1955-146, Digital Computer Laboratory, University of Illinois at Urbana-Champaign
- ^ F.E. Hamilton; R.R. Seeber; R.A. Rowley; E.S. Hughes (January 19, 1949). "Selective Sequence Electronic Calculator". US Patent 2,636,672. Retrieved April 28, 2011.
{{cite web}}
: Unknown parameter|last-author-amp=
ignored (|name-list-style=
suggested) (help) Issued April 28, 1953. - ^ Herbert R.J. Grosch (1991), Computer: Bit Slices From a Life, Third Millennium Books, ISBN 0-88733-085-1
- ^ R. A. Mewaldt, C. M. S. Cohen, W. R. Cook, A. C. Cummings, et. al. "The Low-Energy Telescope (LET) and SEP Central Electronics for the STEREO Mission".
- ^ C.T. Russell. "The STEREO Mission". 2008.
External links
- Forth MISC chip designs
- seaForth-24 - the next to latest multi-core MISC design from Chuck Moore
- Green Arrays - the latest multi-core MISC design from Chuck Moore